Files
cppdraft_translate/cppdraft/locale/codecvt.md
2025-10-25 03:02:53 +03:00

16 KiB
Raw Blame History

[locale.codecvt]

28 Text processing library [text]

28.3 Localization library [localization]

28.3.4 Standard locale categories [locale.categories]

28.3.4.2 The ctype category [category.ctype]

28.3.4.2.5 Class template codecvt [locale.codecvt]

28.3.4.2.5.1 General [locale.codecvt.general]

🔗

namespace std {class codecvt_base {public:enum result { ok, partial, error, noconv }; }; template<class internT, class externT, class stateT>class codecvt : public locale::facet, public codecvt_base {public:using intern_type = internT; using extern_type = externT; using state_type = stateT; explicit codecvt(size_t refs = 0);

result out( stateT& state, const internT* from, const internT* from_end, const internT*& from_next, externT* to, externT* to_end, externT*& to_next) const;

result unshift( stateT& state, externT* to, externT* to_end, externT*& to_next) const;

result in( stateT& state, const externT* from, const externT* from_end, const externT*& from_next, internT* to, internT* to_end, internT*& to_next) const; int encoding() const noexcept; bool always_noconv() const noexcept; int length(stateT&, const externT* from, const externT* end, size_t max) const; int max_length() const noexcept; static locale::id id; protected:~codecvt(); virtual result do_out( stateT& state, const internT* from, const internT* from_end, const internT*& from_next, externT* to, externT* to_end, externT*& to_next) const; virtual result do_in( stateT& state, const externT* from, const externT* from_end, const externT*& from_next, internT* to, internT* to_end, internT*& to_next) const; virtual result do_unshift( stateT& state, externT* to, externT* to_end, externT*& to_next) const; virtual int do_encoding() const noexcept; virtual bool do_always_noconv() const noexcept; virtual int do_length(stateT&, const externT* from, const externT* end, size_t max) const; virtual int do_max_length() const noexcept; };}

1

#

The class codecvt<internT, externT, stateT> is for use when converting from one character encoding to another, such as from wide characters to multibyte characters or between wide character encodings such as UTF-32 and EUC.

2

#

The stateT argument selects the pair of character encodings being mapped between.

3

#

The specializations required in Table 91 ([locale.category]) convert the implementation-defined native character set.

codecvt<char, char, mbstate_t> implements a degenerate conversion; it does not convert at all.

codecvt<wchar_t, char, mbstate_t> converts between the native character sets for ordinary and wide characters.

Specializations on mbstate_t perform conversion between encodings known to the library implementer.

Other encodings can be converted by specializing on a program-defined stateT type.

Objects of type stateT can contain any state that is useful to communicate to or from the specialized do_in or do_out members.

28.3.4.2.5.2 Members [locale.codecvt.members]

🔗

result out( stateT& state, const internT* from, const internT* from_end, const internT*& from_next, externT* to, externT* to_end, externT*& to_next) const;

1

#

Returns: do_out(state, from, from_end, from_next, to, to_end, to_next).

🔗

result unshift(stateT& state, externT* to, externT* to_end, externT*& to_next) const;

2

#

Returns: do_unshift(state, to, to_end, to_next).

🔗

result in( stateT& state, const externT* from, const externT* from_end, const externT*& from_next, internT* to, internT* to_end, internT*& to_next) const;

3

#

Returns: do_in(state, from, from_end, from_next, to, to_end, to_next).

🔗

int encoding() const noexcept;

4

#

Returns: do_encoding().

🔗

bool always_noconv() const noexcept;

5

#

Returns: do_always_noconv().

🔗

int length(stateT& state, const externT* from, const externT* from_end, size_t max) const;

6

#

Returns: do_length(state, from, from_end, max).

🔗

int max_length() const noexcept;

7

#

Returns: do_max_length().

28.3.4.2.5.3 Virtual functions [locale.codecvt.virtuals]

🔗

`result do_out( stateT& state, const internT* from, const internT* from_end, const internT*& from_next, externT* to, externT* to_end, externT*& to_next) const;

result do_in( stateT& state, const externT* from, const externT* from_end, const externT*& from_next, internT* to, internT* to_end, internT*& to_next) const; `

1

#

Preconditions: (from <= from_end && to <= to_end) is well-defined and true;state is initialized, if at the beginning of a sequence, or else is equal to the result of converting the preceding characters in the sequence.

2

#

Effects: Translates characters in the source range [from, from_end), placing the results in sequential positions starting at destination to.

Converts no more than (from_end - from) source elements, and stores no more than (to_end - to) destination elements.

3

#

Stops if it encounters a character it cannot convert.

It always leaves the from_next and to_next pointers pointing one beyond the last element successfully converted.

If it returns noconv,internT and externT are the same type, and the converted sequence is identical to the input sequence [from, from_next),to_next is set equal to to, the value of state is unchanged, and there are no changes to the values in [to, to_end).

4

#

A codecvt facet that is used by basic_filebuf ([file.streams]) shall have the property that ifdo_out(state, from, from_end, from_next, to, to_end, to_next) would return ok, where from != from_end, thendo_out(state, from, from + 1, from_next, to, to_end, to_next) shall also return ok, and that ifdo_in(state, from, from_end, from_next, to, to_end, to_next) would return ok, where to != to_end, thendo_in(state, from, from_end, from_next, to, to + 1, to_next) shall also return ok.220

[Note 1:

As a result of operations on state, it can return ok or partial and set from_next == from and to_next != to.

— end note]

5

#

Returns: An enumeration value, as summarized in Table 93.

Table 93 — do_in/do_out result values [tab:locale.codecvt.inout]

🔗
Value
Meaning
🔗
ok
completed the conversion
🔗
partial
not all source characters converted
🔗
error
encountered a character in [from, from_end) that cannot be converted
🔗
noconv
internT and externT are the same type, and input sequence is identical to converted sequence

A return value of partial, if (from_next == from_end), indicates that either the destination sequence has not absorbed all the available destination elements, or that additional source elements are needed before another destination element can be produced.

6

#

Remarks: Its operations on state are unspecified.

[Note 2:

This argument can be used, for example, to maintain shift state, to specify conversion options (such as count only), or to identify a cache of seek offsets.

— end note]

🔗

result do_unshift(stateT& state, externT* to, externT* to_end, externT*& to_next) const;

7

#

Preconditions: (to <= to_end) is well-defined and true;state is initialized, if at the beginning of a sequence, or else is equal to the result of converting the preceding characters in the sequence.

8

#

Effects: Places characters starting at to that should be appended to terminate a sequence when the current stateT is given by state.221

Stores no more than (to_end - to) destination elements, and leaves the to_next pointer pointing one beyond the last element successfully stored.

9

#

Returns: An enumeration value, as summarized in Table 94.

Table 94 — do_unshift result values [tab:locale.codecvt.unshift]

🔗
Value
Meaning
🔗
ok
completed the sequence
🔗
partial
space for more than to_end - to destination elements was needed to terminate a sequence given the value of state
🔗
error
an unspecified error has occurred
🔗
noconv
no termination is needed for this state_type

🔗

int do_encoding() const noexcept;

10

#

Returns: -1 if the encoding of the externT sequence is state-dependent; else the constant number of externT characters needed to produce an internal character; or 0 if this number is not a constant.222

🔗

bool do_always_noconv() const noexcept;

11

#

Returns: true if do_in() and do_out() return noconv for all valid argument values.

codecvt<char, char, mbstate_t> returns true.

🔗

int do_length(stateT& state, const externT* from, const externT* from_end, size_t max) const;

12

#

Preconditions: (from <= from_end) is well-defined and true;state is initialized, if at the beginning of a sequence, or else is equal to the result of converting the preceding characters in the sequence.

13

#

Effects: The effect on the state argument is as if it called do_in(state, from, from_end, from, to, to + max, to) for to pointing to a buffer of at least max elements.

14

#

Returns: (from_next - from) wherefrom_next is the largest value in the range [from, from_end] such that the sequence of values in the range [from, from_next) representsmax or fewer valid complete characters of type internT.

The specialization codecvt<char, char, mbstate_t>, returns the lesser of max and (from_end - from).

🔗

int do_max_length() const noexcept;

15

#

Returns: The maximum value that do_length(state, from, from_end, 1) can return for any valid range [from, from_end) and stateT value state.

The specialization codecvt<char, char, mbstate_t>::do_max_length() returns 1.

220)220)

Informally, this means that basic_filebuf assumes that the mappings from internal to external characters is 1 to N: that a codecvt facet that is used by basic_filebuf can translate characters one internal character at a time.

221)221)

Typically these will be characters to return the state to stateT().

222)222)

If encoding() yields -1, then more than max_length() externT elements can be consumed when producing a single internT character, and additional externT elements can appear at the end of a sequence after those that yield the final internT character.