Files
2025-10-25 03:02:53 +03:00

151 lines
7.8 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[character.seq]
# 16 Library introduction [[library]](./#library)
## 16.3 Method of description [[description]](description#character.seq)
### 16.3.3 Other conventions [[conventions]](conventions#character.seq)
#### 16.3.3.3 Type descriptions [[type.descriptions]](type.descriptions#character.seq)
#### 16.3.3.3.4 Character sequences [character.seq]
#### [16.3.3.3.4.1](#general) General [[character.seq.general]](character.seq.general)
[1](#general-1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L720)
The C standard library makes widespread useof characters and character sequences that follow a few uniform conventions:
- [(1.1)](#general-1.1)
Properties specified as [*locale-specific*](#def:locale-specific "16.3.3.3.4.1General[character.seq.general]") may change during program execution
by a call to setlocale(int, const char*) ([[clocale.syn]](clocale.syn "28.3.5.1Header <clocale> synopsis")), or
by a change to a locale object,
as described in [[locales]](locales "28.3.3Locales") and [[input.output]](input.output "31Input/output library")[.](#general-1.1.sentence-1)
- [(1.2)](#general-1.2)
The [*execution character set*](#def:character_set,execution "16.3.3.3.4.1General[character.seq.general]") and
the [*execution wide-character set*](#def:wide-character_set,execution "16.3.3.3.4.1General[character.seq.general]") are supersets of the basic literal character set ([[lex.charset]](lex.charset "5.3.1Character sets"))[.](#general-1.2.sentence-1)
The encodings of the execution character sets and
the sets of additional elements (if any) are locale-specific[.](#general-1.2.sentence-2)
Each element of the execution wide-character set is encoded as
a single code unit representable by a value of type wchar_t[.](#general-1.2.sentence-3)
[*Note [1](#general-note-1)*:
The encodings of the execution character sets can be unrelated
to any literal encoding[.](#general-1.2.sentence-4)
— *end note*]
- [(1.3)](#general-1.3)
A [*letter*](#def:letter "16.3.3.3.4.1General[character.seq.general]") is any of the 26 lowercase or 26uppercase letters in the basic character set[.](#general-1.3.sentence-1)
- [(1.4)](#general-1.4)
The[*decimal-point character*](#def:character,decimal-point "16.3.3.3.4.1General[character.seq.general]") is the locale-specific
(single-byte) character used by functions that convert between a (single-byte)
character sequence and a value of one of the floating-point types[.](#general-1.4.sentence-1)
It is used
in the character sequence to denote the beginning of a fractional part[.](#general-1.4.sentence-2)
It is
represented in [[support]](support "17Language support library") through [[exec]](exec "33Execution control library") and [[depr]](depr "Annex D(normative)Compatibility features") by a period,'.',
which is
also its value in the "C" locale[.](#general-1.4.sentence-3)
- [(1.5)](#general-1.5)
A[*character sequence*](#def:character_sequence "16.3.3.3.4.1General[character.seq.general]") is an [array object](dcl.array "9.3.4.5Arrays[dcl.array]") *A* that
can be declared as*T A*[*N*],
where *T* is any of the typeschar,unsigned char,
orsigned char ([[basic.fundamental]](basic.fundamental "6.9.2Fundamental types")), optionally qualified by any combination ofconst orvolatile[.](#general-1.5.sentence-1)
The initial elements of the
array have defined contents up to and including an element determined by some
predicate[.](#general-1.5.sentence-2)
A character sequence can be designated by a pointer value*S* that points to its first element[.](#general-1.5.sentence-3)
#### [16.3.3.3.4.2](#byte.strings) Byte strings [[byte.strings]](byte.strings)
[1](#byte.strings-1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L788)
A [*null-terminated byte string*](#def:ntbs "16.3.3.3.4.2Byte strings[byte.strings]"),
or ntbs,
is a character sequence whose highest-addressed element
with defined content has the value zero
(the [*terminating null character*](#def:character,terminating_null "16.3.3.3.4.2Byte strings[byte.strings]"));
no other element in the sequence has the value zero[.](#byte.strings-1.sentence-1)[139](#footnote-139 "Many of the objects manipulated by function signatures declared in <cstring> are character sequences or ntbss. The size of some of these character sequences is limited by a length value, maintained separately from the character sequence.")
[2](#byte.strings-2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L803)
The [*length of an ntbs*](#def:ntbs,length "16.3.3.3.4.2Byte strings[byte.strings]") is the number of elements that
precede the terminating null character[.](#byte.strings-2.sentence-1)
An [*empty ntbs*](#def:ntbs,empty "16.3.3.3.4.2Byte strings[byte.strings]") has a length of zero[.](#byte.strings-2.sentence-2)
[3](#byte.strings-3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L810)
The [*value of an ntbs*](#def:ntbs,value "16.3.3.3.4.2Byte strings[byte.strings]") is the sequence of values of the
elements up to and including the terminating null character[.](#byte.strings-3.sentence-1)
[4](#byte.strings-4)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L815)
A [*static ntbs*](#def:ntbs,static "16.3.3.3.4.2Byte strings[byte.strings]") is an ntbs with
static storage duration[.](#byte.strings-4.sentence-1)[140](#footnote-140 "A string-literal, such as &quot;abc&quot;, is a static ntbs.")
[139)](#footnote-139)[139)](#footnoteref-139)
Many of the objects manipulated by
function signatures declared in[<cstring>](cstring.syn#header:%3ccstring%3e "27.5.1Header <cstring> synopsis[cstring.syn]") are character sequences or ntbss[.](#footnote-139.sentence-1)
The size of some of these character sequences is limited by
a length value, maintained separately from the character sequence[.](#footnote-139.sentence-2)
[140)](#footnote-140)[140)](#footnoteref-140)
A [*string-literal*](lex.string#nt:string-literal "5.13.5String literals[lex.string]"), such as"abc",
is a static ntbs[.](#footnote-140.sentence-1)
#### [16.3.3.3.4.3](#multibyte.strings) Multibyte strings [[multibyte.strings]](multibyte.strings)
[1](#multibyte.strings-1)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L827)
A [*multibyte character*](#def:character,multibyte "16.3.3.3.4.3Multibyte strings[multibyte.strings]") is
a sequence of one or more bytes representing the
code unit sequence for an encoded character of the
execution character set[.](#multibyte.strings-1.sentence-1)
[2](#multibyte.strings-2)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L834)
A [*null-terminated multibyte string*](#def:ntmbs "16.3.3.3.4.3Multibyte strings[multibyte.strings]"),
or ntmbs,
is an ntbs that constitutes a
sequence of valid multibyte characters, beginning and ending in the initial
shift state[.](#multibyte.strings-2.sentence-1)[141](#footnote-141 "An ntbs that contains characters only from the basic literal character set is also an ntmbs. Each multibyte character then consists of a single byte.")
[3](#multibyte.strings-3)
[#](http://github.com/Eelis/draft/tree/9adde4bc1c62ec234483e63ea3b70a59724c745a/source/lib-intro.tex#L847)
A [*static ntmbs*](#def:ntmbs,static "16.3.3.3.4.3Multibyte strings[multibyte.strings]") is an ntmbs with static storage duration[.](#multibyte.strings-3.sentence-1)
[141)](#footnote-141)[141)](#footnoteref-141)
An ntbs that contains characters only from the
basic literal character set is also an ntmbs[.](#footnote-141.sentence-1)
Each multibyte character then
consists of a single byte[.](#footnote-141.sentence-2)