Files
2025-10-25 03:02:53 +03:00

5.8 KiB
Raw Permalink Blame History

[basic.extended.fp]

6 Basics [basic]

6.9 Types [basic.types]

6.9.3 Optional extended floating-point types [basic.extended.fp]

1

#

If the implementation supports an extended floating-point type ([basic.fundamental]) whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary16, then the typedef-name std::float16_t is declared in the header and names such a type, the macro STDCPP_FLOAT16_T is defined ([cpp.predefined]), and the floating-point literal suffixes f16 and F16 are supported ([lex.fcon]).

2

#

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary32, then the typedef-name std::float32_t is declared in the header and names such a type, the macro STDCPP_FLOAT32_T is defined, and the floating-point literal suffixes f32 and F32 are supported.

3

#

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary64, then the typedef-name std::float64_t is declared in the header and names such a type, the macro STDCPP_FLOAT64_T is defined, and the floating-point literal suffixes f64 and F64 are supported.

4

#

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary128, then the typedef-name std::float128_t is declared in the header and names such a type, the macro STDCPP_FLOAT128_T is defined, and the floating-point literal suffixes f128 and F128 are supported.

5

#

If the implementation supports an extended floating-point type with the properties, as specified by ISO/IEC 60559, of radix (b) of 2, storage width in bits (k) of 16, precision in bits (p) of 8, maximum exponent (emax) of 127, and exponent field width in bits (w) of 8, then the typedef-name std::bfloat16_t is declared in the header and names such a type, the macro STDCPP_BFLOAT16_T is defined, and the floating-point literal suffixes bf16 and BF16 are supported.

6

#

[Note 1:

A summary of the parameters for each type is given in Table 15.

The precision p includes the implicit 1 bit at the beginning of the significand, so the storage used for the significand is p−1 bits.

ISO/IEC 60559 does not assign a name for a type having the parameters specified for std::bfloat16_t.

— end note]

Table 15 — Properties of named extended floating-point types [tab:basic.extended.fp]

🔗
Parameter
float16_t float32_t float64_t float128_t bfloat16_t
🔗
ISO/IEC 60559 name
binary16 binary32 binary64 binary128
🔗
k, storage width in bits
16 32 64 128 16
🔗
p, precision in bits
11 24 53 113 8
🔗
emax, maximum exponent
15 127 1023 16383 127
🔗
w, exponent field width in bits
5 8 11 15 8

7

#

Recommended practice: Any names that the implementation provides for the extended floating-point types described in this subsection that are in addition to the names declared in the header should be chosen to increase compatibility and interoperability with the interchange types_Float16, _Float32, _Float64, and _Float128 defined in ISO/IEC TS 18661-3 and with future versions of ISO/IEC 9899.