cppannotations/annotations/yo/first/unicode.yo
2017-11-16 19:37:52 +01:00

30 lines
1.2 KiB
Text

In bf(C++) string literals can be defined as NTBSs. Prepending an NTBS
by tt(L) (e.g., tt(L"hello")) defines a tt(wchar_t) string literal.
bf(C++) also supports 8, 16 and 32 bit i(Unicode) encoded
strings. Furthermore, two new data types are introduced: tt(char16_t) and
tt(char32_t) storing, respectively, a ti(UTF-16) and a ti(UTF-32) unicode
value.
A tt(char) type value fits in a tt(utf_8) unicode value. For character sets
exceeding 256 different values wider types (like tt(char16_t) or tt(char32_t))
should be used.
String literals for the various types of unicode encodings (and associated
variables) can be defined as follows:
verb(
char utf_8[] = u8"This is UTF-8 encoded.";
char16_t utf16[] = u"This is UTF-16 encoded.";
char32_t utf32[] = U"This is UTF-32 encoded.";
)
Alternatively, unicode constants may be defined using the ti(\u) escape
sequence, followed by a hexadecimal value. Depending on the type of the
unicode variable (or constant) a tt(UTF-8, UTF-16) or tt(UTF-32) value is
used. E.g.,
verb(
char utf_8[] = u8"\u2018";
char16_t utf16[] = u"\u2018";
char32_t utf32[] = U"\u2018";
)
Unicode strings can be delimited by double quotes but raw string literals
can also be used.