Basic helpers for reading and writing UTF-8-encoded strings.
Classes | |
class | pw::utf::CodePointAndSize |
class | pw::utf8::EncodedCodePoint |
Encapsulates the result of encoding a single code point as UTF-8. More... | |
Functions | |
constexpr bool | pw::utf::IsValidCodepoint (uint32_t code_point) |
constexpr bool | pw::utf::IsValidCharacter (uint32_t code_point) |
constexpr pw::Result< utf::CodePointAndSize > | pw::utf8::ReadCodePoint (std::string_view str) |
Reads the first code point from a UTF-8 encoded str . | |
constexpr bool | pw::utf8::IsStringValid (std::string_view str) |
Determines if str is a valid UTF-8 string. | |
constexpr Result< EncodedCodePoint > | pw::utf8::EncodeCodePoint (uint32_t code_point) |
Encodes a single code point as UTF-8. | |
Status | pw::utf8::WriteCodePoint (uint32_t code_point, pw::StringBuilder &output) |
Helper that writes a code point to the provided pw::StringBuilder . | |
|
constexpr |
Encodes a single code point as UTF-8.
UTF-8 encodes as 1-4 bytes from a range of [0, 0x10FFFF]
.
1-byte encoding has a top bit of zero:
N-bytes sequences are denoted by annotating the top N+1 bits of the leading byte and then using a 2-bit continuation marker on the following bytes.
|
inlineconstexpr |
Checks if the code point is a valid character.
Excludes non-characters (U+FDD0..U+FDEF
, and all codepoints ending in 0xFFFE
or 0xFFFF
) from the set of valid code points.
|
inlineconstexpr |
Checks if the code point is in a valid range.
Excludes the surrogate code points ([0xD800, 0xDFFF]
) and codepoints larger than 0x10FFFF
(the highest codepoint allowed). Non-characters and unassigned codepoints are allowed.
|
constexpr |
Reads the first code point from a UTF-8 encoded str
.
This is a very basic decoder without much thought for performance and very basic validation that the correct number of bytes are available and that each byte of a multibyte sequence has a continuation character. See pw::utf8::EncodeCharacter()
for encoding details.