Basic helpers for reading and writing UTF-8-encoded strings.
Classes | |
class | pw::utf::CodePointAndSize |
class | pw::utf8::EncodedCodePoint |
Encapsulates the result of encoding a single code point as UTF-8. More... | |
Functions | |
constexpr bool | pw::utf::IsValidCodepoint (uint32_t code_point) |
constexpr bool | pw::utf::IsValidCharacter (uint32_t code_point) |
constexpr pw::Result< utf::CodePointAndSize > | pw::utf8::ReadCodePoint (std::string_view str) |
Reads the first code point from a UTF-8 encoded str . | |
constexpr bool | pw::utf8::IsStringValid (std::string_view str) |
Determines if str is a valid UTF-8 string. | |
constexpr Result< EncodedCodePoint > | pw::utf8::EncodeCodePoint (uint32_t code_point) |
Encodes a single code point as UTF-8. | |
Status | pw::utf8::WriteCodePoint (uint32_t code_point, pw::StringBuilder &output) |
Helper that writes a code point to the provided pw::StringBuilder . | |
|
constexpr |
Encodes a single code point as UTF-8.
UTF-8 encodes as 1-4 bytes from a range of [0, 0x10FFFF]
.
1-byte encoding has a top bit of zero:
N-bytes sequences are denoted by annotating the top N+1 bits of the leading byte and then using a 2-bit continuation marker on the following bytes.
embed:rst:leading-asterisk * * .. pw-status-codes:: * * OK: The codepoint encoded as UTF-8. * * OUT_OF_RANGE: The code point was not in the valid range for UTF-8 * encoding. * *
|
inlineconstexpr |
Checks if the code point is a valid character.
Excludes non-characters (U+FDD0..U+FDEF
, and all codepoints ending in 0xFFFE
or 0xFFFF
) from the set of valid code points.
|
inlineconstexpr |
Checks if the code point is in a valid range.
Excludes the surrogate code points ([0xD800, 0xDFFF]
) and codepoints larger than 0x10FFFF
(the highest codepoint allowed). Non-characters and unassigned codepoints are allowed.
|
constexpr |
Reads the first code point from a UTF-8 encoded str
.
This is a very basic decoder without much thought for performance and very basic validation that the correct number of bytes are available and that each byte of a multibyte sequence has a continuation character. See pw::utf8::EncodeCharacter()
for encoding details.
embed:rst:leading-asterisk * * .. pw-status-codes:: * * OK: The decoded code point and the number of bytes read. * * INVALID_ARGUMENT: The string was empty or malformed. * * OUT_OF_RANGE: The decoded code point was not in the valid range. * *