Tokenization customization options.
Macros | |
#define | PW_TOKENIZER_CFG_ARG_TYPES_SIZE_BYTES 4 |
#define | PW_TOKENIZER_CFG_C_HASH_LENGTH 128 |
#define | PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES 52 |
#define | PW_TOKENIZER_NESTED_PREFIX_STR "$" |
#define PW_TOKENIZER_CFG_ARG_TYPES_SIZE_BYTES 4 |
For a tokenized string with arguments, the types of the arguments are encoded in either 4 bytes (uint32_t
) or 8 bytes (uint64_t
). 4 bytes supports up to 14 tokenized string arguments; 8 bytes supports up to 29 arguments. Using 8 bytes increases code size for 32-bit machines.
Argument types are encoded two bits per argument, in little-endian order. The 4 or 6 least-significant bits, respectively, store the number of arguments, while the remaining bits encode the argument types.
#define PW_TOKENIZER_CFG_C_HASH_LENGTH 128 |
Maximum number of characters to hash in C. In C code, strings shorter than this length are treated as if they were zero-padded up to the length. Strings that are the same length and share a common prefix longer than this value hash to the same value. Increasing PW_TOKENIZER_CFG_C_HASH_LENGTH
increases the compilation time for C due to the complexity of the hashing macros.
PW_TOKENIZER_CFG_C_HASH_LENGTH
has no effect on C++ code. In C++, hashing is done with a constexpr
function instead of a macro. There are no string length limitations and compilation times are unaffected by this macro.
Only hash lengths for which there is a corresponding macro header (pw_tokenizer/internal/pw_tokenizer_65599_fixed_length_#_hash_macro.
) are supported. Additional macros may be generated with the generate_hash_macro.py
function. New macro headers must then be added to pw_tokenizer/internal/tokenize_string.h
.
This MUST match the value of DEFAULT_C_HASH_LENGTH
in pw_tokenizer/py/pw_tokenizer/tokens.py
.
#define PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES 52 |
PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES
is deprecated. It is used as the default value for pw_log_tokenized's PW_LOG_TOKENIZED_ENCODING_BUFFER_SIZE_BYTES
. This value should not be configured; set PW_LOG_TOKENIZED_ENCODING_BUFFER_SIZE_BYTES
instead.
#define PW_TOKENIZER_NESTED_PREFIX_STR "$" |
This character is used to mark the start of all tokenized messages. For consistency, it is recommended to always use $ if possible. If required, a different non-Base64 character may be used as a prefix.
A string version of the character is required for format-string-literal concatenation.