Oveview

Tokenization customization options.

Macros
#define	PW_TOKENIZER_CFG_ARG_TYPES_SIZE_BYTES 4

#define	PW_TOKENIZER_CFG_C_HASH_LENGTH 128

#define	PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES 52

#define	PW_TOKENIZER_NESTED_PREFIX_STR "$"

Macro Definition Documentation

◆ PW_TOKENIZER_CFG_ARG_TYPES_SIZE_BYTES

#define PW_TOKENIZER_CFG_ARG_TYPES_SIZE_BYTES 4

For a tokenized string with arguments, the types of the arguments are encoded in either 4 bytes (uint32_t) or 8 bytes (uint64_t). 4 bytes supports up to 14 tokenized string arguments; 8 bytes supports up to 29 arguments. Using 8 bytes increases code size for 32-bit machines.

Argument types are encoded two bits per argument, in little-endian order. The 4 or 6 least-significant bits, respectively, store the number of arguments, while the remaining bits encode the argument types.

◆ PW_TOKENIZER_CFG_C_HASH_LENGTH

#define PW_TOKENIZER_CFG_C_HASH_LENGTH 128

Maximum number of characters to hash in C. In C code, strings shorter than this length are treated as if they were zero-padded up to the length. Strings that are the same length and share a common prefix longer than this value hash to the same value. Increasing PW_TOKENIZER_CFG_C_HASH_LENGTH increases the compilation time for C due to the complexity of the hashing macros.

PW_TOKENIZER_CFG_C_HASH_LENGTH has no effect on C++ code. In C++, hashing is done with a constexpr function instead of a macro. There are no string length limitations and compilation times are unaffected by this macro.

Only hash lengths for which there is a corresponding macro header (pw_tokenizer/internal/pw_tokenizer_65599_fixed_length_#_hash_macro.) are supported. Additional macros may be generated with the generate_hash_macro.py function. New macro headers must then be added to pw_tokenizer/internal/tokenize_string.h.

This MUST match the value of DEFAULT_C_HASH_LENGTH in pw_tokenizer/py/pw_tokenizer/tokens.py.

◆ PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES

#define PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES 52

PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES is deprecated. It is used as the default value for pw_log_tokenized's PW_LOG_TOKENIZED_ENCODING_BUFFER_SIZE_BYTES. This value should not be configured; set PW_LOG_TOKENIZED_ENCODING_BUFFER_SIZE_BYTES instead.

◆ PW_TOKENIZER_NESTED_PREFIX_STR

#define PW_TOKENIZER_NESTED_PREFIX_STR "$"

This character is used to mark the start of all tokenized messages. For consistency, it is recommended to always use $ if possible. If required, a different non-Base64 character may be used as a prefix.