pw_tokenizer#
Compress strings to shrink logs by +75%
Stable C++ C11 Python Rust TypeScript Java Code Size Impact: 50% reduction in log size
Logging is critical, but developers are often forced to choose between
additional logging or saving crucial flash space. The pw_tokenizer
module
enables extensive logging with substantially less memory usage by replacing
printf-style strings with binary tokens during compilation. It is designed to
integrate easily into existing logging systems.
Although the most common application of pw_tokenizer
is binary logging,
the tokenizer is general purpose and can be used to tokenize any strings,
with or without printf-style arguments.
Why tokenize strings?
Dramatically reduce binary size by removing string literals from binaries.
Reduce I/O traffic, RAM, and flash usage by sending and storing compact tokens instead of strings. We’ve seen over 50% reduction in encoded log contents.
Reduce CPU usage by replacing snprintf calls with simple tokenization code.
Remove potentially sensitive log, assert, and other strings from binaries.
Tokenized logging in action#
Here’s an example of how pw_tokenizer
enables you to store
and send the same logging information using significantly less
resources:
A quick overview of how the tokenized version works:
You tokenize
"Battery Voltage: %d mV"
with a macro likePW_TOKENIZE_STRING
. You can use pw_log_tokenized to handle the tokenization automatically.After tokenization,
"Battery Voltage: %d mV"
becomesd9 28 47 8e
.The first 4 bytes sent over the wire is the tokenized version of
"Battery Voltage: %d mV"
. The last 2 bytes are the value ofvoltage
converted to a varint using pw_varint.The logs are converted back to the original, human-readable message via the Detokenization API and a token database.