Pigweed AI summary: The pw_tokenizer module helps developers compress strings to reduce the size of logs by more than 75%. It replaces printf-style strings with binary tokens during compilation, resulting in extensive logging with less memory usage. The module is designed to integrate easily into existing logging systems and can be used to tokenize any strings. Tokenizing strings offers several benefits, including reducing binary size, I/O traffic, RAM, and flash usage, as well as removing potentially sensitive information from binaries. The module is not related to parsing
Stable C11 C++14 Python TypeScript Code Size Impact: 50% reduction in binary log size
Logging is critical, but developers are often forced to choose between
additional logging or saving crucial flash space. The pw_tokenizer
module
helps address this by replacing printf-style strings with binary tokens during
compilation. This enables extensive logging with substantially less memory
usage.
Note
This usage of the term “tokenizer” is not related to parsing! The module is called tokenizer because it replaces a whole string literal with an integer token. It does not parse strings into separate tokens.
The most common application of pw_tokenizer
is binary logging, and it is
designed to integrate easily into existing logging systems. However, the
tokenizer is general purpose and can be used to tokenize any strings, with or
without printf-style arguments.
Why tokenize strings?
Dramatically reduce binary size by removing string literals from binaries.
Reduce I/O traffic, RAM, and flash usage by sending and storing compact tokens instead of strings. We’ve seen over 50% reduction in encoded log contents.
Reduce CPU usage by replacing snprintf calls with simple tokenization code.
Remove potentially sensitive log, assert, and other strings from binaries.
See Design for a more detailed explanation
of how pw_tokenizer
works.
Example: tokenized logging#
Pigweed AI summary: This paragraph provides an example of using the "pw_tokenizer" module for tokenized logging. It explains that tokenized logging can significantly reduce the binary and encoded size of log messages. The example shows the size comparison between plain text logging and tokenized logging, demonstrating the reduction in size. It also includes a table that illustrates the size difference at different stages of logging. The paragraph concludes by mentioning a "toctree-wrapper" compound that contains links to other sections related to the "pw_tokenizer
This example demonstrates using pw_tokenizer
for logging. In this example,
tokenized logging saves ~90% in binary size (41 → 4 bytes) and 70% in encoded
size (49 → 15 bytes).
Before: plain text logging
Location |
Logging Content |
Size in bytes |
---|---|---|
Source contains |
|
|
Binary contains |
|
41 |
(log statement is called with
|
||
Device transmits |
|
49 |
When viewed |
|
After: tokenized logging
Location |
Logging Content |
Size in bytes |
||||||
---|---|---|---|---|---|---|---|---|
Source contains |
|
|||||||
Binary contains |
|
4 |
||||||
(log statement is called with
|
||||||||
Device transmits |
|
15 |
||||||
When viewed |
|