Guides#

Pigweed AI summary: The paragraph describes the process of integrating and using the `pw_tokenizer` library. It explains the steps required to integrate the library with a project, such as adding it to the build, using tokenization macros, and adding linker script contents. It also provides guidance on how to use `pw_tokenizer` with Zephyr, how to tokenize messages with arguments in a custom macro, and how to encode and decode Base64 messages. Additionally, it covers topics like managing token databases, working with

Getting started#

Pigweed AI summary: To integrate `pw_tokenizer` with a project, there are several steps that need to be followed. First, `pw_tokenizer` needs to be added to the build system. Then, the tokenization macros should be used in the code. Additionally, the contents of `pw_tokenizer_linker_sections.ld` should be added to the project's linker script. After compiling the code to produce an ELF file, `database.py create` should be run on the ELF file to generate a CSV

Integrating pw_tokenizer requires a few steps beyond building the code. This section describes one way pw_tokenizer might be integrated with a project. These steps can be adapted as needed.

Add pw_tokenizer to your build. Build files for GN, CMake, and Bazel are provided. For Make or other build systems, add the files specified in the BUILD.gn’s pw_tokenizer target to the build.
Use the tokenization macros in your code. See Tokenization.
Add the contents of pw_tokenizer_linker_sections.ld to your project’s linker script. In GN and CMake, this step is done automatically.
Compile your code to produce an ELF file.
Run database.py create on the ELF file to generate a CSV token database. See Managing token databases.
Commit the token database to your repository. See notes in Database management.
Integrate a database.py add command to your build to automatically update the committed token database. In GN, use the pw_tokenizer_database template to do this. See Update a database.
Integrate detokenize.py or the C++ detokenization library with your tools to decode tokenized logs. See Detokenization.

Using with Zephyr#

Pigweed AI summary: When using the pw_tokenizer library with Zephyr, there are three Kconfigs that can be used. The CONFIG_PIGWEED_TOKENIZER option will automatically link pw_tokenizer and its dependencies. The CONFIG_PIGWEED_TOKENIZER_BASE64 option will link pw_tokenizer.base64 and its dependencies. The CONFIG_PIGWEED_DETOKENIZER option will link pw_tokenizer.decoder and its dependencies. Once enabled, the tokenizer headers can be included in the code like any other Zephy

When building pw_tokenizer with Zephyr, 3 Kconfigs can be used currently:

CONFIG_PIGWEED_TOKENIZER will automatically link pw_tokenizer as well as any dependencies.
CONFIG_PIGWEED_TOKENIZER_BASE64 will automatically link pw_tokenizer.base64 as well as any dependencies.
CONFIG_PIGWEED_DETOKENIZER will automatically link pw_tokenizer.decoder as well as any dependencies.

Once enabled, the tokenizer headers can be included like any Zephyr headers:

#include <pw_tokenizer/tokenize.h>

Note

Zephyr handles the additional linker sections via pw_tokenizer_zephyr.ld which is added to the end of the linker file via a call to zephyr_linker_sources(SECTIONS ...).

Tokenization guides#

Pigweed AI summary: This paragraph provides a summary of the tokenization guides. It explains how to implement a custom tokenization macro similar to pw_log_tokenized. The example includes the EncodeTokenizedMessage function, which handles encoding and processing the message. The encoding is done using the pw::tokenizer::EncodedMessage class or the pw::tokenizer::EncodeArgs() function from pw_tokenizer/encode_args.h. The encoded message can then be transmitted or stored as needed. The paragraph also mentions the HandleTokenizedMessage function

Tokenize a message with arguments in a custom macro#

Pigweed AI summary: This paragraph discusses an example of implementing a custom tokenization macro similar to pw_log_tokenized. It includes code snippets for the EncodeTokenizedMessage function and explains how encoding and processing the message is handled. The paragraph also mentions the use of the pw::tokenizer::EncodedMessage class and pw::tokenizer::EncodeArgs() function from pw_tokenizer/encode_args.h. The HandleTokenizedMessage function is also mentioned, along with the EncodeTokenizedMessage function.

The following example implements a custom tokenization macro similar to pw_log_tokenized.

#include "pw_tokenizer/tokenize.h"

#ifndef __cplusplus
extern "C" {
#endif

void EncodeTokenizedMessage(uint32_t metadata,
                            pw_tokenizer_Token token,
                            pw_tokenizer_ArgTypes types,
                            ...);

#ifndef __cplusplus
}  // extern "C"
#endif

#define PW_LOG_TOKENIZED_ENCODE_MESSAGE(metadata, format, ...)         \
  do {                                                                 \
    PW_TOKENIZE_FORMAT_STRING(                                         \
        PW_TOKENIZER_DEFAULT_DOMAIN, UINT32_MAX, format, __VA_ARGS__); \
    EncodeTokenizedMessage(payload,                                    \
                           _pw_tokenizer_token,                        \
                           PW_TOKENIZER_ARG_TYPES(__VA_ARGS__)         \
                               PW_COMMA_ARGS(__VA_ARGS__));            \
  } while (0)

In this example, the EncodeTokenizedMessage function would handle encoding and processing the message. Encoding is done by the pw::tokenizer::EncodedMessage class or pw::tokenizer::EncodeArgs() function from pw_tokenizer/encode_args.h. The encoded message can then be transmitted or stored as needed.

#include "pw_log_tokenized/log_tokenized.h"
#include "pw_tokenizer/encode_args.h"

void HandleTokenizedMessage(pw::log_tokenized::Metadata metadata,
                            pw::span<std::byte> message);

extern "C" void EncodeTokenizedMessage(const uint32_t metadata,
                                       const pw_tokenizer_Token token,
                                       const pw_tokenizer_ArgTypes types,

Base64 guides#

Pigweed AI summary: This paragraph provides an overview of Base64 guides. It explains how to encode and decode Base64 messages using the pw::tokenizer::PrefixedBase64Encode and pw::tokenizer::PrefixedBase64Decode functions in C++ or C. It also mentions the Python Detokenizer class for decoding and detokenizing prefixed Base64 messages. The paragraph further discusses the use of the parse_message tool in the Python package to extract argument information from tokenized Base64 messages. It concludes by mentioning the det

See Base64 format for a conceptual overview of Base64.

Encoding Base64#

Pigweed AI summary: To encode with the Base64 format, you can use the function `pw::tokenizer::PrefixedBase64Encode` or `pw_tokenizer_PrefixedBase64Encode` in the tokenizer handler function. For example, the code snippet shows a function called `TokenizedMessageHandler` that takes an array of encoded messages and its size as parameters. Inside the function, the `pw::tokenizer::PrefixedBase64Encode` function is used to encode the message, and the resulting Base64

To encode with the Base64 format, add a call to pw::tokenizer::PrefixedBase64Encode or pw_tokenizer_PrefixedBase64Encode in the tokenizer handler function. For example,

void TokenizedMessageHandler(const uint8_t encoded_message[],
                             size_t size_bytes) {
  pw::InlineBasicString base64 = pw::tokenizer::PrefixedBase64Encode(
      pw::span(encoded_message, size_bytes));

  TransmitLogMessage(base64.data(), base64.size());
}

Decoding Base64#

Pigweed AI summary: The Python Detokenizer class has methods for decoding and detokenizing Base64 messages. It also supports recursive detokenization for prefixed Base64 text. The detokenizer can decode tokenized strings found in detokenized text. In C++ or C, Base64 decoding is supported with the pw::tokenizer::PrefixedBase64Decode or pw_tokenizer_PrefixedBase64Decode functions.

The Python Detokenizer class supports decoding and detokenizing prefixed Base64 messages with detokenize_base64 and related methods.

Tip

The Python detokenization tools support recursive detokenization for prefixed Base64 text. Tokenized strings found in detokenized text are detokenized, so prefixed Base64 messages can be passed as %s arguments.

For example, the tokenized string for “Wow!” is $RhYjmQ==. This could be passed as an argument to the printf-style string Nested message: %s, which encodes to $pEVTYQkkUmhZam1RPT0=. The detokenizer would decode the message as follows:

"$pEVTYQkkUmhZam1RPT0=" → "Nested message: $RhYjmQ==" → "Nested message: Wow!"

Base64 decoding is supported in C++ or C with the pw::tokenizer::PrefixedBase64Decode or pw_tokenizer_PrefixedBase64Decode functions.

Investigating undecoded messages#

Pigweed AI summary: The Python package includes a tool called "parse_message" that can parse tokenized Base64 messages without looking up the token in a database. This tool can help extract argument information from an unusable message and identify which statement in the code produced the message. However, it is not very helpful for tokenized messages without arguments. To use the tool, pass Base64 tokenized messages to "pw_tokenizer.parse_message". More usage information can be found by passing "-h" or "--help" to

Tokenized messages cannot be decoded if the token is not recognized. The Python package includes the parse_message tool, which parses tokenized Base64 messages without looking up the token in a database. This tool attempts to guess the types of the arguments and displays potential ways to decode them.

This tool can be used to extract argument information from an otherwise unusable message. It could help identify which statement in the code produced the message. This tool is not particularly helpful for tokenized messages without arguments, since all it can do is show the value of the unknown token.

The tool is executed by passing Base64 tokenized messages, with or without the $ prefix, to pw_tokenizer.parse_message. Pass -h or --help to see full usage information.

Example#

Pigweed AI summary: The given text is a log output from a program. It shows the decoding of two arguments. The first argument is decoded from the token '$329JMwA=' and it results in the binary value b'\xdfoI3\x00', the token value 0x33496fdf, and the argument value b'\x00'. The decoding is attempted with two different formats: [%s] and [%d] 0. The second argument is decoded from the token '$koSl524

$ python -m pw_tokenizer.parse_message '$329JMwA=' koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw== --specs %s %d

INF Decoding arguments for '$329JMwA='
INF Binary: b'\xdfoI3\x00' [df 6f 49 33 00] (5 bytes)
INF Token:  0x33496fdf
INF Args:   b'\x00' [00] (1 bytes)
INF Decoding with up to 8 %s or %d arguments
INF   Attempt 1: [%s]
INF   Attempt 2: [%d] 0

INF Decoding arguments for '$koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw=='
INF Binary: b'\x92\x84\xa5\xe7n\x13FAILED_PRECONDITION\x02OK' [92 84 a5 e7 6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (28 bytes)
INF Token:  0xe7a58492
INF Args:   b'n\x13FAILED_PRECONDITION\x02OK' [6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (24 bytes)
INF Decoding with up to 8 %s or %d arguments
INF   Attempt 1: [%d %s %d %d %d] 55 FAILED_PRECONDITION 1 -40 -38
INF   Attempt 2: [%d %s %s] 55 FAILED_PRECONDITION OK

Detokenizing command line utilities#

Pigweed AI summary: This paragraph mentions a reference to a section called "Detokenizing command line utilities" and provides a link to a specific command line utility. The paragraph also refers to a target with the ID "module-pw-tokenizer-masks".

See Detokenizing command line utilties.

Smaller tokens with masking#

Pigweed AI summary: The "pw_tokenizer" module allows users to use smaller tokens by providing a mask to apply to the token. This helps reduce memory usage when tokens are packed into data structures or stored in arrays. The masked token is not a masked version of the full 32-bit token, but rather the actual token itself. This makes it easy to decode tokens that use fewer than 32 bits. The module provides masking functionality through the "*_MASK" versions of the macros. Tokens are hashes, so there

pw_tokenizer uses 32-bit tokens. On 32-bit or 64-bit architectures, using fewer than 32 bits does not improve runtime or code size efficiency. However, when tokens are packed into data structures or stored in arrays, the size of the token directly affects memory usage. In those cases, every bit counts, and it may be desireable to use fewer bits for the token.

pw_tokenizer allows users to provide a mask to apply to the token. This masked token is used in both the token database and the code. The masked token is not a masked version of the full 32-bit token, the masked token is the token. This makes it trivial to decode tokens that use fewer than 32 bits.

Masking functionality is provided through the *_MASK versions of the macros. For example, the following generates 16-bit tokens and packs them into an existing value.

constexpr uint32_t token = PW_TOKENIZE_STRING_MASK("domain", 0xFFFF, "Pigweed!");
uint32_t packed_word = (other_bits << 16) | token;

Tokens are hashes, so tokens of any size have a collision risk. The fewer bits used for tokens, the more likely two strings are to hash to the same token. See Token collisions.

Masked tokens without arguments may be encoded in fewer bytes. For example, the 16-bit token 0x1234 may be encoded as two little-endian bytes (34 12) rather than four (34 12 00 00). The detokenizer tools zero-pad data smaller than four bytes. Tokens with arguments must always be encoded as four bytes.

Keep tokens from different sources separate with tokenization domains#

Pigweed AI summary: The pw_tokenizer module supports multiple tokenization domains, which are string labels associated with each tokenized string. This allows projects to keep tokens from different sources separate. Some potential use cases include keeping large sets of tokenized strings separate to avoid collisions and creating a separate database for a small number of strings that use truncated tokens. If no domain is specified, the domain is empty. The database and detokenization command line tools default to reading from the default domain, but the domain can be specified for

pw_tokenizer supports having multiple tokenization domains. Domains are a string label associated with each tokenized string. This allows projects to keep tokens from different sources separate. Potential use cases include the following:

Keep large sets of tokenized strings separate to avoid collisions.
Create a separate database for a small number of strings that use truncated tokens, for example only 10 or 16 bits instead of the full 32 bits.

If no domain is specified, the domain is empty (""). For many projects, this default domain is sufficient, so no additional configuration is required.

// Tokenizes this string to the default ("") domain.
PW_TOKENIZE_STRING("Hello, world!");

// Tokenizes this string to the "my_custom_domain" domain.
PW_TOKENIZE_STRING_DOMAIN("my_custom_domain", "Hello, world!");

The database and detokenization command line tools default to reading from the default domain. The domain may be specified for ELF files by appending #DOMAIN_NAME to the file path. Use #.* to read from all domains. For example, the following reads strings in some_domain from my_image.elf.

./database.py create --database my_db.csv path/to/my_image.elf#some_domain

See Managing token databases for information about the database.py command line tool.

Managing token databases#

Pigweed AI summary: Token databases are managed using the "database.py" script, which can extract tokens from compilation artifacts and manage database files. An example ELF file with tokenized logs is provided for experimentation. The "create" command can be used to make a new token database from various file types, with CSV and binary formats supported. The "update" command adds new tokens to the database, and a CSV token database can be checked into a source repository and updated as code changes are made. Token databases can also be

Background: Token databases

Token databases are managed with the database.py script. This script can be used to extract tokens from compilation artifacts and manage database files. Invoke database.py with -h for full usage information.

An example ELF file with tokenized logs is provided at pw_tokenizer/py/example_binary_with_tokenized_strings.elf. You can use that file to experiment with the database.py commands.

Create a database#

Pigweed AI summary: The "create" command allows users to create a new token database from various file formats such as ELF files, archives, existing token databases, or a JSON file. The command syntax is "./database.py create --database DATABASE_NAME ELF_OR_DATABASE_FILE...". The command supports two output formats: CSV and binary. By default, it generates a CSV database, but users can specify "--type binary" to create a binary database instead. CSV databases are useful for source control and human review, while binary databases

The create command makes a new token database from ELF files (.elf, .o, .so, etc.), archives (.a), existing token databases (CSV or binary), or a JSON file containing an array of strings.

./database.py create --database DATABASE_NAME ELF_OR_DATABASE_FILE...

Two database output formats are supported: CSV and binary. Provide --type binary to create to generate a binary database instead of the default CSV. CSV databases are great for checking into a source control or for human review. Binary databases are more compact and simpler to parse. The C++ detokenizer library only supports binary databases currently.

Update a database#

Pigweed AI summary: To update a database, the "add" command is used to add new tokenized strings. This command can be executed by running "./database.py add --database DATABASE_NAME ELF_OR_DATABASE_FILE...". The command adds new tokens from ELF files or other databases to the database. If the tokens already exist in the database, the date removed is updated to the latest. A CSV token database can be stored in a source repository and updated as code changes are made. The build system can invoke "database.py

As new tokenized strings are added, update the database with the add command.

./database.py add --database DATABASE_NAME ELF_OR_DATABASE_FILE...

This command adds new tokens from ELF files or other databases to the database. Adding tokens already present in the database updates the date removed, if any, to the latest.

A CSV token database can be checked into a source repository and updated as code changes are made. The build system can invoke database.py to update the database after each build.

GN integration#

Pigweed AI summary: This section discusses how token databases can be updated or created as part of a GN build. The pw_tokenizer_database template provided by $dir_pw_tokenizer/database.gni can automatically update an in-source tokenized strings database or create a new database with artifacts from one or more GN targets or other database files. To create a new database, set the create variable to the desired database type (csv or binary). The database will be created in the output directory. To update an existing database, provide the

Token databases may be updated or created as part of a GN build. The pw_tokenizer_database template provided by $dir_pw_tokenizer/database.gni automatically updates an in-source tokenized strings database or creates a new database with artifacts from one or more GN targets or other database files.

To create a new database, set the create variable to the desired database type ("csv" or "binary"). The database will be created in the output directory. To update an existing database, provide the path to the database with the database variable.

import("//build_overrides/pigweed.gni")

import("$dir_pw_tokenizer/database.gni")

pw_tokenizer_database("my_database") {
  database = "database_in_the_source_tree.csv"
  targets = [ "//firmware/image:foo(//targets/my_board:some_toolchain)" ]
  input_databases = [ "other_database.csv" ]
}

Instead of specifying GN targets, paths or globs to output files may be provided with the paths option.

pw_tokenizer_database("my_database") {
  database = "database_in_the_source_tree.csv"
  deps = [ ":apps" ]
  optional_paths = [ "$root_build_dir/**/*.elf" ]
}

Note

The paths and optional_targets arguments do not add anything to deps, so there is no guarantee that the referenced artifacts will exist when the database is updated. Provide targets or deps or build other GN targets first if this is a concern.

CMake integration#

Pigweed AI summary: This paragraph discusses the integration of CMake with token databases. It mentions that token databases can be updated or created during a CMake build. The provided template, pw_tokenizer_database, can automatically update an existing tokenized strings database or create a new one using artifacts from a CMake target. To create a new database, the CREATE variable needs to be set to either "csv" or "binary", and the database will be created in the output directory. To update an existing database, the path

Token databases may be updated or created as part of a CMake build. The pw_tokenizer_database template provided by $dir_pw_tokenizer/database.cmake automatically updates an in-source tokenized strings database or creates a new database with artifacts from a CMake target.

To create a new database, set the CREATE variable to the desired database type ("csv" or "binary"). The database will be created in the output directory.

include("$dir_pw_tokenizer/database.cmake")

pw_tokenizer_database("my_database") {
  CREATE binary
  TARGET my_target.ext
  DEPS ${deps_list}
}

To update an existing database, provide the path to the database with the database variable.

pw_tokenizer_database("my_database") {
  DATABASE database_in_the_source_tree.csv
  TARGET my_target.ext
  DEPS ${deps_list}
}

Working with token collisions#

Pigweed AI summary: This paragraph provides instructions on how to work with token collisions. It suggests running a command to view information about token databases and any collisions. If collisions are found, it recommends changing one of the colliding strings slightly to create a new token. It also mentions that in C, artificial collisions may occur with long strings and suggests adjusting a configuration setting to resolve this. Additionally, it advises using the "mark_removed" command to mark missing strings as removed and using the "purge" command to delete tokens

See Token collisions for a conceptual overview of token collisions.

Collisions may occur occasionally. Run the command python -m pw_tokenizer.database report <database> to see information about a token database, including any collisions.

If there are collisions, take the following steps to resolve them.

Change one of the colliding strings slightly to give it a new token.
In C (not C++), artificial collisions may occur if strings longer than PW_TOKENIZER_CFG_C_HASH_LENGTH are hashed. If this is happening, consider setting PW_TOKENIZER_CFG_C_HASH_LENGTH to a larger value. See pw_tokenizer/public/pw_tokenizer/config.h.
Run the mark_removed command with the latest version of the build artifacts to mark missing strings as removed. This deprioritizes them in collision resolution.
```
python -m pw_tokenizer.database mark_removed --database <database> <ELF files>
```
The purge command may be used to delete these tokens from the database.

Detokenization guides#

Pigweed AI summary: Detokenization guides provide instructions on how to detokenize in different programming languages such as Python, C++, and TypeScript. In Python, the detokenization process involves importing the Detokenizer class from the pw_tokenizer package and using it to detokenize log messages. The pw_tokenizer package also offers the AutoUpdatingDetokenizer class, which automatically reloads database files when they change. For messages that are optionally tokenized and encoded, the pw_tokenizer.proto.decode_optionally_token

See Detokenization for a conceptual overview of detokenization.

Python#

Pigweed AI summary: To detokenize in Python, you can import the Detokenizer class from the pw_tokenizer package and instantiate it with the paths to token databases or ELF files. The code example provided shows how to use the Detokenizer to process a log message and log the result. Additionally, the pw_tokenizer package offers the AutoUpdatingDetokenizer class, which automatically reloads database files when they change, making it useful for long-running tools. The package also provides a function called decode_optionally

To detokenize in Python, import Detokenizer from the pw_tokenizer package, and instantiate it with paths to token databases or ELF files.

import pw_tokenizer

detokenizer = pw_tokenizer.Detokenizer('path/to/database.csv', 'other/path.elf')

def process_log_message(log_message):
    result = detokenizer.detokenize(log_message.payload)
    self._log(str(result))

The pw_tokenizer package also provides the AutoUpdatingDetokenizer class, which can be used in place of the standard Detokenizer. This class monitors database files for changes and automatically reloads them when they change. This is helpful for long-running tools that use detokenization. The class also supports token domains for the given database files in the <path>#<domain> format.

For messages that are optionally tokenized and may be encoded as binary, Base64, or plaintext UTF-8, use pw_tokenizer.proto.decode_optionally_tokenized(). This will attempt to determine the correct method to detokenize and always provide a printable string. For more information on this feature, see Tokenized fields in protocol buffers.

C++#

Pigweed AI summary: The C++ detokenization libraries can be used in C++ or any language that can call into C++ with a C-linkage wrapper, such as Java or Rust. The library uses binary-format token databases and can read them from a file or include them in the source code. The TokenDatabase class verifies the validity of the data before using it, and if it is invalid, it returns an empty database for which ok() returns false. A reference Java Native Interface (JNI) implementation is provided

The C++ detokenization libraries can be used in C++ or any language that can call into C++ with a C-linkage wrapper, such as Java or Rust. A reference Java Native Interface (JNI) implementation is provided.

The C++ detokenization library uses binary-format token databases (created with database.py create --type binary). Read a binary format database from a file or include it in the source code. Pass the database array to TokenDatabase::Create, and construct a detokenizer.

Detokenizer detokenizer(TokenDatabase::Create(token_database_array));

std::string ProcessLog(span<uint8_t> log_data) {
  return detokenizer.Detokenize(log_data).BestString();
}

The TokenDatabase class verifies that its data is valid before using it. If it is invalid, the TokenDatabase::Create returns an empty database for which ok() returns false. If the token database is included in the source code, this check can be done at compile time.

// This line fails to compile with a static_assert if the database is invalid.
constexpr TokenDatabase kDefaultDatabase =  TokenDatabase::Create<kData>();

Detokenizer OpenDatabase(std::string_view path) {
  std::vector<uint8_t> data = ReadWholeFile(path);

  TokenDatabase database = TokenDatabase::Create(data);

  // This checks if the file contained a valid database. It is safe to use a
  // TokenDatabase that failed to load (it will be empty), but it may be
  // desirable to provide a default database or otherwise handle the error.
  if (database.ok()) {
    return Detokenizer(database);
  }
  return Detokenizer(kDefaultDatabase);
}

TypeScript#

Pigweed AI summary: To detokenize in TypeScript, one can import the Detokenizer from the pigweedjs package and instantiate it with a CSV token database. For messages encoded in Base64, one can use Detokenizer::detokenizeBase64, which can also detokenize nested Base64 tokens. There is also detokenizeUint8Array that works like detokenize but expects Uint8Array instead of a Frame argument.

To detokenize in TypeScript, import Detokenizer from the pigweedjs package, and instantiate it with a CSV token database.

import { pw_tokenizer, pw_hdlc } from 'pigweedjs';
const { Detokenizer } = pw_tokenizer;
const { Frame } = pw_hdlc;

const detokenizer = new Detokenizer(String(tokenCsv));

function processLog(frame: Frame){
  const result = detokenizer.detokenize(frame);
  console.log(result);
}

For messages that are encoded in Base64, use Detokenizer::detokenizeBase64. detokenizeBase64 will also attempt to detokenize nested Base64 tokens. There is also detokenizeUint8Array that works just like detokenize but expects Uint8Array instead of a Frame argument.

Protocol buffers#

Pigweed AI summary: The "pw_tokenizer" library offers tools for managing tokenized fields in protocol buffers. For more information on tokenized fields in protocol buffers, refer to the "Tokenized fields in protocol buffers" section in the design documentation.

pw_tokenizer provides utilities for handling tokenized fields in protobufs. See Tokenized fields in protocol buffers for details.

pw_tokenizer#

Cut your log sizes in half

Guides#

Getting started#

Using with Zephyr#

Tokenization guides#

Tokenize a message with arguments in a custom macro#

Base64 guides#

Encoding Base64#

Decoding Base64#

Investigating undecoded messages#

Example#

Detokenizing command line utilities#

Smaller tokens with masking#

Keep tokens from different sources separate with tokenization domains#

Managing token databases#

Create a database#

Update a database#

GN integration#

CMake integration#

Working with token collisions#

Detokenization guides#

Python#

C++#

TypeScript#

Protocol buffers#