API reference#
Compatibility#
Pigweed AI summary: The paragraph states that the compatibility of a certain module includes C11, C++14, and Python 3.
C11
C++14
Python 3
Tokenization#
Tokenization converts a string literal to a token. If it’s a printf-style string, its arguments are encoded along with it. The results of tokenization can be sent off device or stored in place of a full string.
-
typedef uint32_t pw_tokenizer_Token#
The type of the 32-bit token used in place of a string. Also available as
pw::tokenizer::Token
.
Tokenization macros#
Adding tokenization to a project is simple. To tokenize a string, include
pw_tokenizer/tokenize.h
and invoke one of the PW_TOKENIZE_
macros.
Tokenize a string literal#
pw_tokenizer
provides macros for tokenizing string literals with no
arguments.
-
PW_TOKENIZE_STRING(string_literal)#
Converts a string literal to a
pw_tokenizer_Token
(uint32_t
) token in a standalone statement. C and C++ compatible. In C++, the string may be a literal or a constexpr char array, including function variables like__func__
. In C, the argument must be a string literal. In either case, the string must be null terminated, but may contain any characters (including ‘\0’).constexpr uint32_t token = PW_TOKENIZE_STRING("Any string literal!");
-
PW_TOKENIZE_STRING_DOMAIN(domain, string_literal)#
Tokenizes a string literal in a standalone statement using the specified domain . C and C++ compatible.
-
PW_TOKENIZE_STRING_MASK(domain, mask, string_literal)#
Tokenizes a string literal in a standalone statement using the specified domain and bit mask . C and C++ compatible.
The tokenization macros above cannot be used inside other expressions.
Yes: Assign PW_TOKENIZE_STRING
to a constexpr
variable.
constexpr uint32_t kGlobalToken = PW_TOKENIZE_STRING("Wowee Zowee!");
void Function() {
constexpr uint32_t local_token = PW_TOKENIZE_STRING("Wowee Zowee?");
}
No: Use PW_TOKENIZE_STRING
in another expression.
void BadExample() {
ProcessToken(PW_TOKENIZE_STRING("This won't compile!"));
}
Use PW_TOKENIZE_STRING_EXPR
instead.
An alternate set of macros are provided for use inside expressions. These make
use of lambda functions, so while they can be used inside expressions, they
require C++ and cannot be assigned to constexpr variables or be used with
special function variables like __func__
.
-
PW_TOKENIZE_STRING_EXPR(string_literal)#
Converts a string literal to a
uint32_t
token within an expression. Requires C++.DoSomething(PW_TOKENIZE_STRING_EXPR("Succeed"));
-
PW_TOKENIZE_STRING_DOMAIN_EXPR(domain, string_literal)#
Tokenizes a string literal using the specified domain within an expression. Requires C++.
-
PW_TOKENIZE_STRING_MASK_EXPR(domain, mask, string_literal)#
Tokenizes a string literal using the specified domain and bit mask within an expression. Requires C++.
When to use these macros
Use PW_TOKENIZE_STRING
and related macros to tokenize string
literals that do not need %-style arguments encoded.
Yes: Use PW_TOKENIZE_STRING_EXPR
within other expressions.
void GoodExample() {
ProcessToken(PW_TOKENIZE_STRING_EXPR("This will compile!"));
}
No: Assign PW_TOKENIZE_STRING_EXPR
to a constexpr
variable.
constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR("This won't compile!"));
Instead, use PW_TOKENIZE_STRING
to assign to a constexpr
variable.
No: Tokenize __func__
in PW_TOKENIZE_STRING_EXPR
.
void BadExample() {
// This compiles, but __func__ will not be the outer function's name, and
// there may be compiler warnings.
constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR(__func__);
}
Instead, use PW_TOKENIZE_STRING
to tokenize __func__
or similar macros.
Tokenize a message with arguments to a buffer#
-
PW_TOKENIZE_TO_BUFFER(buffer, buffer_size_pointer, format, ...)#
Encodes a tokenized string and arguments to the provided buffer. The size of the buffer is passed via a pointer to a
size_t
. After encoding is complete, thesize_t
is set to the number of bytes written to the buffer.The macro’s arguments are equivalent to the following function signature:
TokenizeToBuffer(void* buffer, size_t* buffer_size_pointer, const char* format, ...); // printf-style arguments
For example, the following encodes a tokenized string with a temperature to a buffer. The buffer is passed to a function to send the message over a UART.
uint8_t buffer[32]; size_t size_bytes = sizeof(buffer); PW_TOKENIZE_TO_BUFFER( buffer, &size_bytes, "Temperature (C): %0.2f", temperature_c); MyProject_EnqueueMessageForUart(buffer, size);
While
PW_TOKENIZE_TO_BUFFER
is very flexible, it must be passed a buffer, which increases its code size footprint at the call site.
-
PW_TOKENIZE_TO_BUFFER_DOMAIN(domain, buffer, buffer_size_pointer, format, ...)#
Same as
PW_TOKENIZE_TO_BUFFER
, but tokenizes to the specified domain .
-
PW_TOKENIZE_TO_BUFFER_MASK(domain, mask, buffer, buffer_size_pointer, format, ...)#
Same as
PW_TOKENIZE_TO_BUFFER_DOMAIN
, but applies a bit mask to the token.
Why use this macro
Encode a tokenized message for consumption within a function.
Encode a tokenized message into an existing buffer.
Avoid using PW_TOKENIZE_TO_BUFFER
in widely expanded macros, such as a
logging macro, because it will result in larger code size than passing the
tokenized data to a function.
Tokenize a message with arguments in a custom macro#
Projects can leverage the tokenization machinery in whichever way best suits
their needs. The most efficient way to use pw_tokenizer
is to pass tokenized
data to a global handler function. A project’s custom tokenization macro can
handle tokenized data in a function of their choosing.
pw_tokenizer
provides two low-level macros for projects to use
to create custom tokenization macros.
-
PW_TOKENIZE_FORMAT_STRING(domain, mask, format, ...)#
Tokenizes a format string with optional arguments and sets the
_pw_tokenizer_token
variable to the token. Must be used in its own scope, since the same variable is used in every invocation.The tokenized string uses the specified tokenization domain . Use
PW_TOKENIZER_DEFAULT_DOMAIN
for the default. The token also may be masked; useUINT32_MAX
to keep all bits.This macro checks that the printf-style format string matches the arguments, stores the format string in a special section, and calculates the string’s token at compile time.
-
PW_TOKENIZER_ARG_TYPES(...)#
Converts a series of arguments to a compact format that replaces the format string literal. Evaluates to a
pw_tokenizer_ArgTypes
value.Depending on the size of
pw_tokenizer_ArgTypes
, the bottom 4 or 6 bits store the number of arguments and the remaining bits store the types, two bits per type. The arguments are not evaluated; only their types are used.
The outputs of these macros are typically passed to an encoding function. That
function encodes the token, argument types, and argument data to a buffer using
helpers provided by pw_tokenizer/encode_args.h
.
-
size_t pw::tokenizer::EncodeArgs(pw_tokenizer_ArgTypes types, va_list args, span<std::byte> output)#
Encodes a tokenized string’s arguments to a buffer. The
pw_tokenizer_ArgTypes
parameter specifies the argument types, in place of a format string.Most tokenization implementations should use the
EncodedMessage
class.
-
template<size_t kMaxSizeBytes = PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES>
class EncodedMessage# Encodes a tokenized message to a fixed size buffer. By default, the buffer size is set by the
PW_TOKENIZER_CFG_ENCODING_BUFFER_SIZE_BYTES
config macro. This class is used to encode tokenized messages passed in from tokenization macros.To use
pw::tokenizer::EncodedMessage
, construct it with the token, argument types, andva_list
from the variadic arguments:void SendLogMessage(span<std::byte> log_data); extern "C" void TokenizeToSendLogMessage(pw_tokenizer_Token token, pw_tokenizer_ArgTypes types, ...) { va_list args; va_start(args, types); EncodedMessage encoded_message(token, types, args); va_end(args); SendLogMessage(encoded_message); // EncodedMessage converts to span }
-
size_t pw_tokenizer_EncodeArgs(pw_tokenizer_ArgTypes types, va_list args, void *output_buffer, size_t output_buffer_size)#
C function that encodes arguments to a tokenized buffer. Use the
pw::tokenizer::EncodeArgs()
function from C++.
Tokenizing function names#
The string literal tokenization functions support tokenizing string literals or
constexpr character arrays (constexpr const char[]
). In GCC and Clang, the
special __func__
variable and __PRETTY_FUNCTION__
extension are declared
as static constexpr char[]
in C++ instead of the standard static const
char[]
. This means that __func__
and __PRETTY_FUNCTION__
can be
tokenized while compiling C++ with GCC or Clang.
// Tokenize the special function name variables.
constexpr uint32_t function = PW_TOKENIZE_STRING(__func__);
constexpr uint32_t pretty_function = PW_TOKENIZE_STRING(__PRETTY_FUNCTION__);
Note that __func__
and __PRETTY_FUNCTION__
are not string literals.
They are defined as static character arrays, so they cannot be implicitly
concatentated with string literals. For example, printf(__func__ ": %d",
123);
will not compile.
Buffer sizing helper#
-
template<typename ...ArgTypes>
constexpr size_t pw::tokenizer::MinEncodingBufferSizeBytes()# Calculates the minimum buffer size to allocate that is guaranteed to support encoding the specified arguments.
The contents of strings are NOT included in this total. The string’s length/status byte is guaranteed to fit, but the string contents may be truncated. Encoding is considered to succeed as long as the string’s length/status byte is written, even if the actual string is truncated.
Examples:
Message with no arguments:
MinEncodingBufferSizeBytes() == 4
Message with an int argument
MinEncodingBufferSizeBytes<int>() == 9 (4 + 5)
Tokenization in Python#
The Python pw_tokenizer.encode
module has limited support for encoding
tokenized messages with the encode_token_and_args
function.
- pw_tokenizer.encode.encode_token_and_args(token: int, *args: Union[int, float, bytes, str]) bytes #
Encodes a tokenized message given its token and arguments.
This function assumes that the token represents a format string with conversion specifiers that correspond with the provided argument types. Currently, only 32-bit integers are supported.
This function requires a string’s token is already calculated. Typically these tokens are provided by a database, but they can be manually created using the tokenizer hash.
- pw_tokenizer.tokens.pw_tokenizer_65599_hash(string: Union[str, bytes], *, hash_length: Optional[int] = None) int #
Hashes the string with the hash function used to generate tokens in C++.
This hash function is used calculate tokens from strings in Python. It is not used when extracting tokens from an ELF, since the token is stored in the ELF as part of tokenization.
This is particularly useful for offline token database generation in cases where tokenized strings in a binary cannot be embedded as parsable pw_tokenizer entries.
Note
In C, the hash length of a string has a fixed limit controlled by
PW_TOKENIZER_CFG_C_HASH_LENGTH
. To match tokens produced by C (as opposed
to C++) code, pw_tokenizer_65599_hash()
should be called with a matching
hash length limit. When creating an offline database, it’s a good idea to
generate tokens for both, and merge the databases.
Protobuf tokenization library#
The pw_tokenizer.proto
Python module defines functions that may be used to
detokenize protobuf objects in Python. The function
pw_tokenizer.proto.detokenize_fields()
detokenizes all fields annotated as
tokenized, replacing them with their detokenized version. For example:
my_detokenizer = pw_tokenizer.Detokenizer(some_database)
my_message = SomeMessage(tokenized_field=b'$YS1EMQ==')
pw_tokenizer.proto.detokenize_fields(my_detokenizer, my_message)
assert my_message.tokenized_field == b'The detokenized string! Cool!'
pw_tokenizer.proto#
Utilities for working with tokenized fields in protobufs.
- pw_tokenizer.proto.decode_optionally_tokenized(detokenizer: Detokenizer, data: bytes, prefix: str = '$') str #
Decodes data that may be plain text or binary / Base64 tokenized text.
- pw_tokenizer.proto.detokenize_fields(detokenizer: Detokenizer, proto: Message, prefix: str = '$') None #
Detokenizes fields annotated as tokenized in the given proto.
The fields are replaced with their detokenized version in the proto. Tokenized fields are bytes fields, so the detokenized string is stored as bytes. Call .decode() to convert the detokenized string from bytes to str.