Design Discussion#
Pigweed AI summary: The design of pw_snapshot was based on several key requirements, including the use of a pre-established file format, incremental writing, extensibility, and compactness. The use of Protobufs was chosen due to their wide support across languages and platforms, and the ability to reduce file size compared to other formats like JSON. The pw_snapshot proto format takes advantage of "extensions" from Proto 2 without explicitly relying on the feature, and certain ranges are reserved to ensure flexibility for both Pigweed and
There were a handful of key requirements going into the design of pw_snapshot:
Pre-established file format - Building and maintaining tooling to support parsing binary snapshot data is a high maintenance burden that detracts from the appeal of a pre-existing widely known/supported format.
Incremental writing - Needing to build an entire snapshot before committing it as a finished file is a big limitation on embedded devices where RAM is often very constrained. It is important that a snapshot can be built in smaller in-memory segments that can be committed incrementally to a larger sink (e.g. UART, off-chip flash).
Extensible - Pigweed doesn’t know everything users might want to capture in a snapshot. It’s important that users have ways to include their own information into snapshots with minimal friction.
Relatively compact - It’s important that snapshots can contain useful information even when they are limited to a few hundred bytes in size.
Why Proto?#
Pigweed AI summary: The article explains why Protobufs are a popular choice for encoding and decoding data across different languages and platforms. Using Protobufs reduces the need for maintaining custom or unstructured formats and results in smaller file sizes compared to using JSON. While Protobuf messages cannot be written without intermediate buffers, large messages can be incrementally written as long as there is enough buffer space for encoding the largest sub-message.
Protobufs are widely used and supported across many languages and platforms. This greatly reduces the encode/decode tooling maintenance introduced by using custom or unstructured formats. While using a format like JSON provides similarly wide tooling support, encoding the same information as a proto significantly reduces the final file size.
While protobuffer messages aren’t truly streamable (i.e. can be written without any intermediate buffers) due to how message nesting works, a large message can be incrementally written as long as there’s enough buffer space for encoding the largest single sub-message in the proto.
Why overlay multiple protos?#
Pigweed AI summary: The article discusses the use of multiple protos in serialization and how Proto 2 supported this feature through "extensions". Although Proto 3 removed this feature, it still allows the serialization of two overlayed protos to the same data stream. The article recommends using an "Any" proto instead of extensions to eliminate collisions in proto messages, but this introduces unacceptable overhead. The pw_snapshot's proto format takes advantage of extensions from Proto 2 without relying on the feature explicitly. Certain ranges are reserved to ensure
Proto 2 supported a feature called “extensions” that explicitly allowed this behavior. While proto 3 removed this feature, it doesn’t disallow the old behavior of serializing two ‘overlayed’ protos to the same data stream. Proto 3 recommends using an “Any” proto instead of extensions, as it is more explicit and eliminates the issue of collisions in proto messages. Unfortunately, proto ‘Any’ messages introduce unacceptable overhead. For a single integer that would encode to a few bytes using extensions, an Any submessage quickly expands to tens of bytes.
pw_snapshot’s proto format takes advantage of “extensions” from proto 2 without explicitly relying on the feature. To reduce the risk of colissions and maximize encoding efficiency, certain ranges are reserved to allow Pigweed to grow while ensuring downstream customers have equivalent flexibility when using the Snapshot proto format.
Why no file header?#
Pigweed AI summary: The article explains why there is no file header in pw_snapshot. It is assumed that any serialized snapshot already tracks its size and checksum, and the focus is on treating it as raw serialized proto data rather than introducing a container format.
Right now it’s assumed that anything that is storing or transferring a serialized snapshot implicitly tracks its size (and a checksum, if desired). While a container format might be introduced independently, pw_snapshot focuses on treating an encoded snapshot as raw serialized proto data.