pw_metric#

Attention

This module is not yet production ready; ask us if you are interested in using it out or have ideas about how to improve it.

Overview#

Pigweed AI summary: Pigweed's metric module is a lightweight manual instrumentation system for tracking system health metrics like counts or set values. It supports tokenized names, a tree structure, per object collection, global registration, and has a simple design. The module is useful for offloading metrics, getting metrics in a consistent format, providing a simple and reliable way for developers to collect metrics, and providing low-overhead visibility during early system boot. The article includes examples of instrumenting a single object and instrumenting a legacy

Pigweed’s metric module is a lightweight manual instrumentation system for tracking system health metrics like counts or set values. For example, pw_metric could help with tracking the number of I2C bus writes, or the number of times a buffer was filled before it could drain in time, or safely incrementing counters from ISRs.

Key features of pw_metric:

Tokenized names - Names are tokenized using the pw_tokenizer enabling long metric names that don’t bloat your binary.
Tree structure - Metrics can form a tree, enabling grouping of related metrics for clearer organization.
Per object collection - Metrics and groups can live on object instances and be flexibly combined with metrics from other instances.
Global registration - For legacy code bases or just because it’s easier, pw_metric supports automatic aggregation of metrics. This is optional but convenient in many cases.
Simple design - There are only two core data structures: Metric and Group, which are both simple to understand and use. The only type of metric supported is uint32_t and float. This module does not support complicated aggregations like running average or min/max.

Example: Instrumenting a single object#

Pigweed AI summary: This section provides an example of how to instrument a class with a metric group and metrics. The code shows how to increment counters for attempts and successes, and how to access the metrics group. The output format for the metrics subsystem is not standardized, but a JSON dump is provided as an example. Each instance of the class will have unique counters.

The below example illustrates what instrumenting a class with a metric group and metrics might look like. In this case, the object’s MySubsystem::metrics() member is not globally registered; the user is on their own for combining this subsystem’s metrics with others.

#include "pw_metric/metric.h"

class MySubsystem {
 public:
  void DoSomething() {
    attempts_.Increment();
    if (ActionSucceeds()) {
      successes_.Increment();
    }
  }
  Group& metrics() { return metrics_; }

 private:
  PW_METRIC_GROUP(metrics_, "my_subsystem");
  PW_METRIC(metrics_, attempts_, "attempts", 0u);
  PW_METRIC(metrics_, successes_, "successes", 0u);
};

The metrics subsystem has no canonical output format at this time, but a JSON dump might look something like this:

{
  "my_subsystem" : {
    "successes" : 1000,
    "attempts" : 1200,
  }
}

In this case, every instance of MySubsystem will have unique counters.

Example: Instrumenting a legacy codebase#

Pigweed AI summary: The article discusses how to instrument legacy code for debugging in embedded development. It suggests using a global mechanism to register metrics instead of dependency injection. The example shows how to add metrics to legacy code by defining them and incrementing them, which will be available globally through the pw::metric::global_metrics object.

A common situation in embedded development is debugging legacy code or code which is hard to change; where it is perhaps impossible to plumb metrics objects around with dependency injection. The alternative to plumbing metrics is to register the metrics through a global mechanism. pw_metric supports this use case. For example:

Before instrumenting:

// This code was passed down from generations of developers before; no one
// knows what it does or how it works. But it needs to be fixed!
void OldCodeThatDoesntWorkButWeDontKnowWhy() {
  if (some_variable) {
    DoSomething();
  } else {
    DoSomethingElse();
  }
}

After instrumenting:

#include "pw_metric/global.h"
#include "pw_metric/metric.h"

PW_METRIC_GLOBAL(legacy_do_something, "legacy_do_something");
PW_METRIC_GLOBAL(legacy_do_something_else, "legacy_do_something_else");

// This code was passed down from generations of developers before; no one
// knows what it does or how it works. But it needs to be fixed!
void OldCodeThatDoesntWorkButWeDontKnowWhy() {
  if (some_variable) {
    legacy_do_something.Increment();
    DoSomething();
  } else {
    legacy_do_something_else.Increment();
    DoSomethingElse();
  }
}

In this case, the developer merely had to add the metrics header, define some metrics, and then start incrementing them. These metrics will be available globally through the pw::metric::global_metrics object defined in pw_metric/global.h.

Why not just use simple counter variables?#

Pigweed AI summary: The article explains why it is not sufficient to use simple counter variables instead of leveraging a metric library. The reasons include offloading metrics, consistent formatting, uncoordinated collection, and pre-boot or interrupt visibility. The metric library provides a simple and reliable way for developers to collect metrics for their subsystems without having to coordinate to offload. Additionally, metrics can be used to understand what is happening during early system boot when not all system facilities are up.

One might wonder what the point of leveraging a metric library is when it is trivial to make some global variables and print them out. There are a few reasons:

Metrics offload - To make it easy to get metrics off-device by sharing the infrastructure for offloading.
Consistent format - To get the metrics in a consistent format (e.g. protobuf or JSON) for analysis
Uncoordinated collection - To provide a simple and reliable way for developers on a team to all collect metrics for their subsystems, without having to coordinate to offload. This could extend to code in libraries written by other teams.
Pre-boot or interrupt visibility - Some of the most challenging bugs come from early system boot when not all system facilities are up (e.g. logging or UART). In those cases, metrics provide a low-overhead approach to understand what is happening. During early boot, metrics can be incremented, then after boot dumping the metrics provides insights into what happened. While basic counter variables can work in these contexts too, one still has to deal with the offloading problem; which the library handles.

Metrics API reference#

The metrics API consists of just a few components:

The core data structures pw::metric::Metric and pw::metric::Group
The macros for scoped metrics and groups PW_METRIC and PW_METRIC_GROUP
The macros for globally registered metrics and groups PW_METRIC_GLOBAL and PW_METRIC_GROUP_GLOBAL
The global groups and metrics list: pw::metric::global_groups and pw::metric::global_metrics.

Metric#

The pw::metric::Metric provides:

A 31-bit tokenized name
A 1-bit discriminator for int or float
A 32-bit payload (int or float)
A 32-bit next pointer (intrusive list)

The metric object is 12 bytes on 32-bit platforms.

class pw::metric::Metric#

Increment(uint32_t amount = 0)#: Increment the metric by the given amount. Results in undefined behaviour if the metric is not of type int.

Set(uint32_t value)#: Set the metric to the given value. Results in undefined behaviour if the metric is not of type int.

Set(float value)#: Set the metric to the given value. Results in undefined behaviour if the metric is not of type float.

Group#

The pw::metric::Group object is simply:

A name for the group
A list of children groups
A list of leaf metrics groups
A 32-bit next pointer (intrusive list)

The group object is 16 bytes on 32-bit platforms.

class pw::metric::Group#

Dump(int indent_level = 0)#

Recursively dump a metrics group to pw_log. Produces output like:

"$6doqFw==": {
  "$05OCZw==": {
    "$VpPfzg==": 1,
    "$LGPMBQ==": 1.000000,
    "$+iJvUg==": 5,
  }
  "$9hPNxw==": 65,
  "$oK7HmA==": 13,
  "$FCM4qQ==": 0,
}

Note the metric names are tokenized with base64. Decoding requires using the Pigweed detokenizer. With a detokenizing-enabled logger, you could get something like:

"i2c_1": {
  "gyro": {
    "num_sampleses": 1,
    "init_time_us": 1.000000,
    "initialized": 5,
  }
  "bus_errors": 65,
  "transactions": 13,
  "bytes_sent": 0,
}

Macros#

The macros are the primary mechanism for creating metrics, and should be used instead of directly constructing metrics or groups. The macros handle tokenizing the metric and group names.

PW_METRIC(identifier, name, value)#

PW_METRIC(group, identifier, name, value)#

PW_METRIC_STATIC(identifier, name, value)#

PW_METRIC_STATIC(group, identifier, name, value)#

Declare a metric, optionally adding it to a group.

identifier - An identifier name for the created variable or member. For example: i2c_transactions might be used as a local or global metric; inside a class, could be named according to members (i2c_transactions_ for Google’s C++ style).
name - The string name for the metric. This will be tokenized. There are no restrictions on the contents of the name; however, consider restricting these to be valid C++ identifiers to ease integration with other systems.
value - The initial value for the metric. Must be either a floating point value (e.g. 3.2f) or unsigned int (e.g. 21u).
group - A pw::metric::Group instance. If provided, the metric is added to the given group.

The macro declares a variable or member named “name” with type pw::metric::Metric, and works in three contexts: global, local, and member.

If the _STATIC variant is used, the macro declares a variable with static storage. These can be used in function scopes, but not in classes.

At global scope:

PW_METRIC(foo, "foo", 15.5f);

void MyFunc() {
  foo.Increment();
}

At local function or member function scope:

void MyFunc() {
  PW_METRIC(foo, "foo", 15.5f);
  foo.Increment();
  // foo goes out of scope here; be careful!
}

At member level inside a class or struct:

struct MyStructy {
  void DoSomething() {
    somethings.Increment();
  }
  // Every instance of MyStructy will have a separate somethings counter.
  PW_METRIC(somethings, "somethings", 0u);
}

You can also put a metric into a group with the macro. Metrics can belong to strictly one group, otherwise an assertion will fail. Example:

PW_METRIC_GROUP(my_group, "my_group");
PW_METRIC(my_group, foo, "foo", 0.2f);
PW_METRIC(my_group, bar, "bar", 44000u);
PW_METRIC(my_group, zap, "zap", 3.14f);

Tip

If you want a globally registered metric, see pw_metric/global.h; in that contexts, metrics are globally registered without the need to centrally register in a single place.

PW_METRIC_GROUP(identifier, name)#

PW_METRIC_GROUP(parent_group, identifier, name)#

PW_METRIC_GROUP_STATIC(identifier, name)#

PW_METRIC_GROUP_STATIC(parent_group, identifier, name)#

Declares a pw::metric::Group with name name; the name is tokenized. Works similar to PW_METRIC and can be used in the same contexts (global, local, and member). Optionally, the group can be added to a parent group.

If the _STATIC variant is used, the macro declares a variable with static storage. These can be used in function scopes, but not in classes.

Example:

PW_METRIC_GROUP(my_group, "my_group");
PW_METRIC(my_group, foo, "foo", 0.2f);
PW_METRIC(my_group, bar, "bar", 44000u);
PW_METRIC(my_group, zap, "zap", 3.14f);

PW_METRIC_GLOBAL(identifier, name, value)#

Declare a pw::metric::Metric with name name, and register it in the global metrics list pw::metric::global_metrics.

Example:

#include "pw_metric/metric.h"
#include "pw_metric/global.h"

// No need to coordinate collection of foo and bar; they're autoregistered.
PW_METRIC_GLOBAL(foo, "foo", 0.2f);
PW_METRIC_GLOBAL(bar, "bar", 44000u);

Note that metrics defined with PW_METRIC_GLOBAL should never be added to groups defined with PW_METRIC_GROUP_GLOBAL. Each metric can only belong to one group, and metrics defined with PW_METRIC_GLOBAL are pre-registered with the global metrics list.

Attention

Do not create PW_METRIC_GLOBAL instances anywhere other than global scope. Putting these on an instance (member context) would lead to dangling pointers and misery. Metrics are never deleted or unregistered!

PW_METRIC_GROUP_GLOBAL(identifier, name, value)#

Declare a pw::metric::Group with name name, and register it in the global metric groups list pw::metric::global_groups.

Note that metrics created with PW_METRIC_GLOBAL should never be added to groups! Instead, just create a freestanding metric and register it into the global group (like in the example below).

Example:

#include "pw_metric/metric.h"
#include "pw_metric/global.h"

// No need to coordinate collection of this group; it's globally registered.
PW_METRIC_GROUP_GLOBAL(leagcy_system, "legacy_system");
PW_METRIC(leagcy_system, foo, "foo",0.2f);
PW_METRIC(leagcy_system, bar, "bar",44000u);

Attention

Do not create PW_METRIC_GROUP_GLOBAL instances anywhere other than global scope. Putting these on an instance (member context) would lead to dangling pointers and misery. Metrics are never deleted or unregistered!

Usage & Best Practices#

Pigweed AI summary: The "Usage & Best Practices" section discusses tradeoffs made in the library to enable low memory use per-metric, and highlights the need for care in constructing metric trees. It suggests using the Init() pattern for static objects with metrics to avoid issues with parent metrics not being constructed. The section also notes that metric member order matters in objects and that groups should be declared before metrics. The library has no built-in synchronization for manipulating the tree structure, so users must synchronize access to metrics. Metrics are

This library makes several tradeoffs to enable low memory use per-metric, and one of those tradeoffs results in requiring care in constructing the metric trees.

Use the Init() pattern for static objects with metrics#

Pigweed AI summary: The article discusses the use of the Init() pattern for static objects with metrics in embedded systems. It explains the common pattern of allocating many objects globally and reducing reliance on dynamic allocation, leading to a pattern where rich/large objects are statically constructed at global scope. The article then highlights a problem that may arise when adding metrics to such objects and suggests using the Init() pattern for static objects, where references to dependencies may only be stored during construction, but no methods on the dependencies are called. The article

A common pattern in embedded systems is to allocate many objects globally, and reduce reliance on dynamic allocation (or eschew malloc entirely). This leads to a pattern where rich/large objects are statically constructed at global scope, then interacted with via tasks or threads. For example, consider a hypothetical global Uart object:

class Uart {
 public:
  Uart(span<std::byte> rx_buffer, span<std::byte> tx_buffer)
    : rx_buffer_(rx_buffer), tx_buffer_(tx_buffer) {}

  // Send/receive here...

 private:
  pw::span<std::byte> rx_buffer;
  pw::span<std::byte> tx_buffer;
};

std::array<std::byte, 512> uart_rx_buffer;
std::array<std::byte, 512> uart_tx_buffer;
Uart uart1(uart_rx_buffer, uart_tx_buffer);

Through the course of building a product, the team may want to add metrics to the UART to for example gain insight into which operations are triggering lots of data transfer. When adding metrics to the above imaginary UART object, one might consider the following approach:

class Uart {
 public:
  Uart(span<std::byte> rx_buffer,
       span<std::byte> tx_buffer,
       Group& parent_metrics)
    : rx_buffer_(rx_buffer),
      tx_buffer_(tx_buffer) {
      // PROBLEM! parent_metrics may not be constructed if it's a reference
      // to a static global.
      parent_metrics.Add(tx_bytes_);
      parent_metrics.Add(rx_bytes_);
   }

  // Send/receive here which increment tx/rx_bytes.

 private:
  pw::span<std::byte> rx_buffer;
  pw::span<std::byte> tx_buffer;

  PW_METRIC(tx_bytes_, "tx_bytes", 0);
  PW_METRIC(rx_bytes_, "rx_bytes", 0);
};

PW_METRIC_GROUP(global_metrics, "/");
PW_METRIC_GROUP(global_metrics, uart1_metrics, "uart1");

std::array<std::byte, 512> uart_rx_buffer;
std::array<std::byte, 512> uart_tx_buffer;
Uart uart1(uart_rx_buffer,
           uart_tx_buffer,
           uart1_metrics);

However, this is incorrect, since the parent_metrics (pointing to uart1_metrics in this case) may not be constructed at the point of uart1 getting constructed. Thankfully in the case of pw_metric this will result in an assertion failure (or it will work correctly if the constructors are called in a favorable order), so the problem will not go unnoticed. Instead, consider using the Init() pattern for static objects, where references to dependencies may only be stored during construction, but no methods on the dependencies are called.

Instead, the Init() approach separates global object construction into two phases: The constructor where references are stored, and a Init() function which is called after all static constructors have run. This approach works correctly, even when the objects are allocated globally:

class Uart {
 public:
  // Note that metrics is not passed in here at all.
  Uart(span<std::byte> rx_buffer,
       span<std::byte> tx_buffer)
    : rx_buffer_(rx_buffer),
      tx_buffer_(tx_buffer) {}

   // Precondition: parent_metrics is already constructed.
   void Init(Group& parent_metrics) {
      parent_metrics.Add(tx_bytes_);
      parent_metrics.Add(rx_bytes_);
   }

  // Send/receive here which increment tx/rx_bytes.

 private:
  pw::span<std::byte> rx_buffer;
  pw::span<std::byte> tx_buffer;

  PW_METRIC(tx_bytes_, "tx_bytes", 0);
  PW_METRIC(rx_bytes_, "rx_bytes", 0);
};

PW_METRIC_GROUP(root_metrics, "/");
PW_METRIC_GROUP(root_metrics, uart1_metrics, "uart1");

std::array<std::byte, 512> uart_rx_buffer;
std::array<std::byte, 512> uart_tx_buffer;
Uart uart1(uart_rx_buffer,
           uart_tx_buffer);

void main() {
  // uart1_metrics is guaranteed to be initialized by this point, so it is
  safe to pass it to Init().
  uart1.Init(uart1_metrics);
}

Attention

Be extra careful about static global metric registration. Consider using the Init() pattern.

Metric member order matters in objects#

Pigweed AI summary: The order of declaring in-class groups and metrics matters if the metrics are within a group declared inside the class. The group must be declared before the metrics, otherwise it will result in a compile error. It is recommended to put groups before metrics when declaring metrics members inside classes.

The order of declaring in-class groups and metrics matters if the metrics are within a group declared inside the class. For example, the following class will work fine:

#include "pw_metric/metric.h"

class PowerSubsystem {
 public:
   Group& metrics() { return metrics_; }
   const Group& metrics() const { return metrics_; }

 private:
  PW_METRIC_GROUP(metrics_, "power");  // Note metrics_ declared first.
  PW_METRIC(metrics_, foo, "foo", 0.2f);
  PW_METRIC(metrics_, bar, "bar", 44000u);
};

but the following one will not since the group is constructed after the metrics (and will result in a compile error):

#include "pw_metric/metric.h"

class PowerSubsystem {
 public:
   Group& metrics() { return metrics_; }
   const Group& metrics() const { return metrics_; }

 private:
  PW_METRIC(metrics_, foo, "foo", 0.2f);
  PW_METRIC(metrics_, bar, "bar", 44000u);
  PW_METRIC_GROUP(metrics_, "power");  // Error: metrics_ must be first.
};

Attention

Put groups before metrics when declaring metrics members inside classes.

Thread safety#

Pigweed AI summary: The pw_metric tool lacks built-in synchronization for manipulating the tree structure, so users must rely on shared global mutex or single-threaded metric construction. Individual metrics have atomic functions that don't require separate synchronization and can be used from ISRs, but access to metrics must be synchronized externally. The tool does not internally synchronize access during construction. Metric Set/Increment are safe.

pw_metric has no built-in synchronization for manipulating the tree structure. Users are expected to either rely on shared global mutex when constructing the metric tree, or do the metric construction in a single thread (e.g. a boot/init thread). The same applies for destruction, though we do not advise destructing metrics or groups.

Individual metrics have atomic Increment(), Set(), and the value accessors as_float() and as_int() which don’t require separate synchronization, and can be used from ISRs.

Attention

You must synchronize access to metrics. pw_metrics does not internally synchronize access during construction. Metric Set/Increment are safe.

Lifecycle#

Pigweed AI summary: The article discusses the lifecycle of metric objects and advises against destructing them as they are designed to live for the lifetime of the program or application. The article also explains that dynamic creation/destruction of metrics is not covered by pw_metric and provides an example of incorrect usage. It warns against destructing metrics and emphasizes that they are designed to be registered/structured upfront and manipulated during a device's active phase.

Metric objects are not designed to be destructed, and are expected to live for the lifetime of the program or application. If you need dynamic creation/destruction of metrics, pw_metric does not attempt to cover that use case. Instead, pw_metric covers the case of products with two execution phases:

A boot phase where the metric tree is created.
A run phase where metrics are collected. The tree structure is fixed.

Technically, it is possible to destruct metrics provided care is taken to remove the given metric (or group) from the list it’s contained in. However, there are no helper functions for this, so be careful.

Below is an example that is incorrect. Don’t do what follows!

#include "pw_metric/metric.h"

void main() {
  PW_METRIC_GROUP(root, "/");
  {
    // BAD! The metrics have a different lifetime than the group.
    PW_METRIC(root, temperature, "temperature_f", 72.3f);
    PW_METRIC(root, humidity, "humidity_relative_percent", 33.2f);
  }
  // OOPS! root now has a linked list that points to the destructed
  // "humidity" object.
}

Attention

Don’t destruct metrics. Metrics are designed to be registered / structured upfront, then manipulated during a device’s active phase. They do not support destruction.

Exporting metrics#

Pigweed AI summary: The article discusses the importance of exporting metrics for analysis and debugging. It introduces the optional RPC service libraries offered by pw_metric that enable exporting a user-supplied set of on-device metrics via RPC. The article explains how to fetch the metrics by calling the MetricService.Get RPC method, which streams all registered metrics to the caller in batches. The returned metric objects have flattened paths to the root. The article also provides instructions on how to expose a MetricService in an application and register the service with an RPC

Collecting metrics on a device is not useful without a mechanism to export those metrics for analysis and debugging. pw_metric offers optional RPC service libraries (:metric_service_nanopb based on nanopb, and :metric_service_pwpb based on pw_protobuf) that enable exporting a user-supplied set of on-device metrics via RPC. This facility is intended to function from the early stages of device bringup through production in the field.

The metrics are fetched by calling the MetricService.Get RPC method, which streams all registered metrics to the caller in batches (server streaming RPC). Batching the returned metrics avoids requiring a large buffer or large RPC MTU.

The returned metric objects have flattened paths to the root. For example, the returned metrics (post detokenization and jsonified) might look something like:

{
  "/i2c1/failed_txns": 17,
  "/i2c1/total_txns": 2013,
  "/i2c1/gyro/resets": 24,
  "/i2c1/gyro/hangs": 1,
  "/spi1/thermocouple/reads": 242,
  "/spi1/thermocouple/temp_celsius": 34.52,
}

Note that there is no nesting of the groups; the nesting is implied from the path.

RPC service setup#

Pigweed AI summary: This section provides instructions on how to expose a MetricService in an application, including defining metrics, creating an instance of MetricService, and registering the service with an RPC server. It also includes an example code snippet. The section emphasizes the importance of appropriate access control when exporting metrics and warns that calls to MetricService::Get are blocking and could be problematic in some cases. The section mentions plans to offer an async version of MetricService.

To expose a MetricService in your application, do the following:

Define metrics around the system, and put them in a group or list of metrics. Easy choices include for example the global_groups and global_metrics variables; or creat your own.
Create an instance of pw::metric::MetricService.
Register the service with your RPC server.

For example:

#include "pw_rpc/server.h"
#include "pw_metric/metric.h"
#include "pw_metric/global.h"
#include "pw_metric/metric_service_nanopb.h"

// Note: You must customize the RPC server setup; see pw_rpc.
Channel channels[] = {
 Channel::Create<1>(&uart_output),
};
Server server(channels);

// Metric service instance, pointing to the global metric objects.
// This could also point to custom per-product or application objects.
pw::metric::MetricService metric_service(
    pw::metric::global_metrics,
    pw::metric::global_groups);

void RegisterServices() {
  server.RegisterService(metric_service);
  // Register other services here.
}

void main() {
  // ... system initialization ...

  RegisterServices();

  // ... start your applcation ...
}

Attention

Take care when exporting metrics. Ensure appropriate access control is in place. In some cases it may make sense to entirely disable metrics export for production builds. Although reading metrics via RPC won’t influence the device, in some cases the metrics could expose sensitive information if product owners are not careful.

Attention

MetricService::Get is a synchronous RPC method

Calls to is MetricService::Get are blocking and will send all metrics immediately, even though it is a server-streaming RPC. This will work fine if the device doesn’t have too many metrics, or doesn’t have concurrent RPCs like logging, but could be a problem in some cases.

We plan to offer an async version where the application is responsible for pumping the metrics into the streaming response. This gives flow control to the application.

Size report#

Pigweed AI summary: The size report shows the cost in code and memory for various metrics examples. It does not include the RPC service. The report includes information on the label, segment, and delta for each metric. There are also details on the size impact of dumping group and metrics to log, as well as the size impact of having one group and four metrics. The report mentions that there is an unexpectedly large flash impact and that the reason for this is being investigated.

The below size report shows the cost in code and memory for a few examples of metrics. This does not include the RPC service.

Label

Segment

Delta

1 metric and 1 group no dump or export

+404

(+) dump group and metrics to log

+1,052

(+) 1 group (+) 4 metrics

+248

Attention

At time of writing, the above sizes show an unexpectedly large flash impact. We are investigating why GCC is inserting large global static constructors per group, when all the logic should be reused across objects.

Metric Parser#

Pigweed AI summary: The metric_parser Python Module is used to request system metrics via RPC and then parse the response. It detokenizes group and metric names and returns the metrics in a dictionary organized by group and value.

The metric_parser Python Module requests the system metrics via RPC, then parses the response while detokenizing the group and metrics names, and returns the metrics in a dictionary organized by group and value.

Design tradeoffs#

Pigweed AI summary: The article discusses the design tradeoffs in metrics collection and aggregation. The chosen approach includes using atomic-sized metrics, no aggregate metrics yet, no virtual metrics, linked list registration, no fast metric lookup, and relying on C++ static initialization. The article also mentions the support for both local and global metrics and the need for well-tested upstream solutions for legitimate use cases. The responsibility of aggregation is pushed to the user, and synchronization is not guaranteed except for atomic operations. The article concludes by stating that helpers

There are many possible approaches to metrics collection and aggregation. We’ve chosen some points on the tradeoff curve:

Atomic-sized metrics - Using simple metric objects with just uint32/float enables atomic operations. While it might be nice to support larger types, it is more useful to have safe metrics increment from interrupt subroutines.
No aggregate metrics (yet) - Aggregate metrics (e.g. average, max, min, histograms) are not supported, and must be built on top of the simple base metrics. By taking this route, we can considerably simplify the core metrics system and have aggregation logic in separate modules. Those modules can then feed into the metrics system - for example by creating multiple metrics for a single underlying metric. For example: “foo”, “foo_max”, “foo_min” and so on.

The other problem with automatic aggregation is that what period the aggregation happens over is often important, and it can be hard to design this cleanly into the API. Instead, this responsibility is pushed to the user who must take more care.

Note that we will add helpers for aggregated metrics.
No virtual metrics - An alternate approach to the concrete Metric class in the current module is to have a virtual interface for metrics, and then allow those metrics to have their own storage. This is attractive but can lead to many vtables and excess memory use in simple one-metric use cases.
Linked list registration - Using linked lists for registration is a tradeoff, accepting some memory overhead in exchange for flexibility. Other alternatives include a global table of metrics, which has the disadvantage of requiring centralizing the metrics – an impossibility for middleware like Pigweed.
Synchronization - The only synchronization guarantee provided by pw_metric is that increment and set are atomic. Other than that, users are on their own to synchonize metric collection and updating.
No fast metric lookup - The current design does not make it fast to lookup a metric at runtime; instead, one must run a linear search of the tree to find the matching metric. In most non-dynamic use cases, this is fine in practice, and saves having a more involved hash table. Metric updates will be through direct member or variable accesses.
Relying on C++ static initialization - In short, the convenience outweighs the cost and risk. Without static initializers, it would be impossible to automatically collect the metrics without post-processing the C++ code to find the metrics; a huge and debatably worthwhile approach. We have carefully analyzed the static initializer behaviour of Pigweed’s IntrusiveList and are confident it is correct.
Both local & global support - Potentially just one approach (the local or global one) could be offered, making the module less complex. However, we feel the additional complexity is worthwhile since there are legimitate use cases for both e.g. PW_METRIC and PW_METRIC_GLOBAL. We’d prefer to have a well-tested upstream solution for these use cases rather than have customers re-implement one of these.

Roadmap & Status#

Pigweed AI summary: The "Roadmap & Status" section outlines several planned improvements to the metric system. These include adding support for string metric names, enabling selective metric enabling/disabling, adding support for aggregate metrics, implementing an async RPC solution, integrating a stopwatch mechanism, supporting C-only code, adding a global metric counter, exploring the possibility of mapping metrics to a custom proto structure, and making metric structure instantiation safer. These improvements aim to make the metric system more flexible, efficient, and user-friendly.

String metric names - pw_metric stores metric names as tokens. On one hand, this is great for production where having a compact binary is often a requirement to fit the application in the given part. However, in early development before flash is a constraint, string names are more convenient to work with since there is no need for host-side detokenization. We plan to add optional support for using supporting strings.
Aggregate metrics - We plan to add support for aggregate metrics on top of the simple metric mechanism, either as another module or as additional functionality inside this one. Likely examples include min/max,
Selectively enable or disable metrics - Currently the metrics are always enabled once included. In practice this is not ideal since many times only a few metrics are wanted in production, but having to strip all the metrics code is error prone. Instead, we will add support for controlling what metrics are enabled or disabled at compile time. This may rely on of C++20’s support for zero-sized members to fully remove the cost.
Async RPC - The current RPC service exports the metrics by streaming them to the client in batches. However, the current solution streams all the metrics to completion; this may block the RPC thread. In the future we will have an async solution where the user is in control of flow priority.
Timer integration - We would like to add a stopwatch type mechanism to time multiple in-flight events.
C support - In practice it’s often useful or necessary to instrument C-only code. While it will be impossible to support the global registration system that the C++ version supports, we will figure out a solution to make instrumenting C code relatively smooth.
Global counter - We may add a global metric counter to help detect cases where post-initialization metrics manipulations are done.
Proto structure - It may be possible to directly map metrics to a custom proto structure, where instead of a name or token field, a tag field is provided. This could result in elegant export to an easily machine parsable and compact representation on the host. We may investigate this in the future.
Safer data structures - At a cost of 4B per metric and 4B per group, it may be possible to make metric structure instantiation safe even in static constructors, and also make it safe to remove metrics dynamically. We will consider whether this tradeoff is the right one, since a 4B cost per metric is substantial on projects with many metrics.