pw_metric

Attention

This module is not yet production ready; ask us if you are interested in using it out or have ideas about how to improve it.

Overview

Pigweed’s metric module is a lightweight manual instrumentation system for tracking system health metrics like counts or set values. For example, pw_metric could help with tracking the number of I2C bus writes, or the number of times a buffer was filled before it could drain in time, or safely incrementing counters from ISRs.

Key features of pw_metric:

  • Tokenized names - Names are tokenized using the pw_tokenizer enabling long metric names that don’t bloat your binary.

  • Tree structure - Metrics can form a tree, enabling grouping of related metrics for clearer organization.

  • Per object collection - Metrics and groups can live on object instances and be flexibly combined with metrics from other instances.

  • Global registration - For legacy code bases or just because it’s easier, pw_metric supports automatic aggregation of metrics. This is optional but convenient in many cases.

  • Simple design - There are only two core data structures: Metric and Group, which are both simple to understand and use. The only type of metric supported is uint32_t and float. This module does not support complicated aggregations like running average or min/max.

Example: Instrumenting a single object

The below example illustrates what instrumenting a class with a metric group and metrics might look like. In this case, the object’s MySubsystem::metrics() member is not globally registered; the user is on their own for combining this subsystem’s metrics with others.

#include "pw_metric/metric.h"

class MySubsystem {
 public:
  void DoSomething() {
    attempts_.Increment();
    if (ActionSucceeds()) {
      successes_.Increment();
    }
  }
  Group& metrics() { return metrics_; }

 private:
  PW_METRIC_GROUP(metrics_, "my_subsystem");
  PW_METRIC(metrics_, attempts_, "attempts", 0u);
  PW_METRIC(metrics_, successes_, "successes", 0u);
};

The metrics subsystem has no canonical output format at this time, but a JSON dump might look something like this:

{
  "my_subsystem" : {
    "successes" : 1000,
    "attempts" : 1200,
  }
}

In this case, every instance of MySubsystem will have unique counters.

Example: Instrumenting a legacy codebase

A common situation in embedded development is debugging legacy code or code which is hard to change; where it is perhaps impossible to plumb metrics objects around with dependency injection. The alternative to plumbing metrics is to register the metrics through a global mechanism. pw_metric supports this use case. For example:

Before instrumenting:

// This code was passed down from generations of developers before; no one
// knows what it does or how it works. But it needs to be fixed!
void OldCodeThatDoesntWorkButWeDontKnowWhy() {
  if (some_variable) {
    DoSomething();
  } else {
    DoSomethingElse();
  }
}

After instrumenting:

#include "pw_metric/global.h"
#include "pw_metric/metric.h"

PW_METRIC_GLOBAL(legacy_do_something, "legacy_do_something");
PW_METRIC_GLOBAL(legacy_do_something_else, "legacy_do_something_else");

// This code was passed down from generations of developers before; no one
// knows what it does or how it works. But it needs to be fixed!
void OldCodeThatDoesntWorkButWeDontKnowWhy() {
  if (some_variable) {
    legacy_do_something.Increment();
    DoSomething();
  } else {
    legacy_do_something_else.Increment();
    DoSomethingElse();
  }
}

In this case, the developer merely had to add the metrics header, define some metrics, and then start incrementing them. These metrics will be available globally through the pw::metric::global_metrics object defined in pw_metric/global.h.

Why not just use simple counter variables?

One might wonder what the point of leveraging a metric library is when it is trivial to make some global variables and print them out. There are a few reasons:

  • Metrics offload - To make it easy to get metrics off-device by sharing the infrastructure for offloading.

  • Consistent format - To get the metrics in a consistent format (e.g. protobuf or JSON) for analysis

  • Uncoordinated collection - To provide a simple and reliable way for developers on a team to all collect metrics for their subsystems, without having to coordinate to offload. This could extend to code in libraries written by other teams.

  • Pre-boot or interrupt visibility - Some of the most challenging bugs come from early system boot when not all system facilities are up (e.g. logging or UART). In those cases, metrics provide a low-overhead approach to understand what is happening. During early boot, metrics can be incremented, then after boot dumping the metrics provides insights into what happened. While basic counter variables can work in these contexts too, one still has to deal with the offloading problem; which the library handles.

Metrics API reference

The metrics API consists of just a few components:

  • The core data structures pw::metric::Metric and pw::metric::Group

  • The macros for scoped metrics and groups PW_METRIC and PW_METRIC_GROUP

  • The macros for globally registered metrics and groups PW_METRIC_GLOBAL and PW_METRIC_GROUP_GLOBAL

  • The global groups and metrics list: pw::metric::global_groups and pw::metric::global_metrics.

Metric

The pw::metric::Metric provides:

  • A 31-bit tokenized name

  • A 1-bit discriminator for int or float

  • A 32-bit payload (int or float)

  • A 32-bit next pointer (intrusive list)

The metric object is 12 bytes on 32-bit platforms.

class pw::metric::Metric
Increment(uint32_t amount = 0)

Increment the metric by the given amount. Results in undefined behaviour if the metric is not of type int.

Set(uint32_t value)

Set the metric to the given value. Results in undefined behaviour if the metric is not of type int.

Set(float value)

Set the metric to the given value. Results in undefined behaviour if the metric is not of type float.

Group

The pw::metric::Group object is simply:

  • A name for the group

  • A list of children groups

  • A list of leaf metrics groups

  • A 32-bit next pointer (intrusive list)

The group object is 16 bytes on 32-bit platforms.

class pw::metric::Group
Dump(int indent_level = 0)

Recursively dump a metrics group to pw_log. Produces output like:

"$6doqFw==": {
  "$05OCZw==": {
    "$VpPfzg==": 1,
    "$LGPMBQ==": 1.000000,
    "$+iJvUg==": 5,
  }
  "$9hPNxw==": 65,
  "$oK7HmA==": 13,
  "$FCM4qQ==": 0,
}

Note the metric names are tokenized with base64. Decoding requires using the Pigweed detokenizer. With a detokenizing-enabled logger, you could get something like:

"i2c_1": {
  "gyro": {
    "num_sampleses": 1,
    "init_time_us": 1.000000,
    "initialized": 5,
  }
  "bus_errors": 65,
  "transactions": 13,
  "bytes_sent": 0,
}

Macros

The macros are the primary mechanism for creating metrics, and should be used instead of directly constructing metrics or groups. The macros handle tokenizing the metric and group names.

PW_METRIC(identifier, name, value)
PW_METRIC(group, identifier, name, value)
PW_METRIC_STATIC(identifier, name, value)
PW_METRIC_STATIC(group, identifier, name, value)

Declare a metric, optionally adding it to a group.

  • identifier - An identifier name for the created variable or member. For example: i2c_transactions might be used as a local or global metric; inside a class, could be named according to members (i2c_transactions_ for Google’s C++ style).

  • name - The string name for the metric. This will be tokenized. There are no restrictions on the contents of the name; however, consider restricting these to be valid C++ identifiers to ease integration with other systems.

  • value - The initial value for the metric. Must be either a floating point value (e.g. 3.2f) or unsigned int (e.g. 21u).

  • group - A pw::metric::Group instance. If provided, the metric is added to the given group.

The macro declares a variable or member named “name” with type pw::metric::Metric, and works in three contexts: global, local, and member.

If the _STATIC variant is used, the macro declares a variable with static storage. These can be used in function scopes, but not in classes.

  1. At global scope:

PW_METRIC(foo, "foo", 15.5f);

void MyFunc() {
  foo.Increment();
}
  1. At local function or member function scope:

void MyFunc() {
  PW_METRIC(foo, "foo", 15.5f);
  foo.Increment();
  // foo goes out of scope here; be careful!
}
  1. At member level inside a class or struct:

struct MyStructy {
  void DoSomething() {
    somethings.Increment();
  }
  // Every instance of MyStructy will have a separate somethings counter.
  PW_METRIC(somethings, "somethings", 0u);
}

You can also put a metric into a group with the macro. Metrics can belong to strictly one group, otherwise an assertion will fail. Example:

PW_METRIC_GROUP(my_group, "my_group");
PW_METRIC(my_group, foo, "foo", 0.2f);
PW_METRIC(my_group, bar, "bar", 44000u);
PW_METRIC(my_group, zap, "zap", 3.14f);

Tip

If you want a globally registered metric, see pw_metric/global.h; in that contexts, metrics are globally registered without the need to centrally register in a single place.

PW_METRIC_GROUP(identifier, name)
PW_METRIC_GROUP(parent_group, identifier, name)
PW_METRIC_GROUP_STATIC(identifier, name)
PW_METRIC_GROUP_STATIC(parent_group, identifier, name)

Declares a pw::metric::Group with name name; the name is tokenized. Works similar to PW_METRIC and can be used in the same contexts (global, local, and member). Optionally, the group can be added to a parent group.

If the _STATIC variant is used, the macro declares a variable with static storage. These can be used in function scopes, but not in classes.

Example:

PW_METRIC_GROUP(my_group, "my_group");
PW_METRIC(my_group, foo, "foo", 0.2f);
PW_METRIC(my_group, bar, "bar", 44000u);
PW_METRIC(my_group, zap, "zap", 3.14f);
PW_METRIC_GLOBAL(identifier, name, value)

Declare a pw::metric::Metric with name name, and register it in the global metrics list pw::metric::global_metrics.

Example:

#include "pw_metric/metric.h"
#include "pw_metric/global.h"

// No need to coordinate collection of foo and bar; they're autoregistered.
PW_METRIC_GLOBAL(foo, "foo", 0.2f);
PW_METRIC_GLOBAL(bar, "bar", 44000u);

Note that metrics defined with PW_METRIC_GLOBAL should never be added to groups defined with PW_METRIC_GROUP_GLOBAL. Each metric can only belong to one group, and metrics defined with PW_METRIC_GLOBAL are pre-registered with the global metrics list.

Attention

Do not create PW_METRIC_GLOBAL instances anywhere other than global scope. Putting these on an instance (member context) would lead to dangling pointers and misery. Metrics are never deleted or unregistered!

PW_METRIC_GROUP_GLOBAL(identifier, name, value)

Declare a pw::metric::Group with name name, and register it in the global metric groups list pw::metric::global_groups.

Note that metrics created with PW_METRIC_GLOBAL should never be added to groups! Instead, just create a freestanding metric and register it into the global group (like in the example below).

Example:

#include "pw_metric/metric.h"
#include "pw_metric/global.h"

// No need to coordinate collection of this group; it's globally registered.
PW_METRIC_GROUP_GLOBAL(leagcy_system, "legacy_system");
PW_METRIC(leagcy_system, foo, "foo",0.2f);
PW_METRIC(leagcy_system, bar, "bar",44000u);

Attention

Do not create PW_METRIC_GROUP_GLOBAL instances anywhere other than global scope. Putting these on an instance (member context) would lead to dangling pointers and misery. Metrics are never deleted or unregistered!

Usage & Best Practices

This library makes several tradeoffs to enable low memory use per-metric, and one of those tradeoffs results in requiring care in constructing the metric trees.

Use the Init() pattern for static objects with metrics

A common pattern in embedded systems is to allocate many objects globally, and reduce reliance on dynamic allocation (or eschew malloc entirely). This leads to a pattern where rich/large objects are statically constructed at global scope, then interacted with via tasks or threads. For example, consider a hypothetical global Uart object:

class Uart {
 public:
  Uart(span<std::byte> rx_buffer, span<std::byte> tx_buffer)
    : rx_buffer_(rx_buffer), tx_buffer_(tx_buffer) {}

  // Send/receive here...

 private:
  std::span<std::byte> rx_buffer;
  std::span<std::byte> tx_buffer;
};

std::array<std::byte, 512> uart_rx_buffer;
std::array<std::byte, 512> uart_tx_buffer;
Uart uart1(uart_rx_buffer, uart_tx_buffer);

Through the course of building a product, the team may want to add metrics to the UART to for example gain insight into which operations are triggering lots of data transfer. When adding metrics to the above imaginary UART object, one might consider the following approach:

class Uart {
 public:
  Uart(span<std::byte> rx_buffer,
       span<std::byte> tx_buffer,
       Group& parent_metrics)
    : rx_buffer_(rx_buffer),
      tx_buffer_(tx_buffer) {
      // PROBLEM! parent_metrics may not be constructed if it's a reference
      // to a static global.
      parent_metrics.Add(tx_bytes_);
      parent_metrics.Add(rx_bytes_);
   }

  // Send/receive here which increment tx/rx_bytes.

 private:
  std::span<std::byte> rx_buffer;
  std::span<std::byte> tx_buffer;

  PW_METRIC(tx_bytes_, "tx_bytes", 0);
  PW_METRIC(rx_bytes_, "rx_bytes", 0);
};

PW_METRIC_GROUP(global_metrics, "/");
PW_METRIC_GROUP(global_metrics, uart1_metrics, "uart1");

std::array<std::byte, 512> uart_rx_buffer;
std::array<std::byte, 512> uart_tx_buffer;
Uart uart1(uart_rx_buffer,
           uart_tx_buffer,
           uart1_metrics);

However, this is incorrect, since the parent_metrics (pointing to uart1_metrics in this case) may not be constructed at the point of uart1 getting constructed. Thankfully in the case of pw_metric this will result in an assertion failure (or it will work correctly if the constructors are called in a favorable order), so the problem will not go unnoticed. Instead, consider using the Init() pattern for static objects, where references to dependencies may only be stored during construction, but no methods on the dependencies are called.

Instead, the Init() approach separates global object construction into two phases: The constructor where references are stored, and a Init() function which is called after all static constructors have run. This approach works correctly, even when the objects are allocated globally:

class Uart {
 public:
  // Note that metrics is not passed in here at all.
  Uart(span<std::byte> rx_buffer,
       span<std::byte> tx_buffer)
    : rx_buffer_(rx_buffer),
      tx_buffer_(tx_buffer) {}

   // Precondition: parent_metrics is already constructed.
   void Init(Group& parent_metrics) {
      parent_metrics.Add(tx_bytes_);
      parent_metrics.Add(rx_bytes_);
   }

  // Send/receive here which increment tx/rx_bytes.

 private:
  std::span<std::byte> rx_buffer;
  std::span<std::byte> tx_buffer;

  PW_METRIC(tx_bytes_, "tx_bytes", 0);
  PW_METRIC(rx_bytes_, "rx_bytes", 0);
};

PW_METRIC_GROUP(root_metrics, "/");
PW_METRIC_GROUP(root_metrics, uart1_metrics, "uart1");

std::array<std::byte, 512> uart_rx_buffer;
std::array<std::byte, 512> uart_tx_buffer;
Uart uart1(uart_rx_buffer,
           uart_tx_buffer);

void main() {
  // uart1_metrics is guaranteed to be initialized by this point, so it is
  safe to pass it to Init().
  uart1.Init(uart1_metrics);
}

Attention

Be extra careful about static global metric registration. Consider using the Init() pattern.

Metric member order matters in objects

The order of declaring in-class groups and metrics matters if the metrics are within a group declared inside the class. For example, the following class will work fine:

#include "pw_metric/metric.h"

class PowerSubsystem {
 public:
   Group& metrics() { return metrics_; }
   const Group& metrics() const { return metrics_; }

 private:
  PW_METRIC_GROUP(metrics_, "power");  // Note metrics_ declared first.
  PW_METRIC(metrics_, foo, "foo", 0.2f);
  PW_METRIC(metrics_, bar, "bar", 44000u);
};

but the following one will not since the group is constructed after the metrics (and will result in a compile error):

#include "pw_metric/metric.h"

class PowerSubsystem {
 public:
   Group& metrics() { return metrics_; }
   const Group& metrics() const { return metrics_; }

 private:
  PW_METRIC(metrics_, foo, "foo", 0.2f);
  PW_METRIC(metrics_, bar, "bar", 44000u);
  PW_METRIC_GROUP(metrics_, "power");  // Error: metrics_ must be first.
};

Attention

Put groups before metrics when declaring metrics members inside classes.

Thread safety

pw_metric has no built-in synchronization for manipulating the tree structure. Users are expected to either rely on shared global mutex when constructing the metric tree, or do the metric construction in a single thread (e.g. a boot/init thread). The same applies for destruction, though we do not advise destructing metrics or groups.

Individual metrics have atomic Increment(), Set(), and the value accessors as_float() and as_int() which don’t require separate synchronization, and can be used from ISRs.

Attention

You must synchronize access to metrics. pw_metrics does not internally synchronize access during construction. Metric Set/Increment are safe.

Lifecycle

Metric objects are not designed to be destructed, and are expected to live for the lifetime of the program or application. If you need dynamic creation/destruction of metrics, pw_metric does not attempt to cover that use case. Instead, pw_metric covers the case of products with two execution phases:

  1. A boot phase where the metric tree is created.

  2. A run phase where metrics are collected. The tree structure is fixed.

Technically, it is possible to destruct metrics provided care is taken to remove the given metric (or group) from the list it’s contained in. However, there are no helper functions for this, so be careful.

Below is an example that is incorrect. Don’t do what follows!

#include "pw_metric/metric.h"

void main() {
  PW_METRIC_GROUP(root, "/");
  {
    // BAD! The metrics have a different lifetime than the group.
    PW_METRIC(root, temperature, "temperature_f", 72.3f);
    PW_METRIC(root, humidity, "humidity_relative_percent", 33.2f);
  }
  // OOPS! root now has a linked list that points to the destructed
  // "humidity" object.
}

Attention

Don’t destruct metrics. Metrics are designed to be registered / structured upfront, then manipulated during a device’s active phase. They do not support destruction.

Exporting metrics

Collecting metrics on a device is not useful without a mechanism to export those metrics for analysis and debugging. pw_metric offers an optional RPC service library (:metric_service_nanopb) that enables exporting a user-supplied set of on-device metrics via RPC. This facility is intended to function from the early stages of device bringup through production in the field.

The metrics are fetched by calling the MetricService.Get RPC method, which streams all registered metrics to the caller in batches (server streaming RPC). Batching the returned metrics avoids requiring a large buffer or large RPC MTU.

The returned metric objects have flattened paths to the root. For example, the returned metrics (post detokenization and jsonified) might look something like:

{
  "/i2c1/failed_txns": 17,
  "/i2c1/total_txns": 2013,
  "/i2c1/gyro/resets": 24,
  "/i2c1/gyro/hangs": 1,
  "/spi1/thermocouple/reads": 242,
  "/spi1/thermocouple/temp_celsius": 34.52,
}

Note that there is no nesting of the groups; the nesting is implied from the path.

RPC service setup

To expose a MetricService in your application, do the following:

  1. Define metrics around the system, and put them in a group or list of metrics. Easy choices include for example the global_groups and global_metrics variables; or creat your own.

  2. Create an instance of pw::metric::MetricService.

  3. Register the service with your RPC server.

For example:

#include "pw_rpc/server.h"
#include "pw_metric/metric.h"
#include "pw_metric/global.h"
#include "pw_metric/metric_service_nanopb.h"

// Note: You must customize the RPC server setup; see pw_rpc.
Channel channels[] = {
 Channel::Create<1>(&uart_output),
};
Server server(channels);

// Metric service instance, pointing to the global metric objects.
// This could also point to custom per-product or application objects.
pw::metric::MetricService metric_service(
    pw::metric::global_metrics,
    pw::metric::global_groups);

void RegisterServices() {
  server.RegisterService(metric_service);
  // Register other services here.
}

void main() {
  // ... system initialization ...

  RegisterServices();

  // ... start your applcation ...
}

Attention

Take care when exporting metrics. Ensure appropriate access control is in place. In some cases it may make sense to entirely disable metrics export for production builds. Although reading metrics via RPC won’t influence the device, in some cases the metrics could expose sensitive information if product owners are not careful.

Attention

MetricService::Get is a synchronous RPC method

Calls to is MetricService::Get are blocking and will send all metrics immediately, even though it is a server-streaming RPC. This will work fine if the device doesn’t have too many metrics, or doesn’t have concurrent RPCs like logging, but could be a problem in some cases.

We plan to offer an async version where the application is responsible for pumping the metrics into the streaming response. This gives flow control to the application.

Size report

The below size report shows the cost in code and memory for a few examples of metrics. This does not include the RPC service.

Label

Segment

Before

Delta

After

1 metric and 1 group no dump or export

FLASH
RAM
20,808
672
+692
+32
21,500
704

(+) dump group and metrics to log

FLASH
RAM
20,808
672
+1,460
+32
22,268
704

(+) 1 group (+) 4 metrics

FLASH
RAM
22,268
704
+256
+64
22,524
768

Attention

At time of writing, the above sizes show an unexpectedly large flash impact. We are investigating why GCC is inserting large global static constructors per group, when all the logic should be reused across objects.

Design tradeoffs

There are many possible approaches to metrics collection and aggregation. We’ve chosen some points on the tradeoff curve:

  • Atomic-sized metrics - Using simple metric objects with just uint32/float enables atomic operations. While it might be nice to support larger types, it is more useful to have safe metrics increment from interrupt subroutines.

  • No aggregate metrics (yet) - Aggregate metrics (e.g. average, max, min, histograms) are not supported, and must be built on top of the simple base metrics. By taking this route, we can considerably simplify the core metrics system and have aggregation logic in separate modules. Those modules can then feed into the metrics system - for example by creating multiple metrics for a single underlying metric. For example: “foo”, “foo_max”, “foo_min” and so on.

    The other problem with automatic aggregation is that what period the aggregation happens over is often important, and it can be hard to design this cleanly into the API. Instead, this responsibility is pushed to the user who must take more care.

    Note that we will add helpers for aggregated metrics.

  • No virtual metrics - An alternate approach to the concrete Metric class in the current module is to have a virtual interface for metrics, and then allow those metrics to have their own storage. This is attractive but can lead to many vtables and excess memory use in simple one-metric use cases.

  • Linked list registration - Using linked lists for registration is a tradeoff, accepting some memory overhead in exchange for flexibility. Other alternatives include a global table of metrics, which has the disadvantage of requiring centralizing the metrics – an impossibility for middleware like Pigweed.

  • Synchronization - The only synchronization guarantee provided by pw_metric is that increment and set are atomic. Other than that, users are on their own to synchonize metric collection and updating.

  • No fast metric lookup - The current design does not make it fast to lookup a metric at runtime; instead, one must run a linear search of the tree to find the matching metric. In most non-dynamic use cases, this is fine in practice, and saves having a more involved hash table. Metric updates will be through direct member or variable accesses.

  • Relying on C++ static initialization - In short, the convenience outweighs the cost and risk. Without static initializers, it would be impossible to automatically collect the metrics without post-processing the C++ code to find the metrics; a huge and debatably worthwhile approach. We have carefully analyzed the static initializer behaviour of Pigweed’s IntrusiveList and are confident it is correct.

  • Both local & global support - Potentially just one approach (the local or global one) could be offered, making the module less complex. However, we feel the additional complexity is worthwhile since there are legimitate use cases for both e.g. PW_METRIC and PW_METRIC_GLOBAL. We’d prefer to have a well-tested upstream solution for these use cases rather than have customers re-implement one of these.

Roadmap & Status

  • String metric names - pw_metric stores metric names as tokens. On one hand, this is great for production where having a compact binary is often a requirement to fit the application in the given part. However, in early development before flash is a constraint, string names are more convenient to work with since there is no need for host-side detokenization. We plan to add optional support for using supporting strings.

  • Aggregate metrics - We plan to add support for aggregate metrics on top of the simple metric mechanism, either as another module or as additional functionality inside this one. Likely examples include min/max,

  • Selectively enable or disable metrics - Currently the metrics are always enabled once included. In practice this is not ideal since many times only a few metrics are wanted in production, but having to strip all the metrics code is error prone. Instead, we will add support for controlling what metrics are enabled or disabled at compile time. This may rely on of C++20’s support for zero-sized members to fully remove the cost.

  • Async RPC - The current RPC service exports the metrics by streaming them to the client in batches. However, the current solution streams all the metrics to completion; this may block the RPC thread. In the future we will have an async solution where the user is in control of flow priority.

  • Timer integration - We would like to add a stopwatch type mechanism to time multiple in-flight events.

  • C support - In practice it’s often useful or necessary to instrument C-only code. While it will be impossible to support the global registration system that the C++ version supports, we will figure out a solution to make instrumenting C code relatively smooth.

  • Global counter - We may add a global metric counter to help detect cases where post-initialization metrics manipulations are done.

  • Proto structure - It may be possible to directly map metrics to a custom proto structure, where instead of a name or token field, a tag field is provided. This could result in elegant export to an easily machine parsable and compact representation on the host. We may investigate this in the future.

  • Safer data structures - At a cost of 4B per metric and 4B per group, it may be possible to make metric structure instantiation safe even in static constructors, and also make it safe to remove metrics dynamically. We will consider whether this tradeoff is the right one, since a 4B cost per metric is substantial on projects with many metrics.