pw_enum#

pw_enum: Rich enum support

pw_enum supports automatic stringifying and tokenizing of C++ enums. It works by parsing C++ standard header files and generating versions of those headers with minimal additions needed to support these features.

Why use pw_enum?

  • Efficient string or tokenized logging: Stringifies or tokenizes logs automatically for seamless logging.

  • Automatic content-based versioning: Generates version hashes to prevent collisions as values change.

Automatic tokenized and stringified enums#

pw_enum works on enums declared in standard C++ header files. To use pw_enum:

  1. Declare one or more enums in a header files.

  2. Include the header file in a pw_cc_enum target instead of a standard cc_library.

  3. Include pw_enum/generate.h in the header file.

  4. Register the enum using the PW_ENUM(MyEnum, …) macro at global scope. List the fully qualified enum name, followed by all of its enumerators. If an enumerator has multiple aliases, only include one of them.

Important

The PW_ENUM macro must be called at global scope (outside of any namespace blocks, class definitions, or functions).

If PW_ENUM is called inside a namespace block, class, or function, the C++ compiler will reject it with a compilation error indicating that the template specialization of _PW_ENUM_cannot_be_used_within_namespaces must occur at global scope.

pw_enum headers are parsed during the build to support versioned tokenization and stringification with pw::EnumToString().

Example#

Declare enums in a standard C++ header and call PW_ENUM(MyEnum, …) at the bottom of the file, outside of any namespace blocks (in the global namespace).

#pragma once

#include <cstdint>

#include "pw_enum/generate.h"

namespace my::nested::pkg {

// Declare the enum as normal.
enum class MyEnum : uint8_t {
  kAlpha = 0,
  kBeta,
  kAliasedBeta = kBeta,
};

}  // namespace my::nested::pkg

PW_ENUM(my::nested::pkg::MyEnum, kAlpha, kBeta, kAliasedBeta);

Use the enum normally. It is tokenized with PW_TOKENIZE_ENUM and works with tokenized logs and pw::EnumToString().

#include "enum_example/basic_enum.h"

#include "pw_enum/to_string.h"
#include "pw_log/log.h"
#include "pw_log/tokenized_args.h"

namespace my::nested::pkg {

const char* HandleEnum(MyEnum value) {
  // Log the enum as a string or token, depending on the logging backend.
  PW_LOG_INFO("The enum value is: " MY_NESTED_PKG_MY_ENUM, PW_LOG_ENUM(value));

  switch (value) {
    case MyEnum::kAlpha:
      // Handle case kAlpha
      break;
    case MyEnum::kBeta:
      // Handle case kBeta
      break;
  }

  // The const char* string version of the enum is always available.
  return pw::EnumToString(value);
}

}  // namespace my::nested::pkg

Enums can reference values from other enums, even if they reside in different files and namespaces.

#pragma once

#include <cstdint>

#include "enum_example/basic_enum.h"
#include "pw_enum/generate.h"

namespace my::nested::pkg {

enum class OtherEnum : uint8_t {
  kFirst = 0,
  kSecond = static_cast<uint8_t>(my::nested::pkg::MyEnum::kBeta),
};

enum class AnotherEnum {
  kX = 1,
  kY = 2,
};

}  // namespace my::nested::pkg

PW_ENUM(my::nested::pkg::OtherEnum, kFirst, kSecond);
PW_ENUM(my::nested::pkg::AnotherEnum, kX, kY);
#pragma once

#include "enum_example/basic_enum.h"
#include "pw_enum/generate.h"

namespace my::other::pkg {

enum class ReferencesOtherEnum {
  kFirstValue = 0,
  kFromOther = static_cast<int>(my::nested::pkg::MyEnum::kAlpha),
};

}  // namespace my::other::pkg

PW_ENUM(my::other::pkg::ReferencesOtherEnum, kFirstValue, kFromOther);
pw_cc_enum(
    name = "advanced_enum",
    hdrs = ["enum_example/references_other_enum.h"],
    strip_include_prefix = ".",
    deps = [":basic_enum"],
)

Enumerator names#

By default, enumerator names that follow Google’s kEnumName style are converted to upper snake case, without the k prefix (ENUM_NAME). Names that do not follow Google style are used directly.

To override the default enumerator name, specify it in the PW_ENUM(name, …) macro with a string literal after =. For example:

PW_ENUM(my::Enum,                 // String name:
        kStandardStyle,           // "STANDARD_STYLE"
        kCustom = "custom_name",  // "custom_name"
        nonStandard,              // "nonStandard"
);

Enumerator aliases#

If multiple enumerator names share the same value (aliases), they can be registered together in the PW_ENUM macro. The generator groups registered aliases, sorting their display names alphabetically and joining them with | (e.g. "ALPHA|ALIAS_ALPHA"). To omit aliases, simply leave them out of PW_ENUM.

Logging enums#

Enums generated by pw_enum natively support Pigweed’s tokenized logging infrastructure.

  • Versioned format macro: pw_enum generates a macro to use in the format string for the enum. The macro is named for the namespace and enum name (e.g. MY_NESTED_PKG_MY_ENUM). The macro evaluates to a string literal that can be concatenated into a format string.

    The macro is versioned based on the enum’s contents. The version changes automatically when the enum changes, so tokenized logs of enums never have collisions.

    A *_DOMAIN macro (e.g. MY_NESTED_PKG_MY_ENUM_DOMAIN) is also generated with the enum’s tokenization domain, for use with nested tokenization.

  • Argument macro: Include pw_log/tokenized_args.h and use PW_LOG_ENUM(value) as the argument to the log statement.

When using a tokenizing logging backend, the generated format macro evaluates to PW_TOKEN_FMT(::namespace::Enum), and PW_LOG_ENUM resolves to pw::tokenizer::EnumToToken(), logging the 32-bit token. When using a standard string-based logging backend, the format macro yields the string format specifier %s, and PW_LOG_ENUM resolves to pw::EnumToString(), which yields the string representation.

Example#

  PW_LOG_INFO("State " MY_NESTED_PKG_OTHER_ENUM ": received packet",
              PW_LOG_ENUM(state));

Build integration#

pw_enum provides build integration for Bazel, GN, and CMake.

Use the pw_cc_enum rule from //pw_enum:pw_cc_enum.bzl.

pw_cc_enum(
    name = "basic_enum",
    hdrs = ["private/pw_enum_private/basic_enum.h"],
    strip_include_prefix = "private",
    deps = [":base_enum"],
)

Use the pw_cc_enum template from “$dir_pw_enum/pw_cc_enum.gni”.

pw_cc_enum("basic_enum") {
  public = [ "private/pw_enum_private/basic_enum.h" ]
  deps = [ ":base_enum" ]
  include_dirs = [ "private" ]
}

Use the pw_cc_enum function from pw_enum/pw_cc_enum.cmake.

pw_cc_enum(pw_enum.basic_enum
  HEADERS
    private/pw_enum_private/basic_enum.h
  PUBLIC_INCLUDES
    private
  PUBLIC_DEPS
    pw_enum.base_enum
)

Stringifying enums#

pw::EnumToString() returns a string version of an enum. It uses a FTADLE extension point PwEnumToString(enum). FTADLE is a pattern that enables customization by searching for a matching function via Argument-Dependent Lookup (ADL). For more information, see Designing Extension Points With FTADLE.

If you don’t use pw_cc_enum, you can manually use PW_TOKENIZE_ENUM in pw_tokenizer/enum.h to tokenize the enum and implement PwEnumToString.

Cross language support#

pw_enum is currently C++-only, but could be expanded to support other languages. The parser extracts the full enum definition and resolves all enumerator values, so it would be straightforward to generate compatible enum definitions for other languages from a C++ header. pw_enum could also support an alternate format, such as JSON, for the original enum definition, and generate C++ and other languages from that.

Background#

pw_tokenizer is one of Pigweed’s most widely adopted features. It has supported nested tokenization—a tokenized message inside another tokenized message—since the early days. Initially, only Base64-encoded messages were supported, which is inefficient. Support for directly encoding nested messages as 32-bit integers was added later (see 0105: Nested Tokens and Tokenized Log Arguments).

With support for encoding tokens as integers, supporting rich enums was a clear next step. This culminated in the creation of pw_tokenizer/enum.h and its supporting macros. This approach uses the enum’s integral value as a nested token, discriminated by its namespace to avoid collisions between different enum types. The result is highly efficient enum logs that are still readable and user-friendly.

The need for versioning#

Real-world deployment of pw_tokenizer/enum.h soon revealed a critical flaw. When enum values were changed or reordered during development, the resulting tokens changed. When merging token databases from different builds, this led to collisions, where the same token mapped to different string representations. It became clear that enum tokenization required versioning.

Several alternatives were explored for automatic enum versioning. A key constraint was that enum values must be able to be set with expressions, which may reference constants or other enum values. Approaches considered included:

  • Tokenize names instead of values: Hash the enumerator names and generate a function with a switch statement to map values to tokens at runtime.

  • Version in the domain: Incorporate a hash of the enum’s contents (names and values) into the tokenization domain, requiring two arguments to log an enum (the version and the value).

  • Calculate tokens from a base: This approach used a hash of the enum’s contents as a base offset, adding the enum value to it at the call site.

Ultimately, these approaches were ruled out because they increased code size relative to the existing implementation, primarily due to the additional code required at the call site.

The code size penalty could be avoided if there were a constexpr way to insert the enum’s version into the log format string. Then, the existing token logging macros could be used (PW_LOG_FMT). For example:

// If there were a way to define this macro during compilation, versioned
// enums would have no code size cost relative to unversioned enums.
#define MY_ENUM_FMT PW_LOG_FMT("::my::Enum::version_1234")

PW_LOG_INFO("My enum: " MY_ENUM_FMT, PW_LOG_ENUM(my_enum))

Unfortunately, there is no way get the versioned enum domain into a concatenatable string literal. This is required for compatibility with pw_log’s C-style API. If pw_log offered a C++-only API, this would be feasible, but adding such an API was out of scope.

Generating enums#

Generating enums appeared to be the only way to get the enum’s version into a string literal at compile time. This led to the creation of pw_enum.

JSON definition#

The initial implementation of pw_enum generated C++ headers from JSON files. While this worked well technically, it proved too difficult for projects to adopt due to the friction of maintaining JSON definitions for standard C++ enums. Protocol Buffers were considered in place of JSON, but they are too limited for this use case. Protobufs do not support setting enumerator values based on other enums or external constants.

Parse C++ source#

Finally, multiple approaches for parsing enum definitions out of C++ source code were explored. These included:

  • Use libclang from Python to parse header files. This would be robust and even perform constexpr evaluation of enumerator values. Unfortunately, libclang is a large dependency, and is not readily available on all platforms.

  • Parse clang’s -ast-dump output. This would be fairly robust, but would involve parsing moderately complex, non-standard text output intended for human consumption. It also requires clang, which not all projects build with.

  • Use a custom Python parser. This approach would be utterly impractical and brittle.

Ultimately, parsing C++ source directly proved infeasible. The final design avoids parsing arbitrary C++ source with the PW_ENUM(name, …) macro.

Design#

The final design of pw_enum addresses the constraints identified during its evolution by combining standard C++ header files with a specialized build-time generator powered by compile-time template evaluation.

This architecture provides a seamless user experience with zero runtime overhead and robust protection against database collisions, while maintaining the full expressiveness of standard C++ enum definitions.

Source files#

Users define enums in standard C++ header files. To opt-in to pw_enum, the header includes pw_enum/generate.h and registers each enum by calling the PW_ENUM(…) macro with the enum name and names of all of its enumerators.

The macro’s primary purpose is to capture enum metadata in an easily parsable format. PW_ENUM() expands to the enum name and list of enumerators, surrounded by unique markers. A Python script searches a preprocessed source file for the markers and extracts the enum metadata.

The macro also serves to require that users list the file in a pw_cc_enum target (see Build system integration), which is necessary for it to be processed. If the file is not processed by pw_enum machinery, the macro expands to static_assert(false), causing the build to fail with an informative error.

Enumerator evaluation#

Enumerator values can be defined by arbitrary C++ expressions. The values may change, even if the individual source file does not change.

The parse.py script evaluates enumerators generating a source file that references them. The source file instantiates a template with the enumerators as template arguments. Compilation fails, but includes the enumerator values in an easily parsable form.

This solution is far from ideal, but has proven to be robust. It evaluates enumerators with the same toolchain as the rest of the project. Printing compile-time constants with failed template instantiations is a common workaround to achieve compile-time “printf” functionality. The script searches for a unique template name and doesn’t depend on a particular compiler or version.

Build-time generation#

After parsing, the pw_cc_enum target runs a Python script (pw_enum/py/pw_enum/generate.py) that generates a header.

  1. Enum generation: The script generates a “shadowed” version of the header in the build directory. This generated header contains the original content, plus a footer with tokenization metadata. It also replaces the PW_ENUM(...) calls with _PW_ENUM_GENERATED(...). PW_ENUM(...) expands to static_assert(false) to require users to build headers with pw_cc_enum.

  2. Versioning: A unique version hash is calculated for each enum based on its fully qualified name and the names and values of all its enumerators. This hash is used to construct a unique tokenization domain (e.g., ::namespace::_pw_enum_HASH::EnumName). This ensures that if the enum changes, the domain changes, preventing collisions in merged token databases.

  3. Tokenization: The generated footer includes a call to PW_TOKENIZE_ENUM_CUSTOM from pw_tokenizer, which registers the enum values and their string representations in the database.

Build system integration#

The pw_cc_enum build rule automates the process of parsing C++ headers and generating versioned enum metadata. It invokes the pw_enum generator with the correct compilation flags and ensures the generated headers are prioritized during compilation.

  • Bazel (pw_cc_enum.bzl) creates an internal library target to collect the compilation flags (includes, defines) required to parse the header correctly and passes them to the Python script.

  • CMake (pw_cc_enum.cmake) creates an internal interface library to collect includes and defines from dependencies, and uses file(GENERATE) to produce a flags file for the generator. It uses -iquote to ensure that the build system prioritizes the generated shadowed header over the original source header.

  • GN (pw_cc_enum.gni) compiles a placeholder C++ file with the enum’s dependencies to generate a target Ninja file. The generator script parses the target Ninja file and the toolchain’s Ninja file to extract the compiler and its compilation flags (defines, includes, and flags) to run the evaluation step. This is similar to how pw_compilation_testing works. Like CMake, the GN build uses -iquote to ensure that the build system prioritizes the generated shadowed header over the original source header.