The pw_log protobuf#

pw_log: Robust and portable logging for embedded projects

pw_log defines a protocol buffer for storing and transmitting log messages. The protobuf is optimized to be maximally efficient.

Fields#

The pw_log protobuf is defined in log.proto.

syntax = "proto3";

package pw.log;

import "pw_protobuf_protos/common.proto";
import "pw_tokenizer_proto/options.proto";

option java_outer_classname = "Log";

// A log message and metadata. Logs come in a few different forms:
//
//  1. A tokenized log message (recommended for production)
//  2. A non-tokenized log message (good for development)
//  3. A "log missed" tombstone, indicating that some logs were dropped
//
// Size analysis for tokenized log messages, including each field's proto tag:
//
//  - message     - 6-12 bytes; depending on number and value of arguments
//  - line_level  - 3 bytes; 4 bytes if line > 2048 (uncommon)
//  - timestamp   - 3 bytes; assuming delta encoding
//  - thread      - 2-6 bytes; depending on whether value is a token or string
//
// Adding the fields gives the total proto message size:
//
//    6-12 bytes - log
//    9-15 bytes - log + level + line
//   12-18 bytes - log + level + line + timestamp
//
// An analysis of a project's log token database revealed the following
// distribution of the number of arguments to log messages:
//
//   # args   # messages
//     0         2,700
//     1         2,400
//     2         1,200
//     3+        1,000
//
// Note: The below proto makes some compromises compared to what one might
// expect for a "clean" proto design, in order to shave bytes off of the
// messages. It is critical that the log messages are as small as possible to
// enable storing more logs in limited memory. This is why, for example, there
// is no separate "DroppedLog" type, or a "TokenizedLog" and "StringLog", which
// would add at least 2 extra bytes per message
message LogEntry {
  // The log message, which may be tokenized.
  //
  // If tokenized logging is used, implementations may encode metadata in the
  // log message rather than as separate proto fields. This reduces the size of
  // the protobuf with no overhead.
  //
  // The standard format for encoding metadata in the log message is defined by
  // the pw_log_tokenized module. The message and metadata are encoded as
  // key-value pairs using ■ and ♦ as delimiters. For example:
  //
  //  ■msg♦This is the log message: %d■module♦wifi■file♦../path/to/file.cc
  //
  // See http://pigweed.dev/pw_log_tokenized for full details. When
  // pw_log_tokenized is used, this metadata is automatically included as
  // described.
  //
  // The level and flags are not included since they may be runtime values and
  // thus cannot always be tokenized. The line number is not included because
  // line numbers change frequently and a new token is created for each line.
  //
  // Size analysis when tokenized:
  //
  //   tag+wire = 1 byte
  //   size     = 1 byte; payload will almost always be < 127 bytes
  //   payload  = N bytes; typically 4-10 in practice
  //
  // Total: 2 + N ~= 6-12 bytes
  optional bytes message = 1 [(tokenizer.format) = TOKENIZATION_OPTIONAL];

  // Packed log level and line number. Structure:
  //
  //   Level: Bottom 3 bits; level = line_level & 0x7
  //   Line: Remaining bits; line = (line_level >> 3)
  //
  // Note: This packing saves two bytes per log message in most cases compared
  // to having line and level separately; and is zero-cost if the log backend
  // omits the line number.
  uint32 line_level = 2;

  // Some log messages have flags to indicate attributes such as whether they
  // are from an assert or if they contain PII. The particular flags are
  // product- and implementation-dependent.
  uint32 flags = 3;

  // Timestamps are either specified with an absolute timestamp or relative to
  // the previous log entry.
  oneof time {
    // The absolute timestamp in implementation-defined ticks. Applications
    // determine how to interpret this on the receiving end. In the simplest
    // case, these ticks might be milliseconds or microseconds since boot.
    // Applications could also access clock information out-of-band with a
    // ClockParameters protobuf.
    int64 timestamp = 4;

    // Time since the last entry in implementation-defined ticks, as for the
    // timestamp field. This enables delta encoding when batching entries
    // together.
    //
    // Size analysis for this field including tag and varint, assuming 1 kHz
    // ticks:
    //
    //           < 127 ms gap == 127 ms      ==  7 bits == 2 bytes
    //        < 16,000 ms gap ==  16 seconds == 14 bits == 3 bytes
    //     < 2,000,000 ms gap ==  35 minutes == 21 bits == 4 bytes
    //   < 300,000,000 ms gap ==  74 hours   == 28 bits == 5 bytes
    //
    // Log bursts will thus consume just 2 bytes (tag + up to 127ms delta) for
    // the timestamp, which is a good improvement over an absolute timestamp.
    int64 time_since_last_entry = 5;
  }

  // When the log buffers are full but more logs come in, the logs are counted
  // and a special log message is omitted with only counts for the number of
  // messages dropped.
  uint32 dropped = 6;

  // The PW_LOG_MODULE_NAME for this log message.
  bytes module = 7 [(tokenizer.format) = TOKENIZATION_OPTIONAL];

  // The file path where this log was created, if not encoded in the message.
  bytes file = 8 [(tokenizer.format) = TOKENIZATION_OPTIONAL];

  // The task or thread name that created the log message. If the log was not
  // created on a thread, it should use a name appropriate to that context.
  bytes thread = 9 [(tokenizer.format) = TOKENIZATION_OPTIONAL];

  // The following fields are planned but will not be added until they are
  // needed. Protobuf field numbers over 15 use an extra byte, so these fields
  // are left out for now to avoid reserving field numbers unnecessarily.

  // Represents the device from which the log originated. The meaning of this
  // field is implementation defined
  // uint32 source_id = ?;

  // Some messages are associated with trace events, which may carry additional
  // contextual data. This is a tuple of a data format string which could be
  // used by the decoder to identify the data (e.g. printf-style tokens) and the
  // data itself in bytes.
  // bytes data_format = ?
  //     [(tokenizer.format) = TOKENIZATION_OPTIONAL];
  // bytes data = ?;
}

message LogRequest {}

message LogEntries {
  repeated LogEntry entries = 1;
  uint32 first_entry_sequence_id = 2;
}

// RPC service for accessing logs.
service Logs {
  rpc Listen(LogRequest) returns (stream LogEntries);
}

message FilterRule {
  // Log level values match pw_log/levels.h. Enum names avoid collissions with
  // possible macros.
  enum Level {
    ANY_LEVEL = 0;
    DEBUG_LEVEL = 1;
    INFO_LEVEL = 2;
    WARN_LEVEL = 3;
    ERROR_LEVEL = 4;
    CRITICAL_LEVEL = 5;
    FATAL_LEVEL = 7;
  };
  // Condition 1: log.level >= level_greater_than_or_equal.
  Level level_greater_than_or_equal = 1;

  // Condition 2: (module_equals.size() == 0) || (log.module == module_equals);
  bytes module_equals = 2 [(tokenizer.format) = TOKENIZATION_OPTIONAL];

  // Condition 3: (any_flags_set == 0) || (log.flags & any_flags_set) != 0))
  uint32 any_flags_set = 3;

  // Action to take if all conditions are met and rule is not inactive.
  enum Action {
    INACTIVE = 0;  // Ignore the rule entirely.
    KEEP = 1;      // Keep the log entry if all conditions are met.
    DROP = 2;      // Drop the log entry if all conditions are met
  };
  Action action = 4;

  // Condition 4: (thread_equals.size() == 0 || log.thread == thread_equals).
  bytes thread_equals = 5 [(tokenizer.format) = TOKENIZATION_OPTIONAL];
}

// A filter is a series of rules. First matching rule wins.
message Filter {
  repeated FilterRule rule = 1;
}

message SetFilterRequest {
  // A filter can be identified by a human readable string, token, or number.
  bytes filter_id = 1 [(tokenizer.format) = TOKENIZATION_OPTIONAL];

  Filter filter = 2;
}

message GetFilterRequest {
  bytes filter_id = 1 [(tokenizer.format) = TOKENIZATION_OPTIONAL];
}

message FilterIdListRequest {}

message FilterIdListResponse {
  repeated bytes filter_id = 1 [(tokenizer.format) = TOKENIZATION_OPTIONAL];
}

// RPC service for retrieving and modifying log filters.
service Filters {
  rpc SetFilter(SetFilterRequest) returns (pw.protobuf.Empty);
  rpc GetFilter(GetFilterRequest) returns (Filter);
  rpc ListFilterIds(FilterIdListRequest) returns (FilterIdListResponse);
}

Timestamps#

Timestamps are specified in implementation-defined ticks. Ticks could be milliseconds, microsends, or any arbitrary duration derived from the device’s clock.

For many applications, the timestamp field can be interpreted based on the prior knowledge of the system. For example, timestamp might be known to be milliseconds since boot for one core or microseconds since boot for another.

If desired, a project could collect information about clock parameters separately from pw_log, and use that to interpret log timestamps. For example, they may call an RPC that returns a pw_chrono ClockParamters protobuf. The values from that could be used to interpret timestamp from that device.

The pw_log proto contains two timestamp fields, only one of which may be set at a time:

timestamp – Absolute time when message was logged.
time_since_last_entry – When the message was logged, relative to the previously encoded log message. This is used when multiple log entries are sent in a single LogEntries proto. The previous log entry must use the same time source. If logs with multiple time sources are intermingled in a single LogEntries proto, they must use an absolute timestamp each time the time source changes.

Optionally tokenized text fields#

Several fields in the pw_log proto store text. Examples include message, module, and thread. These fields may contain either plain or tokenized text, either of which is represented as a single bytes field. These fields are marked with a protocol buffer option so the pw_tokenizer.proto module can detect and detokenize tokenized fields as appropriate.

See Tokenized fields in protocol buffers for details.

Packing and unpacking line_level#

As a way to minimize on-the-wire log message size, the log level and the line number of a given log statement are packed into a single proto field. There are helpers in pw_log/proto_utils.h for properly packing and unpacking this field.

 #include "pw_bytes/span.h"
 #include "pw_log/levels.h"
 #include "pw_log/proto_utils.h"
 #include "pw_protobuf/decoder.h"

bool FilterLog(pw::ConstByteSpan serialized_log) {
  pw::protobuf::Decoder log_decoder(serialized_log);
  while (log_decoder.Next().ok()) {
    if (log_decoder.FieldNumber() == 2) {
      uint32_t line_and_level;
      entry_decoder.ReadUint32(&line_and_level);
      PW_DCHECK(entry_decoder.ok());

      uint8_t level = std::get<1>(pw::log::UnpackLineLevel(line_and_level));
      if (level < PW_LOG_LEVEL_INFO) {
        return false;
      }
    }
  }

  return true;
}

Log encoding helpers#

Encoding logs to the log.proto format can be performed using the helpers provided in the pw_log/proto_utils.h header. Separate helpers are provided for encoding tokenized logs and string-based logs.

The following example shows a pw_log_tokenized_HandleLog() implementation that encodes the results to a protobuf.

#include "pw_log/proto_utils.h"

extern "C" void pw_log_tokenized_HandleLog(
    uint32_t payload, const uint8_t data[], size_t size) {
  pw::log_tokenized::Metadata metadata(payload);
  std::byte log_buffer[kLogBufferSize];

  Result<ConstByteSpan> result = EncodeTokenizedLog(
      metadata,
      pw::as_bytes(pw::span(data, size)),
      log_buffer);
  if (result.ok()) {
    // This makes use of the encoded log proto and is custom per-product.
    // It should be implemented by the caller and is not in Pigweed.
    EmitProtoLogEntry(result.value());
  }
}