#include <token_database.h>

Classes
class	Entries

struct	Entry
	An entry in the token database. More...

class	iterator
	Iterator for `TokenDatabase` values. More...

Public Types
using	value_type = Entry

using	size_type = std::size_t

using	difference_type = std::ptrdiff_t

using	reference = value_type &

using	const_reference = const value_type &

using	pointer = const value_type *

using	const_pointer = const value_type *

using	const_iterator = iterator

using	reverse_iterator = std::reverse_iterator< iterator >

using	const_reverse_iterator = std::reverse_iterator< const_iterator >

Public Member Functions
constexpr	TokenDatabase ()
	Creates a database with no data. `ok()` returns false.

Entries	Find (uint32_t token) const
	Returns all entries associated with this token. This is `O(n)`.

constexpr size_type	size () const
	Returns the total number of entries (unique token-string pairs).

constexpr bool	ok () const

constexpr iterator	begin () const
	Returns an iterator for the first token entry.

constexpr iterator	end () const
	Returns an iterator for one past the last token entry.

Static Public Member Functions
template<typename ByteArray >
static constexpr bool	IsValid (const ByteArray &bytes)

template<const auto & kDatabaseBytes>
static constexpr TokenDatabase	Create ()

template<typename ByteArray >
static constexpr TokenDatabase	Create (const ByteArray &database_bytes)

Static Public Attributes
static constexpr uint32_t	kDateRemovedNever = 0xFFFFFFFF

Detailed Description

Reads entries from a v0 binary token string database. This class does not copy or modify the contents of the database.

The v0 token database has two significant shortcomings:

Strings cannot contain null terminators (\0). If a string contains a \0, the database will not work correctly.
The domain is not included in entries. All tokens belong to a single domain, which must be known independently.

A v0 binary token database is comprised of a 16-byte header followed by an array of 8-byte entries and a table of null-terminated strings. The header specifies the number of entries. Each entry contains information about a tokenized string: the token and removal date, if any. All fields are little- endian.

The token removal date is stored within an unsigned 32-bit integer. It is stored as <day> <month> <year>, where <day> and <month> are 1 byte each and <year> is two bytes. The fields are set to their maximum value (0xFF or 0xFFFF) if they are unset. With this format, dates may be compared naturally as unsigned integers.

embed:rst:leading-asterisk
 
*    ======  ====  =========================
*    Header (16 bytes)
*    ---------------------------------------
*    Offset  Size  Field
*    ======  ====  =========================
*         0     6  Magic number (``TOKENS``)
*         6     2  Version (``00 00``)
*         8     4  Entry count
*        12     4  Reserved
*    ======  ====  =========================
* 
*    ======  ====  ==================================
*    Entry (8 bytes)
*    ------------------------------------------------
*    Offset  Size  Field
*    ======  ====  ==================================
*         0     4  Token
*         4     1  Removal day (1-31, 255 if unset)
*         5     1  Removal month (1-12, 255 if unset)
*         6     2  Removal year (65535 if unset)
*    ======  ====  ==================================
*

Entries are sorted by token. A string table with a null-terminated string for each entry in order follows the entries.

Entries are accessed by iterating over the database. A O(n) Find function is also provided. In typical use, a TokenDatabase is preprocessed by a pw::tokenizer::Detokenizer into a std::unordered_map.

Member Function Documentation

◆ Create() [1/2]

template<const auto & kDatabaseBytes>

static constexpr TokenDatabase pw::tokenizer::TokenDatabase::Create ( )

inlinestaticconstexpr

Creates a TokenDatabase and checks if the provided data is valid at compile time. Accepts references to constexpr containers (array, span, string_view, etc.) with static storage duration. For example:

constexpr char kMyData[] = ...;

constexpr TokenDatabase db = TokenDatabase::Create<kMyData>();

pw::tokenizer::TokenDatabase

Definition: token_database.h:75

◆ Create() [2/2]

template<typename ByteArray >

static constexpr TokenDatabase pw::tokenizer::TokenDatabase::Create ( const ByteArray & database_bytes )

inlinestaticconstexpr

Creates a TokenDatabase from the provided byte array. The array may be a span, array, or other container type. If the data is not valid, returns a default-constructed database for which ok() is false.

Prefer the Create overload that takes the data as a template parameter when possible, since that overload verifies data integrity at compile time.

◆ IsValid()

template<typename ByteArray >

static constexpr bool pw::tokenizer::TokenDatabase::IsValid ( const ByteArray & bytes )

inlinestaticconstexpr

Returns true if the provided data is a valid token database. This checks the magic number (TOKENS), version (which must be 0), and that there is is one string for each entry in the database. A database with extra strings or other trailing data is considered valid.

◆ ok()

constexpr bool pw::tokenizer::TokenDatabase::ok ( ) const

inlineconstexpr

True if this database was constructed with valid data. The database might be empty, but it has an intact header and a string for each entry.

Member Data Documentation

◆ kDateRemovedNever

constexpr uint32_t pw::tokenizer::TokenDatabase::kDateRemovedNever = 0xFFFFFFFF

staticconstexpr

Default date_removed for an entry in the token datase if it was never removed.

The documentation for this class was generated from the following file:

pw_tokenizer/public/pw_tokenizer/token_database.h

Classes

Public Types

Public Member Functions

Static Public Member Functions

Static Public Attributes

Detailed Description

Member Function Documentation

◆ Create() [1/2]

◆ Create() [2/2]

◆ IsValid()

◆ ok()

Member Data Documentation

◆ kDateRemovedNever