ParityDb

ParityDb is an embedded persistent key-value store optimized for blockchain applications. Designed by Parity Technologies, it focuses on efficient storage and retrieval of blockchain state data, making it ideal for high-performance blockchain applications.

Design Considerations

ParityDb is specifically designed to handle the unique demands of blockchain applications:

  • Efficient Storage: It efficiently stores blockchain state encoded into the Patricia-Merkle trie. Keys are fixed size and uniformly distributed, and most values are small.
  • Read-Optimized: Prioritizes read performance over write performance to enhance blockchain transaction throughput. Writes are performed in large batches during block imports, with idle periods in between.
  • Reference Counting: Supports reference counting since trie nodes may be shared by multiple tries and branches.
  • Transactional: Supports transactions, ensuring atomicity of writes and preventing partially committed data from being queried.

Key Features

API

ParityDb provides a universal key-value storage that supports transactions. Data can be partitioned into columns, each containing entries of a single data type, such as state trie nodes, block headers, or blockchain transactions. Two types of column indexes are supported: Hash and Btree.

Transactions

The database supports multiple concurrent readers, while writes are serialized and performed in batches (transactions). Transactions are applied atomically, ensuring all data in a transaction is either written or not written at all.

No Cache

ParityDb relies on the OS page cache instead of implementing custom data caching, making the performance of a large database dependent on available system memory.

Durability

The database ensures consistency even if IO is interrupted, although the latest state before the interruption might not be restored. Committed writes not yet saved to disk by background threads are lost.

Implementation Details

Data Structure

Each column stores data in a set of 256 value tables. The first 255 tables contain entries of certain size ranges up to a 32 kB limit, while the 256th table stores entries over 32 kB, split into multiple parts. Hash columns also include a hash index file.

Metadata

A metadata file defines the database structure, including a set of columns with specific configurations.

Hash Index

The hash index is a mmap-backed dynamically sized probing hash table. Each key's 256-bit hash 'k' is used to map to a 512-byte index page. Each page contains an unordered list of 64 8-byte entries. If an insertion is attempted into a full index page, a reindex is triggered.

Value Tables

Value tables are linear arrays of fixed-size entries. Each entry can contain:

  • Filled Entry: Contains 240 bits of k, 15-bit data value size, compression flag, optional reference counter, and the actual value.
  • Tombstone Entry: Contains an index of the previous tombstone, forming a linked list of empty entries.
  • Multipart Entry: Similar to Filled, but holds an address of the next entry for data continuation.

Hash Index Operations

Lookup

Compute k, find the index page using the first n bits, search for a matching entry with matching key bits, and use the address to query the partial k and value from the value table.

Insertion

Triggers a reindex if an insertion is attempted into a full index page. Reindexing creates a new index table with twice the capacity and starts moving entries from the old table to the new one, checking both tables during the process.

Transaction Pipeline

On commit, all data is moved to an in-memory overlay, making it available for queries, then added to the commit queue. The commit worker processes the queue, writes modified data to a binary log file, and flushes it to disk. A finalization thread reads the log file and applies changes to the tables, clearing the overlay.

On startup, existing log files are validated and applied to restore the database state.

Similar Projects