Materialize

Materialize is a cloud-native data warehouse purpose-built for operational workloads where an analytical data warehouse would be too slow, and a stream processor would be too complicated. Using SQL and common tools in the wider data ecosystem, Materialize allows you to build real-time automation, engaging customer experiences, and interactive data products that drive value for your business while reducing the cost of data freshness.

Key Features

Real-Time Data Interaction

Materialize is designed to help you interactively explore your streaming data, perform analytics against live relational data, or increase data freshness while reducing the load of your dashboard and monitoring tasks. The moment you need a refreshed answer, you can get it in milliseconds.

Correct and Consistent Answers

Materialize focuses on providing correct and consistent answers with minimal latency. It does not ask you to accept approximate answers or eventual consistency. Whenever Materialize answers a query, that answer is the correct result on some specific (and recent) version of your data.

SQL Queries as Dataflows

Materialize recasts your SQL queries as dataflows, which can react efficiently to changes in your data as they happen. This allows for seamless real-time updates and low-latency query results.

Fully Managed Cloud Service

Materialize offers a fully managed cloud-native service featuring high availability via multi-active replication, horizontal scalability by seamlessly scaling dataflows across multiple machines, and near-infinite storage by leveraging cloud object storage (e.g., Amazon S3).

Comprehensive PostgreSQL Support

Materialize supports a large fraction of PostgreSQL, including many built-in functions. It uses the PostgreSQL protocol, allowing you to use your favorite SQL client, including psql.

Get Data In

Materialize can read data from various sources:

  • Kafka: Integrate with Kafka and other Kafka API-compatible systems like Redpanda.
  • PostgreSQL: Use PostgreSQL replication streams.
  • SaaS Applications: Connect via webhooks from various SaaS applications.
  • Database Tables: Support for regular database tables to insert, update, and delete rows.

Transform, Manipulate, and Read Your Data

Materialize supports a wide range of SQL features, using the PostgreSQL dialect and protocol:

  • Multi-column join conditions, multi-way joins, self-joins, cross-joins, inner joins, outer joins, etc.
  • Delta-joins to avoid intermediate state blowup.
  • Subquery decorrelation by the SQL optimizer.
  • Incremental maintenance of views with arbitrary inserts, updates, and deletes.
  • Aggregations: min, max, count, sum, stddev, etc.
  • SQL features: HAVING, ORDER BY, LIMIT, DISTINCT.
  • JSON support including operators and functions.
  • Efficient use of nested views and shared underlying indices for space and compute efficiency.

Example Query

Here's an example join query, TPC-H query 15:

-- Views define commonly reused subqueries.
CREATE VIEW revenue (supplier_no, total_revenue) AS
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1 - l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= DATE '1996-01-01'
        AND l_shipdate < DATE '1996-01-01' + INTERVAL '3' month
    GROUP BY
        l_suppkey;
 
-- The MATERIALIZED keyword triggers incremental maintenance.
CREATE MATERIALIZED VIEW tpch_q15 AS
  SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            max(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;
 
-- Creating an index keeps results up-to-date and in memory.
CREATE INDEX tpch_q15_idx ON tpch_q15 (s_suppkey);

Stream inserts, updates, and deletes on the underlying tables (lineitem and supplier), and Materialize keeps the materialized view incrementally updated. Querying SELECT * FROM tpch_q15 provides current results immediately.

Get Data Out

Pull Based

Use any PostgreSQL-compatible driver to make SELECT queries against your views.

Push Based

Listen to changes using SUBSCRIBE or configure Materialize to stream results to a Kafka topic as soon as the views change.

Documentation

Check out the Materialize documentation.

For Developers

Materialize is primarily written in Rust. Developers can find documentation at doc/developer, and Rust API documentation is hosted here.

Contributions are welcome. Prospective code contributors might find the D-good for external contributors label useful. See CONTRIBUTING.md for additional guidance.

Similar Projects