Real-time Analytics | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Wiki Real-time Analytics Real-time Analytics

Contents

Real-time Analytics

Real-time analytics databases are OLAP systems built to answer analytical queries over fresh, high-volume event data in milliseconds rather than seconds. ClickHouse has become the dominant choice; Druid and Pinot are the prior generation; Rockset was acquired by OpenAI in 2024.

A real-time analytics database is a specialized OLAP system designed to do two things at the same time, both of which traditional warehouses do badly: ingest events as they happen, and answer analytical queries over those events in milliseconds. Cloud data warehouses like Snowflake and BigQuery are powerful, but they were built for "scan a billion rows in seconds when an analyst clicks Run." Real-time analytics databases are built for "scan a billion rows in 50 milliseconds when a user loads a dashboard, and do this for thousands of users concurrently, with data that's no more than a few seconds old."

The mental model worth holding: a data warehouse is for asking ad-hoc questions about yesterday. A real-time analytics database is for powering a live dashboard, an in-product analytics feature, or an alerting system that needs answers about what is happening right now. They are not the same product even though they are both "OLAP databases."

Why Warehouses Aren't Enough

If warehouses are so good, why do we need a separate category at all? Three reasons:

1. Latency floor. Snowflake, BigQuery, and Redshift have query latency in the seconds-to-tens-of-seconds range for typical analytical queries. That is fine for an analyst running a query, but unacceptable for a user-facing dashboard that needs to render in under a second when 10,000 users hit it at the same time. Real-time OLAP databases hit p95 query latencies in the tens of milliseconds for the same kinds of aggregations.

2. Concurrency. Warehouses are designed for tens to low hundreds of concurrent queries. A user-facing analytics product might need thousands or tens of thousands of concurrent queries. Real-time OLAP engines are built for that level of concurrency on commodity hardware.

3. Streaming ingestion. Warehouses traditionally batch-load data on intervals (Snowpipe Streaming has narrowed this gap, but not fully). Real-time OLAP databases ingest events directly from Kafka or Kinesis with end-to-end freshness in seconds.

The classic use case where this matters: in-product analytics. Think LinkedIn's "who viewed your profile" page, or Stripe's transaction dashboard, or any SaaS product with a live "your usage this month" view. These are queries running over event data, served to end users, with low latency and high concurrency. A warehouse cannot do this economically. A real-time OLAP database can.

How These Engines Work (the Common Pattern)

Almost every real-time OLAP database shares a similar architectural recipe:

1. Columnar storage. Like warehouses, they store data column-by-column, not row-by-row. Analytical queries that touch a few columns over many rows scan only the relevant columns.

2. Aggressive compression and encoding. Specialized codecs (delta-of-delta for timestamps, run-length encoding for low-cardinality strings, gorilla compression for floats) shrink data dramatically and let queries run on compressed data without decompression.

3. Pre-aggregation or rollups. Most engines support rolling up incoming data into pre-computed aggregations at ingest time (Druid is famous for this). This trades some query flexibility for massive speedups.

4. Time-partitioned segments. Data is split into time-based segments (hourly, daily) so queries that filter by recent time ranges only scan recent segments. This is the single most important optimization for time-series-shaped event data.

5. In-memory or SSD-resident hot data. The recent data lives in memory or local SSD; older data is offloaded to cheaper storage. The mix is engine-specific.

6. Streaming ingestion. Direct Kafka/Kinesis consumers, with events queryable seconds after ingestion.

The differences between ClickHouse, Druid, and Pinot are in the details of these recipes — how they handle joins (some don't, well), how they manage schema evolution, what kinds of queries they optimize for, and how they trade off ingestion latency vs. query latency vs. operational complexity.

The Honest Vendor Take: ClickHouse Won

For a long time, the conventional wisdom in this category was a three-way race between Apache Druid (LinkedIn-adjacent origin, oldest of the three, ad-tech roots), Apache Pinot (LinkedIn-built, used to power LinkedIn's profile views), and ClickHouse (Yandex origin, originally for clickstream analytics). All three solved similar problems with similar architectures.

That race is over. ClickHouse won. It is the dominant real-time OLAP database in 2026 by every meaningful measure — adoption, ecosystem, mindshare, GitHub activity, hiring demand. The reasons:

Operational simplicity. A single C++ binary, no JVM, no ZooKeeper (in newer versions), straightforward to operate even at small scale. Druid by comparison has historical, broker, coordinator, overlord, and middle manager processes plus deep ZooKeeper dependency. The operational tax is not even close.
Query power. ClickHouse supports a much richer SQL surface than Druid or Pinot, including joins (for years a weakness of the Druid/Pinot lineage). For mixed analytical workloads, this matters.
A real commercial vehicle. ClickHouse Inc. (founded 2021 by Aaron Katz with the original ClickHouse creators from Yandex) raised significant funding and built ClickHouse Cloud, the managed offering. Druid's commercial sponsor Imply has been smaller and quieter; Pinot's commercial sponsor StarTree similarly.
Community momentum. The ClickHouse community is vastly larger and more active than the Druid or Pinot communities in 2025-2026.

Apache Druid is fading, not dead. It remains in production at large adopters like Netflix, Airbnb, and Lyft. New deployments are rarer.

Apache Pinot still has its niche — LinkedIn famously uses it to power user-facing features at scale, and StarTree is doing real work commercializing it — but it has not displaced ClickHouse outside its core constituency.

Rockset was the venture-backed entrant that pitched "real-time analytics with full SQL on schemaless data." It was technically interesting and well-funded but never reached the scale of the others. OpenAI acquired Rockset in June 2024, ostensibly to use its real-time analytics capabilities in OpenAI's infrastructure. Rockset's external commercial product was wound down. The lesson, perhaps: this category is hard to monetize as a standalone vendor when ClickHouse exists as a free open-source alternative.

Where Real-Time Analytics Fits in the Stack

Real-time OLAP databases sit downstream of event streaming and stream processing, and at the same level as data warehouses — but for a different use case. The typical architecture: events flow through Kafka, are optionally enriched by Flink, and are written into ClickHouse (or Druid, or Pinot) where they become queryable for live dashboards, in-product analytics, observability tools, and operational alerting.

Real-time OLAP databases are not a replacement for warehouses. They are a complement. Most large data organizations run both: a warehouse for batch analytics, ad-hoc questions, and ML training data; a real-time OLAP database for the use cases that need millisecond latency on fresh event data.

When You Actually Need a Real-Time OLAP Database

Scenario	Real-time OLAP?	Alternative
—-	—-	—-
User-facing live dashboard with thousands of concurrent users	Yes	Warehouse cannot handle the concurrency
In-product analytics ("your usage this month")	Yes	Same
Observability / log analytics at high volume	Yes (often ClickHouse)	Elasticsearch is the legacy alternative
Real-time fraud detection or alerting	Yes	Combined with stream processing
Internal BI dashboards refreshed daily	No	Warehouse
Ad-hoc analyst queries on historical data	No	Warehouse
ML training data preparation	No	Lakehouse / warehouse
Real-time leaderboards with millisecond freshness	Maybe	Or Materialize

Tools in This Category

ClickHouse — The dominant real-time OLAP database. Yandex origin, now ClickHouse Inc.
Apache Druid — The original "interactive analytics on streaming data" engine. Fading vs ClickHouse.
Apache Pinot — LinkedIn-built real-time OLAP, strong at user-facing analytics, commercialized by StarTree.
Rockset — Venture-backed real-time analytics database, acquired by OpenAI in June 2024.

How TextQL Works with Real-Time Analytics

Real-time OLAP databases are first-class TextQL connections. Because ClickHouse, Druid, and Pinot all speak SQL (with some dialect variation), TextQL Ana can query them directly the same way it queries Snowflake or BigQuery. The interesting use case is freshness: when a business user asks "how many sign-ups did we get in the last hour, by region," TextQL pointed at ClickHouse can answer with data that is seconds old, not yesterday's batch. Real-time OLAP backends are how TextQL goes from "analytics on historical data" to "analytics on what's happening right now."

See TextQL in action

Real-time Analytics

Category Sub-second OLAP databases for event data

Also called Real-time OLAP, streaming analytics databases, columnar event stores

Not to be confused with Cloud data warehouses (slower, larger), stream processors (compute engines)

Key vendors ClickHouse, Druid, Pinot, Rockset

Typical latency Sub-second query, sub-minute ingestion

Typical use cases Live dashboards, in-product analytics, observability, ad-tech

Monthly mindshare ~80K · specialized OLAP-on-events category; smaller than warehouses