NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Apache Pinot
Apache Pinot is a real-time analytics database created at LinkedIn in 2013-2014 to power user-facing analytics features like 'Who Viewed Your Profile.' Open-sourced in 2015, it became an Apache top-level project in 2021, and is commercialized by StarTree.
Apache Pinot is a real-time OLAP database designed for the specific use case of user-facing analytics at extreme scale — the kind of analytics where every LinkedIn user, every Uber driver, every Slack admin, can pull up a personalized dashboard that aggregates millions of underlying events in under 100 milliseconds. It was built at LinkedIn, by LinkedIn engineers, to solve LinkedIn's specific problem: powering features like "Who Viewed Your Profile," "Talent Insights," and "Article Analytics" against the entire LinkedIn dataset, for hundreds of millions of users, with sub-second latency.
If Apache Druid was the engine of ad-tech analytics in the 2010s, Pinot was the engine of social-media-scale user-facing analytics. The two share a similar architectural lineage but were designed with different scale assumptions and different query patterns. In 2026, both are being squeezed by ClickHouse — but Pinot has a more defensible niche than Druid because of its specific strengths in extreme-concurrency, low-latency point lookups.
Pinot was built starting around 2013-2014 inside LinkedIn by an engineering team led by Kishore Gopalakrishna, Praveen Neppalli Naga, and Jean-François Im. The motivating problem was a feature LinkedIn product managers kept wanting to ship: "show every user a personalized dashboard with stats about their profile, their posts, their network." The naive implementation — precomputed daily aggregates per user — was too stale and didn't allow drill-down. The "right" implementation — live aggregations over the raw event stream — was too slow on existing infrastructure to serve hundreds of millions of users at LinkedIn's traffic.
LinkedIn already had Druid in production for some use cases, but Druid wasn't optimized for the kinds of workloads LinkedIn wanted to support (millions of QPS, high-cardinality user-keyed queries, sub-100ms p99 latencies). So they built Pinot, with a deliberate design goal: support the LinkedIn-specific pattern of "every user gets a fast personalized dashboard," not just the ad-tech pattern of "a few analysts drill into aggregate data."
Pinot was open-sourced in 2015, became an Apache incubator project in 2018, and graduated to a top-level Apache project in December 2021. In 2019, several of the original creators left LinkedIn to found StarTree, the commercial company stewarding Pinot, building StarTree Cloud as a managed Pinot offering.
Pinot, Druid, and ClickHouse all share the basic real-time OLAP recipe (columnar storage, time-partitioned segments, streaming ingestion from Kafka). The differences are in the optimizations:
Pinot's distinctive strengths:
Druid's strengths over Pinot:
ClickHouse's strengths over Pinot:
The honest summary: if your workload is "thousands of users hitting personalized dashboards with similar query shapes," Pinot is the best-fit engine in this category. If your workload is "general-purpose real-time OLAP with diverse query patterns," ClickHouse is probably faster, simpler, and easier to operate.
LinkedIn's "Who Viewed Your Profile" page is the canonical Pinot use case. Every LinkedIn user, when they load that page, triggers a query like "show this user the list of people who viewed their profile in the last 90 days, with counts and aggregations." The query is keyed by a single user ID. The data is sourced from billions of profile-view events across all of LinkedIn. The latency budget is under 100 milliseconds. The concurrency is hundreds of thousands of these queries per second across the LinkedIn user base.
This is a hard problem. Star-tree indexes plus high concurrency plus real-time ingestion from Kafka is exactly what Pinot was built for, and it does this kind of work better than anything else in its category.
Other companies have adopted Pinot for similar patterns:
The pattern: large consumer platforms with millions of end users who each get a personalized analytical view. Pinot is the right tool when the workload looks like this.
Outside its niche, Pinot has the same problems as Druid:
For general-purpose real-time analytics, ClickHouse is the easier and faster choice. Pinot's case rests on its specific optimizations for the user-facing analytics pattern.
StarTree, like Imply for Druid and Confluent for Kafka, is the commercial company hoping to turn an open-source project into a sustainable business. StarTree's bet is that the user-facing analytics niche is large enough to support a managed-Pinot business, and that the operational simplification of StarTree Cloud will pull customers away from self-hosting. The competitive question is whether ClickHouse Cloud (which has more momentum, more capital, and more brand recognition) will absorb the user-facing analytics market over time.
Pinot sits downstream of event streaming (Kafka is the standard ingest path) and serves queries to applications, BI tools, and end users. It is an analytical serving database, not a transformation engine — like Druid and ClickHouse, you typically pair it with Flink or another stream processor for non-trivial enrichment.
TextQL Ana connects to Pinot via its SQL interface (REST or JDBC) and queries it the same way it queries other SQL backends. Where Pinot is genuinely interesting for TextQL users is in organizations that have already built user-facing analytics on Pinot — TextQL becomes a natural-language interface to data that already powers customer-visible features, with the same freshness and concurrency characteristics.
See TextQL in action