NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Rockset
Rockset was a real-time analytics database founded in 2016 by ex-Facebook engineers Venkat Venkataramani and Dhruba Borthakur. It pitched 'schemaless real-time SQL analytics' as a venture-backed alternative to ClickHouse and Druid, and was acquired by OpenAI in June 2024 for an estimated $200-300 million.
Rockset was a real-time analytics database that, between 2016 and 2024, pitched the most differentiated technical story in its category: fully-indexed real-time SQL analytics on schemaless JSON data, with no data modeling required. It was founded by a team of ex-Facebook engineers who had built core pieces of Facebook's online data infrastructure (RocksDB, the underlying storage engine of many modern databases including CockroachDB, TiDB, and Yugabyte), and it raised over $100M in venture funding to compete with ClickHouse, Druid, and Pinot.
In June 2024, OpenAI acquired Rockset for a reported $200-300 million. The Rockset team and technology were absorbed into OpenAI, and the external commercial product was wound down. This is the rare case where a real-time analytics database company exited not to a database vendor, not via IPO, not by absorbing the open-source ecosystem — but by being bought to power the internal infrastructure of a hyperscaling AI company.
Rockset was founded in 2016 by Venkat Venkataramani, Dhruba Borthakur, Shruti Bhat, and Tudor Bosman. Their backgrounds were heavily concentrated in Facebook's online data infrastructure:
Their thesis: the analytical database market was stuck in a false choice between data warehouses (slow, batch-oriented, requires extensive modeling) and real-time OLAP databases like Druid (fast, but require careful schema design and have a high operational tax). What if you could build a database that ingested raw JSON, indexed everything automatically, and let you write SQL against it with no upfront modeling — and ran with the latency profile of an online system?
This is a real engineering challenge, and Rockset's approach to it was technically interesting.
Rockset's key innovations:
1. Converged indexing. Rockset built every column with three indexes simultaneously: a row index, a column index, and an inverted index. This let the query optimizer choose the best access path for any query at runtime — point lookups, range scans, full-text search, aggregations — without the user needing to declare which indexes to build. The cost was higher storage and ingest overhead, but the result was that "any query was fast."
2. Schemaless ingestion. You could pipe JSON or any semi-structured data into Rockset and query it with SQL immediately. New fields were automatically detected and indexed. Schema evolution was free. This was a real differentiator for use cases where data shape changed frequently or where teams didn't want to invest in upfront schema design.
3. Compute-storage separation built in. Rockset was cloud-native from day one, with separate compute units (called "virtual instances") that could be sized independently and scaled per workload. This was years ahead of where Druid or self-hosted ClickHouse was on this dimension.
4. RocksDB as the storage engine. Building on Borthakur's expertise, Rockset used RocksDB as the foundation. This gave it efficient storage management and good single-node performance characteristics out of the box.
5. SQL-first interface. Rockset spoke standard SQL (with some extensions for working with nested JSON), unlike Druid's historical JSON query language. This made it accessible to a broader audience.
Rockset was technically interesting, well-funded, and well-marketed. It nonetheless failed to reach the scale of ClickHouse or even Druid as an independent commercial entity. The honest reasons:
1. ClickHouse existed and was free. Open-source ClickHouse offered comparable (or better) raw query performance, was fully open source under Apache 2.0, and required no commercial relationship. For most prospects, the incremental value of Rockset's auto-indexing and schemaless features wasn't enough to justify a paid SaaS over self-hosted ClickHouse.
2. The auto-indexing tradeoff cut both ways. Indexing every column on every event is expensive in storage and ingest cost. At scale, this made Rockset more expensive than ClickHouse, which lets users choose what to index. Customers with sophisticated data engineering teams preferred to pay engineering time to model data well and run cheaper infrastructure.
3. Closed source in an open-source category. Rockset was a proprietary cloud service with no open-source version. In a category dominated by open-source projects (Druid, Pinot, ClickHouse), this was a hard sell to engineers who wanted to evaluate a system on their own terms.
4. The market category was already crowded. By 2020-2021, real-time OLAP had three established open-source contenders plus first-party offerings from every cloud (BigQuery BI Engine, Snowflake's evolution toward lower latency, Databricks SQL Photon). Carving out independent commercial space was hard.
In June 2024, OpenAI announced its acquisition of Rockset. The strategic rationale, as stated publicly: OpenAI wanted Rockset's real-time analytics and search infrastructure to power retrieval and analytics inside OpenAI's products. The Rockset technology — particularly its converged indexing and low-latency query engine — was valuable for the kind of structured-plus-unstructured retrieval that LLM-powered products need.
For the broader real-time analytics market, the acquisition signaled two things:
As an external product, Rockset is gone. The cloud service has been wound down for new customers, existing customers have been migrated off, and the team works inside OpenAI on infrastructure that is not externally sold. If you are picking a real-time analytics database in 2026, Rockset is not on the list.
What endures is the lesson: the technical problems Rockset tried to solve — schemaless ingestion, automatic indexing, low operational overhead, true cloud-native architecture — are real problems that the rest of the category continues to work on. ClickHouse Cloud has absorbed many of the cloud-native ideas. The "schemaless" pitch has migrated into the broader story of semi-structured data support in modern OLAP engines.
Rockset sat downstream of streaming sources (Kafka, Kinesis, DynamoDB Streams, MongoDB change streams) and served queries to applications, dashboards, and operational systems via a SQL API. Its closest analogs were Druid, Pinot, and ClickHouse, with the differentiating features being its schemaless ingestion and full automatic indexing.
When Rockset existed as a commercial product, TextQL Ana could connect to it via standard SQL like any other backend. With Rockset's external product wound down, TextQL users who previously ran on Rockset have generally migrated to ClickHouse, Snowflake, or other analytical backends.
See TextQL in action