NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →

NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →

Dremio

Dremio is an open data lakehouse SQL engine built around Apache Iceberg, Apache Arrow, and a query acceleration feature called Reflections. It positions itself as the open challenger to Snowflake and Databricks for analytics on the lake.

Dremio is an SQL query engine and data platform built specifically for the data lakehouse. It reads Apache Iceberg tables sitting on object storage, executes queries with a vectorized Apache Arrow engine, and accelerates them with a feature called Reflections — materialized views the optimizer can use transparently. Its pitch is straightforward: you can have warehouse-grade SQL performance without putting your data in a warehouse.

If Trino is the federated query engine that wants to read everything, Dremio is the focused lakehouse engine that wants to replace your warehouse. It is one of the most direct open-architecture challengers to Snowflake in the market.

Origin Story

Dremio was founded in 2015 by Tomer Shiran and Jacques Nadeau, both of whom came from MapR and were the original creators of Apache Drill — another SQL-on-Hadoop engine. Drill was technically interesting but never won decisive commercial traction, and Dremio is best understood as Shiran and Nadeau's second swing at the same problem with the lessons learned. They also co-created Apache Arrow — the in-memory columnar format that has since become the lingua franca for moving columnar data between systems — which gives you a sense of the technical pedigree.

The original Dremio pitch in 2017 was "self-service analytics on the data lake without ETL." A business analyst could point a BI tool at Dremio, browse virtual datasets, build queries against raw lake files, and Dremio's optimizer would figure out the rest. That pitch was early, and the market wasn't quite ready — most enterprises were still busy moving everything into Snowflake.

Dremio's second life started around 2021 when it pivoted hard onto Apache Iceberg. The company became one of the loudest voices in the Iceberg ecosystem, contributed significantly to the project, built a managed Iceberg catalog (Nessie) with Git-like branching, and rebranded as the "Open Data Lakehouse." This bet has aged well. As Iceberg became the dominant open table format, Dremio became one of the natural commercial homes for Iceberg-first analytics.

What's Distinctive About Dremio

Reflections. This is the technical idea Dremio is best known for. A Reflection is a materialized view — a precomputed slice or aggregation of your data, persisted as a Parquet/Iceberg dataset. When you query a base table, Dremio's optimizer transparently rewrites the query to read from a matching Reflection if one exists. The user doesn't know or care; their query just runs faster. This is essentially what BI extracts (Tableau extracts, Power BI imports) do, but managed and queried at the engine level. Reflections are Dremio's answer to "but lake queries are slow." You don't make the lake faster — you precompute the slow parts.

Apache Arrow native. Dremio's execution engine works in Arrow buffers end-to-end. This means data doesn't get copied or reformatted as it moves through the query pipeline, which is genuinely faster than engines that round-trip through other formats. Arrow Flight, the high-speed wire protocol Dremio uses, also lets clients pull query results in parallel without the overhead of a row-by-row JDBC/ODBC driver.

Iceberg-native everything. Dremio reads and writes Iceberg tables natively, including merge, update, delete, and time-travel queries. It bundles a Nessie-based catalog that gives Iceberg tables Git-like branching and tagging — so you can create a branch of your data, run an experiment, and merge it back. This is one of the features Iceberg backers point to as a reason the open lakehouse is more interesting than the closed warehouse.

Semantic layer / virtual datasets. Dremio includes a built-in semantic layer where analysts curate "virtual datasets" — saved SQL views with documentation and lineage — that BI tools see as if they were tables. This is real semantic-layer functionality baked into the engine, which most query engines don't have.

Where Dremio Sits in the Market

Dremio's competitive position is interesting and a little uncomfortable. It is technically excellent, demonstrably fast on Iceberg workloads, and has a clear architectural story. But it sits in a market where the giants on either side are absorbing its features:

  • Snowflake added Iceberg table support and Polaris (their own Iceberg catalog) in 2024.
  • Databricks has Photon and Unity Catalog and is aggressively pursuing the same lakehouse pitch.
  • Starburst / Trino competes for the same "open SQL on the lake" buyer.

Dremio's bet is that none of those players will ever be truly open the way an Iceberg-first independent vendor can be — and that as Iceberg adoption grows, customers will want a compute layer they don't have to buy from the same company that owns their catalog or storage. That is a real bet. It is also a hard one, because Snowflake and Databricks have enormous distribution.

The honest take: Dremio is the most credible "we are not Snowflake or Databricks" alternative for organizations that want a lakehouse and want it open. It has good technology, a clear identity, and committed customers in regulated industries that prize portability. But it has not (yet) become the default choice the way Snowflake did for the cloud warehouse era. Whether that changes depends a lot on how the Iceberg-vs-everyone-else story plays out over the next two or three years.

What Dremio Is Not

It is not a federated query engine. Dremio can connect to a few external sources (relational databases, MongoDB, Elasticsearch) but federation is not its focus. If you want to join data from 20 different sources, Trino is the better tool. Dremio is optimized for "SQL on the lake," not "SQL on everything."

It is not a transactional database. Like Trino, Dremio is for analytical queries against analytical data. It is not where you put your application's writes.

It is not embedded. Dremio runs as a coordinator + executors cluster, similar to Trino. It is not a single-node tool like DuckDB.

TextQL Fit

TextQL connects to Dremio as a SQL endpoint. For organizations running a Dremio-on-Iceberg lakehouse, this is one of the cleanest natural-language-to-data integrations available — the semantic layer that Dremio exposes (virtual datasets, descriptions, lineage) gives Ana strong context to ground its SQL generation, and Reflections handle the latency problem that would otherwise make interactive lake queries painful.

See TextQL in action

See TextQL in action

Dremio
Founded 2015
Headquarters Santa Clara, CA
Founders Tomer Shiran, Jacques Nadeau (Apache Drill / MapR alumni)
License Apache 2.0 (Community); Commercial (Cloud / Enterprise)
Query language ANSI SQL
Built on Apache Arrow, Apache Iceberg, Apache Calcite
Category Query Engines
Monthly mindshare ~20K · ~1K customers; Iceberg-native lakehouse positioning