NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Delta Lake
Delta Lake is Databricks' open table format. Technically open source under the Linux Foundation, it remains tightly coupled to Databricks in practice — and is being squeezed by Iceberg's rise.
Delta Lake is Databricks' open table format. It does the same job as Apache Iceberg — ACID transactions, schema evolution, time travel, and all the other warehouse-style guarantees — but on top of Parquet files. It is open source. It is governed by the Linux Foundation. And in practice, it is still a Databricks product in ways that matter.
The metaphor: Delta Lake is a filing system on top of files, written by the company that runs the biggest filing room in the world. Databricks built Delta to fix the specific pain points its customers hit on Spark. That is its strength (it was shaped by real production workloads) and its weakness (it is optimized for one vendor's runtime, and non-Databricks engines have always been second-class citizens).
Databricks started as the commercial company around Apache Spark in 2013. By 2017, Databricks customers were running huge Spark workloads against Parquet files on S3 and hitting the usual data lake problems: no ACID guarantees, partial writes after failures, bad behavior under concurrent writers, and no reliable way to do updates or merges.
Databricks built an internal solution called Delta Lake to address these issues. The core idea was simple and good: maintain a transaction log — an append-only sequence of JSON files in a _delta_log/ directory alongside the Parquet data — that records every change to the table. Each commit is an atomic write to that log. Readers reconstruct the current state of the table by replaying the log. Schema changes, updates, deletes, and time travel all fall out naturally.
Delta was open-sourced in April 2019 at the Spark + AI Summit, under an Apache 2.0 license. Databricks positioned it as the foundation of a new category it called the lakehouse — a term Databricks has been promoting aggressively ever since. In 2019, Databricks donated Delta Lake to the Linux Foundation to give it a veneer of neutral governance.
Delta Lake delivers all four table format properties, though the mechanism differs from Iceberg:
ACID transactions. Delta uses a transaction log (the _delta_log directory of JSON files, periodically compacted into Parquet checkpoints). Commits are ordered, atomic log appends. Conflicts are detected via optimistic concurrency control.
Schema evolution. Delta supports adding, dropping, renaming, and reordering columns. Column mapping (added in Delta 2.0) tracks columns by ID rather than name, bringing it closer to Iceberg's model. Older Delta tables without column mapping have weaker guarantees around renames and drops.
Time travel. Because the transaction log preserves history, you can query any previous version of a table by version number or timestamp. This is one of Delta's earliest and most-loved features.
Partition evolution. This is where Delta lags Iceberg. Delta supports partitioning, but changing the partitioning scheme is less clean than in Iceberg. Delta introduced liquid clustering in 2023 as an alternative to traditional partitioning — a more flexible, automatic data layout strategy — which is arguably better than either Delta's or Iceberg's partitioning models for many workloads, but is currently a Databricks-specific feature.
Here is the uncomfortable truth about Delta Lake that Databricks marketing will not tell you plainly: Delta Lake is open source, but the best Delta experience is inside Databricks, and the gap is intentional.
A partial list of features that landed in Databricks' proprietary runtime before (or instead of) the open source Delta project:
The Delta Standalone reader and the newer Delta Kernel project exist, and they are genuine open-source efforts to let non-Databricks engines read Delta tables. But third-party support has historically been patchy. Trino has good Delta support. Flink has serviceable support. DuckDB reads Delta but with caveats. Snowflake can read external Delta tables. None of these match the experience you get inside Databricks. That asymmetry is the entire point of Delta from a commercial standpoint.
Between 2022 and 2024, it became increasingly clear that Apache Iceberg was winning the table format war. Snowflake picked Iceberg. BigQuery supported Iceberg through BigLake. Trino, Flink, and Spark all shipped excellent Iceberg connectors. The Apache Iceberg REST catalog specification emerged as a universal standard. Tabular — Iceberg's commercial company — grew fast.
Databricks' response came in two moves.
Move one: UniForm. Announced at Data + AI Summit 2023, Delta Universal Format ("UniForm") lets a Delta table write its metadata in a way that is also readable as an Iceberg table. In practice, UniForm generates Iceberg-compatible metadata pointing at the same underlying Parquet files. The intent: a customer writes Delta, but external engines read it as Iceberg. This is a defensive compatibility layer. When a format ships a compatibility layer for the rival format, you know which one is winning.
Move two: the Tabular acquisition. In June 2024, Databricks acquired Tabular — the commercial company founded by Iceberg's original Netflix creators — for a reported $1–2 billion. Databricks pledged to invest in Iceberg, unify Delta and Iceberg metadata, and make the two formats converge. The explicit corporate message was "we're embracing Iceberg." The implicit market message was "we lost the standard war, so we bought the winner."
What this means for Delta going forward is ambiguous. Databricks insists Delta is not deprecated, and Delta will continue to receive investment. But the strategic center of gravity is visibly shifting. Any customer making a greenfield lakehouse decision in 2025 should seriously consider whether writing Delta (even via UniForm) is better than just writing native Iceberg. Many are concluding it isn't.
Delta sits at the table format layer, on top of Parquet files in object storage, under Spark, Databricks SQL warehouses, and — to a lesser degree — Trino, Flink, and other external engines. Its catalog story is most complete inside Unity Catalog, Databricks' governance layer, which is where Delta tables get their most powerful access controls, lineage, and sharing features.
Delta Lake is a genuinely good table format. For a pure Databricks shop with no intention of using another query engine, it works extremely well and the ecosystem integration inside Databricks is first class. The issue is not Delta's technical quality; it's its strategic position. The rest of the industry has picked Iceberg, and Databricks itself has tacitly conceded the point by shipping UniForm and buying Tabular. If you are building a new lakehouse in 2026, the safe architectural bet is Iceberg. Delta makes sense if Databricks is your chosen platform and you want to stay on the preferred path inside it.
TextQL Ana connects to Delta Lake tables through Databricks SQL warehouses, via Unity Catalog, or through Trino where Delta is exposed. Delta's transaction log gives Ana consistent, versioned table metadata, which — like Iceberg — is exactly the kind of structured grounding LLM-driven SQL needs to stay correct.
See TextQL in action
Related topics