Delta Lake | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Delta Lake

Delta Lake is Databricks' open table format. Technically open source under the Linux Foundation, it remains tightly coupled to Databricks in practice — and is being squeezed by Iceberg's rise.

Delta Lake is Databricks' open table format. It does the same job as Apache Iceberg — ACID transactions, schema evolution, time travel, and all the other warehouse-style guarantees — but on top of Parquet files. It is open source. It is governed by the Linux Foundation. And in practice, it is still a Databricks product in ways that matter.

The metaphor: Delta Lake is a filing system on top of files, written by the company that runs the biggest filing room in the world. Databricks built Delta to fix the specific pain points its customers hit on Spark. That is its strength (it was shaped by real production workloads) and its weakness (it is optimized for one vendor's runtime, and non-Databricks engines have always been second-class citizens).

Origin: The Databricks Story

Databricks started as the commercial company around Apache Spark in 2013. By 2017, Databricks customers were running huge Spark workloads against Parquet files on S3 and hitting the usual data lake problems: no ACID guarantees, partial writes after failures, bad behavior under concurrent writers, and no reliable way to do updates or merges.

Databricks built an internal solution called Delta Lake to address these issues. The core idea was simple and good: maintain a transaction log — an append-only sequence of JSON files in a _delta_log/ directory alongside the Parquet data — that records every change to the table. Each commit is an atomic write to that log. Readers reconstruct the current state of the table by replaying the log. Schema changes, updates, deletes, and time travel all fall out naturally.

Delta was open-sourced in April 2019 at the Spark + AI Summit, under an Apache 2.0 license. Databricks positioned it as the foundation of a new category it called the lakehouse — a term Databricks has been promoting aggressively ever since. In 2019, Databricks donated Delta Lake to the Linux Foundation to give it a veneer of neutral governance.

The Four Properties, Delta-Style

Delta Lake delivers all four table format properties, though the mechanism differs from Iceberg:

ACID transactions. Delta uses a transaction log (the _delta_log directory of JSON files, periodically compacted into Parquet checkpoints). Commits are ordered, atomic log appends. Conflicts are detected via optimistic concurrency control.

Schema evolution. Delta supports adding, dropping, renaming, and reordering columns. Column mapping (added in Delta 2.0) tracks columns by ID rather than name, bringing it closer to Iceberg's model. Older Delta tables without column mapping have weaker guarantees around renames and drops.

Time travel. Because the transaction log preserves history, you can query any previous version of a table by version number or timestamp. This is one of Delta's earliest and most-loved features.

Partition evolution. This is where Delta lags Iceberg. Delta supports partitioning, but changing the partitioning scheme is less clean than in Iceberg. Delta introduced liquid clustering in 2023 as an alternative to traditional partitioning — a more flexible, automatic data layout strategy — which is arguably better than either Delta's or Iceberg's partitioning models for many workloads, but is currently a Databricks-specific feature.

The "Open But Not Really" Problem

Here is the uncomfortable truth about Delta Lake that Databricks marketing will not tell you plainly: Delta Lake is open source, but the best Delta experience is inside Databricks, and the gap is intentional.

A partial list of features that landed in Databricks' proprietary runtime before (or instead of) the open source Delta project:

Photon (Databricks' vectorized C++ execution engine) is proprietary. The best read performance on Delta tables only exists inside Databricks.
Deletion vectors (a more efficient way to handle deletes without rewriting files) originated as a Databricks runtime feature before being added to OSS Delta.
Liquid clustering is Databricks-only as of this writing.
Predictive I/O and predictive optimization are proprietary.
The Delta Sharing protocol is open, but the best governed implementation runs through Unity Catalog.
Many performance tuning features, auto-optimize, and Z-ordering have historically worked better — or only — inside the Databricks runtime.

The Delta Standalone reader and the newer Delta Kernel project exist, and they are genuine open-source efforts to let non-Databricks engines read Delta tables. But third-party support has historically been patchy. Trino has good Delta support. Flink has serviceable support. DuckDB reads Delta but with caveats. Snowflake can read external Delta tables. None of these match the experience you get inside Databricks. That asymmetry is the entire point of Delta from a commercial standpoint.

Losing the Format War, and UniForm

Between 2022 and 2024, it became increasingly clear that Apache Iceberg was winning the table format war. Snowflake picked Iceberg. BigQuery supported Iceberg through BigLake. Trino, Flink, and Spark all shipped excellent Iceberg connectors. The Apache Iceberg REST catalog specification emerged as a universal standard. Tabular — Iceberg's commercial company — grew fast.

Databricks' response came in two moves.

Move one: UniForm. Announced at Data + AI Summit 2023, Delta Universal Format ("UniForm") lets a Delta table write its metadata in a way that is also readable as an Iceberg table. In practice, UniForm generates Iceberg-compatible metadata pointing at the same underlying Parquet files. The intent: a customer writes Delta, but external engines read it as Iceberg. This is a defensive compatibility layer. When a format ships a compatibility layer for the rival format, you know which one is winning.

Move two: the Tabular acquisition. In June 2024, Databricks acquired Tabular — the commercial company founded by Iceberg's original Netflix creators — for a reported $1–2 billion. Databricks pledged to invest in Iceberg, unify Delta and Iceberg metadata, and make the two formats converge. The explicit corporate message was "we're embracing Iceberg." The implicit market message was "we lost the standard war, so we bought the winner."

What this means for Delta going forward is ambiguous. Databricks insists Delta is not deprecated, and Delta will continue to receive investment. But the strategic center of gravity is visibly shifting. Any customer making a greenfield lakehouse decision in 2025 should seriously consider whether writing Delta (even via UniForm) is better than just writing native Iceberg. Many are concluding it isn't.

Where Delta Fits in the Stack

Delta sits at the table format layer, on top of Parquet files in object storage, under Spark, Databricks SQL warehouses, and — to a lesser degree — Trino, Flink, and other external engines. Its catalog story is most complete inside Unity Catalog, Databricks' governance layer, which is where Delta tables get their most powerful access controls, lineage, and sharing features.

Honest Take

Delta Lake is a genuinely good table format. For a pure Databricks shop with no intention of using another query engine, it works extremely well and the ecosystem integration inside Databricks is first class. The issue is not Delta's technical quality; it's its strategic position. The rest of the industry has picked Iceberg, and Databricks itself has tacitly conceded the point by shipping UniForm and buying Tabular. If you are building a new lakehouse in 2026, the safe architectural bet is Iceberg. Delta makes sense if Databricks is your chosen platform and you want to stay on the preferred path inside it.

How TextQL Works with Delta Lake

TextQL Ana connects to Delta Lake tables through Databricks SQL warehouses, via Unity Catalog, or through Trino where Delta is exposed. Delta's transaction log gives Ana consistent, versioned table metadata, which — like Iceberg — is exactly the kind of structured grounding LLM-driven SQL needs to stay correct.

See TextQL in action

Delta Lake

Created 2017 internal / 2019 open-sourced at Databricks

Open-sourced April 2019

Donated to Linux Foundation (2019)

License Apache 2.0

Type Open table format

Commercial backer Databricks

Category Table Formats

Monthly mindshare ~100K · default for Databricks customers; ~7K GitHub stars; broader install base than Iceberg historically