Databricks Lakehouse Platform | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Wiki Data Lakehouse Databricks Lakehouse Platform

Contents

Databricks Lakehouse Platform

The Databricks Lakehouse Platform is the actual product Databricks sells: a unified analytics, ML, and SQL platform built on Delta Lake and cloud object storage. It is the concept of 'lakehouse' packaged as a commercial product.

The Databricks Lakehouse Platform is what Databricks actually sells. The word "lakehouse" gets thrown around as if it were a generic architecture, but in 2026 it is overwhelmingly associated with one specific company and one specific product. The lakehouse, as a category, is the concept that Databricks coined, and the Databricks Lakehouse Platform is the concept packaged as a commercial product. Other vendors have lakehouse-shaped offerings — Snowflake on Iceberg, BigQuery + BigLake, Dremio, Microsoft Fabric — but the canonical implementation is Databricks's.

This page is about the product, not the architectural concept. For the concept of a lakehouse in general, see the Data Lakehouse overview.

What It Actually Is

The Databricks Lakehouse Platform is a unified data and AI platform that runs in your cloud account (AWS, Azure, or GCP) and combines several distinct capabilities into a single workspace:

Storage: data lives as Delta Lake tables — which are Parquet files plus a transaction log — on the underlying cloud object store (S3, ADLS Gen2, or GCS). The cloud object store, not Databricks, holds the data.
Compute: Spark clusters and Photon (Databricks's vectorized C++ query engine) run elastically against the Delta tables. You spin clusters up, run a workload, spin them down.
Databricks SQL: a SQL warehouse experience that runs Photon against Delta tables and feels (intentionally) like a cloud data warehouse for BI users.
Notebooks and ML: collaborative notebooks (Python, SQL, Scala, R), MLflow for model tracking, the Mosaic AI / Foundation Model APIs, and an end-to-end ML lifecycle.
Unity Catalog: a unified governance layer that handles permissions, lineage, audit, and discovery across every workload above. Unity Catalog is the glue.
Delta Live Tables: a declarative pipeline framework for data engineering.
Workflows: an orchestration product for scheduling jobs across the platform.
Mosaic AI / Generative AI: model training, fine-tuning, vector search, and serving for AI workloads, integrated with the same Unity Catalog governance.

The pitch is that all of this is a single platform with a single security model, a single catalog, a single billing relationship, and a single user experience — in contrast to the alternative, which is to assemble Snowflake + a separate ML platform + a separate notebook tool + a separate catalog + a separate orchestration tool from five different vendors.

The Origin Story

Databricks was founded in 2013 by the original creators of Apache Spark (Matei Zaharia, Ali Ghodsi, Reynold Xin, Patrick Wendell, Andy Konwinski, Ion Stoica, and Scott Shenker), as a managed Spark service. For its first several years, Databricks was, essentially, "managed Spark on AWS" — the company you paid if you wanted Spark without operating clusters yourself.

The pivot from "managed Spark" to "lakehouse platform" happened in stages:

2018: Delta Lake announced. This was the foundational piece — Parquet files with an ACID transaction log — that made it possible to treat lake-resident data as if it were warehouse-resident data. Without Delta Lake (or its later cousins Iceberg and Hudi), the lakehouse architecture is not possible.
2020: The seminal paper "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics" by Armbrust, Ghodsi, Xin, and Zaharia, presented at CIDR 2021. This paper coined "lakehouse" as a term and laid out the architectural argument. From this point on, Databricks was no longer "the Spark company" — it was "the lakehouse company."
2020-2022: The Databricks SQL product launched, giving Databricks a credible warehouse-style SQL surface. Photon was released to make it fast. Unity Catalog launched to give the platform a real governance story.
2023-2025: Databricks acquired MosaicML for $1.3B and pivoted hard into generative AI, integrating model training and serving into the same lakehouse platform. The company filed for IPO and reached the highest private valuation of any data infrastructure company in history.

The Opinionated Take: The Lakehouse Concept Is Just the Databricks Sales Motion

Here is the part the marketing won't tell you plainly: "lakehouse" started as an architectural idea, but it has become, in practice, the marketing wedge Databricks uses to compete with Snowflake. The technical claim of the lakehouse paper — that you can have warehouse-style SQL performance and governance on top of open table formats and object storage — is true and it is important. But the term "lakehouse" itself is, in 2026, almost synonymous with "the way Databricks wants you to think about your data architecture, in contrast to the way Snowflake wants you to think about it."

This is fine. Snowflake also coined a category ("Data Cloud") to describe their preferred architecture. Vendors invent terms; that's how marketing works. The honest framing is: the Databricks Lakehouse Platform is one vendor's opinionated bundle of storage format (Delta Lake), query engine (Photon), governance (Unity Catalog), notebooks, ML, and AI, all sold as one product. It is the most complete instantiation of the lakehouse concept, and it is the reference implementation against which other lakehouse offerings (Iceberg + Trino, Snowflake on Iceberg, Microsoft Fabric, Dremio) are measured.

The strongest version of the Databricks pitch is real: if you have both SQL analytics and serious ML/AI workloads, the integration story matters. Putting Snowflake next to a separate ML platform with separate governance is genuinely harder than running everything in Databricks. The weakest version of the pitch is that "lakehouse" is a meaningfully distinct architecture from "warehouse on open table formats" — in 2026, those two phrases describe approximately the same thing, and the convergence is the actual story.

How It's Different from "Just Databricks"

This is a frequent confusion. "Databricks" the company sells the "Databricks Lakehouse Platform" as its core product. They are the same thing in casual conversation, but the platform brand exists to emphasize that Databricks is no longer a Spark service — it is a complete data and AI platform in which Spark is one component among many. Databricks SQL, Mosaic AI, Unity Catalog, Delta Live Tables, and Workflows are all products inside the platform, not separate things you buy.

Where Databricks Lakehouse Fits in the Stack

The Databricks Lakehouse Platform spans multiple layers of what other vendors would call separate categories:

Storage: Delta Lake on cloud object storage.
Compute: Spark + Photon (and now SQL warehouses).
Catalog: Unity Catalog.
Notebooks / IDE: Databricks Notebooks.
Orchestration: Workflows / Jobs.
ML: MLflow, Mosaic AI, model serving.
BI: Databricks SQL + AI/BI Genie + the dashboarding product (and partnerships with Tableau, Power BI, Looker, Hex).

This breadth is the value proposition. It is also the reason customer evaluations of Databricks are complicated — you are not buying a single tool, you are buying a bundle, and the relevant comparison depends on which workload you care about.

How TextQL Works with Databricks Lakehouse

TextQL Ana connects to Databricks SQL as a query backend and treats Delta tables (governed by Unity Catalog) as first-class queryable assets. Business users can ask Ana plain-English questions about Databricks-resident data, and Ana generates the SQL, runs it through Databricks SQL, and returns the answer. Ana also uses Unity Catalog metadata — table descriptions, column comments, lineage — to ground its understanding of what each table means. Ana sits above the Databricks Lakehouse Platform, the same way it sits above Snowflake or BigQuery, and is complementary to Databricks's own AI/BI Genie product (which is scoped to Databricks-resident data only) by spanning the customer's full multi-vendor stack.

See TextQL in action

Databricks Lakehouse Platform

Vendor Databricks

Concept introduced 2020 (Databricks Lakehouse paper)

HQ San Francisco, CA

Category Data Lakehouse

Built on Delta Lake, Spark, Photon, Unity Catalog

Underlying storage S3, ADLS Gen2, GCS

Workloads SQL analytics, data engineering, ML, streaming, generative AI

Monthly mindshare ~200K · the marketing umbrella for Databricks Lakehouse Platform; basically all Databricks customers