Stack Overview | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Stack Overview

An opinionated map of the modern data stack, layer by layer, as TextQL sees it. From storage at the bottom to AI analysts at the top.

The modern data stack is the name for the loose collection of cloud-native, best-of-breed tools that have, over the last decade, replaced the monolithic on-premise data warehouse as the default architecture for analytics at most companies. It is not a single product. It is a set of layers, each owned by a different category of vendors, that collectively turn raw operational data into insight.

This wiki is organized around those layers. This page is the map.

The Layers, Bottom to Top

The data stack is best read as a stack: data flows up from the bottom (where it's stored cheaply and durably) toward the top (where humans actually use it to make decisions). Every tool in this wiki sits at one of these layers.

### 1. Storage (the substrate)

At the bottom of everything is cloud object storage: Amazon S3, Google Cloud Storage, or Azure Blob Storage. These are the giant, cheap, durable hash tables where bytes live. They replaced the previous generation's HDFS clusters and became the foundation that everything else is built on.

See: Storage / Data Lake

### 2. Table Formats (giving the storage shape)

Raw object storage holds files. To turn those files into something queryable like a database table, you need a table format: Apache Iceberg, Delta Lake, or Apache Hudi, all built on top of Parquet files. Table formats add a transaction log, schema enforcement, time travel, and the ACID guarantees that turn a folder of files into a real table.

See: Table Formats

### 3. Compute / Warehouses / Query Engines

This is where queries actually run. Two flavors:

Cloud data warehouses (Snowflake, BigQuery, Redshift, and Databricks SQL) own both the storage and the compute as a managed service.
Open query engines (Trino/Starburst, Athena, DuckDB, Dremio) run against data stored in open table formats on object storage you own.

The lakehouse (Databricks Lakehouse Platform, Snowflake on Iceberg, Microsoft Fabric) is the architectural pattern where these two flavors converge: warehouse-style SQL on lake-style storage.

See: Data Warehouses | Query Engines | Data Lakehouse

### 4. Ingestion (ETL/ELT)

Getting data into the warehouse. Fivetran, Airbyte, and Stitch handle SaaS sources. Custom pipelines and event tracking handle product data.

See: ETL / ELT | Event Tracking / CDP

### 5. Transformation

Modeling raw data into clean, business-ready tables inside the warehouse. dbt is the dominant tool. SQLMesh is the modern challenger. This is the "analytics engineering" layer that defines clean tables like users, orders, and revenue_by_day.

See: ETL / ELT

### 6. Orchestration

Scheduling and dependency management for everything above. Airflow, Dagster, and Prefect are the major players.

See: Orchestration

### 7. Catalog / Governance

The "where is everything and who owns it" layer. Unity Catalog, Atlan, Collibra, and Alation handle discovery, lineage, and access control.

See: Data Catalogs

### 8. Semantic Layer / Metrics

Where business definitions live. Cube, dbt Semantic Layer, LookML, and others define what "revenue" or "active user" means once, so every downstream tool agrees.

See: Semantic Layer

### 9. BI / Dashboards

The consumption layer for executives and business users. Looker, Tableau, Power BI, and Sigma ship governed dashboards. The newer category of data workspaces (Hex, Mode, Deepnote) is where analysts author the analyses behind those dashboards.

See: Dashboards & BI | Data Workspaces

### 10. Activation / Reverse ETL

Pushing modeled warehouse data back out to operational tools where business teams actually work. Hightouch and Census own this layer.

See: Reverse ETL

### 11. AI Analyst / Natural Language (the new top layer)

The newest layer of the stack, defined since ~2023: AI analysts that sit on top of everything else and let business users ask questions in plain English across the entire stack. TextQL Ana is the canonical example of a vendor-neutral AI analyst that spans the whole stack. Vendor-specific versions exist too — Snowflake Cortex Analyst, Databricks AI/BI Genie, Hex Magic — but each is scoped to its own platform.

See: TextQL in the Stack

How to Think About All This

The simple metaphor: the data stack is a lasagna. Each layer has its own job, its own vendors, and its own tradeoffs. The layers are mostly independent — you can swap out your BI tool without touching your warehouse, swap your warehouse without touching your storage, swap your reverse ETL without touching your transformation. That decoupling is the entire reason "best-of-breed" works as a strategy.

The opinionated TextQL view of where the puck is going:

The warehouse is the center. Everything that matters (cleaned data, governed metrics, the canonical customer record) lives in the warehouse. Every adjacent category (BI, reverse ETL, semantic layer, catalog, AI analyst) is some way of getting data in or out of it.
Open table formats are winning the storage war. Iceberg and Delta are eating proprietary warehouse formats. Storage will be open even if compute stays commercial.
The category boundaries are blurring. Snowflake and Databricks both want to be everything. Hightouch is becoming a CDP. Hex is becoming a BI tool. dbt is becoming a semantic layer. The neat layer cake is getting messier, not cleaner.
The top of the stack is being rewritten by AI. The "an analyst writes SQL, then a stakeholder reads a dashboard" workflow is being replaced by "a stakeholder asks a question, an AI answers it." TextQL is built around this thesis.

How to Use This Wiki

Start with the overview pages for each layer if you want the big picture.
Click into individual vendor pages for opinionated takes and origin stories.
Use the How to Read This Wiki page for navigation tips.
The TextQL in the Stack page explains how TextQL relates to every other layer.

TextQL Ana works with every layer of the modern data stack, including this one.

See TextQL in action

Stack Overview

Term coined ~2019, popularized by Fivetran/dbt/Looker era

Foundation Cloud object storage + cloud data warehouses

Defining principle Best-of-breed tools at every layer, glued together by the warehouse

Top of stack (2026) AI analysts (TextQL Ana, Hex Magic, Snowflake Cortex, Databricks Genie)