dbt | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

dbt

dbt (data build tool) is the de facto SQL-based transformation layer for the modern data stack. Founded in 2016 by Tristan Handy and the team at RJMetrics, dbt invented the analytics engineering discipline.

dbt (lowercase, always; it stands for "data build tool") is the standard way to transform data inside a cloud data warehouse using SQL. If your team has a Snowflake, BigQuery, Redshift, or Databricks account in 2026, the odds that dbt is somewhere in your stack are roughly the same as the odds that Git is in your engineering org. It is, for practical purposes, the default.

Plain English: dbt lets you write SQL SELECT statements that describe the tables you want to exist, and it figures out how to create them. You write a file called customers.sql that contains a query joining raw orders and raw users. dbt wraps that query in CREATE TABLE customers AS ..., runs it against your warehouse, and now there's a customers table. Multiply that by hundreds of files, add automatic dependency resolution, testing, documentation, and Git integration, and you have what the industry calls analytics engineering.

Origin Story: From Consulting Pain to a Movement

dbt was born inside RJMetrics, a Philadelphia-based BI startup that was acquired by Magento in 2016. After the acquisition, three of the engineers — Tristan Handy, Drew Banin, and Connor McArthur — spun out a consultancy called Fishtown Analytics (named after the Philadelphia neighborhood). Their job was to help startups set up Redshift and answer analytics questions. They kept running into the same problem at every client: there was no good way to organize SQL transformations.

The available options in 2016 were grim. You could write raw SQL into Airflow tasks (brittle, no testing, hard to read). You could buy Informatica or Matillion (heavyweight, GUI-driven, expensive). You could write stored procedures in your warehouse (proprietary, untestable, version-control-hostile). None of these matched the way the consultants actually wanted to work, which was: write SQL in a text editor, commit it to Git, run it from the command line, get errors when things break.

So they built it themselves. The first version of dbt was a command-line tool that did one thing: take a folder of .sql files, build a dependency graph based on ref() macros that referenced other models, and execute them in topological order against the warehouse. They open-sourced it in 2016. Within two years, it was the most-discussed project in the data community Slack (which itself eventually became dbt Labs' Slack — a community of over 100,000 data practitioners).

In 2021, Fishtown Analytics rebranded to dbt Labs, raised a $150M Series C at a $1.5B valuation, and then jumped to $4.2B in February 2022 at the peak of the data tooling boom. Tristan Handy's blog, The Analytics Engineering Roundup, became required reading for an entire profession.

What dbt Actually Does

A dbt project is a folder of SQL and YAML files. The core unit is the model, which is a .sql file containing one SELECT statement. Here's a tiny example:

-- models/marts/customers.sql
SELECT
  u.user_id,
  u.email,
  COUNT(o.order_id) AS lifetime_orders,
  SUM(o.total) AS lifetime_revenue
FROM {{ ref('stg_users') }} u
LEFT JOIN {{ ref('stg_orders') }} o USING (user_id)
GROUP BY 1, 2

The {{ ref('stg_users') }} is a Jinja macro that tells dbt: "this model depends on stg_users." dbt parses every file, builds a directed acyclic graph (DAG) of dependencies, and runs them in the right order. When dbt run finishes, you have a customers table in your warehouse.

Around that core idea, dbt adds:

Materializations. You can declare a model as a view, a table, an incremental model (only rebuild new rows), or an ephemeral CTE. dbt handles all the CREATE TABLE / MERGE / INSERT boilerplate.
Tests. YAML-defined assertions like "this column is unique," "this column is never null," "this column references that one." Failed tests block the pipeline.
Documentation. dbt auto-generates a website showing every model, its columns, its lineage, and its descriptions.
Sources. Declared external tables (the raw data Fivetran loaded) that dbt can test, document, and reference.
Packages. A package manager (similar to npm) for sharing dbt code — the most famous being dbt-utils, the unofficial standard library.
Macros. Jinja-based reusable SQL snippets, which is how dbt-utils, audit logging, and dynamic schema patterns are built.
Snapshots. A built-in slowly-changing-dimension (SCD Type 2) implementation.
Semantic Layer / MetricFlow. Acquired from Transform in 2023, this is dbt's answer to the metric definition problem: define a metric once in YAML, query it from any BI tool.

Why dbt Won

There are technically more powerful tools. SQLMesh has better incrementality and column-level lineage. Coalesce has a slick GUI. Snowflake Dynamic Tables and Databricks DLT bring transformation native to the warehouse. But dbt won, and continues to win, for reasons that have very little to do with features.

1. It treats SQL as code. Before dbt, SQL was something you wrote in a query editor and pasted into a stored procedure. dbt insisted SQL belonged in a Git repo, with pull requests, code review, and CI. This was a cultural shift, not a technical one. It turned data work into software engineering.

2. It made a community. dbt's Slack workspace is one of the most active practitioner communities in tech. Coalesce, the annual conference, draws thousands. The dbt Labs team has been unusually generous with their attention — Tristan answers questions personally, in public. The community itself produces packages, blog posts, and patterns that compound the product's value.

3. It named a profession. The phrase "analytics engineer" did not exist before dbt. Now it's a job title with its own salary band, hiring funnel, and conference circuit. Once you give a group of people an identity, you can sell them tools forever. This was the smartest thing dbt Labs ever did.

4. It runs on every warehouse. dbt has adapters for Snowflake, BigQuery, Redshift, Databricks, Postgres, DuckDB, Trino, and a long tail of others. Your warehouse choice does not affect your dbt choice.

The Honest Take

dbt-core (the open-source version) is excellent and free. dbt Cloud (the commercial offering: hosted IDE, scheduler, CI, semantic layer, and now governance via dbt Mesh) is the path dbt Labs needs to monetize, and their pricing has crept toward enterprise territory. Many sophisticated teams run dbt-core on Airflow, Dagster, or GitHub Actions and skip dbt Cloud entirely.

The threats to dbt are real but slow. SQLMesh is technically superior in several ways (real virtual environments, better incrementality, column-level lineage) but has a fraction of the community. Snowflake Dynamic Tables and Databricks DLT want to absorb transformation into the warehouse itself, which would obviate dbt for single-warehouse shops. AI-assisted transformation — where an LLM writes the model and a human reviews it — could change what an analytics engineer's day looks like. None of these have dented dbt's market share yet.

The thing nobody says out loud: dbt's biggest moat is not the tool. It is the fact that every analytics engineer on the planet already knows how to use it. Switching costs are educational, not technical.

How TextQL Works with dbt

dbt is one of the most important systems TextQL Ana reads from. The dbt manifest.json artifact contains every model, every column, every test, every description, and the full lineage DAG — in other words, the closest thing to a semantic layer that most companies have. TextQL uses the manifest to understand what tables mean, which ones are trusted (marts vs staging), how they relate, and which columns answer which business questions. For teams that have invested in dbt model descriptions and tests, TextQL's accuracy on natural-language queries goes up substantially. dbt is, in a real sense, the documentation that makes LLM-generated SQL work.

See TextQL in action

dbt

Founded 2016 (open-sourced)

Founders Tristan Handy, Drew Banin, Connor McArthur

HQ Philadelphia, PA

Parent dbt Labs (formerly Fishtown Analytics)

Last valuation $4.2B (Series D, Feb 2022)

License Apache 2.0 (dbt-core); commercial (dbt Cloud)

Category ETL / Integration

Monthly mindshare ~400K · ~50K orgs running dbt Core; analytics engineering category creator; #1 in transformation