Data Observability | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Wiki Data Observability Data Observability

Contents

Data Observability

Data observability platforms monitor data quality, freshness, schema changes, and pipeline health, alerting teams when something breaks before the dashboard does.

Data observability is the practice of monitoring your data pipelines the same way SRE teams monitor production services. The premise is simple and slightly damning: most data teams discover their pipelines are broken when an executive opens a dashboard, sees a number that looks wrong, and sends a Slack message that ruins someone's afternoon. Data observability platforms exist so that the data team finds out first, ideally before anyone else notices.

Think of it this way: if your data warehouse is a kitchen, data observability is the smoke alarm. It does not cook for you, it does not improve the recipes, and it cannot tell you the meal is bad. But it screams the moment something is on fire, which turns out to be most of the value.

Origin Story: Barr Moses Names the Category

The phrase "data observability" was popularized — and arguably invented as a commercial category — by Barr Moses, co-founder and CEO of Monte Carlo, in 2019. Moses had been VP of Customer Success at Gainsight, where she watched customer after customer hit "data downtime": the dashboards were broken, nobody knew why, and the data team spent days bisecting pipelines by hand.

Her insight was that software engineering had already solved this problem. When a microservice goes down, Datadog or New Relic pages someone within seconds. When a deployment introduces a regression, Honeycomb shows you exactly which traces changed. Yet data engineers were running blind. There was no Datadog for data.

Monte Carlo launched in 2019 with a manifesto-style blog post titled "What is Data Observability?" that defined the five pillars: freshness, volume, schema, distribution, and lineage. Those five pillars became the de facto rubric every other vendor in the category now markets against. Whether Moses invented the idea is debatable — data quality tooling has existed since Informatica in the 1990s — but she absolutely invented the category name, and in B2B SaaS, naming the category is half the battle.

The Five Pillars, Explained Simply

1. Freshness. Is the data up to date? Your orders table normally updates every 15 minutes. It has not updated in three hours. That is a freshness incident, and it almost always means an upstream pipeline is broken.

2. Volume. Did you get the right amount of data? If events normally lands 10 million rows a day and today it landed 200,000, something is wrong even if no error was thrown. Volume anomalies catch silent failures — the kind where a job "succeeds" but only processes a fraction of its input.

3. Schema. Did the columns change? Someone in the product team renamed user_id to userId, the ETL job kept running, and now half your downstream models are joining on a column that no longer exists. Schema observability watches for these changes and alerts before they propagate.

4. Distribution. Do the values look normal? Yesterday revenue ranged from $5 to $5,000. Today it ranges from -$1,000 to $50,000,000. The job ran fine. The numbers are insane. Distribution checks catch these statistical anomalies.

5. Lineage. When something breaks, what else is affected? If the upstream raw_orders table is bad, lineage tells you which 47 dbt models, 12 dashboards, and 3 ML features depend on it. Without lineage, every incident is a manual scavenger hunt.

How It Actually Works

Most observability platforms do roughly the same thing under the hood. They connect to your warehouse via a read-only role, scan metadata (information_schema, query history, audit logs), and build a statistical baseline for every table — expected row counts, expected update times, expected distributions of key columns. Then they run anomaly detection against that baseline on a schedule.

The clever part is that this requires almost no configuration. You do not have to write tests. The platform infers what "normal" looks like from history. This is the key contrast with the older paradigm represented by Great Expectations and dbt tests, which require engineers to declare assertions ("this column should never be null"). Declarative testing is precise but labor-intensive. Statistical observability is automatic but noisier. Most mature data teams end up using both.

The Category War

The data observability market has been one of the hottest sub-categories in data infrastructure since 2020, and the vendor list is competitive:

Monte Carlo — Category creator. Most well-funded ($236M raised, last valued at $1.6B in 2022). Marketing-led, enterprise sales motion. Strongest brand.
Bigeye — Founded by ex-Uber data quality team (the people who built Uber's internal tool, "Data Quality Monitor"). More technical positioning, deeper SQL-based metric definitions.
Acceldata — India/US, broader scope (covers compute observability and cost too, not just data quality). Strong in enterprise and Hadoop-legacy shops.
Great Expectations — The open-source ancestor. Test-framework first, observability second. Now commercialized as GX Cloud.
Soda — Open-source-friendly, YAML-based check definitions, appeals to engineers who hate clicking through UIs.
Datafold — Pivoted from diff-based testing to broader observability and data migration.
Anomalo — AI-first, no-code, targets less technical buyers.
Metaplane — Acquired by dbt Labs in 2024, now folded into dbt's platform play.
Sifflet, Lightup, Validio — A long tail of European and US challengers.

The honest take: no vendor has clearly won. Monte Carlo has the best brand but several customers have churned over pricing. Bigeye is technically respected but smaller. Anomalo and Metaplane are growing fast on the lower end of the market. Meanwhile, Snowflake and Databricks have started shipping native observability features (Snowflake Horizon, Unity Catalog data quality monitors), which threatens to commoditize the entire category from below.

Where This Fits in the Stack

Data observability is a horizontal layer, not a node in the pipeline. It does not transform data, it does not store data, it does not move data. It watches everything else. It typically connects to:

The data warehouse (where most monitoring happens via metadata)
The orchestrator (Airflow, Dagster, Prefect) for run-status signals
The transformation layer for model-level lineage
The BI tool for downstream impact analysis
A notification surface (Slack, PagerDuty, email)

A common architecture mistake is treating observability as optional until you have an incident. By the time you have your first major incident, you have also lost trust with your business stakeholders, and that takes a year to rebuild.

How TextQL Works with Data Observability

TextQL Ana is downstream of data observability — when Ana answers a business question, the underlying data needs to be trustworthy. Customers running Monte Carlo, Bigeye, or similar platforms get a quieter experience with Ana because broken pipelines are caught and remediated before users ask questions about them. TextQL does not replace observability; it benefits from it.

See TextQL in action

Data Observability

Category Data quality & monitoring

Coined by Barr Moses, Monte Carlo (2019)

Borrowed from Software observability (Datadog, New Relic, Honeycomb)

Key vendors Monte Carlo, Bigeye, Acceldata, Great Expectations

Five pillars Freshness, volume, schema, distribution, lineage

Typical buyer Head of Data, VP Analytics, Data Platform lead

Monthly mindshare ~60K · newer category created ~2019 by Monte Carlo; data engineers and platform teams