Starburst | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Starburst

Starburst is the commercial company built around Trino, the open-source distributed SQL query engine. Founded in 2017 by the original Trino creators, Starburst sells a managed cloud service (Galaxy) and an enterprise distribution. Think Databricks-for-Spark, but for query engines.

Starburst is the commercial company that exists in the same relationship to Trino as Databricks does to Spark, or Confluent does to Kafka: the company employs almost all of the lead committers, contributes the majority of the code, and sells a managed and enhanced commercial distribution of the open-source project. Starburst was founded in 2017 by the same team that created Presto at Facebook in 2012 — Martin Traverso, Dain Sundstrom, and David Phillips — together with Justin Borgman (CEO) and Kamil Bajda-Pawlikowski, who had previously sold their Hadoop SQL company Hadapt to Teradata in 2014.

In plain English: if you want to query data that lives across many different places — a Postgres database, a Snowflake warehouse, a folder of Parquet files on S3, a Salesforce export, a stream off Kafka — and you want to do it with one SQL endpoint, the dominant open-source answer is Trino. Starburst is the company you call when you want someone else to run that Trino cluster for you, with a query optimizer that's actually good, with caching, with enterprise security, and with a phone number you can call at 3am.

Origin Story: From Hadapt to Presto to Starburst

To understand Starburst, you have to start with two parallel histories that converge in 2017.

History one: Presto. In 2012, Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang built Presto inside Facebook to give Facebook's data scientists an interactive SQL engine on top of HDFS, since Hive was too slow. Presto was open-sourced in 2013 and became the de facto SQL engine on the data lake at Netflix, Airbnb, Uber, LinkedIn, and most other large tech companies. By 2018 the original Presto creators had left Facebook to commercialize the project independently.

History two: Hadapt. Justin Borgman co-founded Hadapt in 2010 around research from Yale (Daniel Abadi's lab) on running SQL on Hadoop. Teradata acquired Hadapt in 2014, and Borgman ran Teradata's Center for Hadoop. While there, his team built a commercial Presto distribution and a customer base around it. When Teradata decided to deprioritize the effort in 2017, Borgman, Bajda-Pawlikowski, and the team spun out as a new company — Starburst — to focus on Presto commercially.

In 2018, Starburst joined forces with the Presto founding engineers (Traverso, Sundstrom, Phillips), who had just left Facebook. The combined team became the home of the open-source project. In late 2020, after a long-running trademark dispute with Facebook over the "Presto" name, the team renamed the project to Trino, and the community followed. Within roughly two years, Trino had clearly become the actively developed branch and "Presto" — the version that stayed at the Linux Foundation — had become a sleepier, slower-moving project mostly maintained by Meta and IBM.

Starburst raised aggressively through this period. The Series D in January 2022 closed at a $3.35B valuation on roughly $414M in total funding, in the last of the zero-interest-rate megadeals. Since then — like every data infrastructure company funded in that era — Starburst has been operating in a more disciplined growth mode and has not announced a new round at a higher valuation.

Their Products

Starburst has a tight, easy-to-explain product line:

Trino — The open-source distributed SQL query engine. Apache 2.0 licensed. Maintained primarily by Starburst employees, with a healthy outside contributor community. This is the foundation that everything else is built on, and it is genuinely open — you can run it without any commercial relationship with Starburst, and many large tech companies do.
Starburst Enterprise — The self-managed commercial distribution of Trino. Adds a cost-based query optimizer (CBO) tuned with table statistics, expanded enterprise connectors (SAP HANA, Teradata, Oracle, etc.), Ranger and OAuth integrations, role-based access control, indexed materialized views, and 24/7 commercial support. Sold as a license to run on your own infrastructure.
Starburst Galaxy — The fully managed SaaS version. You point Galaxy at your data sources, it provisions and manages the Trino clusters for you, and you pay per credit consumed. Galaxy is where Starburst is putting most of its strategic energy in 2024-2026. It includes Warp Speed (smart caching/indexing), the Galaxy console, governance controls, and the Galaxy data products / "data domains" features that lean into the data mesh narrative.
The "Icehouse" architecture — Starburst's branding for the Trino + Apache Iceberg lakehouse pattern. The pitch is that Trino + Iceberg + object storage is a complete, vendor-neutral lakehouse that can replace Snowflake or Databricks for analytical workloads, at a much lower cost and without storage lock-in.

The Strategy: Open Compute, Open Storage, Anti-Lock-In

Starburst's strategic positioning is the cleanest "anti-lock-in" pitch in the data warehouse / lakehouse market. The argument goes:

Snowflake locks you into proprietary storage (FDN format inside Snowflake-managed S3) and proprietary compute (the Snowflake engine).
Databricks locks you into Delta Lake (which is open-ish but heavily Databricks-flavored) and proprietary compute (Photon).
The hyperscalers lock you into their cloud account.
Starburst, by contrast, runs on fully open storage (Iceberg on S3/GCS/ADLS, in your own buckets, in your own account) and fully open compute (Trino, Apache 2.0). If you ever want to leave Starburst, you can keep your data and run vanilla open-source Trino on it. There is no lock-in.

That's a real, defensible argument. Where it gets harder is the question of how much customers actually care about the anti-lock-in story versus operational simplicity and bundled features. In practice, most enterprise customers will tolerate a fair amount of lock-in if the alternative is operational complexity. Starburst's bet is that as data sizes grow and storage costs become a bigger fraction of total spend, the storage-portability story becomes more and more valuable.

The other major front for Starburst is federation. Trino's connector ecosystem — 50+ connectors covering databases, warehouses, SaaS tools, and streams — is unmatched, and Starburst's commercial story leans on it. The "single SQL endpoint that reaches everything" pitch is particularly strong for the analytics-mesh and data-product use cases that became fashionable from 2022 onward.

Honest Market Take

Starburst is in a difficult but defensible spot. The difficult part: every major warehouse vendor has noticed the federation and lakehouse story and has started absorbing those capabilities into their own products. Snowflake now supports Iceberg tables backed by customer storage. Databricks Lakehouse Federation lets you query external systems from Databricks. BigQuery has BigLake. Redshift has Spectrum. The "specialized federated query engine" niche is being squeezed from above.

The defensible part: none of those warehouse-vendor federation features are as good as Trino's, none of them are open-source, and none of them solve the core "I don't want to be locked into one warehouse" problem. There is genuine, ongoing demand from large enterprises — especially in regulated industries and in mid-sized European markets — for a serious open-compute lakehouse that doesn't make them pick a hyperscaler. Starburst is the best answer to that demand.

The most useful frame: Starburst is the Databricks of query engines. One company makes most of the OSS commits, sells a managed and enhanced version, and credibly claims that the open-source project would not exist without them. The lock-in is genuinely lower than Snowflake or Databricks, because Trino is genuinely open. The product is genuinely good. The remaining question is whether the warehouse vendors can absorb federation fast enough to deny Starburst the standalone-product oxygen it needs to grow.

TextQL Fit

TextQL Ana connects natively to both open-source Trino and Starburst (Enterprise and Galaxy). For organizations running a federated stack — some data in Snowflake, some in Postgres, some in S3 Iceberg tables, some in a mainframe — pointing Ana at a Starburst coordinator gives one SQL endpoint that reaches everything. The federation happens in Trino, which is exactly where it should happen. Ana then operates against that single endpoint as if it were a normal warehouse.

See TextQL in action

Starburst

Founded 2017

Founders Justin Borgman, Martin Traverso, Dain Sundstrom, David Phillips, Kamil Bajda-Pawlikowski

Headquarters Boston, MA

CEO Justin Borgman

Total funding ~$414M (Series D, Jan 2022, $3.35B valuation)

Status Private

Core OSS project Trino

Monthly mindshare ~30K · ~600 customers; the commercial Trino company; smaller than Snowflake/Databricks