Presto | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Presto

Presto is the original distributed SQL query engine built at Facebook in 2012. After a 2020 fork by its original creators, the actively-developed branch is now called Trino. Presto continues at the Linux Foundation, maintained primarily by Meta, IBM, and Uber.

Presto is an open-source distributed SQL query engine originally built at Facebook in 2012 to query data sitting in the Hive data warehouse on Hadoop. It pioneered the idea of a federated, MPP-style SQL engine that could read from many data sources at once — and for most of the 2010s it was the SQL-on-the-data-lake engine.

In 2020, the original creators forked Presto into a new project called Trino, taking most of the contributor base and most of the active development with them. Today, "Presto" generally means the version that stayed at the Linux Foundation (sometimes called PrestoDB), maintained primarily by Meta, IBM, Uber, and Ahana (acquired by IBM in 2023). It is still in active use, especially at Meta itself, but the center of gravity for the broader community has moved to Trino.

Origin Story

Presto was started inside Facebook in 2012 by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang. The motivating problem was straightforward: Facebook had an enormous Hive warehouse, Hive was slow for interactive queries (it compiled SQL into MapReduce jobs that took minutes to start), and Facebook's data scientists wanted answers in seconds, not minutes. Hive was great for big batch jobs and terrible for interactive analysis.

Presto took a fundamentally different architectural approach. Instead of compiling to MapReduce, it ran a long-lived cluster of coordinators and workers that pipelined data through query stages in memory, streaming results between operators rather than materializing intermediate state to disk. The result was a 10x-100x speedup on interactive queries against the same Hive data.

Facebook open-sourced Presto in 2013. It was an immediate hit. Within a few years, Netflix, Airbnb, Uber, LinkedIn, Twitter, Lyft, Salesforce, and most of the rest of Silicon Valley were running Presto in production. Presto became the de facto SQL engine for the data lake.

The Schism: Why There Are Two Prestos

This is the most important thing to understand about Presto today, and the story is worth reading honestly.

In 2018, the original creators — Martin Traverso, Dain Sundstrom, and David Phillips — left Facebook to form a new company (later renamed Starburst) and a new foundation to govern the open-source project independently. They argued that Presto needed independent governance to grow beyond Facebook's needs. Facebook disagreed about how that should happen, and the split became formal: Facebook donated their version of Presto to the Linux Foundation under a new entity called the Presto Foundation, while the original creators continued maintaining their version under the name PrestoSQL.

For a couple of years, both versions coexisted under confusingly similar names. Facebook's version was called PrestoDB and lived under the Linux Foundation. The original team's version was called PrestoSQL and lived at prestosql.io. Both claimed to be Presto. Most of the active community contribution was happening on the PrestoSQL side, but Facebook owned the "Presto" trademark.

In December 2020, after a trademark dispute with Facebook, the PrestoSQL team renamed their project to Trino. The community largely followed them. Within roughly a year, "Trino" had become the active branch with the larger contributor base, more frequent releases, and the bulk of new enterprise adoption. PrestoDB — now just "Presto" — continued, but at a slower pace, anchored to the needs of its largest contributors (Meta, IBM, Uber).

The honest summary: Trino effectively won the Presto war for the broader market. If you read a 2024 blog post that says "Presto," check whether the author actually means Trino. Most of the time they do.

What Presto Still Does

Presto-the-project is not dead, and it would be wrong to write it off. There are several places where it remains the active choice:

Meta itself. Meta runs one of the largest Presto deployments in the world — the original use case, scaled to enormous size. Their internal version of Presto is the basis for the open-source PrestoDB project, and they continue to invest substantial engineering in it. In particular, Meta has been the driving force behind Presto Velox — a C++ vectorized execution engine that is meant to do for Presto what Photon did for Databricks SQL. Velox is technically interesting and is being adopted beyond just Presto (Spark, for instance, has experimented with it).

IBM and watsonx.data. IBM acquired Ahana, the small Presto-focused company, in 2023, and integrated PrestoDB into its watsonx.data lakehouse platform. IBM's enterprise customers get Presto as part of that product.

Uber. Uber is one of the largest non-Meta production users of PrestoDB and is a significant contributor to the project.

Amazon Athena. Athena is technically based on Presto / Trino (the lineage has gotten complicated as both engines evolved), but architecturally it owes its design to the Presto family. If you've used Athena to query S3 data, you've used a Presto descendant.

Presto vs Trino, Practically

For most new deployments, the answer is straightforward: use Trino unless you have a specific reason to use Presto. Trino has a larger contributor base, more frequent releases, more connectors, broader tooling support, and the commercial backing of Starburst. The case for choosing PrestoDB instead boils down to either (a) you are specifically aligning with Meta, IBM, or Uber's stack, (b) you want Presto Velox's C++ execution engine, or (c) you have an existing PrestoDB deployment and migration is more expensive than maintaining the status quo.

Both engines descend from the same 2012 codebase and both still use the same core architectural ideas: coordinator + workers, in-memory pipelined execution, ANSI SQL, lots of connectors, optimized for federated lake querying. The differences over time have grown but they are still recognizable cousins.

TextQL Fit

TextQL connects to PrestoDB the same way it connects to Trino — via the standard JDBC client. For organizations that have inherited or chosen to stay on PrestoDB (often because of an existing IBM or Meta-aligned deployment), the integration is straightforward. The semantic and federation properties that make Presto a good backend for natural language queries are essentially the same as Trino's.

See TextQL in action

Presto

Created 2012

Origin Facebook (now Meta)

Original creators Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang

Open sourced 2013

License Apache 2.0

Governance Presto Foundation (Linux Foundation)

Category Query Engines

Monthly mindshare ~50K · ~16K GitHub stars; the original Facebook project; legacy now that Trino is dominant