NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Trino
Trino is the open-source distributed SQL query engine forked from PrestoSQL in 2020 by its original Facebook creators. It's the dominant federated SQL engine for the data lake -- the project most people now mean when they say 'Presto.'
Trino is an open-source distributed SQL query engine designed to query data wherever it lives — in object storage, in databases, in streams, in SaaS tools — and join across all of them in a single query. It is what most people now mean when they say "SQL on the data lake," and it is the engine inside roughly every modern lakehouse query layer that doesn't have a proprietary alternative bolted on top.
In plain English: Trino's primary job is to make a folder full of Parquet files on S3 feel like a database table, and to do it fast enough that an analyst can run an interactive query against terabytes of data without waiting for a coffee. Its secondary job — and the reason it's so widely adopted — is to do the same trick across dozens of different data sources at once, so a single query can join a Postgres dimension table against a Parquet fact table on S3 against a Salesforce export, and the engine figures out the plan.
This page is about the open-source Trino project. For the company that employs most of the Trino maintainers and sells a managed cloud version, see Starburst.
This is one of the most consequential forks in the history of data infrastructure, and it's worth telling honestly.
Presto was created at Facebook in 2012 by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang. They built it because Facebook's existing tool for SQL on Hadoop — Hive — was too slow for the kind of interactive analytics Facebook's data scientists wanted. Presto was open-sourced in 2013 and quickly picked up by Netflix, Airbnb, Uber, LinkedIn, and effectively every other large tech company. By 2017 it was the de facto SQL engine for the data lake.
In 2018, the original creators left Facebook and founded a new company (which would eventually merge with Starburst) to commercialize Presto. At the same time, they continued to maintain the open-source project, which they renamed PrestoSQL to distinguish it from the version Facebook had decided to keep developing internally. Facebook donated their version of Presto — now called PrestoDB — to the Linux Foundation under a foundation called the Presto Foundation, governed by a board that included Facebook, Uber, and Twitter.
The conflict was about control and trademark. The original creators argued they had built and were maintaining the actively developed branch; Facebook argued the trademark belonged to them. In December 2020, after a long-running trademark dispute with Facebook, the team renamed PrestoSQL to Trino. The community followed. Within a year or two, "Presto" — the version that stayed at the Linux Foundation — had become a sleepier project, while Trino raced ahead with a much larger contributor base, more frequent releases, and almost all the major enterprise users.
The honest summary: Trino won the Presto war. If you read a benchmark, a blog post, or a vendor pitch from 2022 onwards that says "Presto," they almost always mean Trino. Presto-the-project still exists, still has serious backers (Meta, IBM, Uber), and still ships, but it is no longer where the gravity is.
Trino is governed by the Trino Software Foundation, a non-profit set up by the original creators after the rename. The TSF holds the trademark and the copyright on the project. The day-to-day technical leadership is a relatively small group of maintainers, the majority of whom work for Starburst, with significant outside contributions from companies like LinkedIn, Bloomberg, Pinterest, and Goldman Sachs.
This is a familiar pattern — it looks a lot like Confluent's relationship to Kafka or Databricks's relationship to Spark before Spark moved fully under the Apache Software Foundation. The trademark is held by the foundation rather than by the company, which gives the project a measure of independence, but the company employs most of the lead committers and pays for most of the development. In practice, if you want to influence the Trino roadmap, you talk to Starburst.
The licensing is genuinely open. Trino is Apache 2.0, anyone can fork it, and several large tech companies do run their own internal builds of Trino without any commercial relationship with Starburst.
Trino uses a classic MPP (massively parallel processing) architecture with two roles:
The execution model is pipelined and in-memory: data flows between stages as soon as it's ready, rather than landing on disk between stages the way the old MapReduce / Hive model did. This is the main reason Trino is fast: there is no per-stage I/O overhead and no disk-spill checkpoint, so a well-tuned Trino query can return results in seconds against data that would have taken minutes in Hive.
The cost of this design is that Trino is fundamentally optimized for interactive queries that finish in seconds to minutes, not for long-running fault-tolerant batch jobs. If a worker dies mid-query, by default, the entire query fails. The 2022-2023 "Project Tardigrade" / fault-tolerant execution work added retry capability, which helps for long-running ETL workloads, but Trino is still a worse choice than Spark for hours-long jobs.
Federation, first and foremost. Trino has connectors for 50+ data sources — Hive, Iceberg, Delta Lake, Postgres, MySQL, MongoDB, Cassandra, Kafka, Elasticsearch, Snowflake, BigQuery, Redshift, SAP HANA, even Google Sheets. A single query can join across any of them. Nothing else in the open-source world does federation this comprehensively, and the catalog of connectors is the single biggest reason large enterprises adopt Trino.
Lake querying at scale. Trino's columnar reader and vectorized execution let it scan Parquet and ORC files extremely efficiently. It supports the three big open table formats — Apache Iceberg, Delta Lake, and Apache Hudi — as first-class citizens. For many large enterprises with petabyte-scale lakes, Trino is the SQL engine of record.
ANSI SQL compliance. Unlike some of its competitors, Trino aims to be a real, standards-compliant SQL engine. Window functions, CTEs, complex subqueries, lateral joins, recursive queries — all of it. This matters because BI tools generate SQL on the assumption that the engine is a real database, and Trino mostly delivers on that assumption.
Connector extensibility. The connector SPI (service provider interface) is genuinely well-designed. Building a new connector for a new data source is a one-to-three week project for a competent Java engineer, and the community has used this to build connectors for almost every storage system that exists.
Long-running, fault-tolerant batch jobs. As described above. Use Spark for hours-long ETL jobs.
Updates and deletes on operational data. Trino can write to Iceberg and Delta tables, including merge operations, but it is not a transactional database. It is not where you go for OLTP workloads or ACID-heavy operational data.
Caching. Out of the box, vanilla Trino doesn't cache much. Every query re-reads from the underlying storage. There are workarounds (caching connectors, intermediate cache servers, materialized views), but it's not a core strength of the OSS project. Starburst's commercial distribution adds Warp Speed indexing/caching, which is one of the bigger reasons enterprises pay for Starburst rather than running open-source Trino themselves.
Cost-based optimization in the OSS. Vanilla Trino has a relatively basic query planner. It will pick the right join order and the right join strategy if you give it good statistics, but the OSS planner has historically lagged behind Starburst's commercial CBO and behind the warehouse vendors' planners. This is improving, but it's still a real gap.
Trino's adoption in production at large tech companies is one of its strongest selling points. Public references include:
Outside of the named tech companies, Trino is also the engine inside several other vendor products: Amazon Athena is a managed fork of an older Presto, Ahana (acquired by IBM in 2023) was a managed Presto/Trino vendor, and Google's Dataproc and Azure HDInsight both ship Trino as a managed option. Even where Trino isn't the user-facing brand, it is often the engine doing the work.
| Engine | Best at | Worst at |
|---|---|---|
| —- | —- | —- |
| Trino | Federated SQL across many sources; lake queries at interactive speed | Long batch jobs; out-of-the-box caching |
| Presto | Same as Trino historically; now mostly Meta-internal | Falling behind Trino on contributor velocity |
| Databricks Photon | High-performance vectorized SQL on Delta Lake inside Databricks | Tightly coupled to Databricks; not portable |
| DuckDB | Single-node analytics; embedded in apps | Distributed workloads |
| Dremio | Iceberg-first lakehouse with reflections / acceleration | Smaller ecosystem than Trino |
| Hive | Stable batch SQL on HDFS for legacy Hadoop shops | Interactive performance |
The core competitive frame in 2026: Trino is the dominant open federated query engine, with the broadest connector ecosystem and the largest production deployments. Photon is faster on Delta Lake inside Databricks but isn't sold separately. Dremio is the closest direct competitor but has a much smaller community. The cloud warehouses (Snowflake, BigQuery, Redshift) are absorbing federation features into their own engines, which is the longest-term competitive pressure on Trino.
TextQL Ana connects natively to Trino (and to Starburst Galaxy and Enterprise). For organizations running a federated stack — some data in Snowflake, some in Postgres, some in S3 Iceberg tables — pointing Ana at a Trino coordinator gives a single SQL endpoint that reaches everything. The federation happens inside Trino, which is exactly where it should happen, and Ana operates against that single endpoint as if it were a normal warehouse.
See TextQL in action