NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →

NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →

Trino

Trino is the open-source distributed SQL query engine forked from PrestoSQL in 2020 by its original Facebook creators. It's the dominant federated SQL engine for the data lake -- the project most people now mean when they say 'Presto.'

Trino is an open-source distributed SQL query engine designed to query data wherever it lives — in object storage, in databases, in streams, in SaaS tools — and join across all of them in a single query. It is what most people now mean when they say "SQL on the data lake," and it is the engine inside roughly every modern lakehouse query layer that doesn't have a proprietary alternative bolted on top.

In plain English: Trino's primary job is to make a folder full of Parquet files on S3 feel like a database table, and to do it fast enough that an analyst can run an interactive query against terabytes of data without waiting for a coffee. Its secondary job — and the reason it's so widely adopted — is to do the same trick across dozens of different data sources at once, so a single query can join a Postgres dimension table against a Parquet fact table on S3 against a Salesforce export, and the engine figures out the plan.

This page is about the open-source Trino project. For the company that employs most of the Trino maintainers and sells a managed cloud version, see Starburst.

Origin Story: A Fork Born of a Corporate Dispute

This is one of the most consequential forks in the history of data infrastructure, and it's worth telling honestly.

Presto was created at Facebook in 2012 by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang. They built it because Facebook's existing tool for SQL on Hadoop — Hive — was too slow for the kind of interactive analytics Facebook's data scientists wanted. Presto was open-sourced in 2013 and quickly picked up by Netflix, Airbnb, Uber, LinkedIn, and effectively every other large tech company. By 2017 it was the de facto SQL engine for the data lake.

In 2018, the original creators left Facebook and founded a new company (which would eventually merge with Starburst) to commercialize Presto. At the same time, they continued to maintain the open-source project, which they renamed PrestoSQL to distinguish it from the version Facebook had decided to keep developing internally. Facebook donated their version of Presto — now called PrestoDB — to the Linux Foundation under a foundation called the Presto Foundation, governed by a board that included Facebook, Uber, and Twitter.

The conflict was about control and trademark. The original creators argued they had built and were maintaining the actively developed branch; Facebook argued the trademark belonged to them. In December 2020, after a long-running trademark dispute with Facebook, the team renamed PrestoSQL to Trino. The community followed. Within a year or two, "Presto" — the version that stayed at the Linux Foundation — had become a sleepier project, while Trino raced ahead with a much larger contributor base, more frequent releases, and almost all the major enterprise users.

The honest summary: Trino won the Presto war. If you read a benchmark, a blog post, or a vendor pitch from 2022 onwards that says "Presto," they almost always mean Trino. Presto-the-project still exists, still has serious backers (Meta, IBM, Uber), and still ships, but it is no longer where the gravity is.

Governance: The Trino Software Foundation

Trino is governed by the Trino Software Foundation, a non-profit set up by the original creators after the rename. The TSF holds the trademark and the copyright on the project. The day-to-day technical leadership is a relatively small group of maintainers, the majority of whom work for Starburst, with significant outside contributions from companies like LinkedIn, Bloomberg, Pinterest, and Goldman Sachs.

This is a familiar pattern — it looks a lot like Confluent's relationship to Kafka or Databricks's relationship to Spark before Spark moved fully under the Apache Software Foundation. The trademark is held by the foundation rather than by the company, which gives the project a measure of independence, but the company employs most of the lead committers and pays for most of the development. In practice, if you want to influence the Trino roadmap, you talk to Starburst.

The licensing is genuinely open. Trino is Apache 2.0, anyone can fork it, and several large tech companies do run their own internal builds of Trino without any commercial relationship with Starburst.

Architecture: Coordinator + Workers, Pipelined In-Memory Execution

Trino uses a classic MPP (massively parallel processing) architecture with two roles:

  • The coordinator is the brain. It receives the SQL query, parses it, builds a logical plan, optimizes it, breaks it into stages, and assigns those stages to workers. There is exactly one coordinator per cluster.
  • The workers do the actual work. They read data from the underlying source via a connector, execute the assigned operations (filters, joins, aggregations), and stream intermediate results to other workers as needed. You scale a Trino cluster by adding more workers.

The execution model is pipelined and in-memory: data flows between stages as soon as it's ready, rather than landing on disk between stages the way the old MapReduce / Hive model did. This is the main reason Trino is fast: there is no per-stage I/O overhead and no disk-spill checkpoint, so a well-tuned Trino query can return results in seconds against data that would have taken minutes in Hive.

The cost of this design is that Trino is fundamentally optimized for interactive queries that finish in seconds to minutes, not for long-running fault-tolerant batch jobs. If a worker dies mid-query, by default, the entire query fails. The 2022-2023 "Project Tardigrade" / fault-tolerant execution work added retry capability, which helps for long-running ETL workloads, but Trino is still a worse choice than Spark for hours-long jobs.

What Trino Is Good At

Federation, first and foremost. Trino has connectors for 50+ data sources — Hive, Iceberg, Delta Lake, Postgres, MySQL, MongoDB, Cassandra, Kafka, Elasticsearch, Snowflake, BigQuery, Redshift, SAP HANA, even Google Sheets. A single query can join across any of them. Nothing else in the open-source world does federation this comprehensively, and the catalog of connectors is the single biggest reason large enterprises adopt Trino.

Lake querying at scale. Trino's columnar reader and vectorized execution let it scan Parquet and ORC files extremely efficiently. It supports the three big open table formats — Apache Iceberg, Delta Lake, and Apache Hudi — as first-class citizens. For many large enterprises with petabyte-scale lakes, Trino is the SQL engine of record.

ANSI SQL compliance. Unlike some of its competitors, Trino aims to be a real, standards-compliant SQL engine. Window functions, CTEs, complex subqueries, lateral joins, recursive queries — all of it. This matters because BI tools generate SQL on the assumption that the engine is a real database, and Trino mostly delivers on that assumption.

Connector extensibility. The connector SPI (service provider interface) is genuinely well-designed. Building a new connector for a new data source is a one-to-three week project for a competent Java engineer, and the community has used this to build connectors for almost every storage system that exists.

What Trino Is Not Good At

Long-running, fault-tolerant batch jobs. As described above. Use Spark for hours-long ETL jobs.

Updates and deletes on operational data. Trino can write to Iceberg and Delta tables, including merge operations, but it is not a transactional database. It is not where you go for OLTP workloads or ACID-heavy operational data.

Caching. Out of the box, vanilla Trino doesn't cache much. Every query re-reads from the underlying storage. There are workarounds (caching connectors, intermediate cache servers, materialized views), but it's not a core strength of the OSS project. Starburst's commercial distribution adds Warp Speed indexing/caching, which is one of the bigger reasons enterprises pay for Starburst rather than running open-source Trino themselves.

Cost-based optimization in the OSS. Vanilla Trino has a relatively basic query planner. It will pick the right join order and the right join strategy if you give it good statistics, but the OSS planner has historically lagged behind Starburst's commercial CBO and behind the warehouse vendors' planners. This is improving, but it's still a real gap.

Who Uses Trino in Production

Trino's adoption in production at large tech companies is one of its strongest selling points. Public references include:

  • Netflix — one of the original heavy Presto users, still a major Trino user. Used for ad-hoc analytics, data science workloads, and BI on petabyte-scale data lakes.
  • LinkedIn — migrated from PrestoDB to Trino and is one of the largest single Trino deployments in the world.
  • Goldman Sachs — uses Trino as the core query engine for the Goldman data lake and contributes engineering effort upstream.
  • Stripe, Pinterest, Lyft, Bloomberg — all known production Trino users, several of them committers to the project.
  • Comcast, Salesforce, Shopify — enterprise users, often via Starburst commercial.

Outside of the named tech companies, Trino is also the engine inside several other vendor products: Amazon Athena is a managed fork of an older Presto, Ahana (acquired by IBM in 2023) was a managed Presto/Trino vendor, and Google's Dataproc and Azure HDInsight both ship Trino as a managed option. Even where Trino isn't the user-facing brand, it is often the engine doing the work.

How Trino Compares to Other Query Engines

EngineBest atWorst at
—-—-—-
TrinoFederated SQL across many sources; lake queries at interactive speedLong batch jobs; out-of-the-box caching
PrestoSame as Trino historically; now mostly Meta-internalFalling behind Trino on contributor velocity
Databricks PhotonHigh-performance vectorized SQL on Delta Lake inside DatabricksTightly coupled to Databricks; not portable
DuckDBSingle-node analytics; embedded in appsDistributed workloads
DremioIceberg-first lakehouse with reflections / accelerationSmaller ecosystem than Trino
HiveStable batch SQL on HDFS for legacy Hadoop shopsInteractive performance

The core competitive frame in 2026: Trino is the dominant open federated query engine, with the broadest connector ecosystem and the largest production deployments. Photon is faster on Delta Lake inside Databricks but isn't sold separately. Dremio is the closest direct competitor but has a much smaller community. The cloud warehouses (Snowflake, BigQuery, Redshift) are absorbing federation features into their own engines, which is the longest-term competitive pressure on Trino.

How TextQL Works with Trino

TextQL Ana connects natively to Trino (and to Starburst Galaxy and Enterprise). For organizations running a federated stack — some data in Snowflake, some in Postgres, some in S3 Iceberg tables — pointing Ana at a Trino coordinator gives a single SQL endpoint that reaches everything. The federation happens inside Trino, which is exactly where it should happen, and Ana operates against that single endpoint as if it were a normal warehouse.

See TextQL in action

See TextQL in action

Trino
Project type Open-source distributed SQL query engine
License Apache 2.0
Forked from PrestoSQL (December 2020 rename)
Original Presto Built at Facebook, 2012
Written in Java
Governance Trino Software Foundation
Commercial sponsor Starburst
Category Query Engines
Monthly mindshare ~80K · ~10K GitHub stars; the Presto fork that won; active community