NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Vendors
Vendors are the companies that build and sell data tools. Some bundle their data products into a broader cloud platform (AWS, Google, Microsoft); others are pure-plays focused on a single category (Snowflake, Databricks, dbt Labs).
A vendor is the company that builds and sells a data tool. The wiki has a page for almost every important product in the modern data stack — Snowflake, BigQuery, Power BI, Trino, dbt, Tableau, and so on. This section is about the companies behind those products: who founded them, what year, what else they own, and what their actual commercial strategy looks like.
The reason this distinction matters: most of the famous "products" in data are not standalone companies. BigQuery is a feature of Google Cloud. Power BI is a feature of Microsoft 365. Redshift is a feature of AWS. You cannot understand why those products behave the way they do — pricing, packaging, integration, roadmap — without understanding the parent company's broader strategy.
There are basically two business models, and almost every vendor fits one of them.
1. The hyperscaler bundle. AWS, Google Cloud, and Microsoft Azure all sell hundreds of cloud services. Data products are one slice of a much larger menu, and the strategic logic is to make sure no customer ever has a reason to leave the bundle. Each hyperscaler builds (or buys) at least one entry in every data category — a warehouse, an object store, an ETL tool, a BI tool, a streaming platform, an ML platform — not because each individual product is best in class, but because the bundle is the product. The moat is the AWS account, the Azure tenant, the GCP project, and the existing committed spend.
2. The category pure-play. Snowflake only sells a data warehouse (and adjacent things). Databricks only sells a lakehouse platform. dbt Labs only sells transformation tooling. Starburst only sells federated SQL. Confluent only sells Kafka. These companies live or die on whether their single product is best in class, and they have to fight the hyperscalers' "good enough and already in your AWS bill" alternative every quarter. The moat is product excellence and developer love, not the bundle.
Then there's a third, weirder category: the roll-up. Salesforce has acquired its way into being a major data vendor (Tableau in 2019 for $15.7B, MuleSoft in 2018 for $6.5B, Informatica in 2025). IBM did it earlier (Cognos, SPSS, Red Hat). Oracle has been doing it for thirty years. The strategy here isn't "best product" or "best bundle" — it's "buy the leader in each category and extract rent through the existing enterprise relationship."
| Vendor | Type | Key data products in the wiki |
|---|---|---|
| —- | —- | —- |
| AWS | Hyperscaler | Redshift, S3, Kinesis, QuickSight, SageMaker |
| Google Cloud | Hyperscaler | BigQuery, GCS, Looker |
| Microsoft | Hyperscaler | Power BI, Azure Blob Storage |
| Snowflake | Pure-play | Snowflake |
| Databricks | Pure-play | Databricks, Photon, Databricks ML |
| Starburst | Pure-play | Trino, Starburst Galaxy |
| dbt Labs | Pure-play | dbt Core, dbt Cloud, dbt Semantic Layer |
| Confluent | Pure-play | Confluent Kafka |
| Salesforce | Roll-up | Tableau, Informatica, Data Cloud, MuleSoft |
Each vendor page in this section follows the same shape:
This is the opposite of how vendor websites work. Vendor websites are designed to present each product in the most favorable possible light against an idealized "before" state. The vendor pages here try to give you the same picture a sober analyst would: real history, real numbers, real tradeoffs.
You might reasonably ask: "I just want to know whether to use BigQuery or Snowflake. Why do I care about Google's broader cloud strategy?"
Because the strategy is the product. BigQuery's pricing model (pay per byte scanned), its serverless architecture, and its tight coupling to Google Cloud Storage are all consequences of Google running it as a feature inside GCP rather than as a standalone business. Snowflake's multi-cloud portability, virtual warehouse model, and consumption-based credits are consequences of Snowflake being a venture-backed pure-play that has to win against three hyperscalers simultaneously. The architectural decisions follow from the business model.
Same thing with Power BI. The reason Power BI dominates BI today is not that it has better visualizations than Tableau (it doesn't, mostly). It's that Microsoft sells it as part of E5 licensing, where it shows up at near-zero marginal cost. Once you understand that, every decision Microsoft makes about Power BI — the integration with Excel, the lukewarm Mac story, the heavy push into Fabric — becomes obvious.
Vendor pages try to make those forces visible.
TextQL Ana is a layer that sits on top of whatever data stack a customer already has. We connect to the warehouse, the BI tool, the catalog, and the semantic layer regardless of vendor — Snowflake or BigQuery or Databricks or Redshift, Tableau or Power BI or Looker. Our job is to make the existing stack more useful, not to replace any of it. So we end up with strong opinions about every major vendor, and the pages in this section are how we share them.
See TextQL in action