Google Cloud Platform | Data Ecosystem Wiki

Public Preview · May 18–Jun 5 NEW: Opus 4.8 is now available in Ana →

Wiki Vendors Google Cloud Platform

Contents

Google Cloud Platform

Google Cloud is the third-largest hyperscaler and home to BigQuery -- the most technically impressive cloud data warehouse on the market. Google has world-class data engineering research and a famously underwhelming go-to-market motion. The result is a data portfolio that punches above its market share.

Google Cloud Platform is the third-largest hyperscaler — behind AWS and Microsoft Azure — and the home of BigQuery, which is on most technical merits the best cloud data warehouse on the market. The contradiction at the heart of GCP's data story is that the product is genuinely excellent and the company is famously bad at selling it. If Google were as good at go-to-market as Snowflake, BigQuery would already be the dominant cloud warehouse. It isn't, and that gap is the most interesting thing about Google Cloud.

In plain English: Google Cloud sells the same general menu of cloud services as AWS — compute, storage, networking, databases, ML — but the data products are the part Google is actually proud of. Most of them are commercial versions of papers Google research published years before anyone else figured out how to build them.

Origin Story: Born from Papers, Not from Customers

Google Cloud's data portfolio is descended directly from Google's internal infrastructure, and that infrastructure is descended from a famous sequence of research papers. GFS (the Google File System) was published in 2003. MapReduce in 2004. Bigtable in 2006. Dremel — the underlying engine for BigQuery — was published in 2010. Spanner in 2012. Each of these papers triggered an open-source clone (Hadoop from GFS+MapReduce; HBase from Bigtable; Drill and Impala from Dremel; CockroachDB from Spanner) and each clone became a multibillion-dollar industry.

The strategic decision Google made starting around 2008 was: instead of letting the open-source ecosystem keep eating Google's research, Google would commercialize the originals. App Engine launched in April 2008 as the very first GCP product. Google Cloud Storage launched in 2010. BigQuery launched the same year, initially as a way for Google's ad-tech customers to query their own DoubleClick data, then opened to general availability in 2011-2012. The team that built Dremel inside Google — Andrew Fikes, Sergey Melnik, and others — essentially shipped their internal tool as a public product.

For the next decade, Google Cloud was a curious market presence: the products were unusually good, Google's brand in technical communities was unusually strong, and yet the actual revenue was a fraction of AWS. The conventional explanation is that Google didn't take enterprise sales seriously until Thomas Kurian — formerly president of product development at Oracle — took over Google Cloud as CEO in November 2018. Under Kurian, GCP got a real enterprise sales force, real partner programs, real long-term commitments, and the data products started showing up in big procurements. The biggest single bet of the Kurian era was the 2019 acquisition of Looker for $2.6 billion, which gave Google a credible BI seat at the table for the first time.

Their Data Products

Google BigQuery — The crown jewel. A serverless cloud data warehouse based on the Dremel architecture, with full separation of storage and compute, query-level pricing, and a SQL dialect that has gradually become very nearly ANSI-compliant. BigQuery introduced — and arguably still does best — the "no clusters, no warehouses, just SQL" model. It also has the most defensible AI/ML integration story of any cloud warehouse via BigQuery ML.
Google Cloud Storage — GCP's object storage. Fully S3-compatible at the API level (mostly), with strong consistency, tight integration with BigQuery via BigLake, and the same role as S3 in the AWS world: the substrate everything else is built on.
Looker — A modeling-first BI tool with its own semantic layer language (LookML). Acquired by Google in 2019 for $2.6 billion. Looker invented the modern semantic-layer-as-product category in the early 2010s and is one of the few BI tools that's still actually opinionated about how data should be modeled. Under Google, Looker has been somewhat slow to evolve, but it remains the cleanest "warehouse-native" BI architecture.

Other GCP data products that don't yet have a wiki page but are part of the broader data stack:

Dataflow — A managed runner for Apache Beam, doing both streaming and batch under the same programming model. Beam itself was open-sourced from Google's internal MillWheel and FlumeJava systems. Dataflow is one of GCP's underrated products: it's the cleanest streaming model in the cloud.
Dataproc — Managed Spark and Hadoop. The "we have to ship this because customers ask for it" entry, equivalent to AWS EMR.
Pub/Sub — Globally distributed messaging. The Kinesis/Kafka equivalent, with a different (and arguably nicer) abstraction.
Vertex AI — The ML platform, formed in 2021 by merging the older AI Platform with AutoML. Now also the home of the Gemini model family for cloud customers.
Dataform — A SQL-based transformation tool, acquired in late 2020 and re-released as a free part of BigQuery. The dbt competitor that comes in the box, though it has not displaced dbt in practice.
Dataplex — The governance / catalog / metadata layer. Google's answer to Lake Formation and Unity Catalog.

The Strategy: Win on Engineering, Lose on Sales

Google Cloud's data strategy is the opposite of AWS's. Where AWS ships "good enough" products and wins on bundle and integration, Google ships technically excellent products and consistently fails to convert that excellence into market share. There are a few reasons for this, and they're all worth being honest about.

1. Google built BigQuery for itself, not for the market. BigQuery's pricing model (pay per byte scanned) made perfect sense to Google engineers and was completely alien to enterprise procurement teams in 2012. Snowflake — which launched the same year — chose a much more familiar consumption-based credit model and absolutely cleaned up in head-to-head deals despite being technically less impressive. Google has since added flat-rate pricing and reservations, but the early reputation damage was permanent.

2. Google Cloud's enterprise muscle is recent. Until 2018, Google Cloud was run by engineers, for engineers. The Kurian-era reorganization fixed a lot of this, but enterprise procurement has long memories. Big regulated customers still default to AWS or Azure for the "nobody got fired for buying" reasons, and Google has to buy its way into those deals via aggressive discounts.

3. The Google brand is a double-edged sword in B2B. Google means "the consumer ad company that kills products you depend on." Enterprise CTOs have spent fifteen years watching Google Reader, Google Inbox, Google Code, Stadia, and dozens of other products get cancelled. That memory makes them nervous about betting a 7-year data platform decision on Google Cloud, no matter how good BigQuery is.

The post-2023 wrinkle in this story is AI. Google has the strongest in-house model story of any hyperscaler — Gemini, the TPU stack, DeepMind — and that has given Google Cloud a real second wind. AI workloads tend to need large amounts of object storage, fast vector search, and tight integration with managed model APIs, all of which Google Cloud actually does well. Google's bet for the rest of the 2020s is that the AI tide lifts BigQuery and GCS along with it. That bet looks reasonable.

Honest Market Take

If you were starting a greenfield data platform in 2026 and could pick any vendor on technical merit alone, BigQuery is probably the right answer for most analytical workloads. It's serverless in a way Snowflake still isn't, the query optimizer is excellent, the storage layer is the most cost-efficient, the AI/ML integration is the most credible, and the overall operational complexity is the lowest of any cloud warehouse. That's the part of the story Google is right about.

The part Google is wrong about is everything that happens outside of BigQuery. Looker has stagnated under Google ownership and is no longer the modeling thought leader it was in 2018. Dataform is a dbt competitor that nobody uses. Dataplex is a governance product that nobody mentions. Vertex AI is a perfectly serviceable ML platform that exists in the shadow of Databricks and SageMaker. The overall portfolio feels like "BigQuery and some other stuff," not like a coherent data platform the way Snowflake's or Databricks' do.

The trajectory question is whether Google can reorganize its data stack around BigQuery and the AI primitives convincingly enough to start growing share against Snowflake and Databricks. Recent moves — the Iceberg integration, BigQuery ML's expansion, the Gemini integrations — are pointing in the right direction. But Google has been "about to win cloud data" for ten years now, and the market keeps not cooperating.

How TextQL Works with Google Cloud

TextQL Ana connects natively to BigQuery and Looker, and many TextQL customers run on GCP. The BigQuery + Looker + TextQL stack is one of the cleanest end-to-end architectures we see: BigQuery as the warehouse, LookML as the semantic layer, and Ana as the natural-language interface on top. Because BigQuery is serverless and tight-budgeted by default, it's also one of the easiest backends to point an AI agent at — there's no warehouse to "wake up" and no cluster to mis-size.

See TextQL in action

Google Cloud Platform

Founded 2008 (App Engine launch)

Parent Alphabet, Inc. (NASDAQ: GOOGL)

Headquarters Mountain View, CA

CEO (Google Cloud) Thomas Kurian (since November 2018)

Annual revenue ~$43B (FY 2024 Google Cloud segment)

Category Hyperscaler / cloud platform

Crown jewel BigQuery

Monthly mindshare ~1M · third place cloud (~$40B revenue); strong data engineering reputation