NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Amazon SageMaker
Amazon SageMaker is AWS's end-to-end machine learning platform, launched in 2017. It is the default ML platform for AWS-native engineering teams and the largest single ML platform by customer count.
Amazon SageMaker is AWS's end-to-end machine learning platform, launched at AWS re:Invent in November 2017. It is, by raw customer count, almost certainly the largest ML platform in the world. Its position is the same as every other AWS service: not the best in any single dimension, but the default for any team that already lives in AWS, which turns out to be most enterprises. The default has enormous value in enterprise sales.
If Databricks ML is the data-team-led ML platform, SageMaker is the engineering-team-led ML platform. The buyer is usually a Director of ML Engineering whose team already runs production services on EC2, S3, and Lambda, and who wants ML to live in the same VPC, IAM, and billing system as everything else.
In 2016-2017, AWS noticed that machine learning was about to become a huge category and that AWS had no managed offering for it. Customers were stitching together EC2 instances, custom AMIs, S3 buckets, and DIY orchestration to train models. The internal Amazon ML teams (the ones training models for product recommendations, fulfillment, and Alexa) had built sophisticated internal tooling, but none of it was exposed to customers.
The decision to build SageMaker was made by AWS leadership specifically to commoditize the ML platform layer before any startup could build a defensible category around it. (The same playbook AWS ran against MongoDB with DocumentDB, against Elastic with OpenSearch, etc.) Werner Vogels personally announced SageMaker on stage at re:Invent 2017 with a typical AWS pitch: "managed Jupyter, managed training, managed deployment, all integrated with the AWS services you already use."
The first version of SageMaker was crude. Notebooks worked. Training worked. Deployment worked. Almost nothing else did. But the service shipped, AWS sales started selling it, and the product got better fast. By 2019, SageMaker had added Pipelines (for MLOps workflows), Experiments, Model Monitor, and a model registry. By 2021, it was a full end-to-end platform competitive with Databricks ML for AWS-native customers.
In 2024, AWS announced the next-generation SageMaker — a major redesign that reframed SageMaker as a unified data, analytics, and AI platform, integrating with Redshift, EMR, and Bedrock. This is partly a response to Databricks' lakehouse pitch, and partly an admission that the original SageMaker was too narrowly scoped to be the AWS answer to Databricks.
SageMaker is genuinely sprawling. The major components:
This is a longer feature list than any single competitor. The flip side is that SageMaker has the fragmented-AWS-product-suite feel: the components are powerful but often inconsistent, with overlapping features and confusing names. (There are at least three different ways to deploy a model.)
SageMaker is the inevitable choice for AWS-native organizations and the frustrating choice for everyone else. Its strengths are entirely about distribution and integration: every AWS account already has it, every AWS DevOps team already understands IAM and CloudFormation, and every AWS-native data pipeline can wire into SageMaker without leaving the VPC. For a Fortune 500 already running on AWS, picking anything other than SageMaker for ML is a meaningfully harder political conversation.
The frustrations are also real. The UX has historically been inferior to Databricks, Vertex AI, and most pure-play ML platforms. The component sprawl is overwhelming for newcomers. The classical AWS pricing model — billed by instance-hour for everything, with separate charges for storage, data transfer, and endpoints — is hard to predict and harder to optimize. And SageMaker has had a longer reputation for shipping features that look great in keynotes but turn out to be limited in practice (see: the original SageMaker Pipelines).
The LLM era is mixed for SageMaker. On one hand, AWS Bedrock is a genuinely strong foundation model API and integrates cleanly with SageMaker. On the other hand, the most exciting LLM training and inference workloads are happening on specialized platforms (Together, Anyscale, Modal, MosaicML/Databricks), not on SageMaker. AWS is racing to close the gap with Trainium chips, HyperPod, and the next-gen SageMaker, but the LLM-era story is still being written.
The honest prediction: SageMaker will continue to be the largest ML platform by customer count for the foreseeable future, simply because AWS has the largest cloud customer base. It will not necessarily be the best, but it will be the most inevitable. Databricks will continue to win in data-team-led purchases, and SageMaker will continue to win in engineering-team-led purchases. Both will keep growing.
SageMaker lives inside AWS:
A typical SageMaker buyer is an ML platform team at an AWS-native enterprise that needs enterprise-grade ML infrastructure with the same governance and security as the rest of their AWS environment.
TextQL Ana connects to AWS data sources — Redshift, Athena, RDS, S3-via-Iceberg — to answer questions in natural language. When customers run SageMaker for classical ML, the outputs of those models (predictions, scores, segments) typically land back in Redshift or S3, and Ana can query those outputs alongside the rest of the warehouse data. A business user can ask "show me customers with the highest propensity-to-buy scores from last week's model run" and get an answer pulled from the table SageMaker wrote. TextQL is complementary to SageMaker: SageMaker builds the models, Ana lets business users query their outputs in plain English.
See TextQL in action