Scale AI | TextQL

NEW: Ana Now Supports Firebolt Read now →

Scale AI

How Scale AI answers ~1,900 data requests per week across Ops, Finance, Growth & HR

One AI agent serving four business units — every answer governed through a single semantic layer owned by the Data team.

>50%

of Scale’s dbt code reviewed by Ana

1.9T

rows across tables handled in production

+74.9%

organic monthly adoption

"I suggest you try TextQL on your messiest datasets, hook it up to your worst codebase and documents, and ask the most complicated question that actually drives your business."

Heqing Huang, Director of Analytics, Scale AI

X in f

About Scale AI

Scale AI is a software company providing data labeling and development tools for artificial intelligence, operating a marketplace of hundreds of thousands of expert contributors across 73 countries.

Industry

Software services

Company Size

~1,200 employees

Headquarters

San Francisco, CA

Pain Point

Four business units with distinct data stacks, all bottlenecked on a shared Data team and a growing ticket queue

Products Used

Data Sources

Snowflake, dbt, Tableau, Redash

Every team self-served. Every query governed.

Request Demo →

Scale AI produces billions of data points generated by hundreds of thousands of expert contributors across 73 countries. Running a business that complex means every department (Finance, Growth, Ops, HR) depends on data to operate. With the constant requests, an analytics backlog grew. Turnaround slowed. And the team who should have been building foundational data infrastructure were stuck fielding ad hoc queries instead.

Rather than hire another ten analysts, Director of Analytics Heqing Huang deployed TextQL’s Ana. Pre-loaded with Scale’s own data sources and governed through a single centralized semantic layer, Ana now resolves roughly 1,900 data requests per week and runs 42 active playbooks in production. Stakeholders across the company get self-serve answers in minutes — not days.

Queries answered per week

~350

~1,900

Leading fintech company's internal data agent usage

Scale AI's internal Ana usage

5.4× throughput

Source: recently, a leading fintech company reported ~1.4K queries answered over 4 weeks (~350/wk).

[ THE FOUNDATION ]

How the Data team kept control while opening access

Every Ana deployment at Scale runs on the same data engineering foundation: dbt models, Snowflake warehouse, Tableau dashboards. They're governed through a single semantic layer owned by the Data team. Teams get self-serve access; the Data team keeps ownership of what every metric actually means.

CONTROL

Every query governed. Nothing gets past the semantic layer.

BUSINESS UNITS

FINANCE

420 req/wk

GROWTH

510 req/wk

OPS

380 req/wk

290 req/wk

CENTRAL SEMANTIC LAYER — OWNED BY THE DATA TEAM

ONTOLOGY

Metric definitions

CERTIFIED

dbt lineage

ACCESS

Row/col scoping

AUDIT

Query logs

CONNECTED SYSTEMS

Snowflake

dbt

Tableau

Redash

CRM

Procurement

+7 more

Ana reads dbt model lineage directly, so every answer traces back through the same transformations that power Scale's certified dashboards in Tableau. When a team asks "what was revenue last quarter," Ana resolves it against the same model that finance uses in the board deck. No drift, no second source of truth.

[ TEAM CONFIGURATIONS ]

Same agent. Four different worlds.

The deployment wasn't one-size-fits-all. Each business unit got an Ana pre-loaded with the data sources they actually use — Growth sees CRM and pipeline telemetry, Finance sees procurement and cloud billing, Ops sees the contributor marketplace, HR sees HRIS and performance management data. Underneath, every query resolves against the same governed metric definitions.

Finance

~420 req/wk

Finance pulls from Snowflake billing tables, procurement software for BPO invoices, and AWS/GCP billing for infrastructure costs. These three systems rarely agree with each other, and reconciling them was a recurring time sink.

A question like “what’s our spend efficiency across campaigns?” required joining billing actuals against invoice line items against cloud compute, then defending the number in front of leadership.

Ana handles the cross-system join in a single query. Finance now runs campaign-level spend tracking, budget variance analysis, and cost allocation across programs without filing a ticket. The questions that matter most to the CFO’s office get answered in the meeting where they come up, not a week later.

ProcurementAWS BillingGCP Billing

Growth

~510 req/wk

Growth tracks delivery metrics, free-trial conversion, and marketplace performance across Snowflake ETL pipelines, CRM data, and snapshot views that capture point-in-time funnel states.

The hard questions were never about any one system. They were about correlations across all three: “how do week-over-week delivery metrics relate to pipeline movement in the CRM, and where are free-trial conversions stalling?”

Ana reads the ETL schemas, understands the CRM object model, and reconciles snapshot timing differences automatically. Growth built playbooks that refresh weekly and deliver results into Slack channels. The team went from requesting reports to owning their analytics workflow entirely.

CRM

Ops

~380 req/wk

Supply ops manages contributor availability across a 73-country marketplace. Their data lives in Snowflake ETL tables, task and delivery tables tracking contributor output, and demand forecasting models.

The question is always some version of “do we have enough contributors with the right skills in the right regions for what’s coming in three months?” Answering it means joining availability data with task completion rates with forward-looking demand signals.

Ana runs those joins and produces capacity projections by region, skill type, and program. Ops uses it to flag constraints before they become delivery problems. The dashboards refresh automatically instead of requiring a manual pull every time leadership asks for an update.

Demand forecasting tablesWorker & supply data

~290 req/wk

HR operates across their HRIS for headcount and contributor data, a performance management platform for reviews, and a workforce administration system for HR operations. Each system has its own ID schema and its own definition of “active.”

A question like “contributor retention by region” used to mean pulling from all three, reconciling the mismatches, and hoping the numbers held up in review.

Ana maps across the systems and handles the reconciliation. HR now runs retention breakdowns, workforce allocation analysis, and planning queries that inform hiring decisions on demand.

HRISPerformance ManagementWorkforce Admin

"The traditional BI players — Tableau, Snowflake — are not innovating as fast as these new tools that add value to our business."

Heqing Huang, Director of Analytics, Scale AI

[ THE RESULTS ]

Data scientists build models now, not reports.

Within nine months, Ana went from a pilot to the default analytics surface at Scale. The numbers tell the story: 28,000+ total messages sent, 11,500+ threads started, and a peak week of ~1,900 messages. Early adoption averaged ~400 messages over the first 30 days; the most recent 30-day window hit ~7,000 — a 17× increase, entirely pull-driven. The Data team’s inbound queue dropped dramatically, and they shifted from fulfilling requests to owning the semantic layer underneath all of it.

QUARTERLY IMPACT

From pilot to ~1,900 messages a week

Weekly Messages to Ana

*Average weekly messages per month. 28,000+ total messages across 11,500+ threads.

Automated weekly reporting

Ana delivers breakdowns on supply capacity, demand forecasting, and contributor performance directly into Slack channels, including project health metrics, contributor retention analysis, and spend efficiency.

Dynamic dashboards in minutes

Any user can build a fully functioning, complex dashboard from scratch in under 45 minutes and schedule it to refresh automatically. What once required days of back-and-forth with the data team now happens in a single sitting.

On-demand business questions

Ana provides instant responses to exploratory analysis, scenario planning, and root-cause investigations, including ad-hoc requests from Scale’s executives on project timelines and marketplace dynamics.

Code-level transparency

Ana traces through DBT models and SQL logic to explain how metrics are constructed, enabling stakeholders to understand the analysis without pinging the data team for a walkthrough.

The expansion from analytics into the rest of the company happened the way the best platform rollouts do: someone used it, their colleague saw what it could do, and the next Slack message was “can I get access?” Finance saw what analytics was doing with Ana and started using it for spend efficiency. Growth picked it up for campaign analysis. Supply ops adopted it for contributor forecasting. HR used it for workforce planning. The 74.9% month-over-month user adoption growth wasn’t a launch metric. It was a trailing indicator of a tool that was already indispensable by the time anyone thought to measure it.

Scale stopped treating analyst bandwidth as the bottleneck for every data question in the company and gave every team the ability to answer their own questions at the speed the business actually moves.

"With TextQL, our analysts can now focus on high leverage tasks and the most challenging problems."

Heqing Huang, Director of Analytics, Scale AI