NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Data Governance & Security
Data governance and security tools control who can see which data, how it is masked or tokenized, and how compliance obligations are enforced. The category spans access control platforms, data masking, policy engines, and data protection suites.
Data governance and security is the layer of the stack that decides who is allowed to see which data, in what form, under what conditions, and with what audit trail. It is the least glamorous part of the modern data stack and the first one a regulator asks about.
The category is easy to conflate with data catalogs because the buyers overlap (Chief Data Officers, governance committees, privacy officers) and the vendors overlap (Collibra sells both, Atlan has governance features, Unity Catalog is both a catalog and an access control system). The cleanest way to separate the two: a catalog tells you what exists and what it means. A governance and security tool tells the warehouse whether to return a row, and if so, whether to mask the social security number before returning it.
Think of it like a library. The catalog is the index of every book on the shelf. Governance and security is the librarian who checks your ID before handing you a restricted book, crosses out sensitive sections with a marker before giving it to you, and writes down in a ledger that you checked it out. Both are necessary. They are different jobs.
Every vendor in this category does some subset of the following five things:
1. Access control — decide who sees what. The basic primitive is: a user (or service account) runs a query; the tool decides whether to allow it, deny it, or modify it. Traditional databases do this with role-based access control (RBAC): users are assigned roles, roles are granted privileges on tables and columns. Modern platforms also support attribute-based access control (ABAC), where decisions are made against user attributes (department, clearance, region) and data attributes (classification, sensitivity, jurisdiction). ABAC scales better because you write one policy like "users in EU cannot see data tagged pii_us" instead of a thousand role grants.
2. Data masking and tokenization. For a column like ssn or email, the tool returns either the real value, a masked value (XXX-XX-1234), a hashed value, a synthetic substitute, or null — depending on who is asking. This is critical for analytics teams that need to work with production data without ever actually seeing PII. Variations include dynamic masking (applied at query time), static masking (baked into a copy of the data), and format-preserving tokenization (where the masked value still looks like the original format so applications don't break).
3. Row-level security and filtering. Some users can see the full table; others can see only rows where region = 'EU' or customer_id IN (list). Row-level security (RLS) is the single most requested feature in regulated-industry governance deployments because it lets you share a single physical table across geographies and business units without copying it.
4. Policy authoring and centralization. Instead of writing grants against each warehouse separately, you write one policy (in a GUI or as code) and the tool pushes it down to Snowflake, Databricks, BigQuery, Redshift, and your lake. This is the main "platform" pitch: one policy plane across many data stores, so you don't have to duplicate governance logic per warehouse.
5. Audit, logging, and compliance reporting. Every access decision is logged, every query is recorded with the policies applied, and the tool produces reports that say "here is everyone who accessed HIPAA-regulated data in the last 90 days." Compliance officers ask for this constantly, and it is often the official reason the tool got bought.
The category splits into three meaningfully different types of product, and confusing them is the most common source of bad RFPs.
Governance suites (Collibra, Informatica, IBM). These started as business-glossary and policy-workflow tools and added execution (masking, RLS) later. They are top-down tools for compliance committees, with heavy process and light enforcement. Collibra's "Protect" module is the execution layer, but the product's center of gravity is policy authoring, stewardship, and reporting. Good fit for organizations where the audit trail matters more than millisecond query latency.
Access control platforms (Immuta, Privacera, Okera). These started as enforcement tools — policy engines that sit in front of or inside the warehouse and decide whether each query is allowed. They are bottom-up tools for data platform engineers and security teams, with deep technical integration and lighter workflow layers. Immuta, in particular, built its reputation around ABAC and fine-grained masking at query time.
Warehouse-native (Snowflake Horizon, Databricks Unity Catalog, BigQuery). The cloud warehouses have aggressively added their own governance features over the last few years. Snowflake now ships dynamic masking, row access policies, object tagging, and access history. Databricks Unity Catalog includes column-level access control, lineage, and attribute-based policies. BigQuery has DLP, policy tags, and authorized views. For customers committed to a single platform, these are increasingly good enough, and they are killing the stand-alone access control category from below.
The tension between these three camps is the most important dynamic in governance right now. Warehouse-native tools are eating the simpler use cases; third-party access control platforms (Immuta, Privacera) are being squeezed toward the multi-engine, multi-cloud, highly regulated segment where their platform-agnosticism still matters; and legacy governance suites (Collibra) are leaning harder on workflow and audit while partnering with or acquiring execution layers.
Immuta is the strongest pure-play access control vendor, with the deepest ABAC story and the tightest Snowflake and Databricks integrations. It wins at companies where the data estate spans multiple engines and a unified policy plane is genuinely needed. The long-term risk is warehouse-native convergence: if you only use Snowflake, Snowflake Horizon is catching up fast and already good enough for many use cases.
Privacera is the Apache Ranger commercial offering, built by the original Ranger team, and remains the default answer for Hadoop-heritage environments, lakes, and multi-engine architectures. It wins at companies with significant on-prem or lake workloads where Ranger-style policy enforcement is already familiar. It is less visible in the new cloud-native deal flow than Immuta.
Collibra is the compliance-committee answer, not primarily an enforcement tool. Its strength is policy definition, glossary, and audit reporting. When Collibra is evaluated head-to-head against Immuta or Privacera on pure masking and RLS performance, it loses; when it is evaluated on ability to satisfy a regulator, it wins.
Warehouse-native tools (Snowflake Horizon, Unity Catalog, BigQuery policy tags) are winning the simpler cases and are the default recommendation for single-platform customers. They are not adequate yet for multi-cloud, multi-engine, or highly cross-data-store governance, but the gap is closing.
BigID, OneTrust, and privacy-first tools occupy a parallel category focused on data discovery, classification, and privacy workflows (subject access requests, data retention, consent management). They are worth knowing about but are oriented toward privacy and risk teams more than analytics.
Access control and AI analytics have a surprisingly intimate relationship. When a business user asks a natural-language question, the answer must respect all the same row-level, column-level, and masking policies as if they had written the SQL themselves. TextQL Ana executes generated queries under the identity of the asking user, so every query passes through Immuta, Privacera, Snowflake Horizon, or Unity Catalog the same way a human-written query would. The governance layer remains the source of truth for access decisions; Ana is just another consumer.
Two practical implications. First, existing masking and RLS policies automatically apply to AI-generated queries without any additional configuration — a major deployment advantage for regulated industries. Second, for customers who have already invested in a strong governance layer, TextQL inherits that investment rather than asking them to redo it inside an AI vendor's separate policy system.
See TextQL in action