Upsolver | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Upsolver

Upsolver is a streaming ETL platform founded in Tel Aviv in 2014 that turns SQL into managed pipelines from Kafka, Kinesis, and other event sources into data lakes and warehouses. It is not a streaming platform itself -- it sits on top of one.

Upsolver is a streaming ETL platform that lets you write SQL to define pipelines from event sources (Kafka, Kinesis, S3 event notifications, CDC streams) into destinations like S3, Iceberg tables, Snowflake, and Redshift. It is most accurately described not as an event streaming platform itself, but as a SQL-driven layer that sits on top of event streaming platforms to handle the messy job of getting events into queryable, well-structured analytical storage.

It was founded in 2014 in Tel Aviv by Ori Rafael and Yoni Eini, who came out of the Israeli ad-tech scene where the central problem was: "we have billions of events per day flowing through Kafka and Kinesis, and we need them in a queryable form in our lake without hiring a 10-person streaming team to babysit Spark and Flink jobs." Upsolver's pitch from day one has been to make that pipeline declarative rather than operational.

The Streaming ETL Problem Upsolver Solves

The dirty secret of event streaming is that getting data into Kafka is the easy part. Getting it out, in the right shape, to the right place, with no duplicates and no data loss, is where the engineering hours go. A typical "land Kafka events in our warehouse" project involves:

Choosing a stream processing framework (Flink, Spark Structured Streaming, Kafka Streams).
Writing code to deserialize messages, validate schemas, and handle malformed records.
Managing state for deduplication and exactly-once semantics.
Compacting small files in object storage so query engines don't choke on millions of 4KB Parquet files.
Maintaining indexes, partitions, and table metadata as data grows.
Handling schema evolution when the upstream producers change their event format.
Backfilling historical data when a new consumer wants to replay the stream.
Monitoring lag, throughput, and pipeline health.

This is hard. It is also unglamorous, repetitive, and where data engineering teams burn most of their cycles. Upsolver's bet is that all of this can be expressed declaratively in SQL and managed by a hosted service that handles the operational complexity for you.

How Upsolver Actually Works

You point Upsolver at a source (a Kafka topic, a Kinesis stream, an S3 prefix with new file notifications, a CDC stream from a database) and write a SQL statement that describes the transformation and the destination. Upsolver compiles that SQL into a continuously running pipeline, manages the state, handles file compaction, enforces exactly-once semantics, and maintains the destination table.

A typical pipeline might look like:

CREATE SYNC JOB load_orders_to_iceberg
  AS COPY FROM KAFKA my_kafka_connection TOPIC = 'orders'
  INTO ICEBERG my_catalog.analytics.orders
  RUN_INTERVAL = 1 MINUTE;

That single statement gets you: continuous ingestion from Kafka, schema inference, automatic file compaction, exactly-once semantics, and an Iceberg table that downstream query engines (Snowflake, Trino, Athena, Spark) can read. The same pipeline written in raw Flink or Spark would be hundreds of lines of code plus a Kubernetes deployment plus a monitoring dashboard.

Upsolver also supports stateful transformations: windowed aggregations, joins with reference data, deduplication, late-arrival handling. The SQL syntax is extended to express streaming-specific concepts like watermarks and tumbling/sliding windows.

Where Upsolver Fits in the Stack

The category Upsolver sits in is sometimes called streaming ETL or stream-to-lake ingestion. Conceptually, it overlaps with:

Apache Flink for stream processing — but Flink is a framework, not a managed service, and writing Flink jobs is real engineering work.
Confluent Cloud + ksqlDB — which solves a similar SQL-on-streams problem but is tied to Kafka and the Confluent ecosystem.
Snowpipe Streaming / Snowflake Dynamic Tables — Snowflake's first-party answer to "get streaming data into Snowflake with low latency and SQL-defined transforms."
Databricks Delta Live Tables / Lakeflow — Databricks's equivalent for Delta Lake.

The honest competitive picture: Upsolver was early and well-positioned in 2018-2020, when neither Snowflake nor Databricks had a credible streaming-ingest story. Both vendors have since closed the gap dramatically. Snowpipe Streaming and Snowflake's Dynamic Tables, plus Databricks's Delta Live Tables, now do much of what Upsolver does, with the substantial advantage of being native to the destination warehouse. This has compressed Upsolver's market.

Where Upsolver still wins:

Ingestion into open table formats (Iceberg) on cheap object storage, where you don't want to be locked into Snowflake's or Databricks's compute. Upsolver has invested heavily in being the best Iceberg ingestion path and is one of the cleanest options for "build a lakehouse on Iceberg without writing Spark jobs."
Multi-destination pipelines where the same source needs to land in multiple places (a lake for archive, a warehouse for BI, an OLAP database for real-time dashboards).
Heavy stream sources (Kafka, Kinesis) at scale where Snowpipe's per-row overhead becomes expensive.

The Honest Vendor Take

Upsolver is a sharp, narrow tool that does one thing well: declarative streaming ingestion from event sources into analytical storage, with operational complexity abstracted away. If you have a Kafka cluster and you need its data in Iceberg or Snowflake without building a Flink team, Upsolver is a legitimate option and often a better choice than trying to roll your own Spark Structured Streaming pipeline.

The strategic risk for Upsolver is the same risk faced by most "middleware between Kafka and the warehouse" vendors: the warehouses themselves keep absorbing the streaming-ingest job. Snowpipe Streaming, Dynamic Tables, Delta Live Tables, and BigQuery's streaming inserts all encroach on Upsolver's core use case. The competitive question is whether Upsolver can stay ahead by being cross-destination, lake-format-native, and lower-cost than the warehouse-native equivalents.

How TextQL Works with Upsolver

Upsolver does not store data that TextQL queries directly. Instead, Upsolver pipelines land data in destinations that TextQL connects to: Snowflake, Redshift, Databricks, or Iceberg tables on S3 queried via Athena/Trino. The role Upsolver plays in a TextQL stack is to make sure events from Kafka or Kinesis arrive in the warehouse fresh and well-structured, so that when a business user asks a question through TextQL Ana, the underlying data is already there in queryable form.

See TextQL in action

Upsolver

Founded 2014, Tel Aviv, Israel

Founders Ori Rafael (CEO), Yoni Eini

License Proprietary, SaaS

Interface SQL (declarative pipelines)

Primary use Streaming ETL from Kafka/Kinesis to lakes and warehouses

Category Event Streaming / Streaming ETL

Monthly mindshare ~5K · niche streaming ETL; small customer base