NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Upsolver
Upsolver is a streaming ETL platform founded in Tel Aviv in 2014 that turns SQL into managed pipelines from Kafka, Kinesis, and other event sources into data lakes and warehouses. It is not a streaming platform itself -- it sits on top of one.
Upsolver is a streaming ETL platform that lets you write SQL to define pipelines from event sources (Kafka, Kinesis, S3 event notifications, CDC streams) into destinations like S3, Iceberg tables, Snowflake, and Redshift. It is most accurately described not as an event streaming platform itself, but as a SQL-driven layer that sits on top of event streaming platforms to handle the messy job of getting events into queryable, well-structured analytical storage.
It was founded in 2014 in Tel Aviv by Ori Rafael and Yoni Eini, who came out of the Israeli ad-tech scene where the central problem was: "we have billions of events per day flowing through Kafka and Kinesis, and we need them in a queryable form in our lake without hiring a 10-person streaming team to babysit Spark and Flink jobs." Upsolver's pitch from day one has been to make that pipeline declarative rather than operational.
The dirty secret of event streaming is that getting data into Kafka is the easy part. Getting it out, in the right shape, to the right place, with no duplicates and no data loss, is where the engineering hours go. A typical "land Kafka events in our warehouse" project involves:
This is hard. It is also unglamorous, repetitive, and where data engineering teams burn most of their cycles. Upsolver's bet is that all of this can be expressed declaratively in SQL and managed by a hosted service that handles the operational complexity for you.
You point Upsolver at a source (a Kafka topic, a Kinesis stream, an S3 prefix with new file notifications, a CDC stream from a database) and write a SQL statement that describes the transformation and the destination. Upsolver compiles that SQL into a continuously running pipeline, manages the state, handles file compaction, enforces exactly-once semantics, and maintains the destination table.
A typical pipeline might look like:
CREATE SYNC JOB load_orders_to_iceberg
AS COPY FROM KAFKA my_kafka_connection TOPIC = 'orders'
INTO ICEBERG my_catalog.analytics.orders
RUN_INTERVAL = 1 MINUTE;
That single statement gets you: continuous ingestion from Kafka, schema inference, automatic file compaction, exactly-once semantics, and an Iceberg table that downstream query engines (Snowflake, Trino, Athena, Spark) can read. The same pipeline written in raw Flink or Spark would be hundreds of lines of code plus a Kubernetes deployment plus a monitoring dashboard.
Upsolver also supports stateful transformations: windowed aggregations, joins with reference data, deduplication, late-arrival handling. The SQL syntax is extended to express streaming-specific concepts like watermarks and tumbling/sliding windows.
The category Upsolver sits in is sometimes called streaming ETL or stream-to-lake ingestion. Conceptually, it overlaps with:
The honest competitive picture: Upsolver was early and well-positioned in 2018-2020, when neither Snowflake nor Databricks had a credible streaming-ingest story. Both vendors have since closed the gap dramatically. Snowpipe Streaming and Snowflake's Dynamic Tables, plus Databricks's Delta Live Tables, now do much of what Upsolver does, with the substantial advantage of being native to the destination warehouse. This has compressed Upsolver's market.
Where Upsolver still wins:
Upsolver is a sharp, narrow tool that does one thing well: declarative streaming ingestion from event sources into analytical storage, with operational complexity abstracted away. If you have a Kafka cluster and you need its data in Iceberg or Snowflake without building a Flink team, Upsolver is a legitimate option and often a better choice than trying to roll your own Spark Structured Streaming pipeline.
The strategic risk for Upsolver is the same risk faced by most "middleware between Kafka and the warehouse" vendors: the warehouses themselves keep absorbing the streaming-ingest job. Snowpipe Streaming, Dynamic Tables, Delta Live Tables, and BigQuery's streaming inserts all encroach on Upsolver's core use case. The competitive question is whether Upsolver can stay ahead by being cross-destination, lake-format-native, and lower-cost than the warehouse-native equivalents.
Upsolver does not store data that TextQL queries directly. Instead, Upsolver pipelines land data in destinations that TextQL connects to: Snowflake, Redshift, Databricks, or Iceberg tables on S3 queried via Athena/Trino. The role Upsolver plays in a TextQL stack is to make sure events from Kafka or Kinesis arrive in the warehouse fresh and well-structured, so that when a business user asks a question through TextQL Ana, the underlying data is already there in queryable form.
See TextQL in action