over the next 10 years…
a plain text layer will abstract away the need to organize database tables and write sql. data-driven questions will return the analysis (if present) or ingestion instructions (if absent).
data usage at companies will surge, as professionals in every org can query data without sql. proactive discovery will also drive up usage. the surge in data volume will benefit all companies on the data stack. data companies will switch to usage-based models (data volume instead of seat based) to take advantage of data usage.
data analysis becomes fully automated; ai will discover trends and document them in pdfs and powerpoints. as analysis becomes commoditized, finding answers to our questions will be easy. the winners will anticipate the right questions and provide answers before they’re asked.
sql and database management abstracted away
i don't run into invalid memory access of size 8 errors because i don't use risc-v. i don't run into segmentation fault (core dumped) errors because i don't write in c. most programmers don’t see these errors anymore because python has abstracted them away. large language model-powered analysis will do the same thing to sql and db management. data analysis of the future won't include incorrect syntax near ‘table’. expecting ‘(’ or select error either.
writing sql or managing warehouse documentation will be skills only a handful of experts at data companies need to know. instead, custom models built for each warehouse will respond to plaintext with “here’s your data” or “this data isn’t available; here are 3 suggestions on how you can get this analysis…”. anyone with a question will receive the next steps to arrive at their conclusion. even developers will specify their sql in plaintext from within their ide.
analysts will focus on predictive analysis and persistent dashboards instead of ad-hoc queries. models will get so good at extracting trends that data analysts will focus on understanding the analysis they're fed. decision analysis to separate signal from noise will take the spotlight. data engineers will focus on ingesting new data instead of cleaning what's already there. automated data warehouse documentation will let business teams leverage any data that's stored.
smaller companies (50-150 headcount) with a data warehouse and no analyst team will the first adoptors. for them, everyone will make data-driven decisions by querying their warehouse in plain text. as incumbents notice their success, they’ll cross the chasm and follow behind.
easy analysis results in surge in request volume
the trend of data-driven decisions will accelerate as it becomes easier for anyone to ask questions about their data. step function improvements in nlp to sql performance will drive this change. non-technical professionals will spend 20-30 minutes every morning asking 4-5 data-driven questions. this will extent to marketing, customer success, sales, and other business functions, resulting in a surge in data request volume.
incumbents will miss the first wave, as research inertia takes them further into seq2seq nlp models. early openai-powered data companies will build better solutions with new models. these companies will drive higher volumes, making them favored by other data pipelines. incumbents will rush to take advantage of this stack via acquisitions or new products.
data companies will switch to usage-based pricing to take advantage of the surge in usage. older bi incumbents will be slow to move from per-seat pricing models. all data companies find themselves competing to tax the flow of data. this competition will drive verticalization as companies seek to capture more of the tax for themselves. the verticalization will see the largest players in the space
discovery of relevant analysis becomes priority
llm products will generate research, powerpoints, and summaries from datasets without human input. organizations will no longer have to rely on human analysts to make sense of their data. identifying the right analysis to run will become the next problem.
data products will proactively serve up insights, anticipating customer problems before they arise. by using front-of-mind problems and role descriptions, they can identify what a user wants to know before they know to ask it. customer success teams will see an analysis of user behavior before customer calls. marketing teams will get lists of accounts with x behavior to email, pre-populated with relevant email copies.
proactive recommendations will let products drive higher data usage. clickthrough rates and bounce rates will measure recommendation quality. this will lead to the emergence of data recommendation products.