Blog

The New TechStack We’ll Need for the GenAI Era

Sep 25, 2024

A new generation of founders are powering the next massive makeover of infrastructure

Generative AI has taken the corporate world by storm, with an impressive 65% of companies saying they are already using the technology in their products or operations, according to McKinsey. The reality, however, is that while the vast majority of those are in a feverish experimentation phase, few have yet to move large numbers of apps into production, says Shriyash “Yash” Upadhyay, co-founder of Martian, a startup whose technology routes GenAI jobs to the LLM that best suits the task at hand. “One executive recently told me his company has been working on 2,000 different GenAI applications, but they’re just now at the point that they have a couple dozen actually deployed.”

That’s in large part because GenAI is so new. But it’s also because most of the hardware and software infrastructure needed to support our AI-centric future has yet to be built. “Transforming data into insights using AI is far more compute intensive and complex to scale than running traditional cloud applications,” says NEA partner Aaron Jacobson. “For AI to truly change the world, companies will need a new infrastructure stack to power it, with new generations of technology at every layer.”

A burst of entrepreneurial fervor in AI infrastructure & underlying tech

To be sure, many are hard at work building this emerging tech stack, including some well established companies. Nvidia is one of only three companies with a trillion-dollar valuation. Databricks (an NEA portfolio company) has emerged as a powerhouse, thanks to lakehouse data storage that helps companies gather and prep data for easy retrieval by LLMs. 

Many others are just starting to make their mark across emerging layers of the AI stack:

  • Silicon – To compete with Nvidia’s generalized GPU, more than a dozen well-funded startups are building specialized chips designed from the ground up to handle focused aspects of running AI applications. MatX is developing a chip to maximize training and inference for the world’s largest models. Etched has created a transformer-on-a-chip ASIC designed to run any AI workloads at maximum efficiency, claiming one of their chips can replace 160 Nvidia GPUs.  

  • Cloud infrastructureVast Data built an all-flash data platform that can store all of a company’s data in one pool so LLMs can quickly find it. DriveNets’ cloud networking software lets AI platforms run GenAI apps as fast as possible by speeding the rate data flows within and between data centers. 

  • Data infrastructureWeaviate has pioneered a “vector database” that stores information in a way that makes it far easier for LLMs to process than conventional databases. Datafold offers a solution for managing the quality of the data being fed into LLMs while Granica makes software that can efficiently compress and protect this data.

  • Model infrastructureMartian has innovated the “model router” to help companies combine proprietary and open-source models to optimize cost-performance. Together AI has built the world’s fastest “AI cloud” for training, fine-tuning and inference at production scale. Genmo and Twelve Labs are among dozens of companies developing new models for specific modalities such as video.  

Many upstarts already offer compelling proof of the value in focusing on best-in-class infrastructure for AI. They include billion-dollar success stories like CoreWeave, Lambda and Together AI, companies that have invented technology to run GenAI apps far more efficiently and reliably than public cloud giants like Amazon, Microsoft and Google. 

“The big public clouds were built for the old world,” says Renen Hallak from Vast Data, whose customers include CoreWeave and Zoom. “We’re not talking about ten or a hundred or even a thousand times more data to get the most out of AI. It’s so much that we no longer bother to calculate the number. So to enable AI applications and workloads, you need a new kind of data center filled with different types of hardware, with a different software stack on top of it.”

But there are still plenty of gaps in the AI stack that will have to be filled by smart, visionary entrepreneurs, including new hardware beyond chips, such as cooling and networking technology. 

“There hasn’t been much venture investment in the past decade in storage, compute or networking because a lot of the challenges had been solved for running cloud applications,” Jacobson says. “Yet the cost and performance needs of AI apps are completely different, so it’s an opportunity to define an entirely new hardware architecture.” 

Based on our investments in 15 companies throughout this emerging AI stack, here are the key emerging themes we see.

Paving the way for inference

So far, much of the value created by infrastructure providers has been related to the way LLMs are trained. That’s because ingesting and processing the vast amounts of data to develop powerful LLMs like OpenAI’s GPT-4 or Anthropic’s Claude requires huge, centralized conglomerations of GPUs in supercomputer-like architectures. This area will continue to attract investment. Of the $1.3 billion that Bloomberg Intelligence expects to be spent on all GenAI technology (infrastructure, as well as apps and ad platforms that use GenAI) by 2032, the largest chunk, or almost $500 million, will come from training infrastructure.

At NEA, we believe the far bigger market will ultimately be for “inference” infrastructure, which supports the use of those trained LLMs to make predictions or generate content based on new information from the real world (say, to get an answer to a question posed to Perplexity or make an illustration with Dall-E). Unlike the centralized architecture needed for training, the infrastructure for inference applications will have to come in a variety of flavors. Using an LLM to render someone’s idea for a full-length movie might require the resources of a data center, but other applications may have to be deployed at the edge of the network or on IoT devices.

Because of the need to deliver far more capacity at much lower cost, the infrastructure for inference will have to provide fine-grain control so companies can deploy GenAI models with surgical precision. Given the cost of running these demanding apps, anything less could undermine the economics of running the app in the first place. After all, training can be done once or once in a while, but inference applications can be called upon repeatedly for years, until they are retired. “It’s all about continuous optimization,” says Vipul Ved Prakash,  co-founder and CEO of Together AI. “Once a model is trained, the requirements shift to finding the best combination of performance, latency and cost.”

Choice and simplicity for B2B AI Customers

The need for this level of scale and control leaves huge opportunities for innovation. Prakash and his co-founders launched in 2022 to compete with AI cloud providers such as CoreWeave, but then developed an Inference Engine that it now sells to those AI cloud rivals. Prakash says Together AI can run inference apps on the open source Llama LLM 11 times faster than traditional cloud providers such as Amazon can run them on GPT-4. That could prove critical to many early adopters of GenAI who started with OpenAI’s more expensive managed service and are now looking for less pricey options.

A hallmark of Together AI’s success is that it offers customers the flexibility to take full advantage of the massive wave of innovation happening because of GenAI. Its platform lets customers use custom models they create or fine-tune, or use any of the 700,000 (and counting) open models that have been developed since ChatGPT was announced. “Our investment thesis is that companies will use Together AI so that they don’t have to worry about what’s below,” says Jacobson. 

 Birth of the AI model router

Other entrepreneurs are not just disrupting existing categories, but creating new ones. Martian, for example, invented the “model router.” While attending the University of Pennsylvania in 2020, co-founders Etan Ginsberg and Upadhyay were among the first people trying to build LLM applications. As a result, Upadhyay says, “we were among the first to see why building LLM applications sucked. People were creating all these amazing models, but there was no way to figure out which ones to use.”

Solving the problem was anything but trivial, given the “black box” nature of LLM models. But Upadhyay saw an opportunity. Big players like Google and Microsoft were focused on monetizing their own technologies. And OpenAI, he says, “makes more money by making its models smarter, not by making them more transparent and understandable. So even though understanding how these models work is one of the most important problems we need to tackle, not many companies have a strong incentive to try.”

Martian has taken advantage of this vacuum. Its software evaluates each task put before a GenAI system and decides in near real-time which model can execute it at the optimal combination of cost, performance and latency, per the customer's priorities. Importantly, the technology can even assign the job to a combination of models. This flexibility means developers can avoid getting locked in to one model provider. “The best player is always changing,” says Upadhyay. “One day it’s OpenAI, then it’s Anthropic, then it’s someone else.”

Think different – and very big – about AI infrastructure

Over the long haul, we believe the winners of this AI infrastructure makeover will be some of the most valuable companies in the world. That’s why we are excited to back entrepreneurs like Vast Data’s Hallak, who understands that the need for radical new approaches to computing makes this a great time to think big. Founded in 2015 as a maker of high-performance storage hardware, the company pivoted to create a novel approach to storage management. Rather than storing data in different ways — for example, putting product catalogs on fast but expensive All-Flash storage, but archiving old tax filings on cheap hard drives — VAST Data’s single-tier architecture makes all information instantly accessible to AI applications.

Remaking the data layer is a huge ambition, but Hallak always planned to go even farther.  The company introduced a database product in 2023, and earlier this year added a compute framework — essentially, the makings of an operating system optimized for using GenAI applications. “In all three big technology revolutions — PCs, mobile and the cloud — OS companies were often the most valuable,” he says. “I think there’s a huge vacuum in software, and we intend to fill it.”

Early but accelerated innings

Like all massive transformations, the construction of a new AI stack won’t happen overnight, but the innovation cycle is happening far faster than it did for the cloud, mobile or the internet. Startups have already launched thousands of AI applications for critical business processes like customer support, marketing campaigns and legal work. And the next wave will be enterprises deploying their own internally built AI apps en masse. 

As these AI apps scale, the teams running them will learn that many of today’s options are too limiting and expensive. Some already are. “Once companies spend over $1 million on closed managed service, they start looking for a way forward that gives them more control over cost, privacy and whatever other concerns they may have,” says Together AI’s Prakash.

Ultimately, hundreds of billions of dollars (if not trillions) will be spent on providing the technology foundation on which the AI economy will rest. The companies that cement themselves as part of that foundation will become generational. “We’re just at the tip of the iceberg” of the GenAI era, says Martian’s Upadhyay. “Those of us who are trying to build the future get to see farther ahead. And increasingly, the world has come to see that our way of doing things is correct.”