Blog

Beyond Beating the SWE-Bench: How AI Will Modernize Enterprise Code

by Madison Faulkner and Maanasi Garg

Behind every innovator’s dilemma is a legacy code base.

As codebases become more complex, more engineering time gets sunk into maintaining and troubleshooting old code and less goes to innovating and writing new code. For enterprises struggling to manage these codebases, switching costs provide a buffer to fend off faster-moving startups. With AI-led software development, switching costs are rapidly declining. Incumbent moats suddenly become shallower, and the need to modernize enterprise code bases becomes existential. This positions startups with a huge opportunity.

Code modernization enables software development to continuously improve with the pace of technology. Pre-AI, this modernization often meant massive manual engineering initiatives that scaled by adding more engineers. AI, however, gives us the ability to build liquidity into code bases by modernizing code at scale with systems, rather than human engineers.

But we’ve only begun to build the AI tech stack for software development. New startups building AI-native software development tools have largely focused on beating the latest SWE benchmark, seeking to fundraise on incremental model improvements. While code generation is a key aspect of how we modernize code bases and optimize the dev experience, AI-native code modernization encompasses far more than accurate AI code generation. Similar to foundation models, we believe there is a race to the bottom in simply providing code generation.

The great opportunity is not beating the latest benchmark – it is in building the enterprise abstraction which enables modernization of codebases. These abstractions will be AI-native tooling that go beyond code generation to deliver more automated code migration, continuous end-to-end review and testing, and a collaborative human-agent developer experience.

The NEA Framework for AI Opportunities in the Software Development Lifecycle

Startups are attacking every part of the software development stack with AI-native solutions, but not every part of the stack is equally ripe for disruption. Large incumbents have a deep understanding of the daily developer workflow, large quantities of incremental code change capture, and developers who are already embedded into operating systems and IDEs. Their slower pace of innovation in many scenarios is outweighed by their sizable repos.

For startups, it is all about the ability to redefine the dev experience. Three core factors – the complexity of the codebase, the complexity of the workflow, and the complexity and availability of the data – tend to determine how much pain developers and organizations feel and the opportunity AI-native startups have to capture market share with new software development solutions.

Complexity of the codebase: How painful has it been to modernize the codebase, and to what degree can new AI capabilities make a code base more fluid?

Enterprises with complex, rigid code bases have often had to hire specialized expertise to build around code complexity. AI provides a translation layer and a shared latent space between domain experts that can make codebases more liquid and easier to modernize.

Complexity of the workflow: Is a startup able to redefine the entire workflow, or just improve steps within it? Can the workflow be served by just AI capabilities, or does it require combining new AI capabilities with traditional workflows?

The more a startup can overhaul how work gets done, the more opportunity there is to completely transform the developer experience. When the entire process is redefined, the opportunity is much bigger for startups to counterposition their product against incumbents.

For instance, it’s possible to take an LLM and, with the right prompts and a RAG architecture, produce basic code documentation, making it a use case more likely to be served by an AI application within an incumbent product. But more complex workflows, where companies need deterministic and auditable workflows around their code, require more than just an LLM wrapper, giving startups room to differentiate and defend new offerings.

Complexity & availability of the data: What data is available to improve on existing solutions, and can startups gain an advantage around acquiring data or creating synthetic data.

It takes data for AI solutions to improve the user experience, and data moats have long been the holy grail for AI companies. Once an incumbent has proprietary data access that drives the use case, it can create a moat too big for startups to cross and compete, unless startups can acquire or simulate the data necessary to compete.

When we apply this framework to different software development use cases, we come up with three broad categories of software development products – commoditized solutions, mixed opportunities, and high-value use cases.

Bucket 1: Commoditized solutions

Use cases like general code generation and diff reviews are the easiest for incumbents to commoditize because they either lack complexity to allow for meaningful differentiation or are so deeply integrated into existing enterprise workflows that the cost to switch is a barrier to new entrants.

Code generation

Code generation solutions do not inherently build stickiness into the dev experience, without focusing on the sticky platform experience. There may be some incremental opportunity in vertical code generation, but incumbents and open-source solutions are still likely to limit what startups can effectively capture. Capital will be a primary driver of the foundation models that continuously rank the highest on the SWE-bench.

Bucket 2: Mixed opportunity

In the middle are use cases with some complexity and opportunity for differentiation, making these categories a toss-up between startups and incumbents. Use cases in this bucket often require sophisticated data retrieval mechanisms to improve and store a company’s context. While incumbents like Atlassian will often have a data edge, these use cases require more complex use of AI, giving startups an opening to redefine workflows and the developer experience.

Code review and context debt

Code review has traditionally been a difficult space to disrupt, but the proliferation of AI-generated code escalates the importance of the review process as a collaboration point between humans and agentic systems. Significantly more code is being created with AI, raising the need for tools to manage growing context debt. This goes beyond documentation and is fundamental to effective code generation. An AI tool without the context to understand the entire code base will create erroneous code or generate hundred lines of unit tests that already exist, simply because code generation tools don’t have the context and subject matter expertise to integrate new code generation into the existing code base.

Code search and documentation

Related, code search requires mixing static code analysis with LLM semantic understanding for natural language search to work. The complexity required in both code search and code review to mix new AI capabilities with traditional methods makes it a potential area for disruption, if startups can provide greater context around the existing codebase and greatly improve the dev experience.

IDE

IDE integration has traditionally been extremely difficult to disrupt because the cost to switch to a new solution is high and established workflows serve as natural moats. Startups can reach feature parity with incumbents, and we have seen that the space can be disrupted. For instance, Cursor has reached $100M ARR with a much-loved consumer product. However, crossing from consumer to true enterprise IDE solution may still prove challenging, while Codeium’s focus on enterprise-first IDE experience has given them a leg up on this segment. The complexity of integration and the desire to enable plug-and-play development that allows a company to keep its IP and make it interoperable across development environments still favors established players. Innovators here will balance the new dev experience between agents and humans.

Bucket 3: High-value use cases

The biggest opportunity for startups is in complex, high-value, and time-consuming engineering workflows that require combining agentic capabilities with enterprise context. Code migration and testing are complex workflows because they rely on decades of historical context and shared enterprise decisions. Organizations have to identify which chunk of code to test, run complex search and translation across multiple languages, and then securely move code while minimizing disruptions. In many companies, entire systems are untouchable due to missing expertise internally.

Code migration and modernization

While there are opportunities up and down the software development stack, code migration is one of the most compelling for startups. AI isn’t just changing the dev lifecycle, it’s also changing how codebases appear and the optimal languages to leverage. The semantic library and ability to translate between languages is an increasingly critical foundation for the next-gen software stack. We are seeing rapid growth in the usage of AI-friendly languages such as Python and TypeScript.

Startups that can deliver a markedly better experience with the ability to move flexibly across languages and libraries and automate testing of the results could unlock huge new markets and greatly improve code bases in the process. AI has the capability of being a universal translator, and while we often hear about that in the context of natural language, programming is also a set of languages. Code migration is effectively the process of translating code from one format to another, be that a new package, language version, or an entirely new language or framework (e.g., Angular to React, JavaScript to TypeScript, or Python 2 to 3).

Winning code modernization players will:

Identify a high-value workflow as a wedge

There is a wide range of migrations that startups can choose to tackle when entering this space. We think the most promising startups will target migrations with the largest market sizes and clear enterprise urgency. For instance, SAP has set a deadline for customers to migrate from legacy ERP systems to S4/HANA, making that a compelling migration path for startups to tackle. Similarly, a White House report highlighting the vulnerabilities of C and C++ has led to a number of C++ to Rust migration initiatives.

Build end-to-end solutions that combine generative AI, traditional static analysis, and human review

LLMs are by nature probabilistic, meaning the output can vary, and code migration requires a deterministic output of the new code. LLMs can improve the efficiency, especially in code translation and generation, but additional tools and processes are needed to get to an operational and auditable codebase post-migration. A successful product here will have a clear vision of how to weave together the strengths of traditional static analysis with generative AI. This vision will require a founding team with deep, hands-on experience with the specific migration workflow that their startup is tackling.

Focus early on enterprise sales

In order to unlock enterprise value, dev tool startups in both vertical and horizontal domains will need to build into sticky workflows. Focusing on enterprise sales early will target towards finding product abstractions that truly unlock real value. Code migrations and modernization in particular will require trust-building to manage legacy systems that have been in place for decades.

Delight developers with new and improved UI/UX

While C-suite end buyers are often focused on security as a purchase driver, if developers are the one actually making and improving the code changes, it will be the quality of the developer experience that makes the product sticky and allows startups to expand.

Expand into additional migrations and recurring testing

The biggest drawback of code migration is that it is one-off revenue by nature. However, if a startup can nail migration, it gives them the code base data to move into recurring testing workflows, since in our experience, a lot of code migration customers say that the biggest challenge to using AI for code migration is testing.

For instance, NEA’ portfolio company Datafold has an end-to-end data engineering platform for AI with functionality enabling data diffs and data testing/validation. From this platform stance, they have used agentic functionality, starting with a data migration agent, to build recurring migration and modernization capabilities for their customers. Code migration gave them the usage to build a validation flywheel and improve further data migration logic across warehouses.

The future: self-healing code bases (and less technical debt)

The code environment is more complex than ever, with more integrations and code to manage, and a critical part of code modernization is end-to-end tests across every part of a code base: the front end, back end, cloud and API integrations. While we are still a long way from this reality, eventually AI-enabled software development that combines effective context with new agentic architectures has the potential to give us proactive, automated testing, in which AI and AI agents run recurring checks, tests and security evaluations, take actions on your behalf, and surface when it needs your clarification or sign-off. The result: self-healing codebases with continuous, automated code modernization and less technical debt.

We have been so energized by the teams we’ve met building in this space to date, and we believe there are clear opportunities to modernize both across verticalized engineering domains and broadly across the engineering workflow. Value capture will centralize around companies which build sticky enterprise platforms that revolve around recurring modernization. If you’re a founder working on disruptive horizontal or verticalized code modernization tooling, we’d love to hear from you—please reach out: mfaulkner@nea.com and mgarg@nea.com

A big thank you to the many experts building companies and managing migrations in enterprises for their feedback, review and discussion on this thesis.

Topics

Artificial Intelligence Strategy

Disclaimer

The information provided in this blog post is for educational and informational purposes only and is not intended to be investment advice, or recommendation, or as an offer to sell or a solicitation of an offer to buy an interest in any fund or investment vehicle managed by NEA or any other NEA entity. New Enterprise Associates (NEA) is a registered investment adviser with the Securities and Exchange Commission (SEC). However, nothing in this post should be interpreted to suggest that the SEC has endorsed or approved the contents of this post. NEA has no obligation to update, modify, or amend the contents of this post nor to notify readers in the event that any information, opinion, forecast or estimate changes or subsequently becomes inaccurate or outdated. In addition, certain information contained herein has been obtained from third-party sources and has not been independently verified by NEA. Any statements made by founders, investors, portfolio companies, or others in the post or on other third-party websites referencing this post are their own, and are not intended to be an endorsement of the investment advisory services offered by NEA.

NEA makes no assurance that investment results obtained historically can be obtained in the future, or that any investments managed by NEA will be profitable. To the extent the content in this post discusses hypotheticals, projections, or forecasts to illustrate a view, such views may not have been verified or adopted by NEA, nor has NEA tested the validity of the assumptions that underlie such opinions. Readers of the information contained herein should consult their own legal, tax, and financial advisers because the contents are not intended by NEA to be used as part of the investment decision making process related to any investment managed by NEA.

About the Authors

Madison Faulkner

Madison joined NEA in 2024 as a Principal on the technology team focused on early-stage data, infrastructure, developer tools, data science, and AI/ML. Previously, she was a Vice President at Costanoa Ventures where she worked closely with Delphina.ai, Probabl.ai, Mindtrip.ai, Noteable.io (acq by Confluent), Rafay.co, and others. Prior to investing, Madison was Head of Data Science and Machine Learning at Thrasio, Head of Data Science at Greycroft, and held several data science positions at Facebook. Madison received a BS in Management Science and Engineering from Stanford University.

Maanasi Garg

Maanasi joined NEA in 2024 as an Associate on the Technology Investing Team focused on early-stage investments across various sectors. Previously, she was an Investment Banking Analyst at Morgan Stanley where she worked with infrastructure, vertical SaaS, fintech, and other enterprise software companies. Maanasi graduated from the Jerome Fisher Program in Management & Technology at the University of Pennsylvania with an MS in Computer Science and a BS in Finance from The Wharton School.

The New TechStack We’ll Need for the GenAI Era

Blog

Context in Consumer AI

The Current

Horizon3.ai: Leading the AI vs. AI Era of Cybersecurity

Blog

Join the conversation.

Beyond Beating the SWE-Bench: How AI Will Modernize Enterprise Code

The NEA Framework for AI Opportunities in the Software Development Lifecycle

Bucket 1: Commoditized solutions

Code generation

Bucket 2: Mixed opportunity

Code review and context debt

Code search and documentation

IDE

Bucket 3: High-value use cases

Code migration and modernization

The future: self-healing code bases (and less technical debt)

About the Authors

Madison Faulkner

Maanasi Garg

Related Articles

The New TechStack We’ll Need for the GenAI Era

Context in Consumer AI

Horizon3.ai: Leading the AI vs. AI Era of Cybersecurity