Today, I’d like to discuss a set of emerging concepts that link generative AI, enterprise APIs, and new approaches to API security. I am Doug Dooley, the Chief Operating Officer at Data Theorem. The central theme here is a new class of data exploitation happening with enterprise APIs. We’ll discuss three major topics: the changing playground for experimentation, APIs as fueling stations for AI, and the evolving need for security, particularly through attack path visualization.
1. How the Playground for Experimentation Has Changed
In recent years, the cost, complexity, and speed of iterating on data exploration have changed dramatically. Generative AI models, such as GPT-4, have made it significantly easier to gain insights from data exposed through APIs. This isn’t just about writing a prompt and waiting for a response; it’s about bringing together large language models and external API connections to discover patterns, understand data flows, and transform raw information into actionable knowledge.
Sam Altman, who leads OpenAI, has described a future where assistance via APIs will make everything easier. If you’ve ever played with AI assistants like ChatGPT, you know there’s a certain value—even if it varies—when using them to complement your information work. Sometimes the output is extremely high-value, and other times just moderately useful, but overall, these AI assistants are changing how we approach our data.
Now, a key development is the ability to use custom functions and retrieval within these assistants. The head of Developer Operations at OpenAI highlighted how API retrieval and custom function integration make assistants smarter and more targeted. This allows for specialized use cases, from creating math tutors for your kids to building an API hacking assistant for security practitioners. This was tough before because earlier versions, like ChatGPT 3.0, had no way to connect to external APIs. Everything was self-contained, limiting practical applications.
2. APIs as the Fueling Stations for AI
As we move into this new era, data truly becomes the fuel of AI, and APIs are like the charging stations that power up that fuel. Generative AI adoption is accelerating beyond 2024, and as more AI assistants emerge, API data consumption will continue to explode. These assistants will rely heavily on APIs to pull in relevant data and generate insights.
In practice, what does this mean? If you are a security practitioner, or even an ethical hacker, you can now create an “API hacking assistant.” You provide a target, a key, and set the assistant loose. It can map out data, classify information, and identify personally identifiable information (PII)—even credit cards and social security numbers—scattered across various APIs. Enterprise security solutions, like those offered by Data Theorem or other vendors mentioned, have been capable of these tasks, but we are now seeing them integrated seamlessly with generative AI assistants.
With modern playgrounds such as the OpenAI interface, you can toggle on code interpreters, retrieval, and function calling, all within a matter of minutes. Suddenly, you’re running a sophisticated system that not only chats with you but also executes API calls, retrieves files, annotates maps, and integrates with various components. The assistant can now “take action,” bridging the gap between natural language interfaces and the real functionalities of your digital environment.
3. Security for APIs, Applications, and the Software Supply Chain Must Evolve
With all these advancements, the third and critical concept is security. The current state of API security, applications, and the overall software supply chain is not where it needs to be. As generative AI continues to grow, organizations must evolve their security strategies to protect sensitive data flowing through APIs.
Context is now more necessary than ever. Attack path visualization—the ability to see the end-to-end representation of data flows, application dependencies, and where sensitive information resides—will help practitioners better understand and mitigate API risks and privacy challenges.
To understand why context matters so much, consider the difference between having a simple traffic light system—red, yellow, green—versus having a detailed MRI scan of what’s really happening inside your API ecosystem. Without context, a known vulnerability might appear severe (red), but without understanding which applications, data sources, or external services it connects to, it’s hard to prioritize efforts effectively. Conversely, what might appear as a minor vulnerability (yellow) could, in fact, lead directly to a major data breach if connected to a highly sensitive data store.
With the rapid expansion of APIs in cloud environments and the complexity of modern architectures—think serverless functions, Kubernetes clusters, ephemeral compute—just knowing there is a vulnerability is not enough. You need to know where it lives, how it spreads, and what impact it has on the business and customer data. Attack path visualization provides that holistic, contextual view.
Evolving Capabilities in AI Assistants and Functions
Earlier, I mentioned that I had tried building assistants back in the ChatGPT 3.0 era, but I couldn’t hook up external APIs. That limited its usefulness for tasks like API penetration testing or data classification. Now, with ChatGPT 4-Turbo and newer capabilities, you can integrate external APIs seamlessly. This allows generative AI not only to reason about data but also to execute actions, run code, and call functions with guaranteed JSON output—all in real time.
Function calling is a critical development. It’s not just about getting a static answer; it’s about having the assistant invoke multiple functions at once, annotate a map, retrieve files, and even highlight PII in various data sources. This synergy of large language models, external APIs, and advanced function handling is a game-changer for how we interact with data.
API Costs and the Competitive Landscape
The cost factor is also shifting. With ChatGPT 4, the overall cost of these operations is now two to three times lower than before. As AI becomes more accessible and competition in GPU and hardware providers increases, we can expect even lower costs and higher performance. This will encourage more enterprises to plug their data into AI models, accelerating innovation but also increasing the potential attack surface that must be secured.
Data as Proprietary Fuel
From a business perspective, companies are beginning to see data as a proprietary asset. APIs bring that data to AI models, enabling new insights and breakthroughs. We see examples everywhere: large tech companies leveraging unique data sets to power more intelligent AI, startups emerging to classify and discover data via APIs, and enterprises wary about giving all their sensitive data to big providers. They ask: If we are a bank, why let these massive platform providers glean insights into our business data?
Generative AI, including systems like ChatGPT, has proven itself in terms of business viability. The growth from 2022 to 2023 alone is staggering. While the exact sustainability of that growth remains to be seen, the model of unlocking unique data via APIs to feed powerful AI systems seems well established.
From Large Language Models to Artificial General Intelligence
We’re currently excited by large language models and generative AI, but the industry is looking ahead. Some believe we’re just a few years away from artificial general intelligence (AGI). Timelines vary, but what’s clear is that complexity and capability continue to climb. AI will become increasingly intertwined with how we handle data, decisions, and processes at scale.
The Need for Better API Security and Visualization
As AI evolves, so must our API security strategies. Many enterprises rely on simple vulnerability scanning or traffic-light risk frameworks, which are insufficient for today’s complexity. We need full visibility—an “MRI scanner” for APIs—that shows the intricate web of components, data flows, and application connections.
For example, at API Days Paris, Data Theorem introduced capabilities that help customers visualize their serverless functions and the API gateways connecting them to external services. With these visual insights, businesses can correlate vulnerabilities with specific applications and data sources, enabling more informed decisions.
Industry Participants and Leadership
Several vendors—pure-play API security providers, WAF vendors, API gateways—are contributing to the maturation of this space. While the industry is still evolving, the direction is clear: solutions that offer clearer visualization, better context, and more dynamic defenses against exploitation will lead the way.
In this new era of generative AI and API-driven data architectures, organizations can’t rely on old methods of security or data management. The playground has changed, the complexity has increased, and the stakes are higher. APIs act as charging stations for AI’s data fuel, and as AI becomes more integrated into our business processes, the need for detailed, context-driven API security and attack path visualization will grow. By embracing these new tools and approaches, we can ensure a more secure, efficient, and innovative future.