Jun. 14, 2024 at 9:15 am

API Drift Detection

Anoop Gupta2 years agoDecember 8, 2024

1.9kviews

Anoop Gupta is the Director of Software Engineering at Capital One. Over the past 20 years, he has worked on building highly secure, resilient, performant, and scalable enterprise platforms specializing in API Security, API Governance, Cybersecurity, Identity Access Management, Data Protection, High-Performance Distributed Systems, and Privacy. In this article, he delves into the crucial topic of API drift detection and sheds light on its importance for large enterprises.

Background

As businesses continue to grow, there is an increasing need for APIs to cater to the requirements of both internal and external customers. The rise of cloud computing and the widespread adoption of microservices architecture has added complexity to the situation, resulting in a substantial increase in demand. Consequently, there has been a rapid proliferation of APIs, leading to a phenomenon known as API Sprawl. This proliferation often leads to the development of numerous redundant and duplicate APIs, posing challenges in managing and maintaining consistent security policies across the entire enterprise, which ultimately impacts the overall security posture.

The potential for security breaches is a significant concern, particularly when it comes to unauthorized access to sensitive data through rogue APIs. This kind of exposure can lead to data breaches and financial losses, harming the company’s reputation. Both rogue and approved APIs can be identified through out-of-band network discovery using API Gateway, which serves as a centralized entry point for APIs, or through a centralized API Management system that leverages a cataloging solution.

However, issues such as inconsistencies in API design, documentation standards, data quality, and third-party contracts can still create security vulnerabilities. These inconsistencies in APIs are referred to as API Drift.

What is API Drift Detection

Drift detection is the process of identifying any deviations in the current operational state compared to the expected state. These discrepancies can arise due to various factors, and it is primarily the responsibility of API producers to comply with the specifications set during the design phase to minimize the occurrence of such deviations.

Types of API Drift

OpenAPI Specification Drift

OpenAPI specifications provide a structured framework for defining APIs, enabling engineers to gain comprehensive insights into API functionality. When backed by API Management, API undergoes the producer lifecycle encompassing planning, design, review, development, testing, and deployment phases. Despite the inclusion of processes to align API implementation with the prescribed OpenAPI specification and comprehensive reviews, developers may inadvertently deviate from the specified standards for various reasons. Some are –

The delay between documenting specifications and implementing the API can cause a disconnect.
A developer creating a specification can be different from one implementing.
Product or technical modifications may occur after the specifications have been registered.

Design and runtime drift inevitably lead to unauthorized access to sensitive data beyond organizational boundaries.

Data Attribute Drift

API Sprawl can lead to a proliferation of competing standards across an organization. In a large enterprise with numerous APIs and multiple versions, both consumers and producers may deviate from established enterprise data standards. This can result in non-compliance with data standards and enterprise security policies, potentially exposing sensitive data.

The lack of consistent attribute naming patterns among APIs can result in conflicting standards for the same underlying attribute. This inconsistency makes it difficult to enforce governance and security measures, such as encryption or tokenization, across the enterprise. As a result, data privacy issues may arise due to the challenges governing these attributes.

Data Attribute drift is closely linked to data quality but is significant enough to be considered a distinct type of drift.

Data Quality Drift

An API comprises multiple endpoints, each exposing various data attributes that consumers can access. These data attributes may have different data validations and types, and it’s common for an API to deviate from the initial design specifications.

One frequent issue is data type drift, in which integers are incorrectly interpreted as strings, leading to potential issues when the backends consume this data without proper validation.

Another common problem arises from inadequate validation, such as through regular expressions or range checks, for an attribute. When consumers and producers operate based on undocumented specifications, it can result in unauthorized access to the data.

Data Contract Drift

Large organizations create APIs not only for internal use but also for external third-party integrations. Because customer data is shared beyond the company’s boundaries, it’s crucial to establish data contracts with external clients. These data contracts specify the attributes shared with clients and outline the governance policies associated with the shared data.

Like any of the aforementioned drifts, API producers may unintentionally begin transmitting additional information to clients, resulting in data leakage.

Detection

Even when following the shift left approach of ensuring that specifications are registered during the design phase and having governance and standardization in place, the drifts mentioned above still occur.

Methods

As business demand increases, so do the API transactions, making it increasingly challenging to scan and detect drift at that scale. Employing techniques that are less invasive, decoupled, and scalable such as –

Out-of-band network scanning similar to API discovery.
Extending the API intermediary (API Gateway) to scan the traffic.
API producer calling centralized drift services.

Every drift detection method can be a loosely coupled microservice that provides detection and remediation of the drift.

Machine Learning

Certain deviations mentioned earlier can be easily identified with the correct procedures. However, leveraging machine learning to identify non-standardized data attributes can be highly advantageous. By inputting a comprehensive and expanding library of data attributes and their properties, along with domain and sub-domain information, machine learning models can recognize whether there is a more suitable attribute for use and apply the relevant governance and security policy based on the risk profile.

Remediation

Addressing drift issues can be just as demanding as detecting them. Remediation involves pinpointing the potential risk posed by the drift, evaluating the severity of the issue, identifying the API producer and consumers, tracking occurrences and metrics to recognize patterns, and establishing a process to address and remediate the vulnerability. A centralized solution can offer an effective approach to resolving these challenges.

Anoop Gupta

Director, Software Engineering at Capital One

Speaker, researcher, innovator, and senior leader with two decades of experience in building highly secure, resilient, performant, and scalable enterprise platforms. Leading multiple engineering teams at Capital One, specializing in API Security, Cybersecurity, Identity Access Management, Data Protection, Distributed systems, and Privacy. Implementor of several Open Authorization and OpenID Connect specifications to integrate clients that drive business goals. Authored patent and scholarly articles focusing on API security, authentication mechanisms, privacy, and user consent, aiming to educate organizations on implementing robust solutions to mitigate cyber-attacks. Showcasing innovative solutions and pioneering research, emphasizing the importance of robust authentication mechanisms in safeguarding organizations. Helping drive API standards, improve enterprise developer experience, and coach engineering teams to build secure APIs with data standardization and governance policies. Passionately mentors and coaches several engineers and is involved in Pro-Bono projects, fostering teamwork and motivation for professionals looking to establish themselves in software engineering and cybersecurity.

view all posts

APIdays | Events | News | Intelligence

Attend APIdays conferences

The Worlds leading API Conferences:

Singapore, Zurich, Helsinki, Amsterdam, San Francisco, Sydney, Barcelona, London, Paris.

Speak at APIdays

Get your Ticket

Get the API Landscape

The essential 1,000+ companies

Get your Free Copy

Industry Reports

Download our free reports

The State Of Api Documentation: 2017 Edition

State of API Documentation
The State of Banking APIs
GraphQL: all your queries answered
APIE Serverless Architecture

Download Now