Senad Ameti and Christopher Nickolovski are DevOps Leaders at IAG (Insurance Australia Group). This article details their experiences in streamlining API product delivery at IAG.
API product realization has been a growing trend within the API community. There are many ways of delivering API products that matter to our customers. We’re here to tell the story of our experiences in IAG (Insurance Australia). The aspects of streamlining API delivery are multifaceted. We’ll have a deeper dive into API observability and the use of DORA metrics that allowed us to collect a lot of metadata and extract learnings that got fed back into continuous delivery lifecycles which lead to more streamlined product delivery.
IAG is one of the largest general insurance companies across Australia and New Zealand and sells insurance to many leading brands like CGU, WFI, AMI, etc. The API direction at IAG is very much oriented towards building APIs and engaging multi-brand and multi-channel functionality. This strategy enables IAG to engage its consumers, partners, and brokers more efficiently and fosters growth in the business.
This article will cover three key areas in detail –
- API product mindset and delivery
- API Observability and DORA metrics probability
- Lessons learned along the journey.
API Product Mindset and Delivery
We need to determine what our consumer needs and try to shape APIs to deliver business value. We can achieve this through an API credit mindset. We have four key areas of importance that all focus on consumer enablement in some form.
- Ease of consumption – APIs need to be discoverable, secure, well documented, and easy to use. Ultimately, we need to make every effort possible to empower our consumers to succeed.
- Consuming cash engagement and happiness – This is all about delivering our promise and providing all the support a customer needs to integrate. Here we aim to deliver new features, rapidly fix defects regularly, and provide a stable integration platform.
- Long-term value at scale – This will focus on how an API can be grown and reused across different consumers and how we can have a structure to scale up the pace without impacting quality.
- API management and lifecycle – This will focus on how we manage, maintain, and optimize to keep pace with rapidly evolving human needs economically and sustainably.
API Product Delivery
We design and deliver APIs with a consumer-first approach, with an outside-in perspective. We also put API products through a CICD pipeline from day one. Not just the technical aspects but the API products themselves. And this allows us to build, deploy and test APIs rapidly until the customers are satisfied. It empowers developers, designers, API, product owners for quick testing, etc. This streamlining of API delivery is about getting a suitable delivery model for the API team. Having a set of standards and lightweight processes is extremely important. DevSecOps ways of working have proved to be quite successful for us. We are massive fans of the “build it, own it, and support it” type model. This is quite powerful and allows the teams to take accountability and ownership of what they build, maintain, and support. We use a combination of open source and enterprise tooling. We use Splunk as an enterprise tool, and open-source tooling such as Spring Boot, Kubernetes, Prometheus, Zipkin, etc., hit the sweet spot. Any form of agile Kanban, Scrum, or anything in between do just fine when it comes to practices. The importance of having widely adopted and agreed API guidelines, principles, and standards cannot be understated. If multiple teams are building APIs for external partners, the need for API Doc’s naming standards to be consistent across API products is critical. We want to have external teams understand the idiosyncrasies, themes, applications, or underlying architecture of a corporation or company.
The next is the set of strong governance structures. It is essential that governance is strong; it is not heavy. We want to empower like-minded API people to join an API committee. They will collaborate and develop new ways of working and new ideas and maintain and adhere to those previously mentioned standard guidelines and principles.
Last but not least, we want to align ourselves to customer and partner journeys to hear our consumers, especially from outside of the company, so that we can participate in a more extensive API ecosystem. All this comes from our widely adopted API strategy within our company that the entire team works towards. This blueprint is paramount to any success in the API private delivery.
API Technology Scaling
Technology scaling is an essential element of API part delivery to cater to such a demand. Being able to scale technology through server-less or container-based infrastructure is vital. Security test automation, login metrics, and documentation are also important ingredients to providing solutions that pace.
- Automated security patching, portable security code that can be easily adopted across all integration points, automated test coverage across security vulnerabilities, automated dependency checking, and static code analysis tools are very important. These tools will allow development teams to identify and resolve security breaches without strain on the schedules.
- All APIs documentation should be generated directly in the codebase, using tools like Swagger should be enabled for models. Developers shouldn’t be creating documentation manually.
Common components like blogging, security, health checks, and metrics should be externalized so they can be reused across all API products. API templates can be an effective pattern to achieve this and also, as a by-product, can speed up the development process when creating new APIs. Events logging, tracing, and metrics should be embedded across all APIs and have a very consistent way. This is the key to unlocking the wealth of information that they pass through APIs are now head over to snap to discuss team success factors.
Team Success Factors
Getting the wrong people can be detrimental to the success of any team. With APIs being crucial to the relationship between customer needs and business value, getting the right people on the team becomes a key factor of success. The following are some of the main traits that we found particularly useful when improving and scaling our teams out for delivery.
- Emotional Intelligence – There is a constant change in our environment, and emotional intelligence and resilience are paramount. Being adaptive to change, especially in an API world where we deal with so many different consumers and providers within our company and from the outside, becomes crucial.
- Culture fit – Getting the right person with the right culture can benefit the team and vice-versa; it can erode the team’s successes and cause subpar products when the right person is not in. We often pick the person with potentially less expertise and knowledge but with a great culture fit, and we can teach them the right skills to thrive and succeed at IAG.
- Attitude and talent – You can teach a skill, but you cannot teach attitude and talent. And that’s very true for what we found, attitude towards learning towards improving and making sure that things are done right.
- Trust – Trust is underplayed in some ways. Trust within API delivery teams and trust in the broader API ecosystem within our organization are the bare minimum to succeed towards our end goal, as outlined in our API strategy.
- Expert knowledge – If all the previous traits are present, and we have someone with good expertise or expert knowledge, we start to see what the future looks like. And often it looks very bright.
Observability and DORA metrics
Adopting an API mindset is essential for success. How do development teams know they are on the correct delivery path? During our journey at IAG, our team members would often ask the following questions-
- Are we monitoring infrastructure and meeting SLAs?
- How do we observe individual transactions?
- Are we meeting our API KPIs?
- Are we deploying often enough?
- How long does a requirement take to move to production?
- And lastly, are we making decisions based on data and not just guessing?
Our answer to all this was Observability and DORA metrics.
Observability explains why something is happening and provides actionable insights. The observable systems inherently offer data about the state through instrumentation and are designed for granular trend analysis, insight, context, debugging, and much more. Observability includes monitoring, but it is both an outcome and a culture similar to DevOps. There are four main pillars of observability. These provide invaluable insights in what is happening to our API products and are the cornerstones for all future successes of API products.
Structured logging will provide a more in-depth insight into what’s happening, but it can be a bit slow and cumbersome in some ways. Metrics provide trend analysis, etc., and can be quicker to gather and maintain. By observing APIs, we can find many different insights. As observability involves handling actual user usage data, these data can be used; businesses can drive their revenue growth strategies based on how well their users accept their updates and new features. The ability to analyze data on the fly opens a whole new world of possibilities for usage and customer-driven product development. This data can be used to build valuable insights into how to optimize the product and generate more revenue for our customers. For example, unique API consumers, which API is used the most, which customers are using which API the most, opens a wealth of information and insights. We can troubleshoot API issues using application-level metrics, e.g., how many successes and failures we have, how many requests per minute we’re getting latency, and so on. These become extremely valuable because if we’re getting too many errors or too many potential security validation issues, we may look to improve our API products so our customers can use them better.
One of the most important goals of observability is to ensure the reliability of our software systems. The question is, are our software systems error-free for a specific period, i.e., their ability to function without failure. Observability exposes these issues in a system. These issues can be immediately targeted and debug before they fail within the application. The observability of security aspects is critical. Maintaining security standards, advocating adequate visibility into the infrastructure applications, and one of the themes that can detect potential intrusions threats and attempted attacks before they are even completed.
Examples of some of these insights
Over the last 45 years, our team has built out over 120 microservice records services containing approximately 400 API operations. At this scale, we needed a way to review customer users and determine future growth errors. Creating dashboards isn’t an overly complex task using tools like Splunk. But getting the logs across all APIs producing the same data is an entirely different story. Luckily, all our APIs are built using a standard set of libraries and templates which enforce structured logging. Structured logging is a practice of implementing a consistent and predetermined message format for API logs that allow them to be treated as more like data sets rather than just plain text.
Now let’s have a look at the dashboard. On the top left, you can see that we execute approximately 450,000 requests over a 24-hour period. These requests have been distributed across 20 to 34 different API calls and are used by 72 different consumers across the company externally and externally. We are also showing a reuse rate, which is a very key metric at IAG. The dashboard also shows a breakdown of API product metrics across brand channels and Total Requests by consumers. The data shown in the image is only touching the surface of what is possible with a wealth of information passing through APIs.
Now let’s have a look at the dashboard. On the top left, you can see that we execute approximately 450,000 requests over a 24 hour period. These requests have been distributed across 20 to 34 different API calls, and in use by 72 different consumers across the company externally and externally. We also showing a reuse rate, which is a very key metric at IAG. The dashboard is also showing a breakdown of API product metric cross brand channel and Total Requests by consumer. The data shown in the image is only touching the surface of what is possible with a wealth of information passing through APIs.
It’s essential to understand the flow of a single transaction. Imagine a detailed insight into user behavior to understand why so many consumers stopped short of purchasing a product. In API terms, this can be as easy as it is to find a validation rule. Fine-grained logging is achieved via transaction tracing. Tracing is a process of tracking data such as a request from a user thread of an application and is achieved by applying a unique identifier to each incoming request. I’ve been propagating this idea to each subsequent service. Using tools like spring sleuth together with Zipkin enables the creation of tracing diagrams. These diagrams outline the execution path and time taken at each integration point. The ability to view user interactions at this level allows developers to better understand how consumers use an API. It provides insight into how we can optimize as well as the sector as well as waste the sick areas in great detail.
These metrics aim to aggregate top data about performance compliance security and provide invaluable insights into various aspects. API metrics enable you to monitor engineering business KPIs, performance, etc. The image shows one of our API products in motor repair bookings, where we monitor our Kubernetes pods, the memory, the network, and the CPU. We use Spring Boot in the background. We can take action based on the data if something adverse is happening. So as an example, if the pod memory or CPU starts to run out, we have things in place to automatically scale our Kubernetes clusters to meet customer demands.
This is an example of application-level metrics and events. We monitor how many 200 400 500 series errors we are getting. We provide insights into how well our API products are performing. The more 200 errors we have, obviously, our API product is performing as expected. But suppose we have a series of 400 errors; in that case, we can automatically trigger events that tell us what is happening. We can quickly turn around and fix the issues before they start affecting a wider customer community.
In addition, we track security metrics, which is very important to ensure that there is no security breach.
We use DORA metrics to observe and get insights into how well we are doing in terms of API delivery itself. DORA stands for DevOps Research and Assessment, and there are four key metrics that need to be collected –
- Deployment frequency
- Lead time for change
- Mean time to recover
- Change failure rate
Deployment breakages were an error within a DevOps team that always operated reasonably efficiently. But, DORA tells a different story. From October 2019 to May 2020, deployment frequency is low, mainly attributed to long-running projects with limited production release windows. This resulted in large deployment artifacts, which was risky. After seeing these numbers, we decide to shake things up a little from mid-2020, and today, there is a massive positive trend.
The “lead time for change” metrics paint a similar picture. We were initially looking at lead times over 100 days per change. In most cases, we had changes sitting dormant in lower environments waiting on project milestones before making their final push to production. In our scenario, this was due to the other dependencies, like the team waiting for contract-related changes, etc. To reach outside cadence, we needed to change something. Our first attempt was to do everything possible to reduce changes to minimize breaking APIs. We saw an instant improvement from 2020 onwards, while that lead time to change has dropped significantly. We’re still on the lookout for new techniques or technologies to improve it further.
Being able to measure and observe what is happening to various aspects of API product delivery, the underlying infrastructure applications, etc., is paramount for streamlining API product delivery and future successes.
- Use the data you collect through metrics from API Observability and DORA, especially for what it is, and make tough calls. There’s no point in collecting all this data and trying to maneuver it to fit your process or fit a particular narrative within teams, technology, or process.
- The whole point of observing what is happening is to make a relevant change to our customers and consumers of APIs. It is not to get a pat on the back that we’re doing the right thing because, more often than not, you’ll find that you have something to improve, and that is constant.
- Trust your team and the process; basically, get the right people on the ground; anything’s possible. It’s all about having that vision and direction early on.
- Every team must have a backlog and room for improvement. There’s no end game for API delivery; there’s no perfect team, and we all have some room for improvement. Positive change takes time. It is about setting goals and adding them to the backlog to back into action by change.
- Fostering innovation of thought leadership. Change is a good thing and must be promoted as much as possible. Team members need to be encouraged to investigate new technology or even experiment with alternative ways of working.
- Just have fun. The whole point of doing this and being passionate about APIs is to have a bit of fun and enjoy the journey. There is no end in sight. There’s always room to improve the previous dimension. But along the journey, we must have fun and enjoy ourselves.