Dugald Morrow is a Developer Advocate with the Atlassian Developer Relations team. Part of his role is to advocate within Atlassian for the needs of developers who use their APIs. In this article, he discusses going beyond API Design standards.
Recently, we’ve been developing API design standards and other practices collectively referred to as extensibility standards. We use extensibility rather than API because we are concerned with more than just APIs. While APIs are important, we also care about how we collaborate with partners and the practices we employ to make changes available to partners. Hence, the standards apply at a technical level to our products and dependent apps and how Atlassian teams and partners work and interact.
Earlier at Atlassian, development teams did not consistently adhere to methodologies such as Scrum. Internally, it seemed closer to organized chaos. Teams were empowered to choose and create their processes, resulting in a very organic environment with little standardization. That was quite a while ago, but Atlassian teams still enjoy a lot of autonomy. This culture has many benefits, but it also introduces challenges at a high level.
Atlassian is a product company. We have many products with many customers, but this picture is oversimplified because Atlassian Marketplace is also a huge part of our success. Our products have rich APIs that are used by many apps and partners. Lots of these partners are large organizations with many developers. In addition, a huge number of individual developers create apps for their own needs or the needs of the organizations they work for. The Atlassian Marketplace is one of the largest enterprise software marketplaces, with customers installing over 28,000 apps weekly. But despite the success, Atlassian is still a developer-plus company, not a developer-first company. Most of the 10,000-plus Atlassian staff think primarily about products and customers, not apps and partners.
To summarize our situation, we have a culture of independent teams and a large ecosystem. As a company, our primary focus is on customers rather than developers who use our APIs. This has resulted in inconsistent APIs in terms of quality and style. Too many incidents have impacted partners, apps, and customers, and there are weaknesses in how we collaborate. We hypothesize that these issues can be addressed by lifting our standards. Our reasoning is based on the success of recent solutions and practices.
To explain this, let’s look at some problems that led us here. The first two problem areas I focused on were reducing the rate of incidents and providing better API change logs. At the time, I had no sense that these work streams would lead to a framework such as extensibility standards. They were simply problem areas that I knew we needed to solve. Reducing our rate of incidents involved a lot of analysis of historical incident data and working with R and D teams to agree on how to define the various grades of incidents in the context of the ecosystem. The work resulted in the standardization of rest apidays; since this was necessary to evaluate incident severity, we created tooling that can calculate the potential number of apps and users impacted by any endpoint regression instantly based on the billions of events logged every day.
Along with this, we created the methodology for assessing the impact of an API change. The second stream of work involved improving our API change logs. In 2020, we recorded only five change log entries all year for our biggest API, the JIRA cloud platform; several other APIs had no change logs at all. We now have comprehensive change logs for each product and a central change log to which all API changes roll. Developers can create subscriptions with custom filters so they are notified about the types of API changes relevant to them. Not only has this benefited our partners, but this has also had a very positive side effect within Atlassian. Teams are now more aware of the need to communicate changes and consequently have a much stronger ecosystem DNA. I was initially mainly focused on getting the technical solution in place, but over time, I became increasingly aware of the standards needed to keep teams aligned on creating change log entries. For example, rules around picking the right type of change log entry formatting rules and rules relating to amending published change log entries to ensure our ecosystem partners are involved early in the development life cycle. To ensure our ecosystem partners are involved early in the development lifecycle, colleagues of mine created practices that allowed partners to review and comment on our API change plans, such as new API features. This is our request for comments process. It results in Atlassian R and D teams increasing their engagement with partners earlier, which leads to improved API design, as with other practices, there are a range of standards associated with RSCs. Around this time, we recognized the range of practices and rules we had created and started to centralize these to form a suite of extensibility standards. We are continuing to create and refine these standards. Our experience is that Atlassian teams have been happy to embrace our guidance. So long as it’s pragmatic and the value is clear, they have especially loved guidance accompanied by tooling to make their lives easier. Our standards are adopted by teams, not imposed upon them. R and D teams don’t lose their autonomy; the standards provide alignment and consistency. The teams also get peace of mind that they are doing the right thing.
Our extensibility standards are broadly classified into three groups –
- API design
- API change management
- API collaboration
We have already developed quite a range of practices and tools, but we also have a long way to go. There are some gaps in API design standards because we’ve been concentrating on where the need is most urgent.in general, there is a relationship between the types of standards and their testability. API design standards are easier to monitor and test, while collaboration practices and rules are often articulated in terms of soft skills and are, therefore, harder to measure conformance. An example of a collaboration standard is the rule by which change log content should not have a marketing tone.
Next, I want to explain how we capture and manage standards. Our solution defines several types of standards, guides, rules, and definitions. Each type of standard has a different set of fields. This allows some standards to be very text-oriented while others, such as spectral rules, are executable. However, they still have a rich text field, allowing the rationale behind the spectral rule to be provided. Capturing standards in a structured format such as this allows their integrity to be maintained and allows standards to evolve more easily over time. Like most things, our standards have a defined life cycle. The most interesting phases are from incubation to adolescence. The incubation is the state in which standards are drafted; it would be impossible for everyone within Atlassian to fully agree on all our standards. So, we only require a rough consensus between a few experts. We use Confluence to align on standards because it requires a lot of freedom of collaboration. Once a rough consensus is reached, the standard is copied into our standards management solution and enters adolescence.
The standard will need some tuning during adolescence as it encounters real-world experiences. The other states are fairly straightforward and don’t need much explanation. Of course, an important part of having standards is ensuring they are adhered to. We don’t expect 100% adherence to all standards, but we want the level of adherence to increase over time. Measuring adherence provides us with feedback. The data tells us what standards have the most and least conformance and encourages action to improve the standards or work with teams to improve compliance. It’s interesting to think about the differences between adoption and adherence. Adoption is the level of awareness and intention to conform to standards, whereas adherence is the level of conformance.
To be honest, we are still in the early stages of our governance story. We have some automated reports being generated, but we need, still need to work with teams to set gates and targets to improve compliance.
During all of this, I’ve learned several lessons. First of all, while standards seem like they should be black and white, my experience is that they are often Shades of Grey. For example, we need to balance the speed with which we deliver value with the impact on marketplace partners and customers. So as a result, our API deprecation periods vary depending on the circumstance. Our extensibility standards include a guide and associated tooling that calculates the appropriate deprecation period based on information about a backwardly incompatible change. There are even shades of grey in evaluating whether a change is backward compatible, even when it is clear change is practically incompatible. It often requires judgment to estimate the actual impact likely to result. For example, perhaps the field is omitted from a REST API response.
The next lesson relates to the challenge of getting adoption of our extensibility standards. Over 1000 changelog entries in 15 products and services have been created with the new solution. R and D teams have created 45 RFCs with over 1000 responses from partners. Our Practices enjoy 1000s of page views by numerous R and D teams, and our tools also get 1000s of visits. However, some teams still don’t recognize our standards, and every new standard will require an adoption strategy. Our experience has been that anything new must be better than what it is replacing; even then, adoption follows the typical S-curve pattern, where take-up is initially very slow, then picks up speed and finally slows down again. You need to start with one or two champions happy to work with you to help iron out any wrinkles. You can then run a program to onboard as many additional adapters as possible, but there will always be some laggards that are resistant to change. You must be prepared to loop back occasionally to get them over the line.
The next thing I became aware of is the variety of APIs and the fact that the more obscure API types are often the most problematic. An example of an obscure type of API is what I call cardinality. Apps can be broken when a significant change in cardinality is introduced, such as suddenly invoking webhooks at a much greater rate or introducing a new object in an API where apps were previously not expecting it. My incident analysis in 2021 indicated that authorization flows and front-end extension points were the most problematic APIs. Presumably, this is due to a combination of complexity and difficulty to cover with automated regression tests.
The final lesson I’ve learned is that we’ve validated that extensibility standards have helped Atlassian, and presumably, other organizations would also benefit from establishing extensibility standards. During the period we’ve been working on this, we’ve seen a significant reduction in incidents, partner satisfaction increased from 68% to our target of 80%, and while I can’t claim the introduction of extensibility standards is the only reason for this, I am confident it was an important factor.
To recap, we’ve recognized inconsistencies were causing problems and preventing us from delivering value as quickly as we would like. So, we’ve started investing in the creation of standards covering API design, change management, and collaboration with our partners. We’ll continue to invest in these standards and our alignment with them, but we strongly believe extensive standards are necessary for a healthy ecosystem.