This talk explores the economic value of APIs, the importance of data liquidity–an organization’s ability to serve up the right data in the right context at the right time–in maximizing that value, and the vital role that APIs play in the modern data ecosystem.
What I’m gonna share with you today is really to explore that data topic in the context of APIs. In my job as a global leader API strategy, I’m working with a lot of big organisations, all around the world and for the past about 18 months, I’ve been zeroing in on this intersection of data strategy and API strategy. I probably came from a place where I was wondering why these are different, but I’ve arrived in a place recognising that really, what we’re dealing with are two different worlds going all the way back in the history of computing. This is a talk about the role APIs can play in dealing with the collision of the world of analytics and business intelligence, and the world of distributed applications. It’s an abridged version as there’s a whole other version of this that has economic concepts, but I’m going to zero right in on some of the more execution oriented aspects of it.
Software VS Data
First of all, 10 years ago, Marc Andreessen wrote an essay in the Wall Street Journal that said “Software is eating the world”. At that time of explosion of mobile technology, all these new second generation web startups like Uber and Airbnb were saying that software had to be a core capability for companies to thrive in this new digital economy. I think it’s pretty fair to say that now, 10 years on, maybe it’s data that actually drives the digital economy and software is just there to facilitate it. If you look at all these initiatives that companies are taking to drive greater digital engagement under the umbrella term of “digital transformation”, we’re looking to personalise user experiences, optimise operations through process mining and other things such as process automation, launching data products, data driven decision making, the data is everywhere and, really software is what’s needed to marshal that data. The reality for a lot of organisations is they’re just not there with the data. We spent a lot of time in the IT community, focusing on software practices and software engineering and then we realised that, as we get all these gobs of data, we run into these issues, and they’re like yin and yang issues. On one hand, there’s a lack of supply of data, maybe we don’t have all the data that we need. On the other hand, maybe we have so much data that it’s hard to get quality or consistent data, or it’s hard to find the data we already have. We also have the problem of accessing data that might be cryptically hidden, or just in some legacy systems that are hard to break into. And on yet another hand, it’s hard, once you have access to data, to control access to that data. All of this means we’ve got these dichotomous issues going on when it comes to data. Nevertheless, if we really want to follow the path of the digital giants, we have to recognise that they’re ultimately data companies. We might cheekily think of Google as an advertising company because of so much of the revenue coming through advertising (same as Facebook) but what is the secret ingredient to the advertising business and revenue? It’s the data that’s used to personalise and target and observe. Even the API darlings like Stripe and Uber, the core of their business is collecting data before offering digital services. Uber is using all sorts like Google Maps, APIs, payment APIs, and so on. But at their core, their key differentiator is their ability to do the matchmaking and use extremely performant data analytic algorithms to determine optimal ride routes. We may think of software as being the centre of these companies, but I would argue that it is data and that it is a reality that a lot of established organisations are coming through.
Data = Value
The Digital Transformation Playbook: Rethink Your Business for the Digital Age (2016) is a book by David L. Rogers, a Columbia University professor where he dedicated a chapter to turning data into assets. He starts by saying, Look, you have to shift the way you even think about data, if you want to take advantage of it in IT terms. So far, we have looked at data as being something that’s expensive. You have to store it, manage it, focus on data modelling, structuring and having these different silos of data. This makes sense if you’re looking at just the application operational system aspect but when you start to think about data as a capital asset, then all of a sudden, your whole mindset has to change. You want as much data as you can correlate within your company’s boundaries in order to drive value but at the same time, you also want to protect it from leaking outside your borders (except in the cases where you are contextualising it for consumption), lest its value should diminish.
If we look at things from a more practical standpoint, think about data processing. In the “old world”, we had these islands of data processing: on one side, we have OLTP, the world of user facing applications, operational data, I’ll just call it the application world. On the other side, we have OLAP, the online analytics processing world: business intelligence, and increasingly, AI and machine learning, data visualisation. There’s also another term that I want to add: OLEP, which is the online events processing. It’s certainly not the equivalent of the other two in the history of computing but it’s got the same characteristics as the application world where things are happening rapidly and they’re highly distributed.
While the application world has been breaking down and decentralizing, the analytics world has become more centralised. Data warehouses aren’t enough anymore, we need data lakes to dump all the data that we have because we don’t necessarily know yet what value we can get out of it, except maybe through the experimental approach of putting data together and cranking it through and coming up with insights. Whereas in the application world, the world probably most of us in the API space have been living in, we’ve been focusing on decentralisation, microservices, APIs, breaking things apart, distributing, multi cloud, because this external user facing functionality needs to be optimised for performance. These worlds have intersected in the past, but we had this latency of ETL/ELT because these worlds exist independently. Now, there are forces pushing them together. The first one is creating big data. The possibilities of being able to handle massive amounts of data at super processing speeds means that we don’t have that limitation anymore of waiting for the batch jobs to finish in order to get the analytics. You can reasonably create feedback loops of processing data and deal with it in real time, taking away one of the constraints. Machine learning is a big forcing function here, because what is machine learning? It’s essentially applications derived from data. In a very simplified way of explaining it, you’re almost coding with data. You are taking those insights based on statistical models, and creating real time application models that need to be deployed into that application world. It is the same as what we’re doing in the application space with distribution and cloud computing: we are removing constraints that will allow us to run higher scale workloads. User expectations have also changed: consumers expect personalised user experiences. We don’t have the answer yet as to how much those worlds collide but we can look at trends that this has created and there are a lot of them. In the analytics and the big data world, you’ve got cloud data warehouses, machine learning, Apache Spark, data lakes and a whole field of data engineering that has risen in the last 10 years and that is evolving daily. ELT has also changed the way we deal with these massive data pipelines. In the Application world, we’ve got micro services and micro service architectures, personalization, expectations of predictive interactions. There’s also this thing called “reverse ETL” which is basically knowing how to get data from the operational systems into the data lake and reversing the flow to take it out. Finally, in the events world, we’ve got eventual consistency (removing constraints around transactionality), event sourcing models of new ways of storing data and Kafka as a big deal in event streaming.
What about that intersection? What’s the right approach there if these worlds are going to come together? How do we deal with it? I think the way to navigate through this is not necessarily to latch on to all these emerging technologies, because they’re coming out all the time, but to try and rise above the fray. In order to do that, we have to focus on value when it comes to looking at data. If you look at the value chain in the old world, we’d get data from user facing applications, use batch processes to load it into our data warehouse, perform our business intelligence and analytics processing on it, which would drive analytical insights and might lead to a whole development lifecycle. The problem was that there was a lot of latency in the process.
When it comes to the new world, we’re seeing more of a cycle going on because we’ve got the ability to plug things together near real time and we have the opportunity to start at any point. This is the terminology coined by Alex Osterwalder, the creator of the business model canvas. His definition of a business model is the way an organisation captures, creates and delivers value. In our case, we capture value from the operational systems in the form of data, feed that into the analytics processing, or machine learning algorithms, whatever the case may be, do that processing, then deliver that value by deploying it into those user facing applications. It becomes a cycle as we reduce the latency and tighten the feedback loop between learning from the data and deploying the data. One of the amazing things about data is that once you put it out there to be consumed, you’re actually creating more metadata! If I put some information out on the web, and somebody uses it, I can capture context about that interaction, and actually feed that in as more information that makes the data I had more valuable and adds new valuable data. And we keep spinning around so we’ve really got a powerful feedback loop on data. If you think about those digital companies that are web native, whether they’re giants like Amazon, or startups like Stripe, they are treating data like this incredible source of value. In order to make that data value cycle work, the data has to be fluid. What do we mean by fluid? If you have the most data, that doesn’t necessarily mean you have the most data value. Companies that get the most value out of their data are the ones who are able to take the data and present it in the right context at the right time.
I like to use this concept of data liquidity, it’s actually a term that’s been out there in a certain industry context. The most liquid form of capital is cash, right? And we can think about that in terms of data. If you look at a company in terms of their finances, their fiscal operations, liquidity is important in their capital assets because they’re collecting money from their consumers and customers, transacting with distributors, paying for infrastructure, suppliers, contractors etc. That’s how capital is exchanged on the perimeter of the organisation. In the digital world, if we look at the stakeholders that are involved in a digital ecosystem, whether it’s user facing applications or self service channels applications or connected devices, third parties, it’s APIs that are really that medium for exchanging the data value. So I’m arguing that APIs are the most liquid form of data, much like cash is the most liquid form of capital.
Composable enterprise and the power of composability
Let’s talk about composable enterprise and the power of composability. It is the ability to break down capabilities in your organisation through APIs so that they can be composed in any context. APIs can do the same thing for data, and they obviously have been. This is fundamental when you think about making your data liquid, able to change shape in order to fit the right context. Composable data is liquid in the way that we want for striving in order to maximise data value. We actually had an experience last year, at the dawn of the pandemic, where the Salesforce Tableau Mulesoft team jumped into action saying, “how can we help deal with this pandemic?”. That joint team built a data platform, not even knowing how it might be valuable, but just wanting to make sure that data was available, should it ever be needed. There were actually a few iterations around, starting with thinking maybe PPE and other data would be valuable, but essentially landed on case data and policy data around reopening plans. Local policies were the data that people wanted the most. We built this platform in a very API led way, meaning we abstracted all the backend data sources through APIs in order to collect the data. We had a normalised data warehouse with this information being offered and a standardised set of entities that were the core of the data that was being shared. With all these different channels, whether it was through tableau for data visualisation, whether it was through partners who were integrating the data into their own systems, Salesforce launched an application called work.com to help organisations deal with reopening and policies in their regions. We had partners building applications, and we also just had an open public API. From all of those cases, we found that we needed to offer different views of the data to fit those different contexts. So we had these crude raw data APIs at the back end, the core APIs in the middle, and these contextualised APIs for the channels of distribution. We found that this was extremely powerful architecture, because we were able to, on the back end, add lots of new data sources (starting with two to reach around twenty now). We were also able to add channels rapidly and make changes dynamically to channels without having to change the core that was in the middle. So we generalised this architecture to look at how APIs can help with this intersection.
If we look on the left and think about an OLAP data pipeline that is typical in the data engineering space where you’re ingesting data and preparing and storing and analysing and deploying it, and then we think about the kind of multi channel distributed applications we have on the right, we can generalise this and think about what are the crude data APIs that we need in order to source the data? What are the contextualised data APIs that we need in order to fit the needs of each channel system, each user touch point, each user experience? It was tempting to just dump the data into those contextualised approaches but the key to the whole thing is this layer of abstraction of having the composable core that makes the whole thing work. That can come either pre analysis or post analysis. In this model, we can see that, as we collect more information about how the data is being consumed in those contexts of user facing applications, we feed that back in and get more value out of that data and keep increasing the value of the data we already have. I’m calling this API lead data connectivity. The layered approach works very well when it comes to capabilities and supporting Process Automation. In this case, it’s really about supporting liquid data. We can see even how we can capture value through the crude data APIs. There’s a big process there to turn it into the core data APIs where we can create value, and then ultimately, deliver the value through the contextualised data APIs, keep collecting it and running it through the flywheel again and again.
To sum up
Getting away from worrying about where we are storing the data and thinking more about the consumer standpoint of how they access the data, we can impose a control plane across a data network that’s normalised through APIs, that allows us to do all these other system wide functions and discoverability deployment scaling. Focus on data value and not volume, aim for measuring data liquidity as a way of looking at how well your organisation is going to derive value. APIs are really the touch points on the perimeter of the organisation so think about building a data network through APIs and managing it as a network.
Q: In the journey to create data value, what are the most common industry challenges that companies are facing today?
A: I think the big challenge is that there’s a temptation for people to work in big horizontal chunks, getting all the data in one place and then doing all the machine learning. However, I think it is better to focus on value. You take a thinner vertical slice of this whole thing, find the data that you need for a particular use case, run it through the pipeline, feed it into the operational systems, and create a tight feedback loop on a smaller scale, as a way of experimenting and also delivering more of an incremental business value. In other words, don’t try and sort out every data problem for your organisation at once. Build momentum over time by running small experiments.
Q: We are talking in the industry about APIs as a product and data as a product, but I don’t think those two can exist without the other, what’s your take on this?
A: Data is a big part of the API product. In fact, you could say there are API products out there that are really data products, just packaged through APIs. There are other products that are capability products that have a data element in their package through APIs. There are digital products and APIs are involved in the packaging. However, I think as we all know it’s not about the API. It’s about what the API enables.