This article is about the analytics side of the game. It covers the history of data architecture and its evolution. Further, it will cover data mesh and the principles of data mesh.
History of data architecture
Online transaction processing (OLTP) and online analytical processing (OLAP) are the systems that interact with the customer directly. In online transaction processing, you need an API to do a mission-critical Post operation. For online analytical processing, a decision must be taken based on historical data, for example, trend analysis, predictive analysis, etc. A combination of both types of processing is required for a successful data journey.
Online transaction processing is more client-facing and mission-critical. Analytical processing focuses on report creation and prediction dashboard. The link between these two worlds is ETL – Extraction, Transformation, and Loading. This is what data mesh is trying to challenge.
History of data architectures
Around the 60s, the concept of data warehousing came into the picture. It brought all the data from the transaction system and put it into one central big data warehouse. Reporting needs were built on top of the data warehouse. This architecture was not very successful, as it was totally centralized. A central team in the middle is responsible for managing the data and trying to understand the diverse businesses. This was challenging.
In 2010, the data lake architecture came into the picture. It was still centralized, but data from the transactional system was added to the database without any transformation.
But, the two architectures were essentially the same: a data domain generated data, and a consumer consumed the data.
In 2015, as cloud computing kicked in, the same architecture was created on the cloud. Cloud computing made it simple to scale in storage and compute, pushing away the traditional Hadoop implementations.
In 2018, Data Mesh architecture came into the picture.
Data mesh is a dense network of nodes containing data. Unlike centralized and monolithic architectures of data warehouses and data lakes, data mesh architecture is highly decentralized.
Instead of it being a central team trying to address the needs of multiple teams, we have domains with the knowledge and analytical side. Domains can interact with each other to share and analyze data.
OLTP and OLAP hand in hand –
An architecture that facilitates the operational and transactional world to exist with the analytical world in the same domain.
Born decentralized – Data mesh is different from other data architectures because it is highly decentralized and distributed as opposed to centralized.
Aims for democratizing data – Data mesh allows quicker data exchange between two domains, each embracing its domain’s ownership.
Four solid principles – Each principle addresses challenges from past architectures like data warehouse and data lake architectures. The four principles are data domain ownership, data as a product, self-service data platform, and federated governance.
Data Mesh Principle – Data Domain Ownership
In classical architectures, you deliver data to the central data lake or data warehouse, and the domains are done with their part. In a data mesh, domains own their data. So, the data produced by a financial system is owned by the financial system, both the analytical and the transactional side. Each domain owns the data end to end and is responsible for it. Data mesh came from a microservices architecture.
From a people perspective, data domain ownership is a big mentality shift. So the organization needs to prepare the people to be responsible for this new responsibility. From a data perspective, the data generated in the domain is owned by the entire domain. From a process perspective, organizations need to enable and create the rules that can help this ownership concept to resonate with the people. You can create a data ownership program in your company and then name the data owner and similar roles and people for a particular domain. From a technology point of view, we need to look into emerging technologies that can cope with the operational and analytical aspects of the same data in one go.
One of the main tips here is to use the CQRS pattern. CQRS software architecture allows the application to segregate operational commands and analytical queries. So, committing anything from your application to your database is normal. But, it helps trigger another event that stores the data in a different model from where analytics can read from it without hampering the main data. This is a big step towards data mesh architecture.
Data Mesh Principle – Data As A Product
You must deal with the data generated in a domain as your own product. If you have a product, you want to have the best product ever. You want the product everyone wants to buy, the one that gets you revenue, and the one you own and can fix. If you want to deal with data as a product, the key highlight here is the people aspect. In this case, it helps to have clear roles and ownership. Product owners need to be well-defined.
Data Mesh Principle – Data As A Self-Service Data Platform
To aim for the decentralization of your data, you must have a scalable platform. Because of the cloud, you can spin a virtual machine or a service in a few minutes.
There should be one data platform team that produces this platform. This team should be from IT. They should be enablers for other domains. When another domain requests data, it should be able to get the required data with a single click without witnessing the complexity of the platform, infrastructure, connection, or underlying hustle. It is a new level of abstraction of infra and needs a new category of tools that support that kind of flexibility. As it is self-service, if you or a customer want an environment, you can get it in a few clicks without building anything. You don’t have to rebuild logic in the domain. You can let the domain focus on adding business value rather than resolving technical challenges. In that sense, we need to create the processes for IT to become an enabler. And, of course, you need to have enthusiastic enablers who can activate such a product in the organization.
Data Mesh Principle – Federated Governance
If you have multiple domains in your organization, each domain will have the choice to do whatever they want with its data in terms of formats, interoperability, etc. But it cannot be like this in an organization, and this is where federated governance comes into the picture. It entails that we need to have common governance across all the domains, where certain principles and standards must be followed and adhered to by the different domains. For example, standardization on the data types, standardization on the data interoperability, how automation is being done, and how, for example, the data movement should look.
Nowadays, a business person can log in to a portal and search for desired features. Once they find the best suitable product or service, they can subscribe to it. So for them, it’s just one single journey. But behind the scenes, many systems are integrated, and tons of APIs are fired. We need processes to maintain synergy between all the domains. We need to find a cooperation mechanism with federated governance that everyone adheres to, with a little flexibility. We need to find the right balance between what decisions you can make locally in your domain and what decisions need to be global.
Data Mesh Challenges
Specialization – Data mesh implementation requires specialists because the architecture is complex.
Data Redundancy – There will be data redundancy in the beginning, especially if you don’t think of virtualization of data or smart ways to consume the data. So take care of the redundancy and the amount of redundancy you can create across the organization.
Adoption Costs – It can be very costly at the beginning because the current tools lack this concept. So you may need to adapt to the technology.
Complexity – It requires a buy-in from the entire organization. It requires smart people to implement it.
There is no right or wrong in data architecture. You can choose the best architecture based on your needs. In my opinion, mix and match or the best of all. Keep a close eye on changing technologies. Trust the process and collaborate with others. We will reach a point where the data mesh architecture matures and helps the organization move forward.