Data mesh is an emerging architecture that establishes an alternative, de-centralised pattern to the data warehouse and data lake. It is very much on an upward trajectory, gaining traction across various industries. This article will address questions frequently asked about data mesh, ensuring that our customers are ahead of the game. Specifically, what problems is data mesh trying to solve and how can organisations assess if now is the right time to explore data mesh as a viable option?
What are the problems data mesh aims to solve?
Microservices and domain-driven design have advanced software engineering and led the way in seeking to kill off the monolith application. However, perhaps the largest of all monoliths in today’s IT Estate remains relatively untouched: the data warehouse or data lake.
We recognise the data warehouse as a monolith, but all too often accept it as a necessary evil. This is despite the fact we experience the same problems we do with all monoliths: scalability issues and ever-increasing complexity. To make matters worse, in a centralised model, increase in demand on data teams creates bottlenecks in delivery and, ultimately, extends timelines and increases costs.
This is where data mesh comes in. Data mesh seeks to apply to data the same ethos of breaking up the monolith that we have witnessed in software engineering. As opposed to taking data feeds from operational systems into a centralised function, the responsibility in a de-centralised model is with the domain where the focus is on highly interoperable data products.
Data mesh addresses a number of key issues. For instance, we tend to think of data warehousing as removing silos by bringing together data into one place, but the consequence of doing so is that we unintentionally create silos elsewhere:
- We separate those with the knowledge and expertise in the business domain from those that are delivering analytical insights;
- We build operational applications where all the business logic is rightly within the application. We then make that data, which contains no logic, available to the data warehouse leaving it to those central teams to try and piece together the jigsaw puzzle;
- We recognise the value of multi-disciplinary teams, yet we sit our software engineers and data engineers in separate functions, missing out on the sharing of engineering best practice and data knowledge;
- We design and build ways within the data warehouse of trying to deal with data change from upstream feeds, yet we know that in today’s world of Continuous Delivery, changes with operational data are only going to become more frequent.
And we should challenge ourselves in asking, how much real value is there in this centralised model? How many times have we seen a feed go into a data warehouse alongside dozens of other data sources, to be integrated and curated with all this other information only for the real valuable insight to have to consider a fraction of those sources?
Data mesh seeks to address the issues of scalability by building upon what has come before, re-purposing from software engineering, as opposed to tearing up the manual and starting from scratch. As a result, there is a low barrier to entry for many organisations today.
We have increasingly seen analytics become more embedded within operational systems to aid the user in the decision making process. We have seen the benefits of this approach by being able to deliver quickly within those teams. In many ways, this is a natural reaction to an organisation’s desire to make better use of its data assets and a reflection of the great outcomes we can achieve when we seat our data experts within the domain.
Fundamentally, data mesh introduces an architectural framework where governance and standards are key to enable trusted data to be offered up as products for wider use across the enterprise, and not just within that specific domain.
Is data mesh right for your organisation?
As with all emerging architectures there are questions still to be answered. In particularly there are uncertainties about adoption, relevant skills, concerns over disruption and how an architectural approach that requires such broad acceptance works with existing systems including the current data warehouse or data lake.
With data mesh, there is the benefit that data products can be built out iteratively providing early value – there is not that time delay of building that central warehouse that models the enterprise. As a result, for brownfield IT estates, organisations can begin to adopt data mesh as an architectural approach in line with digital modernisation, extending the domain to incorporate analytical data products, building out the mesh over time, and alleviating the burden on the data warehouse.
As for the existing data warehouse, well, data mesh has an answer for that too. The warehouse simply becomes a node on the mesh, albeit a very important node, consuming from and offering up data products.
However, nothing is that simple. With a de-centralised model, effort is required upfront to establish the governance and common standards that ensure data products are interoperable. This, for example, should include clear definitions of a product and the degree to which a domain offers up a narrowly defined product that suits one or two use cases versus something more generalised where there is greater onus on the consumer to refine further specific to that consumers need. In the case of the latter, the product needs to provide clarity as to how different consumers transform that data further and yet ensure consistency in interpretation.
There are other complexities to work through. An effective governance structure is needed to enable the right motivation, prioritisation and, where necessary, arbitration. This is even more important when there is a separation between multiple data producers and a variety of data consumers who may in turn be producers of a data product themselves. Now, data mesh calls out the need for global governance and discusses new data domains correlating data from other domains, but this governance cannot be overstated in a de-centralised model. For example, how do we ensure that domain C gets the product it needs, when its data is solely dependent on data products created by domains A and B, especially when A and B will have their own set of priorities which may not align with C.
There also needs to be agreement in, and ongoing review of, the non-functional requirements and how a given data product can best meet the current and future needs of the consumer. The very nature of an interoperable set of data products is that the use of any given product can grow and adapt over time increasing the significance of continuously scrutinising whether the existing set of products is keeping pace with the non-functionals.
Finding the right data architecture
We work in an imperfect world where competing priorities, practicalities on the ground and who ultimately foots the bill can easily create ongoing blockers where inter-dependencies between teams accumulate. Centralised, cross-cutting functions such as BI Competency Centres typically have within their governance structure the ability to address such issues. As a result, data mesh should not be seen as a bottom-up approach, but something that requires a level of investment to ensure future interoperability and seamless integration across the enterprise. And that attention to detail has to, arguably, be over and above that of centralised capabilities.
Those firms that are already invested in domain-driven design and microservices, have a product mindset and have mature and effective governance across their organisation are consequently well placed to start exploring data mesh. When considering the value of data mesh, ask yourself: is the current data warehouse or data lake generating the insight you need at scale, is there is a high degree of complexity across your data estate, and are there bottlenecks to delivery through centralised data capabilities. In doing so, you will discover whether data mesh is the right approach for you.