Authors: Kai Kuikkaniemi (MyData), Tobias Guggenberger (Fraunhofer ISST)
What is Data Mesh, data product and Data Space thinking? How can they develop a more holistic paradigm of data sharing? What are the benefits of using Data Spaces technologies for organisations in the future? In this article, the authors delve deep into the world of data products in mesh and space, an essential topic for anyone interested in the data space in the energy domain. Whether you’re a seasoned data professional or just beginning your journey in the energy sector, the insights shared here are invaluable. It not only serves as an informative resource but also seamlessly connects to ENERSHARE’s mission to empower data-driven innovation in the energy industry as well as share and enable the transition of current energy systems towards more smart and decentralized paradigms.
Using and sharing data is essential for any modern enterprise. Value creation from business analysis and data-driven services relies on internal and external data. Still today, we can see that organisations produce vast amounts of data that go virtually unused. Why does this happen since any business activity should generate stakeholder value? A simplified answer is that companies do not manage data as a product. Often, organisations do not align the production and warehousing of data with any stakeholder needs or expectations. Data is just produced because it’s easy, cheap, and might help someday. Such a thoughtless approach to data does not generate any value. For other goods, organisations have well-established internal and external product management practices that help them streamline any activity regarding the specific product.1
Data product is a fuzzy term with no single measure or definition. When we talk about a data product, it is primarily a mindset change. The value of this change is measured in practice, not in theory. Product thinking with data means focusing on semantics, data quality, and standardising how data is delivered and described. Despite the fuzziness of the term, one should uphold the impact of the new mindset. When data is a product, its usability becomes better. Productised data can be easily consumed, even by users who did not initially have any connection to the source of this data. The product thinking with data reflects how data use has expanded to more domains.
Typically, data generated in various processes are not designed for reuse or sharing. Poor quality and usability of the data cause massive inefficiencies. In data analytics projects, 48% of the work is data engineering, 32% is analytics modelling & evaluation, and only 20% is deployment.2 Poor data quality costs organisations more than 600 billion dollars annually and accounts for up to 60% of service organisations’ expenses.3 With the growing significance of data reuse, we are witnessing the advent of various new tools to help data engineering, whether that is the management of semantics, discoverability, or data quality. These tools are helpful but do not solve the root cause. When implementing data product thinking in the organisation, the data is initially produced with reuse and sharing in mind – this is solving the root cause of many problems.
Perceiving data as a product reflects how data thinking transforms from IT to business. Through product thinking, data matures from a signal between information systems to an asset or utility that companies develop, share, and sell.
There are apparent differences between physical and data products. Still, businesses can design their processes and architectures to support value creation with data inspired by analogies from physical products. They can apply long-established product management and supply chain management practices to source, produce, and distribute data products through different channels and data factories.
Data mesh is a modern paradigm that has data products at its core. It has four fundaments: domain ownership, data as a product, self-serve data platform, and federated (and computational) governance.4 Zhamak Dehghani coined data mesh in 2019 with Michael Fowler’s help.5 The concept borrows ideas from domain-driven design and builds on top of the software paradigms that promote agile functional teams with autonomy and responsibility. The concept is technological and agnostic to data types – it does not matter whether data is delivered through APIs, query layers or events. Initially, the data mesh concept focused on managing data for analytical use. Later on, the border between analytical use and operational use has been contemplated (ref), especially when we look at how data mesh connects with event-driven designs.
Data spaces have different origins and framing but also much in common with data mesh. Data spaces focus on enabling data sharing and data reuse across organisations. They build on concepts such as data space connectors, data governance capabilities, identity and access management and data catalogues. If we think about the four foundations of data mesh, they have relatively straightforward counterparts in the data spaces thinking:
Data mesh principle | Data space counterpart |
Domain driven | Sectorial data spaces |
Data as product | Semantic management and various data management services |
Self-serve data platform | Data space infrastructure: Data space catalogue, connectors, data marketplaces, and other infrastructural services |
Federated (computational) governance | Data space governance and various governance instruments (such as clearing house and authorisation) |
Thinking in data spaces finds good resonance with the data mesh thinking except the data products thinking. The data spaces are still mainly founded on the classical view of how data is posteriorly engineered for reuse instead of originally manufactured as products that can be easily reused and shared. However, the language of data products is entering data space frameworks such as Gaia-X. The data space thinking could learn from and connect to the industry-driven data mesh paradigm, which is gaining significant traction globally. The two approaches are aligned and complementary since the foundations are the same. Still, the scope is different because data mesh focuses on data use within organisations, and data spaces focus on cross-organisational data sharing.
Data mesh and space could converge in the future and create a more holistic paradigm of data sharing for organisations that cover both internal and external data users. Data marketplace thinking that covers internal use and simultaneously crosses the organisational borders may be a significant driver for data sharing and the adoption of data spaces. The governance, authorisation, and connection capabilities that data spaces offer complement nicely the data mesh capabilities. Additionally, organisations can apply data space technologies internally to overcome the boundary between internal and external data management.
The most significant gap between data spaces and the data mesh approach is the support for data productisation. Data as a product is one of the fundamental principles in data meshes, but in data space development, the role of data products is just starting to grow. Data productisation is a recent major shift in business thinking related to data. In data products, the team that produces data is responsible for the quality and features of the data. The data product approach can be a new foundation that data spaces build on top of and promote.
Fundamentally data mesh and data spaces are technology-agnostic approaches. Many IT firms have jumped on board to promote their data mesh capabilities, and consultancy firms promote how they can assist businesses on their data mesh transformation journeys. In data spaces, the commercial offering is just emerging. Companies with data mesh offerings include rising data stars like Confluent (for event-driven data mesh), Starburst (for common query layer), and Snowflake (for federated data platform) and also many fast-growing data catalogue vendors. The primary value proposition for data mesh is cost reductions for data engineering and -analytics within the organisation. This is especially important when data-related skills are rare and expensive. While data mesh is new and its practical implementation is still growing, there is a lot of excitement about it in the industry. When businesses mature with their internal data capabilities, the next step is to focus on cross-organisational data sharing. That would create demand for the data spaces in combination with data meshes, both building on top of data products.
References:
1 Crosby, L., & Schlueter Langdon, C. (2019). Data is a product. Marketing News, American Marketing Association.
2 Langdon, C. S., & Sikora, R. (2020). Creating a data factory for data products. In Lecture notes in business information processing (pp. 43–55).Springer International Publishing.
3 Swami, A., Vasudevan, S., & Huyn, J. (2020). Data sentinel: A declarative production-scale data validation platform. 36th International Conference on Data Engineering (ICDE), 1579–1590.
4 Dehghani, Z. (2022). Data mesh : Delivering data-driven value at scale. O’Reilly Media, Incorporated.
5https://martinfowler.com/articles/data-monolith-to-mesh.html