The traditional data infrastructure is built around a single, monolithic source of all enterprise data, be it a data warehouse, or more recently, a data lake.
Organizations are beginning to realize some of the problems in this design:
- The limitations of centralized teams: Centralized data teams cannot possibly understand the data needs of all of the different departments that they serve.
- The inability to serve different departments: One central platform cannot be flexible enough to accommodate the requirements of an organization’s different departments.
- Slow data provisioning. Centralized platforms are inherently rigid: as they are set up to perform standard operations across the entire organization. As a result, data provisioning is slow, and can never be real-time or on-demand.
Data mesh is a new, decentralized data architecture that attempts to solve the above problems by replacing the single, centralized data source with multiple data domains, each managed by different departments within the organization.
In order for data mesh to work, as described above, it needs a data delivery system that can address its distributed nature. Traditional replication-based data integration approaches, such as extract, transform, and load (ETL) processes, are not capable of performing this function, as they are designed to move data from multiple data sources into a single repository.
Data virtualization, in contrast, is a perfect fit for data mesh. Unlike ETL processes, it provides real-time access to data without having to replicate it.
The architecture of data virtualization is extremely powerful in enabling data mesh:
- The only data that data virtualization centralizes is the critical metadata for accessing the different data sources.
- This architecture enables organizations to implement governance and security protocols across all of the different data domains from a single point of control.
- This architecture also enables organizations to implement highly tailored semantic models above the individual data sources, that effectively serve as data domains without changing the underlying data.
- These semantic models can be easily changed, developed, or re-designed, again without changing the underlying data.
- Data virtualization enables full-featured data catalogs that not only list what data is available but can also provide ready, real-time access to it, in a self-service manner.