Several of Data Mesh’s offerings are quite impressive. As an example, they have really innovative approaches to data modeling and data architecture. However, when it comes to the technical history of Data Warehousing or conducting seminars on the subject matter, my thoughts diverge with some of their conclusions regarding how we got here. I also do not fully align with them on future plans w.r.t. the current trends. There is a clear disconnect between mesh architects and data warehouse architects. This could lead Data Mesh down a dangerous path regarding their understanding of the past, present, and future of modern OLTP architectures..
Myth #1 The data warehouse as a place to copy OLTP exhaust data
Build Amazon Redshift data warehouse securely and swiftly
- determine, through a persistent and consistent planning effort, which data needs to be integrated into the data warehouse;
- migrate (or transform or load) this data into it through software tools; and then
- access it effectively with reporting and business intelligence applications
Enterprise grade data platform for Google BigQuery
Myth #2 Relational database management systems were first used for OLTP
If someone says that relational database management systems were first used with operational applications, doubt everything the person says. The first RDBMS products were used for reporting on data in dimensional database designs—there was good reason for this, as none of the implementations even came with a usable audit trail facility. So, not at all OLTP-friendly at the beginning.
Myth #3 Data Warehousing necessarily mean monolithic databases
Another myth is that data warehouses always require giant databases. Many businesses have used data warehouses with many node clusters, tons of disks and memory, and ultra-fast backplanes. The myth stems from the fact that we have been able to isolate data at various levels of abstraction for a long time. In addition, we have been able to use data storage and compute technology that has been available for decades.
Myth #4 Data Warehousing necessarily means monolithic and siloed teams
This is a problem with the way IT companies and their customers have reframed the idea of data warehousing development and infrastructure. Before IT got its hands dirty with data warehouses, cross-functional, high-performance teams would work closely with the business to build highly flexible solutions that could be modified quickly in response to evolving needs. In those cases, teams often developed entire data marts iteratively and incrementally, starting small and adding capabilities over time. This approach is not limited to data warehousing; look at the way so many of us use a software versioning strategy for software projects or agile processes for product development. The point here is that this approach fits the nature of how business works better than an attempt to fully understand requirements up-front, design a solution based on some rigid methodology from the outset, build it all out and then accommodate change over time as a separate activity.
Data warehouse databases must be fully normalized
Data warehouses get queried directly
Data warehousing is out of date
As Data Mesh proponents point out, distributed computing techniques go back as far as the 1980s. For example, a client-server model in which one server processes requests from many client machines was popular at that time. Many data-mesh advocates have pointed out that data mesh is actually an extension of this distributed computing model. They also argue that distributed computing and data mesh are not mutually exclusive: you can use both simultaneously.
In the end, there are many positive aspects of data mesh. Perhaps because I’ve worked with it before. In addition, its proponents tend to be polite and well-informed people–unlike some of the rough-necks in the big data industry. However, what bothers me is how people who champion data mesh so often take potshots at how long data warehouses take to build and how long they take to improve upon once built. We saw that with Big Data and Hadoop; then we saw that with data lakes and lake houses/outhouses; now we’re seeing it again. It’s like when some people make fake news just for attention: irritating, time-wasting, and unnecessary. We don’t expect the champions of data mesh will read this blog post, but if they do, we hope they reconsider their stance on data warehousing and realize it’s not from the Stone Age, but an amazing platform for storing petabytes of critical corporate data that enable enterprise BI, analytics, Big Data projects–even cutting-edge AI algorithms and machine learning processes.