Darshan Joshi, Chief Technology Officer, CYTRIO
Numbers show that frictionless data access offers significant business benefits. Over half of businesses worldwide believe seamless data access has become even more critical in decision-making in the last couple of years.
Yet, practical data management problems are keeping many companies from reaching data sharing nirvana. The diverse issues show that the moving data from databases and applications to cloud data warehouses and cloud-based data lakes has never been smooth. Many companies face data swamps and realize they need a proper data platform, architecture blueprint, and governance to get to that nirvana state.
Enter data fabric and data mesh. The former is a design concept that serves as an integrated layer (fabric) of data and connecting processes. In this way, it leaves architectural choices open for data solution providers. Many of its existing data integration patterns can be tweaked to leverage data catalog, knowledge graphs, and dynamic data integration to construct data fabric. Specifically, data fabric can be implemented with data lakes and data warehouses. On the other hand, data mesh adopts a more humanized approach, where teams take responsibility for specific data sources, mixing and matching domain experience and data engineering know-how to create analytics-ready data sets for business.
But this is not an article about which data architecture companies should adopt. Instead, as a data privacy practitioner, I like to point to the relatively large gaping hole that everyone is missing. And that’s data privacy. If not addressed, companies are simply walking into a data-driven pothole.
The data privacy blind spot
Let’s begin with that gaping hole.
What strikes me as highly odd is that even as regulators, governments, and consumers are demanding privacy and visibility into data usage at the enterprise level, neither the advocates of data mesh nor data fabric talk enough about privacy. Yes, security is an inherent part of these architectures, but the apparent deafening silence on data privacy seems to overlook a major data issue – one that faces every data user.
Despite being well-designed to solve the data access headache and address various enterprise-wide use cases, data mesh and data fabric architectures only focus on what we call “central data.” This data resides in applications, databases, data lakes, and data warehouses.
But this is not the only type of data in which we interact. Critical enterprise data also sits in office documents, which we call “non-central” or “edge data.” It’s one reason why chief privacy officers fret over PDF files in your OneDrive and office laptops that make data engineers and scientists grumble.
The problem is that most data architectures prescribe how data flows from central to edge and vice versa, but they do not factor in security and privacy of data on the edge. Instead, they all see it as “central” data and leave it to the companies to make the distinction via security and privacy frameworks and enforcement. The problem is this makes data vulnerable to threats. And since many of these architectures deal with disparate data sources, the security and privacy threat exponentially increases.
One can argue that data mesh holds on to the concept of data ownership for longer than data fabric. Theoretically, it also addresses the data governance questions. Still, neither approach stops an employee who exports data into a PDF report from a physical or virtual data warehouse, even though that report may contain sensitive data. Once exported, that data can be shared without any regard for data governance rules that applied for that same data when it was part of the data lake.
These approaches also don’t address someone exporting data using valid APIs. For example, a sales team may export customer lists containing confidential Personally Identifiable Information (PII) about the customer into a spreadsheet. Then, they share this spreadsheet with others without any regard for role-based access control (RBAC) or governance.
Get your privacy in order first
Security and privacy need to be coded into any data architecture. They must also apply to both “central” and “non-central” data; otherwise, you only end up with a notion of good data governance.
Businesses also need to look for a consolidated view of their data that goes beyond the use of “central” data. At the same time, they need an organization-wide alignment on their security and privacy postures.
Keep in mind that CXOs — think product, data, and information security, among others — will have data access overlaps. So, it’s critical that your data security and privacy posture recognize these overlaps and establish transparent governance, ownership, and communications.
No matter which architecture you finally choose, you need to:
- Find a sound privacy and security partner who can offer a single solution-led approach that takes into account your organization’s central and edge data
- Align your employees to your privacy and security posture — from the CXO to the person downloading customer data on Excel — to maximize your investment in a privacy and security solution
Have a privacy-first mindset
The bottom line is that without a privacy-first mindset and organization-wide alignment, you will end up with fragmented solutions. This means you are addressing your security and privacy elements in a fragmented manner, which makes you reactive as new use cases and solutions may expose new privacy holes.
Without a single, enterprise-wide policy and approach, all data architectures — new and old — are only designed to fail. And in today’s world, such failures increasingly come with a price tag that can derail your well-thought-out business ambitions.
This article was originally published on VM Blog on August 31, 2022: