If you’re up on the latest trends, you’ve probably heard about something called a “data fabric.” But for most people, this probably sounds about as esoteric as ‘the internet’ sounded in the early 1990s. A data fabric can be thought of as an integrated virtual layer that connects all of an organization’s data and the processes and software associated with it. A data fabric overlays analytics on metadata assets to enable greater use of data across all divisions, departments, and data repositories, regardless of how siloed they might be. On an ongoing basis, a data fabric leverages machine learning and artificial intelligence to connect data from an organization’s various data sources to uncover and identify relevant relationships. A data fabric’s features and functionality not only serve the needs of a business, but can help it take giant leaps forward in terms of the value it derives from its data.
If this definition still sounds as esoteric as “world wide web” did to the 1993 mind, read on.
What is a data fabric?
How a Data Fabric Makes Sense of Big Data
The research and advisory firm, Gartner, has taken the lead in fleshing out the concept. Gartner points out that to gain a comprehensive view of an organization’s data, the data fabric should be able to monitor and integrate all of the key methods for data delivery, such as streaming, ETL (extract, transform, load), replication, messaging, virtualization, or microservices. But merely integrating these data delivery methods guarantees nothing more than a mess of data sources, unless you have some ability to overlay it with structure and meaning.
Above all else, a data fabric provides context for your organization’s data, and rich context is the key to a successful data fabric design. The ability to provide context to data largely rests on the quality of the “metadata,” or the data that describes the data. You can think of metadata as being somewhat like a book’s bibliographic information, such as the author, title, and table of contents. Just as this information isn’t the book but describes the book, likewise metadata is data that isn’t actually a part of the data, but is additional data that describes the contents of the data.
The system will not only discover and map out relationships between data entities, but will also help human beings to understand them. In order to aid human comprehension, the data fabric system will create “knowledge graphs” which show the various data entities and their relationships, overlaid with semantic information that allows non-technical users to quickly visualize and interpret the information. The knowledge graph would be similar in appearance to a flow-style chart that you might encounter in any other industry, and might allow you to visualize, for instance, which datasets across the organization had relationships to your department’s data. For example, if the marketing department wanted to instantly see what information held in other departments related to a particular customer list, they would see that displayed on the knowledge graph.
Machine Learning, Artificial Intelligence, and the Data Fabric
A data fabric leverages a thoroughly cataloged pool of metadata. It does this by continuously analyzing the metadata, and using machine learning (ML) and artificial intelligence (AI) algorithms to learn over time and churn out advanced predictions regarding data management and integration. For instance, the AI/ML component would automatically gather the metadata as the data source is added to the fabric, and add that to the other metadata. Through this process, based on what it learns about the data, it would pinpoint previously existing data sources that have commonalities, such as data from the marketing department that shares a list of customers or products with data from the sales department.
Machine learning and artificial intelligence algorithms are indispensable components of a data fabric design. In fact, ML and AI are largely responsible for monitoring data pipelines and suggesting the most appropriate integrations and relationships. These algorithms gather information from the data as it is connected to the fabric, and continually canvas all the organization’s data, determining its characteristics and understanding where potentially useful relationships and connections exist.
As the algorithms learn more about your data assets, they can also begin automating time-consuming improvisational tasks, such as certain business questions that users have repeatedly queried the system about. This type of effort can free up analysts to focus on more challenging problems.
How a Data Fabric Can Help Your Business
A functioning data fabric will provide advantages on a slew of fronts. At its highest level, a data fabric creates a unified data environment that breaks down data silos. This means that anyone within your organization has access to the entirety of the organization’s data (provided of course that you grant them full permissions). For example, if an authorized user in marketing needs to look at data from sales or procurement, they can access that data just as easily as they might access marketing data.
From there we can quickly see how a business would derive numerous benefits. First, the analytics lifecycle is going to be much faster — possibly by orders of magnitude. One reason for this is that currently data analysts and scientists spend a large portion of their time hunting down data sets. Remove this obstacle, and you significantly reduce the time it takes to get an answer to a question.
But there’s more. The data fabric, as we’ve discussed, not only unifies the data, but also makes sense of it by using AI and ML to identify meaningful relationships that might exist between data sets. So going back to our example, the person in the marketing department who was hunting down procurement data — perhaps in order to optimally price an upcoming product sale — would not only be able to access the data, but would have insight into places where marketing data might relate to procurement data. For instance, there might be data on product purchases in a customer list that overlapped with procurement data on the overall demand for those products. This could then be used to optimize advertising efforts to the tastes of specific customers.
A data fabric also supports risk mitigation. Much of the effort in data compliance involves simply knowing where the data is. A data fabric, by its very nature, provides complete visibility into all of the data. Just as a data analyst doesn’t need to go hunting for data sets, if an executive is called to the carpet by a regulator or is dealing with a lawsuit, the legal team doesn’t need to spend inordinate amounts of time hunting down data, either.
Furthermore, there are numerous cost/resource benefits associated with the fact that data no longer needs to be moved around. Data can be utilized where it resides, so there’s no need to ETL data into a data warehouse, which can be a lengthy and resource-intensive process.
On a related note, this ability to integrate data into a data fabric without having to move or change it in any way makes a given data architecture infinitely more scalable. As data is generated, its repository — whether it be a data lake, database, data warehouse, mart or other system — is simply connected to the fabric. At that point its data becomes searchable, and can be connected logically to any other relevant dataset that exists within the organization.
To sum it up, the data fabric — though admittedly somewhat esoteric right now — will follow the path of another formerly esoteric concept, the internet. What may now be a curiosity, just as those ‘surf the web’ cafes back in the ’90s, will quickly become something that you need simply to stay afloat amidst the competition.