DataOps Is Here to Stay. Here’s Why.

The methodology tames unruly pipelines in order to increase the value of your data — so you can adapt faster to business changes.

Written by Joe Gaska
Published on Feb. 17, 2021
Brand Studio Logo

The reason why something has “staying power” in tech has a dizzying range of answers. Some things go viral and then fizzle out, while others embed themselves into our daily lives and become indispensable to us over time. So what is it that separates one from the other? Is there a single, good predictor of whether something has staying power?

One way to ask the same question in tech is to look at how relevant something is across data’s lifecycle. The more it uses, touches or changes data over time, the higher the chances are that it will stick around.

Take data warehouses: They represent a “point-in-time” consumption mechanism for data. By creating a massive crossroads for your data at a specific point in time, they give you a good sense of where you are right now. But they have a harder time helping you see where you’ve been or where you’re going next. They’re hot right now but will begin to fade in relevancy unless they find ways to stretch across more of data’s lifecycle.

Data lakes are convenient watering holes for multiple teams wanting to consume data emanating from a source, such as a cloud application like Salesforce. Although they are capable of capturing changes in data over time, the manual API integrations that act as “tributaries” to feed data lakes are often limited by their initial design purposes. Decisions about how frequently data is captured — and how broadly that data is available across an organization — all serve to limit their relevance to data over time. As interests evolve and maintenance cycles widen, data lakes sometimes end up looking more like data islands.

The same can be said for master data management (MDM). Think about how static or frozen in time the concept of traditional MDM is. It asks the question: “What is my 360-degree view of this customer right now?” But the answer to this question changes over time. As data sources come and go, the perfect master record evolves. MDM sits atop the ever-changing histories of many different data sources, all of which wax and wane in relevance over time.

DataOps is different because it’s a methodology — an act of taming unruly data pipelines in order to turn data into value in an organization. And even though it may sometimes use dated tools or practices, it has tremendous staying power if measured by its relevance to data over time. Long after nature reclaims a snowflake as frozen water falling from the sky, the practice of data operations will continue to move us toward a more perfect union with our data as it changes over time. This is the business of data operations — and it’s here to stay.

So how can organizations use DataOps to their advantage? A good starting point is to map out who needs data in the organization and why. Here, some organizations discover that they may be capturing and creating multiple copies of the same data across teams for different purposes. System admins might be capturing hourly snapshots of data for disaster recovery, while fulfillment teams might be ingesting subsets of the same data into their supply chain systems via the system’s APIs, as an example.

Once you have established the first version of a data-consumption map across your organization, you’ll want to map in the frequency requirements across each data consumer. Some may need the data to be in their systems within five minutes, while others are OK with daily cycles.

What often emerges next is a data lake strategy, where organizations replicate data from a target application or source into a data lake that is adjacent to the source and point all data consumers to that replica as the primary consumption point for their data from a given target. In essence, this creates a continually refreshed watering hole for each source of data in their organization. Once enough data lakes form around common watering holes, organizations typically evolve a “data fabric” strategy that interconnects data lakes into one cohesive, evolving tapestry of data in the organization, tokenizing records and unifying the constantly evolving data sources into a true 360-degree picture of a target entity. In turn, you can analyze that picture over time to spot patterns and opportunities.

DataOps is a continually evolving process for an organization to become more and more attuned to the signals that various data sources are sending about the velocity and health of the business. And while there is no one-size-fits-all solution for any given organization, the practice of DataOps is here to stay. Those who engage with it in a meaningful way will establish a competitive advantage and set themselves up to adapt faster to changes happening in their business.

Related ReadingMore AI Won’t Solve Your Spreadsheet Problems

Explore Job Matches.