As data continues to grow in importance for organizations across industries, data literacy — one’s proficiency in the interpretation and communication of data and data-related activities — has become an increasingly crucial skill, even for those who are not data analysts, data engineers, or data scientists.
Developing such proficiency in data interpretation and communication certainly requires time and effort, but a certain mindset for approaching data and data-related problems also helps to facilitate the development of data literacy. Below are five ways anyone, even non-data professionals, can develop (and enhance) such a data-driven mindset.
Strive to Be an Expert on Your Data
You probably work with at least some data in your current day-to-day work. Maybe it is an Excel spreadsheet containing information on customers, quarterly financial reports, or a list of transactions from a point-of-sale system. Regardless, understanding the data that is relevant to your domain can give you an opening to be a valuable contributor on projects that use your data.
At a minimum, you should know what measures (also known as columns, variables, and fields) are included in the data, any filters applied to the data (does a given data set of interest include all relevant data points or just a subset of data points that satisfy some criteria?), and the mechanisms used to actually collect the data (is the data generated by online activity, questionnaires, in-store purchases, or some other process?). Knowing these facets of your data helps you to understand what questions can, and cannot, be addressed by said data.
This knowledge may also place you in a position to provide advice on which measures might be relevant for a particular project and what, if any, records should be included or excluded from the data for a given project — making you an invaluable resource.
Understand What the Data Is for
It sounds trivial but small startups and Fortune 500 companies alike often skip ahead to analysis before forming a core understanding of what they are looking for. It’s simply more sexy to talk about the insights you are going to get from that fancy new machine learning tool than to do the hard work of critically asking yourself, “why does this matter?” More often than not with this approach you end up spending tons of time and resources analyzing all of your data before realizing that you have nothing to show for it and have to start over. If you don’t know what you are looking for, how are you supposed to know when you find it?
Before you start, ask yourself the following questions:
- What are we looking for?
- What question are you looking to solve?
- Why does it matter?
- What impact does this information have?
- What information do I need to answer that question?
- Where do I find that information?
If you haven’t asked these questions, you should probably stop and start over. You’ve done it all wrong. Stop before you waste more of your time and your company’s money.
If you can start by asking a simple question (how do I get more users?), then break that down to which elements drive user growth at your company (such as which channels do they sign up, who makes up a plurality of users, and so on), then identify what data you need to answer that question. By going through this process, you’ve already accomplished two very important parts of setting your data initiative for success. First, you’ve cut down on the amount of data you will eventually need to analyze since you know what you are looking for and what you do not need. Second, you’ve established an end point, making it easier to successfully provide insights to your stakeholders on that specific question.
Prepare to Get Dirty
At Promotable, we teach on a variety of data-related topics, and in our classes we stress to students that much of the data they will work with in their careers will be messy. To illustrate, say you are looking over a spreadsheet showing the duration of time each visitor spends on your company’s website and you see that some users have negative numbers recorded in the duration column. Or, say you get a data set with responses from a customer satisfaction survey, only to find that a substantial portion of answers to various questions are blank.
Data collection and processing errors can, and do, occur. Missing data is, unfortunately, commonplace across various data sets. Understanding that real-world data is often not clean is important for two reasons. First, awareness of the possibility of anomalous and/or missing data points is a crucial part of approaching data with a critical eye toward data quality. Remember, results mean little if the underlying data is unreliable; garbage in, garbage out.
Second, being mindful of the fact that data cleaning work generally needs to be performed before any meaningful analysis and/or predictive modeling can be done is important for appreciating the true amount of time and effort required to bring a project to fruition. To help ensure accuracy and completeness, time needs to be budgeted to comb through data and perform data cleaning tasks as needed.
Apply Healthy Skepticism to Decisions Related to the Data
While hyphenating a word with data (like “data-driven”) may be used to convey a certain authoritativeness to an outcome or process, the truth is that data alone does not grant credence to anything.
Rather, data is only made useful and valuable through various human-made decisions. From decisions on filtering (are data points with certain characteristics irrelevant to my analysis?) and descriptive statistics (should I use the mean or median to summarize this variable?), to decisions on modeling (should I use a decision tree or random forest?) and interpretation (is a model that makes a prediction with 95 percent accuracy good enough to put into production?), there are a number of stages in the life of a data-driven, or data-inspired, or data-whatever product where human judgment is necessary.
Does this mean that we should automatically suspend trust in products and insights made possible by data? Certainly not. Insofar as data is the end-result of processes that measure events, behaviors, and perceptions in the real world, the scientific use of data has the potential to provide predictions and insights derived from something more than the loudest voice in the room or someone’s gut feeling.
That said, it is important to be mindful that anything with data in front of its name is the result of human activity and, thus, subject to errors and inefficiencies. Furthermore, critically examining data collection procedures, data storage practices, statistical analyses, and predictive modeling is likely to motivate others, and/or yourself, to identify operational blindspots and evaluate whether an existing approach to a data-based issue is really the best practice in comparison to existing alternatives.
In turn, such critical thought about data and data-based results has the potential to improve data quality, lead to stronger statistical and predictive models, and ensure that data is collected and used in an ethical manner.
Accept That the Answers You Get Might Not Be the Answers You Want
One aspect of working with data that is simultaneously exciting and terrifying is that you do not know what the results will show. You may have an idea about what the results of a statistical analysis or machine learning model might show, sure, but until the data is actually analyzed there is no way to be certain. Sometimes, in fact, the data will even give you unexpected or even undesired results.
For example, say you are running an experiment testing whether a new user interface (UI) feature on your company’s mobile app results in higher revenue compared to the current version of the app that does not include this feature. After conducting the analysis of a carefully designed experiment you find that the new UI feature has no meaningful difference on revenue. Does that mean the experiment was a failure? Absolutely not. In this case, you were able to come to a data-informed conclusion and have more information than you did prior to the analysis.
Foster a Data-Driven Culture
You may be questioning why we place so much emphasis on employees having a data-driven mindset throughout an organization. After all, aren’t data professionals paid to handle and think about all things data?
It is true that an organization should not expect their human resources associates to build data pipelines or their accounting department to create machine learning models. Nevertheless, having individuals in various roles who can interpret and communicate data-related processes is a competitive advantage. When decision-makers can interpret different data visualizations and basic statistics they can transform the work of data professionals into solid strategies faster and easier. For example, when managers possess a basic understanding of all the steps involved in creating a usable machine learning model, this makes for better planning, helps to establish reasonable expectations, and ultimately leads to a better product and outcome.
In short, organizations that make a data-driven mindset a core part of their culture will see greater success in maximizing the value of their data than organizations that think it’s just relevant to data professionals.