Data is the most valuable resource of the 21st century. Across a breadth of industries, the world’s top companies are adapting their business models to collect and analyze more data. Governments and civil institutions are also following suit. United Nations Secretary-General, António Guterres, called accurate datathe lifeblood of good policy and decision-making.”

What Is Data Analysis?

Data analysis is the process of extracting key insights from data sets. In addition to crunching numbers, it involves finding the story in the data, the underlying trends and the hidden message. The best data analysts use existing data to predict what the future holds.

Today, data influences everything from our movie recommendations to our medical treatments, from online advertisements to poverty reduction initiatives. More data has been generated in the last two years than in all of the previous years of human history combined. But here’s a shocking statistic: Only 0.5 percent of the world’s data is estimated to have been analyzed.

The ability to work with data is becoming a crucial skill in the modern world. But what exactly is data analysis, and how can one get started with it? In this article, we’ll explore all the details.

 

What Is Data Analysis?

The meaning of data analysis can vary from company to company, from team to team, and even from project to project. It’s better to think about the purpose of data analysis.

The goal of data analysis is to extract key insights from data sets. A data analyst aims to discover useful information that helps stakeholders make informed decisions. It’s not just about crunching numbers. Data analysis wants to find the story in the data, the underlying trend and the hidden message. The best data analysts may even appear to be fortune-tellers by using existing data to predict what the future holds.

It might sound challenging, but data analysis is a lot less complex than it appears. Most of us think like data analysts on a daily basis. For example, we may:

  • Look at our credit card bills to identify our biggest monthly expenses.
  • Glance at our favorite athlete’s past performances to predict how well they’ll play in their next game.
  • Keep track of how many items we complete on a daily to-do list.

In these situations, we’re using our past and current observations to influence our future behavior. Data analysis seeks the same result.

More on DataData Science vs. Decision Science: What’s the Difference

 

What Types of Data Are There?

There are many different types of data. Some common examples include:

  • Text-based data like a name or an address.
  • Numeric data like age, weight or temperature.
  • Time-based data like the date an event occurred.

Data types fall into two broader categories — quantitative and qualitative. Quantitative data involves a numeric measurement. Qualitative data describes a characteristic or trait .

Imagine we’re analyzing a data set of houses sold in the United States. Examples of quantitative data might include the selling price, the number of bedrooms, or the square footage. Examples of qualitative data might include the house’s address, listing agent or type (i.e., ranch or colonial).

 

The Process of Data Analysis

The data analysis process begins with an inquiry. What would you like to figure out? What information would be helpful for you or the business unit to make a decision? What data needs to be acquired? Is there an existing data collection that you have that holds the answers? 

Once data has been collected, the next step is extracting it from its source. A data analyst may query a database to pull out the relevant tables of information. A variety of different file types, including comma separated values (CSV), JavaScript Object Notation (JSON) and Hierarchical Data Format (HDF), are used to store data.

Raw data describes data in its original format, or the way that it is stored before it is manipulated. Raw data often arrives with imperfections that the analyst needs to fix before analysis can occur. For example:

  • A person’s name may be in all uppercase letters instead of a proper casing.
  • A piece of text may have extra whitespace on its ends.
  • A row of data may be missing or duplicated.
  • A cell may store multiple values that need to be isolated.

The process of cleaning data is called data wrangling or data munging.

Once the data is cleaned, we apply a sequence of manipulations to the data set. The goal with every step is to move closer to an answer for the original business inquiry. Data analysis can feel a bit like detective work. Over time, a skilled analyst builds a dynamic toolkit of data analysis techniques and learns the right ones to apply to solve the problem at hand. Some examples of everyday manipulations include:

  • Sorting column values in ascending or descending order.
  • Extracting a subset of rows that satisfy a certain condition.
  • Joining together columns from multiple tables by a common key.
  • Grouping data and performing an aggregate operation like taking the average.

Once an analyst has reached a conclusion, the final step is presenting it to stakeholders. The deliverable can vary from assignment to assignment. One stakeholder may ask for a single conclusive number, such as a count or a sum. Another may request a collection of all relevant data points. A third may like to see a written summary.

It is often easier to describe an idea with a picture than with a paragraph of text. Visual representations of data are popular in data analysis reports. Some common visualizations include bar charts, pie charts, scatter plots and histograms.

More on DataThere’s a Better Way to Put Data in Front of Your Customers

 

Data Analysis Tools

Both free and commercial data analysis tools are available for analysts of all skill levels. Some are optimized for speed and efficiency. Others focus on user experience and convenience. Rarely is there only one technology for a job. Rather, a data analyst learns to choose the best tool for their current assignment.

Microsoft Excel is the most popular data analysis tool available today. It is estimated to be installed on 750 million computers worldwide. Similar spreadsheet software options include Google Sheets and Apple Numbers. Excel is a great option for those getting started with data analysis. Its visual interface requires a smaller learning curve compared to its competitors.

Once data sets start to grow in size and complexity, it becomes optimal to transition to a more heavy-duty solution. Other data analysis tools include:

  • Pandas: an open-source library built on top of the Python programming language.
  • R: a programming language built for statistical analysis.
  • SAS: a commercial suite of tools for data analytics.

These tools open the doors to more advanced analysis and other benefits including automation. 

The demand for data analysts is forecasted to keep growing as companies and institutions collect more and more data. Luckily, the barrier to entry for learning data analysis has never been lower. Many in-depth online courses are available for a wide spectrum of technologies. Data analysis is a worthwhile investment for those looking to pick up for a versatile skill that can prove helpful at both home and work.

* * *

This article was originally posted on Udemy’s Blog as What Is Data Analysis: Learn How to Derive Key Insights From Your Data.

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us