Think of data modeling as the data and systems equivalent of a manufacturing company building a model of an automobile, household appliance or airplane before they embark on building the actual object. Models help us paint a conceptual picture of how different objects and subsystems will relate to one another. We can work out those details long before the costly investments of time and resources into the actual objects. 

What Is Data Modeling?

Data modeling is the process of mapping out how major pieces of data will relate to one another, what their respective characteristics and attributes are and what business rules will apply to the model. It helps data scientists, data analysts and software developers to begin working with data in a conceptual way before creating an analytical model, defining data for software and any other use of data.  

In the case of a data model, we can lay out how major pieces of data will relate to one another, what their respective characteristics and attributes are, and what business rules apply to the model as a whole, as well as smaller components. Data scientists, data analysts, software developers and others can use the descriptive content of a data model to be crystal clear on what each piece of data actually means before building analytical models, data definitions for software and other uses of organizational data.

In this article, we’ll dig deeper into the fundamentals of data modeling and answer common questions of why it’s an important piece of data science.

 

How Does Data Modeling Work? 

Data modeling helps us begin working with data in a highly conceptual way. It’s intended to mirror the real world much more closely than many databases. Additionally, modeling translates and maps data along the conceptual-logical-physical life cycle, all the way down to highly tuned data structures designed to support the best online performance and response time. 

In other words, data modeling is a discipline that mirrors other types of technology that begin with conceptual views and end with technical implementations — like software development and business process modeling, for example.

More on Data ScienceData Analyst vs. Data Scientist: Similarities and Differences Explained.

 

When Do Data Scientists Use Modeling?

Data science teams are responsible for building complex analytical models to underlie a broad range of predictive and exploratory/discovery analytics. A critical building block for those analytical models is getting the data right. This means accurately understanding exactly what each data element means, the possible ranges or lists of values for its contents, whether data instances must be present or if missing data is permitted (e.g., if NULL values are allowed), how the data relates to other data and much more.

Data models help the data science teams fully understand the data and identify any possible issues that could impact the validity and accuracy of their models.

 

What Are the 3 Types of Data Modeling?

Now that we know the basics of how data modeling works, let’s go into the three types of data models, each of which have their own roles in a database system.

 

Conceptual data model

A conceptual data model ideally mirrors the real world as closely as possible. In other words, we try to avoid the constraints of a particular database management system, which often will require our data to be structured in very specific ways, sometimes including “artificial” database tables or other structures necessary to overcome those constraints. Instead, we want the conceptual model to represent a specific business problem, or whatever we’re building it for, with the actual objects, characteristics, relationships and business rules.

 

Logical Data Model

A logical data model maps our conceptual model into a database-specific set of structures. For example, in a relational database, we have certain constructs like tables comprised of rows and columns. We take the model that we built at the conceptual level and apply whatever transformations we need to make in order to make the model now conform to the rules of relational databases, or whatever type of data management system we’re using.

 

Physical Data Model

At the physical level, we make adjustments to the logical model in two ways. The first way is driven by product-specific features and limitations. The second way attempts to increase performance and response time once the database is actually filled with data and implemented.

All three levels are critically important, and the data modeling lifecycle ideally includes substantial involvement at each of the three levels.

More on DataData Integration: An Overview

 

What Are Common Data Modeling Tools?

A data modeling tool is a specialized piece of software that not only permits us to draw major data modeling constructs — entities, attributes and relationships — but also attaches semantics to those objects. These meanings might be as simple as the data types and sizes of attributes, or as complex as the business rules that govern particular relationships between two entities on a data model. 

The semantics then help us translate and convert highly conceptual models that mirror the real world into database-specific constructs that become the data management platform for one or more applications and systems.

Data modeling tools can range from general-purpose packages such as Microsoft Visio, which has evolved into a full-functioned product to build and manage data models, to specialized software such as CA Erwin, ER/Studio Data Architect and many others. Here’s a look at five of the most popular tools: 

 

1. Microsoft Visio 

If you don’t need incredibly specialized software, then Microsoft Visio might be a great place to start. Its diagrams are professional and intuitive. Additionally, the flowchart templates are modern, and you can collaborate with your team in the tool to get real-world insights.

 

2. CA Erwin

If you use Windows as your operating system and are looking for a data modeler to work with structured and unstructured data, Erwin Data Modeler is an award-winning, high-quality modeling tool.

 

3. ER/Studio

Idera’s ER/Studio is another software that works best with Windows. Using the enterprise team edition, you can model data from business intelligence (BI), big data, ETL, NoSQL and relational sources, as well as implement naming standards, define glossary terms and save models in the repository for other team members to view.

 

4. PowerDesigner

The SAP PowerDesigner tool is best used for data modeling, lync-and-sync and metadata management. The interface is easy to navigate and has a repository so you can easily collaborate with your team. This tool works best with Windows.

 

5. Navicat

Use Navicat across any of the operating systems. It’s a more cost-effective way to build high-quality data models than some of the other tools, and you can still perform many of the same tasks. Navicat is best for forward and/or reverse engineering, importing models from other data sources or generating complex SQL/DDL.

More on Data22 Data Science Applications and Examples

 

Popular Data Modeling Techniques  

There’s more than one approach to data modeling. The right technique will depend on your use case and business requirements. Some data modeling methodologies can be thought of as database-centric, which means they are dedicated to graphical depictions of data and the relationships among data objects, with the eventual goal of designing and creating a database. Other techniques are more holistic in nature, addressing data models in concert and alongside process modeling and other aspects of systems development.

Most approaches trace their roots back to entity-relationship (E-R) modeling, which began in 1976 with Peter Chen’s academic paper and has evolved over the years. Another technique closely related to classic E-R modeling is the so-called “crow’s foot notation” which follows many of the same rules as “classic” E-R modeling but uses different symbols to represent data and business rules.

The E-R model, regardless of notation, represents real-world objects. An entity is a thing in the real world. It has a set of properties or attributes and those properties can have values. Because there can be many entities, those entities can have relationships with each other. E-R modeling, therefore, models those entities and the attributes and relationships among them. 

Now, we’ve covered a lot on data modeling, so you’re likely wondering how it fits into your career development. IT and data professionals can either specialize in data modeling as part of data-centric work they do (data warehousing or big data, for example) or work with data models as part of a larger body of work, such as custom software development, packaged software development or data science and machine learning. In general, data modeling is a critically important skill for anyone who works with data in any capacity. Someone could make data modeling the center point of their career, or it could be part of a larger toolkit of skills.

***

This article was originally posted on Udemy’s blog as What is Data Modeling? Common Tools, Techniques, and Model Types

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us