19 Data Modeling Tools You Should Know
During a teleconference meeting one day in mid-October, a developer at DoorDash was updating the company’s director of business intelligence, Marta Vovchenko, on a data model he was putting together. But he didn’t share his screen to show, say, a visualization in DataGrip or a diagram in erwin Data Modeler.
He just held up a piece of paper to the camera.
“Any data modeling starts with pen and paper,” Vovchenko told Built In. “If you’re talking about data modeling tools, pen and paper is a quintessential tool.”
Data Modeling Tools to Know
- SQuirreL SQL Client
- MySQL Workbench
- erwin Data Modeler
Of course, tools come into play at the next step of the process, but the low-tech approach speaks to the core essence of data modeling — plotting and illustrating the relationship between various entities, whether that’s inside an application database or within a data warehouse architecture.
The question of which data modeling tools to consider is contextual. “It’s a question of the right tools for the right situation,” Vovchenko said. Key considerations revolve around an organization’s tech stack and data security. “You have to keep in mind that, if you’re connecting different solutions to your databases, that means you might be opening up your databases,” she added.
Also, there can be some pretty stark tech-cultural lines. If you’re on a data team at a startup with no dedicated data architect role, there’s a good chance you’ve never practically encountered, for instance, ER/Studio — an enterprise-friendly, Windows-only tool with a strong foothold in more traditional corporate realms.
We asked Vovchenko to help us survey the full current landscape of data-modeling tools. And we got some additional perspective from Ajay Gupta, a manager of data architecture at Appnovation, and Mang-Git Ng, founder and CEO of Anvil. Here’s a cross-section of notable options:
Think of Draw.io as the bridge between pen and paper or whiteboarding to something a little more formalized. It’s a free, intuitive, browser-based flowchart builder in which users can drag and drop entity shapes (including the ellipses and parallelograms common to data models) onto a canvas, then join them with connector lines. It’s a handy way to get something down quickly, even for someone lacking access to high-powered software.
It might not make sense in contexts that deal with a lot of unstructured, schema-on-read data, like event-based data sets, but it’s a natural for things like finance models, which are of course highly structured, Vovchenko said.
“The very first model that was drawn for finance [at DoorDash] was actually drawn in Draw.io,” she said.
An SQL plugin, which automatically builds entity shapes and attributes from SQL code, is also available.
This diagram builder is similar to Draw.io, but it promises more complex flows and stronger data protection. (Unlike Draw.io, Lucidchart’s unlimited plans carry a cost, beginning at $7.95 per month.) Lucidchart links to SQL, which means users can pipe their canvases into their actual database management system.
It’s part of DoorDash’s toolkit too. “Historically, we’ve had a lot of microservices based on PostgresQL databases,” Vovchenko said. “Using Lucidcharts was helpful because you can connect it directly to those databases, then read the schema and work with the already pre-documented diagram.”
SQuirreL SQL Client
There are three overarching considerations when working with data models, according to Vovchenko: agility, consistency and data integrity. Agility effectively translates to a willingness to accept the fact that no single data-modeling tool is going to suffice across all use cases.
DoorDash has also recently utilized SQuirreL, a free, open-source graphical tool that’s supported by most major relational databases. Described by Vovchenko as easy to install and deploy, it lets users see database structures and draw entity-relationship diagrams from existing schemas, among other features.
Open-source options may be particularly attractive to net-new businesses, since, as Ajay Gupta noted, connecting those applications to older data environments can sometimes require the added step of front-ending an open database connectivity (ODBC) API.
This free, open-source and cross-platform data-modeling GUI layer for MySQL databases has a reputation for intuitiveness and ease of use.
Vovchenko said she previously used Workbench for structured data alongside the MicroStrategy analytics platform. “Having the ability to use Workbench to develop a model, then overlay it with this extensive logical model to use for reporting purposes, was very helpful for us,” she said. In fact, her team formerly had a regularly scheduled exercise where it would re-create the full model with Workbench and then examine how it fit into the environment. “That’s a really good tool,” she added.
Perhaps the most important trend in data in recent years is the rise of data catalogs, spurred in no small part by privacy regulations like GDPR and CCPA. That push has touched data modeling too. Vovchenko said the line between data lineage tools and data modeling tools will only continue to blur.
Exemplifying the trend is Amundsen, the metadata discovery platform developed and made available by Lyft last year.
“Amundsen models metadata entities as a graph, which makes it easy to extend the model when more entities are introduced,” wrote Lyft engineer Tao Fang when the company made the open-source announcement last October. “All of the entities — such as tables, columns and schemas — are connected via edges that represent the relationships between them.”
erwin Data Modeler
erwin runs on MySQL and PostgresQL — the kinds of free databases toward which so much of the tech world has gravitated in recent years. But it also supports the likes of Teradata, Oracle and Netezza — which is why it’s perhaps more associated with enterprise legacy shops, where data models tend to be more thoroughly optimized than in some deliver-features-fast startups. (The standard model costs $2,999 per year.)
That said, organizations that accept the cost tradeoff get some very robust capabilities. Benefits include the ability to both forward and reverse engineer, generate DDL statements and “help you identify diffs, to keep the history of your schema evolution,” Vovchenko said.
Along with erwin, ER/Studio is another powerful, long-running, enterprise-friendly market leader in data modeling. (The ER, as in erwin, stands for entity relationship.) It, too, supports forward and reverse engineering, and also runs the gamut in terms of databases supported. The tool — which is Windows-only — is available in four options, with the top model supporting Jira integration, conceptual mapping and schema generation on database, among other features. Models start at $920 per user.
Both erwin and ER/Studio have weathered ownership changes since arriving in the late ’90s. Those shakeups may have factored into occasional stability issues for both products, Gupta said, but the fact that both can connect to so many different data sources and database platforms should counteract lingering fears of bugginess. Both are quality options for enterprises that want to avoid vendor lock-in, he added.
Other notable entries in the Windows world include SAP’s PowerDesigner, IBM’s InfoSphere Data Architect and Microsoft’s SQL Server Management Studio, which comes standard with MS SQL Server.
The Oracle SQL Developer Data Modeler lacks what many of those directly above bring in terms of database flexibility — it’s pretty much Oracle or the road. But it is free (though the Oracle database certainly is not) and available for both Windows and Mac.
As we mentioned, tech stacks and tech culture can create pretty sharp dividing lines when it comes to who reaches for what data-modeling tools. Mang-Git Ng, for example, is the founder of recently launched startup Anvil, a paperwork automation company, and a Flexport veteran. For him, enterprise tools like erwin and ER/Studio might as well exist on another planet.
“As a startup, you work scrappy and dirty,” he said, “You ask, ‘How can you get what you need the fastest?’ And then go on from there.”
“I didn’t really see a need for a more advanced data modeling tool, if you already understand the data model that you’ve built,” he added.
So what does someone like Ng use? Postico isn’t a data modeling tool per se; it’s a GUI for PostgresQL that lets users see the tables and views in their database in an intuitive way. Ng said it’s a handy way to look under the hood of your database without getting into the command-line weeds.
While Postico has a reputation for being beginner friendly, the similar Navicat is often considered more comprehensive from a feature perspective and is also worth considering for companies with a dedicated data development manager or data architect role.
Similarly, the open-source Metabase is a graphical interface tool with some helpful analytics visualizations rather than an advanced modeling tool. Its main selling point is cross-team data access. But it also lets users join tables and define entity relationships.
Ng said of his experience with the tool: “I essentially could make a copy of our database, hook it up to Metabase, then slice and dice the data however I wanted. I could make SQL queries on top of the data and do table joins, all from within the UI — which is pretty nice.”
Along with the intuitive UI, Ng said he was impressed with Metabase’s ease of deployment. He got it up and running at full speed within half a day.
Other Notable Data Modeling Tools to Know
Toad Data Modeler: This well-established Windows tool is compatible with a diversity of databases, with distinct editions available for different data roles.
dbt: Short for Data Build Tool, this command-line tool, developed by Fishtown Analytics, allows users to visualize data lineage and complete SQL-based data modeling.
Datagrip: Popular with database administrators, this application allows users to draw entity-relationship diagrams by following the visualization pathway.
Ab Initio: An enterprise-popular application with which data modelers can build E-R relationships in ETL contexts.