Data ontology is a way of linking data in various formats based on various concepts. In the early days of the internet, data were linked using HTTP protocols. Nowadays, one can add another layer, an ontology, to define a specific concept, and then automatically link data points that are pertinent to that concept.

If you’ve seen the word recently and thought that ontology is a new thing, rest assured that it’s more ancient than the oldest sweater you own. Aristotle, a Greek philosopher who lived in the fourth century B.C.E., called it the “first philosophy” in his work Metaphysics

Truth be told, it took a little while for this concept to get popular again. The German rationalist philosopher Christian Wolff eventually got ontology back into mainstream discussions of philosophy in the 18th century. Since then, philosophers have consistently debated the topic — as have patrons in bars, when the hour is late and the liquor is flowing. In this article, well take the concept out of the philosophy seminar and the tavern to explore data ontology and how it works in practice.

What Is Ontology? How Does it Relate to Data?

At its core, ontology is the study of what is. The concept has recently begun to gain traction in the world of computer science through the concept of data ontology. 

Data ontology is a way of linking data in various formats based on various concepts. In the early days of the internet, data were linked using HTTP protocols. Nowadays, one can add another layer, an ontology, to define a specific concept, and then automatically link data points that are pertinent to that concept.

More From Ari JouryLinux Commands Cheat Sheet With Examples

 

Ontology and Data

In the early 2000s, ontology migrated out of the realm of the philosophers when it struck a chord with computer scientists. Veterans like Tim Berners-Lee started advocating for what they called “linked data.” The idea is that data should not solely exist in the form of hypertext documents and hyperlinks between them. Rather, data should be viewed as what it represents — people, places, events, ideas, activities, and so on — and linked in a human-readable way.

This ontological view is quite an advanced way of thinking about data. Perhaps it’s not surprising, then, that the necessary tools to put these thoughts into practice weren’t available yet at that time. 

Now, however, the internet has matured. The tools are available. And ontology is experiencing a renaissance in computer science.

 

What Is Ontology?

Before we get into the details of how ontology works in the world of data, let’s have a look at what a philosopher would say about it. 

At its core, ontology is the study of what is. To make this a little more concrete, one could also say ontology is the study of what exists or of what is real. “Does God exist?,” “Are my feelings real?”, “What is ‘nothing,’ and does it exist?” are all examples of ontological questions. 

Those are questions you might be asking when it’s particularly late at night or when you’ve had an exceptionally tough day. But for philosophers and, increasingly, computer scientists, these are everyday questions thanks to ontology. 

Philosophers like to make assumptions in order to explore such questions further. For example, they might assume that God exists. Then they might ask something like, “What is the relationship of God to humans, animals, plants, the ocean and the sky?” The answers to these questions provide information not only about what exists (e.g., God and humans), but also about the relationship between these things (e.g., God gives kindness to humans). 

 

What Is Data Ontology?

See what I did in that sub-header? It’s an ontological question about ontology and data! 

Mind blown … But let’s get back to business.

The common ground between ontology in philosophy and in computer science is that it’s an attempt to describe everything that is, i.e., entities, ideas, and events, and all the relations between these things. 

For example, if you searched for “Paris” a decade ago, your favorite search engine spat out a list of links that seemed particularly relevant for your query. Their relevance was determined by the amount of times the word “Paris” was mentioned, the number of backlinks to these sites, and a bunch of other criteria that SEO experts can explain much better than I can.

Fast-forward to today: If you tap in “Paris” now, your search machine recognizes that it is a city — and knows what a city is — and will propose data points pertaining to cities, like demographics, districts, and so on. It might also propose train lines that bring you to Paris because trains are things that exist for ontologists, and because your relationship to Paris might be wanting to visit.

This is ontology in action.

More in Data ScienceWhat Is a Data Set?

 

How Ontologies Make Data Easier to Use

There are, of course, many other ways to organize your data. These include vocabularies, taxonomies, thesauri, topic maps, logical models and relational databases. These come with the added benefit that you don’t need to know anything about philosophy to understand them. 

What makes ontologies special is how flexible they are. If you wanted to change a property in a relational database from an integer to a floating-point number, you would have to delete the entire column for that property and recreate it using the new property. In the worst case, you’d have to recreate your whole data set because you can’t always add new columns in relational databases. It’s a mess!

With an ontology, however, changing a property is as easy as changing the semantic concept that underpins that property. That might sound complicated, but in practice it’s as easy as redefining the column that holds this property. The original data set doesn’t get lost, nor do any links or indices that deal with it. 

 

Data Ontology Example

To give you a concrete example, let’s say you have a data set of contracts. If you knew nothing about ontology, you might put all data points about your contracts in a table. This table might contain columns like “Contract owner,” “Coverage,” and “Confidentiality.” The problem is that, if you want to change one of these columns or add a new column later on, you’d have to recreate the whole table to make sure that all entries are in the right format. Also, some contracts might need certain columns while others do not. But filling all the columns for all contracts is a huge time-waster.

With an ontology about contracts, on the other hand, you might have classes like “Business contract” or “Tenancy contract,” which each have properties of their own. Think of it more like a tree diagram than a rigid table. If you want to add a new property, it’s as easy as adding a branch in the appropriate place. You might want to add “Student tenancy agreement” as a branch under “Tenancy contract.” It doesn’t make sense, though, to add this as a whole column to all sorts of contracts because something like a business NDA has absolutely nothing to do with a student tenancy agreement.

Ontologies are also extremely useful for machine learning. It’s tough even for a large language model to understand all of these things: that “Paris” is a city, and as such has certain properties, and that you’re not in this city but might want to be, and that it should, therefore, propose some appropriate train lines to you. All this information is directly fed to the machine learning model if you’re using an ontology. This way, the model can focus its capabilities on proposing the best train lines and tourist venues to you.

 

Ontology Modeling and the Semantic Web

Ontologies are one of the building blocks of the semantic web. This is basically a fancy word for expressing the wish that the web should be human-readable and work with linked data, rather than a scattered mess of https URLs pointing to one another. For example, if you search for “Paris,” you won’t just get a list of links to pages that mention the word “Paris” a lot, but you’ll get pertinent information about the city, its inhabitants, and ways to go there.

What Is the Semantic Web?

The semantic web reflects the idea that the web should be human-readable and work with linked data, rather than a scattered mess of https URLs pointing to one another. Data ontology is a key part of bringing this new, improved web to life.

With the semantic web, distributed and heterogeneous databases can interact with one another because they speak a common language. Distributed, in this context, means that the databases live on many different servers. Heterogeneous means that they might vary in terms of their architecture, data formats, and so on. What these databases have in common, though, is that they all know what a person or a city is, for example. So a database about the largest cities in the world can be matched with the database of the richest people in the world without having to deal with too many technical fiddles. And you’ll know which major city has the most rich people in the blink of an eye if that’s what you queried.

Put in more fancy technical terms, ontologies and the semantic web ensure interoperability, cross-database search and smooth knowledge management. Interoperability means that the databases are able to work with one another. Cross-database search means that you can search for results in several databases at once and infer logical conclusions from them. Finally, smooth knowledge management means that information is stored and used in ways that are straightforward and user-friendly. Think about how your search results and social media feeds might have changed in the last few years. The semantic web is well on its way!

 

Practical Applications of Ontology

We’ve talked about web search a bit, but the scope of ontology goes much further than that. In the pharmaceutical industry, AstraZeneca has used ontology to test its early hypotheses. They build a large data set following ontological principles (i.e., different things like proteins, genes, and diseases exist, and they have certain relationships to one another). This data set was accompanied by a user interface so that researchers at AstraZeneca could explore all things and their relationships before starting any drug development.

In another use case, health records were organized in an ontological way to help people make better food choices. And in a further case, financial data was used to uncover financial crime

What all these applications have in common is that they’re quite user-centric and based on real-world problems. The internet is becoming less about “Hey, why is this URL broken?” and more about “Hi internet, I have a problem, can you solve it for me?”

More in Machine LearningIs Artificial General Intelligence (AGI) Possible?

 

The Age of Ontology

Ontology is sometimes marketed as the next big thing in data science. In reality, it’s an age-old discipline. The idea of using it for data is certainly disruptive; however, it’s already being deployed on data wherever you look.

The question you’re asking now should no longer be “What is ontology and why do I need it?” but rather “Why does my company not work with ontology yet?” 

You don’t have to pick up a philosophy book to answer that question. You might want to take a critical look at how your company is doing this today, however, and reflect on how ontology might improve its current business processes.

You shouldn’t be jumping on every buzzword and every fad in tech (remember NFTs?). You should, however, respect age-old principles when they revolutionize the way we tech. 

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us