Data ontology is a way of linking data in various formats based on various concepts. In the early days of the internet, data were linked using HTTP protocols. Nowadays, one can add another layer, an ontology, to define a specific concept, and then automatically link data points that are pertinent to that concept.
What Is Ontology? How Does it Relate to Data?
Ontology is, put simply, the study of what is. In computer science, data ontology is a method for organizing and structuring data based on the relationships between different entities. For example, the semantic web uses data ontologies to understand what people, cities and other concepts are and how they’re related to each other, so it can provide relevant results for user queries.
Ontology dates back to the fourth century B.C.E., when Greek philosopher Aristotle called it the “first philosophy” in his work Metaphysics. In this article, we’ll take the concept out of the philosophy seminar to explore data ontology and how it works in practice.
What Is Ontology?
At its core, ontology is the study of what is. To make this a little more concrete, one could also say ontology is the study of what exists or what is real. “Does God exist?,” “Are my feelings real?”, “What is ‘nothing,’ and does it exist?” are all examples of ontological questions.
Philosophers like to make assumptions in order to explore such questions further. For example, they might assume that God exists. Then they might ask something like, “What is the relationship of God to humans, animals, plants, the ocean and the sky?” The answers to these questions provide information not only about what exists (e.g., God and humans), but also about the relationship between these things (e.g., God gives kindness to humans).
Ontology and Data
In the early 2000s, ontology struck a chord with computer scientists. Veterans like Tim Berners-Lee started advocating for what they called linked data. The idea is that data should not solely exist in the form of hypertext documents and hyperlinks between them. Rather, data should be viewed as what it represents — people, places, events, ideas, activities and so on — and linked in a human-readable way.
This ontological view is quite an advanced way of thinking about data. Perhaps it’s not surprising, then, that the necessary tools to put these thoughts into practice weren’t available yet at that time.
Now, the internet has matured. The tools are available, and ontology is experiencing a renaissance in computer science.
What Is Data Ontology?
The common ground between ontology in philosophy and computer science is that it’s an attempt to describe everything that is — entities, ideas, events — and all the relations between these things.
For example, if you searched for Paris a decade ago, your favorite search engine spat out a list of links that seemed particularly relevant for your query. Their relevance was determined by the amount of times the word Paris was mentioned, the number of backlinks to these sites, and a bunch of other criteria that SEO experts can explain much better than I can.
If you type in Paris now, your search machine recognizes that it is a city — and knows what a city is — and will propose data points pertaining to cities, like demographics, districts and so on. It might also propose train lines that bring you to Paris because trains are things that exist for ontologists, and because your relationship to Paris might be wanting to visit.
How Ontologies Make Data Easier to Use
There are many other ways to organize your data, including vocabularies, taxonomies, thesauri, topic maps, logical models and relational databases. These come with the added benefit that you don’t need to know anything about philosophy to understand them.
What makes ontologies special is how flexible they are. If you wanted to change a property in a relational database from an integer to a floating-point number, you would have to delete the entire column for that property and recreate it using the new property. In the worst case, you’d have to recreate your whole data set because you can’t always add new columns in relational databases.
With an ontology, changing a property is as easy as changing the semantic concept that underpins that property. That might sound complicated, but in practice it’s as easy as redefining the column that holds this property. The original data set doesn’t get lost, nor do any links or indices that deal with it.
Data Ontology Example
Let’s say you have a data set of contracts. If you knew nothing about ontology, you might put all data points about your contracts in a table. This table might contain columns like “Contract owner,” “Coverage,” and “Confidentiality.” The problem is that, if you want to change one of these columns or add a new column later on, you’d have to recreate the whole table to make sure that all entries are in the right format. Some contracts might also need certain columns while others do not. But filling all the columns for all contracts is a huge time-waster.
An ontology about contracts, might have classes like “Business contract” or “Tenancy contract,” which each have properties of their own. Think of it more like a tree diagram than a rigid table. Adding a new property is as easy as adding a branch in the appropriate place. You might want to add “Student tenancy agreement” as a branch under “Tenancy contract.” It doesn’t make sense, though, to add this as a whole column to all contracts because something like a business NDA has nothing to do with a student tenancy agreement.
Ontologies are also extremely useful for machine learning. It’s tough even for a large language model to understand that Paris is a city and has certain properties; that you’re not in this city but might want to be; and that it should, therefore, propose some appropriate train lines to you. All this information is directly fed to the machine learning model if you’re using an ontology. This way, the model can focus its capabilities on proposing the best train lines and tourist venues to you.
Ontology Modeling and the Semantic Web
Ontologies are one of the building blocks of the semantic web, which is a concept that envisions the web as human-readable and working with linked data, rather than being a scattered mess of https URLs pointing to one another. For example, if you search for Paris, you won’t just get a list of links to pages that mention the word Paris a lot, but you’ll get pertinent information about the city, its inhabitants and ways to go there.
With the semantic web, distributed and heterogeneous databases can interact with one another because they speak a common language. Distributed means that the databases live on many different servers. Heterogeneous means that they might vary in terms of their architecture, data formats and so on. What these databases have in common is that they all know what a person or a city is, for example. A database about the largest cities in the world can be matched with the database of the richest people in the world to reveal which major city has the most rich people in the blink of an eye if that’s what you queried.
Ontologies and the semantic web ensure interoperability, cross-database search and smooth knowledge management. Interoperability means that the databases are able to work with one another. Cross-database search means that you can search for results in several databases at once and infer logical conclusions from them. Finally, smooth knowledge management means that information is stored and used in ways that are straightforward and user-friendly.
Real-Life Applications of Ontology
We’ve talked about web search a bit, but the scope of ontology goes much further than that.
Testing Drug Development
In the pharmaceutical industry, AstraZeneca has used ontology to test its early hypotheses. It built a large data set following ontological principles (i.e., different things like proteins, genes, and diseases exist, and they have certain relationships to one another). This data set was accompanied by a user interface so that researchers at AstraZeneca could explore all things and their relationships before starting any drug development.
Organizing Health Data
In another use case, the startup Edamam organized health records to create a comprehensive platform containing food data, ranging from health publications to recipes shared through sources like the New York Times. Edamam partnered with Ontotext to normalize, mine and rearrange the data accordingly, developing an online hub where users could go to make informed decisions about the food they eat.
Tracking Financial Criminal Activity
The Basel Institute on Governance collaborated with Ontotext to detect financial crimes by sifting through data. Employing tools like Ontotext’s graph database and semantic annotation, the Institute analyzed news sources, documents and financial risk sources to understand how accounts and transfers are related to different companies and individuals. This allowed the organization to track down stolen assets shared on an international scale.
Sorting Products on E-Commerce Sites
E-commerce companies can also apply ontologies to design a personalized search engine experience on their sites. For example, an online shoe store might think about what matters most to its customers and organize its search around relevant relationships. Main categories can focus on the type of shoe before being broken down based on characteristics like shoe size, brand and style. This results in a more intuitive site and allows users to quickly find the items they’re looking for, tailoring the search to their needs.
Planning Marketing Campaigns
SAP’s knowledge graph tool can organize a company’s data and map out the various relationships between that data. Marketers can then leverage these connections to inform their marketing strategies. For example, they could gather insights on the buying habits of customers of a single demographic group living in a particular city. Viewing these relationships enables marketers to sharpen their marketing campaigns for a specific target audience.
Importance of Ontology in Today’s Business Environment
Ontology is sometimes marketed as the next big thing in data science. In reality, it’s an age-old discipline that’s already being deployed in data wherever you look. The question you’re asking now should no longer be, “What is ontology and why do I need it?” but rather, “Why does my company not work with ontology yet?”
You don’t have to pick up a philosophy book to answer that question. But you might want to take a critical look at how your company is doing this today, and reflect on how ontology might improve its current business processes.
Don’t jump on every buzzword and every fad in tech — remember NFTs? — but do respect age-old principles when they revolutionize the way we do tech.
Frequently Asked Questions
What is ontology in simple terms?
Ontology is the study of what is or what exists. In data science, data ontology refers to a system of organizing data based on different characteristics of entities and the relationships that link entities to each other.
What is an example of ontology?
An example of a data ontology is a modern-day search engine. When you look up the name of a city, the search engine understands what a city is and its potential relationship to the user. The search engine then compiles relevant information about the city like its population, places to visit and routes to get there. This way, it doesn’t pull up pages that simply mention the name of the city many times.