Bioinformatics is a field that uses computers to store, process and analyze DNA and other biological data. This information is used to help identify the causes of disease, develop new medicines and deepen our understanding of the human body, other organisms and the world around us.
What Is Bioinformatics?
Bioinformatics is an interdisciplinary field combining biology, statistics and computer science. Powered by high-powered computers and complex algorithms, it involves analyzing DNA, RNA and protein sequences to identify meaningful patterns. Bioinformatics has many use cases, but it’s uniquely effective in understanding, diagnosing and treating medical conditions.
Bioinformatics is often likened to the Rosetta stone, a tablet that helped scholars decode Egyptian hieroglyphics using ancient Greek. Similarly, bioinformatics gives us the ability to translate the hidden language embedded in the molecules of every living organism.
What Is Bioinformatics?
Bioinformatics sits at the intersection of biology, statistics, chemistry and computer science. It uses high-performance computers and algorithms to analyze life sciences data, such as DNA sequences and protein structures, obtained through genomic sequencing. This computational analysis is at the heart of innovation in the biotech industry.
The field of bioinformatics has been around for decades, but it really started to accelerate with the development of the Human Genome Project, an international research project conducted from 1990 to 2003 that mapped all of the genes in the human body. Today, bioinformaticians can search these vast troves of genetic data to find similarities and differences in DNA, RNA or protein sequences, which can be used to identify genetic mutations, understand how genes are expressed into proteins and determine how those proteins function in cellular processes. By comparing the genetic makeup of people with a specific type of cancer against healthy individuals, for example, researchers can identify biological indicators (or biomarkers) that correlate with that cancer. They can then further study that correlation and develop more effective treatments targeting those genes or proteins.
“Once we have the sequencing information — like protein or DNA — we as bioinformaticians try to analyze them and find the patterns,” Jung Lee, an associate professor of chemical and biomolecular engineering at Milwaukee School of Engineering, told Built In. “We are trying to correlate this hidden information with known medical conditions.”
In addition to the human genome, researchers have sequenced the genomes of mice and many other organisms, giving researchers a better understanding of how they’ve evolved over time. By comparing the genetic data of humans and other organisms, we can also learn which genes contribute to our commonalities — and how differing genes explain the characteristics that make us human.
Brief History of Bioinformatics
The history of bioinformatics can be traced back to the early 1960s, when Margarett Dayhoff and Robert S. Ledley developed the first computer program to compare protein sequences. In 1965, Dayhoff and Richard Eck created the Atlas of Protein Sequence and Structure, the first biological sequence database.
The term “bioinformatics” wasn’t coined until 1970, when researchers Paulien Hogeweg and Ben Hesper set out to “study the informatic processes in biotic systems.” In 1977, Frederick Sanger developed “Sanger sequencing,” a technique for determining the order of nucleotides in a DNA strand that was later used to obtain the first complete DNA genome. In the 1980s, Applied Biosystems introduced sequencing machines that automated Sanger sequencing.
In 1982, the National Institutes of Health established GenBank, a public database of nucleic acid sequences. As of 2025, the database has grown to more than 34 trillion base pairs from more than 4.7 billion nucleotide sequences for 581,000 formally described species.
The most important event in the history of bioinformatics was the Human Genome Project, which began in 1990. The international effort identified, mapped and sequenced 92 percent of the human genome over a 13-year period. With the advent of new sequencing technologies, researchers were able to reach 100 percent completion in 2022.
In the early 1990s, sequencing databases were published online, allowing researchers to search and compare nucleic acid sequences using the Basic Local Alignment Search Tool (BLAST).
Sequencing technology has evolved over the years, creating faster, cheaper and more accurate genetic data for bioinformaticians to process and analyze. Today’s machines can process hundreds of thousands of DNA fragments at the same time through a process known as “next-generation sequencing,” typically piecing together DNA fragments with roughly 200 to 400 nucleotide base pairs. And this technology is becoming more precise through long-read sequencing techniques, which are able to read much longer stretches of DNA — at times more than 10,000 base pairs — thus reducing the likelihood of valuable genetic information ending up on the cutting room floor.
Common Use Cases for Bioinformatics
Disease Research
By studying the DNA and protein sequences of people with medical conditions, bioinformaticians can identify gene mutations that may correlate with the disease. Once doctors are aware of the biomarkers associated with known medical conditions, they can assess a patient’s risk for developing a disease — and diagnose it earlier in its progression. Biomarkers can also determine whether a bacteria is resistant to antibiotics, helping doctors make more effective treatment decisions.
Drug Discovery
By identifying the genetic processes that cause certain diseases, researchers can develop drugs that target those proteins. In addition to identifying biomarkers, bioinformatics can conduct experiments “in silico,” or on a computer, to simulate how a drug might interact with a protein. This helps predict the drug’s efficacy and potential side effects. According to Lee, this approach can streamline the long and costly process of drug discovery, which can take upwards of 20 years to complete. While bioinformatics will never replace lab work, it enables lab researchers to be more strategic in their experiments.
“Now, instead of 20 years, it may shorten to 10 years or less,” Lee said. “That is a big contribution.”
Management of Infectious Disease Outbreaks
Bioinformatics is used to identify viruses and bacteria, track how they mutate over time and detect whether the pathogen is resistant to antibiotics. This was especially helpful in tracking the spread and mutation of the Covid-19 virus, enabling public health officials to identify the structure of the virus, trace its origin and develop therapies and vaccines. Bioinformatics allowed researchers to move much faster than traditional lab experiments alone. The real-time reporting of variants was particularly helpful in controlling the spread of the virus and saving lives.
Evolutionary Biology Research
Bioinformatics is also helpful in answering questions about how humans and other species evolved, as well as how different species may be related to each other. Through a process called phylogenetics, bioinformaticians aim to trace the lineage of various species back in time to learn about their origin and evolutions. This analysis is portrayed on a phylogenetic tree, which is similar to a family genealogy tree.
“We’re trying to put every single organism living on the planet into that tree,” Lee said. “That means we basically know how every individual organism is interrelated to other organisms.”
NASA’s Tree of Life project, for example, aims to understand the origins of life on Earth and the possibility of life beyond Earth using environmental data from outer space.
Benefits of Bioinformatics
Faster, More Powerful Data Analysis
Sequencing technologies have come a long way since the manual process of stitching pieces of paper together. The Human Genome Project, which used specialized computers and software to augment its Sanger sequencing, took 13 years and roughly $2.7 billion to sequence a human genome, with the vast majority of it sequenced in a two-year span. While Sanger sequencing is still widely used, researchers can now sequence a genome in one day for $600 using next-generation sequencing. Long-read sequencing, a type of next-generation sequencing, has sequenced a person’s genome in just five hours.
Next-generation sequencing brings more throughput to genomic sequencing, which creates exponentially more data to process and analyze. Bioinformatics is necessary to process all of this data in a fast, efficient manner. Without bioinformatics, Lee said, researchers never would have been able to discover biomarkers, which play an invaluable role in predicting, diagnosing and treating diseases.
More Collaboration Through Public Databases
Bioinformatics is a field centered around public databases that allow researchers to share their findings with one another. By sharing information in databases like GenBank, the Protein Data Bank and The Cancer Genome Atlas, researchers are able to easily access information and build on each other’s findings. This collaborative spirit has accelerated scientific innovation, leading to innovative research that will advance our collective understanding of the human body, other organisms and our environment overall.
More Precise Medical Diagnostics and Treatment
As genomic sequencing becomes faster and cheaper, we could soon reach a point where a patient’s genetic data is used to diagnose and treat their condition. This practice, known as personalized medicine, is already being used to inform the treatment of patients with rare diseases.
But before we start fine-tuning therapies to an individual’s unique DNA, we need to generate human genome reference sequences that reflect the population’s diversity, Lee said. Seventy percent of the human genome reference sequence came from one person, he explained, which is not a representative sample of the broad spectrum of genetic differences in the world’s population.
And for personalized medicine to scale to the individual level, Lee said we need to address limitations, such as data quality and the lack of trained bioinformaticians to interpret the data.
“Personalized medicine is a really good idea; It may be the ultimate goal of bioinformatics,” Lee said. ”But in reality, in terms of implementation of bioinformatics, we have some things to overcome.”
Challenges of Bioinformatics
Data Formatting Challenges
While bioinformaticians have a large amount of data at their fingertips, Lee said there are data quality challenges that hinder the analysis of that data. Most of the data is stored as a text file, he said, and it’s time-consuming for computers to conduct text-based analysis. The data can also be redundant, inconsistent or out of sync with standard naming conventions.
“Non-standard data formats make the progress really slow because we have to clean it up,” Lee said. “It’s a really limiting factor.”
Privacy Concerns
Because it relies on large public databases with personal health information, bioinformatics raises some privacy concerns. This data is almost always anonymized, so it’s not subject to data privacy laws like Health Insurance Portability and Accountability Act (HIPAA) and Europe’s General Data Protection Regulation (GDPR). Anonymized data doesn’t always remain anonymous, though, as it’s possible to re-identify people by cross-referencing data from genomic databases against other public databases.
Shortage of Interdisciplinary Talent
Bioinformaticians need to have the computer science and data analytics skills to manage and process data, but they also need to know how to make sense of that data. Not many people are trained in both, which limits the field’s potential.
Lee, who teaches classes on bioinformatic systems, said his computer science and biology students have trouble communicating with each other because they aren’t familiar with the fundamentals of each other’s disciplines. Even when looking at the work of early bioinformaticians trained in computer science, Lee said the data is organized in a way that doesn’t always make sense from a biology perspective.
But now that artificial intelligence has made programming easier, Lee said he hopes more people will specialize in bioinformatics.
“We need to train people specifically in bioinformatics,” Lee said. “Then they can analyze the information, pinpoint differences and explain how they are related to the medical conditions of each individual person. That will move us toward personalized care. That’s my hope.”
Bioinformatics and Artificial Intelligence
Bioinformatics is undergoing a major transformation thanks to artificial intelligence. The explosive growth of biological data — fueled by high-throughput sequencing and other technological advancements — is leaving researchers to parse through unprecedented amounts of information. Neural networks, random forests and other machine learning algorithms are now essential tools in solving this challenge, helping researchers to do their work faster and more accurately — whether it’s decoding genetic information to accelerate drug development or improving disease diagnostics.
In short: AI isn’t just enhancing bioinformatics, it’s reshaping the entire field. By integrating automation tools and algorithms into their workflows, bioinformaticians can accelerate their research and sharpen their data analyses, helping them to develop a deeper understanding of biological processes as a whole.
Still, the introduction of this technology raises some important challenges, particularly as it relates to data privacy, algorithmic bias and the potential for error. Ensuring AI is applied responsibly and ethically will be key to realizing its full potential in this space.
Frequently Asked Questions
What is bioinformatics used for?
Bioinformatics is used to identify diseases, develop drugs and tailor medical treatments to individuals. It’s also used in non-medical fields, like developing drought-resistant crops and furthering our understanding of evolutionary biology.
Is bioinformatics the same as computational biology?
Bioinformatics and computational biology are closely related fields at the intersection of biology, mathematics and computer science — but they are not the same thing. Bioinformatics analyzes biological datasets like DNA sequences, whereas computational biology is more theoretical, analyzing biological systems and processes, such as protein folding and neuron signaling.
What is the difference between genomics and bioinformatics?
Genomics is the study of genes, how they interact with each other and how they influence biological processes. Bioinformatics is the use of high-powered computers, algorithms and software tools to analyze genomic data.
