A single person’s genome consists of three billion nucleotides. To find patterns that demonstrate evidence of evolutionary pressures, Nature reports that more geneticists are leveraging machine learning, specifically deep learning algorithms.
“Machine learning is automating the ability to make evolutionary inferences, Andrew Kern, a population geneticist at the University of Oregon in Eugene, told Nature. “There is no question that it is moving things forward.”
For example, ‘DeepSweep,’ developed by researchers at the Broad Institute of MIT and Harvard in Cambridge, Massachusetts, is a deep-learning tool that has helped identify 20,000 single nucleotides as simple mutations worth a closer look. Researchers recently reported these findings at the annual meeting of the American Society of Human Genetics in San Diego, California, according to Nature.
Mathematical models have aided genetic research since the 1970s, but deep learning is helping scientists spot patterns in the genome that might have flown under the radar due to its strengths in analyzing massive amounts of data.
Since algorithms need training to successfully classify, geneticists use simulated data as a workaround to not yet knowing which parts of the genome bear signs of natural selection.
“We don’t have ground truth data, so the worry is that we may not be simulating properly,” Sohini Ramachandran, a population geneticist at Brown University in Providence, Rhode Island, told Nature.
“These are incredibly powerful methods for looking for the signals of natural selection. Some people didn’t think you could pinpoint variants when I started. Some thought it was impossible.”
Deep-learning algorithms are notoriously inscrutable, making their results difficult to back-solve. “If the simulation is wrong, it’s not clear what the response means,” explained Philipp Messer, a population geneticist at Cornell University in Ithaca, New York, to Nature.
Meanwhile, deep-learning algorithms are also being used to find signs of adaptation in genomes, including flagged variations near genes thought to influence metabolism within the Khomani San ethnic group of southern Africa.
“These are incredibly powerful methods for looking for the signals of natural selection,” said Pardis Sabeti, a computational geneticist at the Broad Institute, to Nature. “Some people didn’t think you could pinpoint variants when I started. Some thought it was impossible.”