How Scientists Are Using the Crowd to Solve Problems in Biotech
As the coronavirus pandemic began to force people to stay home, many of us asked the same question: What can we do to help defeat this virus? Hundreds of thousands of people answered that question by joining a crowd-sourced science effort to find drugs against the new coronavirus, an effort that is part of a project called [email protected]. [email protected], founded by Stanford University engineering professor Vijay Pande and directed in part by my Washington University colleague Greg Bowman, has used crowdsourcing to build the world’s fastest supercomputer. The project uses the CPUs of the computers of volunteers all over the world to resource intensive simulations of biomolecules, including the protein used by the coronavirus to force its way into human cells.
[email protected] splits up big computational problems into small packages, and parcels them out to the ordinary desktop computers of its volunteer network. Through distributed computing, the project’s scientists can efficiently run physical simulations of biomolecules that would otherwise take years to finish. This spring, as a result of the pandemic, [email protected] gained thousands of new users, which let the project break the exaFLOP barrier, something not yet achieved by the world’s fastest supercomputers.
[email protected] is just one example of crowdsourced science, an approach that’s becoming increasingly popular in the life science and biotech space. In crowdsourced science, volunteers often share not just their CPUs, but also their brains, which are still much better at solving certain kinds of problems than today’s most sophisticated machine learning and artificial intelligence. Humans’ capacity to naturally pick out patterns is powerful, and it is why Google uses CAPTCHAs — we’re much better at picking out fire hydrants and crosswalks than spambots.
Many crowdsourced biotech problems rely on our innate pattern recognition ability. Humans are much better than computers at picking out certain patterns in genome data, at least when the data is represented in a way that seems natural to us. Human geneticists are interested in cataloging genome rearrangements, called structural variants or SVs — portions of the DNA sequence that are copied, flipped, or deleted, and which in some cases cause genetic diseases. Because it takes several kinds of data to reliably label an SV, human expert curators have performed better than the best machine learning algorithms. But there are many potential SVs and not enough expert curators to sort through them all.
A team led out of the U.S. National Institute of Standards and Technology is trying to solve the SV problem by crowdsourcing it. They developed a web platform, called SVCurator, which visualizes the data in a way that’s easy for humans to parse. SVCurator shows a volunteer an example and then asks some specific multiple-choice questions about whether the example looks like an SV. The research team tested their platform with 136 volunteers, and they obtained reliable answers for almost 1,000 SVs.
Something else our brains are naturally good at is solving puzzles, especially puzzles with many parts and relatively few constraints. And thus one effective strategy for crowdsourcing science is to turn a problem into a puzzle or game. Stanford biophysicist Rhiju Das has done this for the problem of RNA design. He’s interested in the problem because RNA molecules have a wide range of biotech applications, from CRISPR gene editing to diagnostic devices and even therapeutics. RNAs are naturally linear molecules, and the challenge is to design their sequence so that it folds into a specific three-dimensional shape that will do the job in the desired application.
Das and his lab turned the problem of RNA design into a puzzle-solving game called EteRNA that runs on your smartphone. Players are given 3D RNA shapes that they need to achieve. They play the game by making a series of moves that change the RNA sequence to make it fold up, according to the rules of the game (which are a stand-in for the laws of physics). As players get better, they’re given more challenging puzzles. To reward the best players, Das tests their RNA designs in the lab, to verify that the real, physical molecule does achieve the right shape.
Remarkably, EteRNA’s human players have come up with entirely new sets of puzzle-solving moves that were missed when Das first tried to solve the RNA design problem with machine learning. This demonstrates the incredible power of human intuition. Das has now flipped the problem and used the human intuition of his thousands of players to build a better machine learning algorithm. The algorithm, called EteRNABrain, was trained on a database of nearly two million human player moves, and it performs better than competing RNA design algorithms. Das’ strategy is similar to that used by the Google DeepMind team that built AlphaGo, the first AI to defeat a professional human player at the game of Go. EteRNABrain shows that crowd-sourcing and machine learning are complementary — rather than alternative — ways to handle big data problems.
Crowdsourcing and Open Innovation in the Biotech Industry
These examples show how effective crowdsourcing can be in biotech and data analysis, and they offer a few lessons for how to crowdsource effectively. Big problems should be broken down into clear, well-defined tasks that non-expert volunteers can complete. Online platforms with natural, engaging visualizations can help participants bring their intuition to bear on problem-solving tasks. Validation of the results is important, such as comparing the volunteers’ results against a set of expert-curated examples (the approach used with SVCurator) or testing RNA designs in the lab (as with EteRNA). Resources like the Open Science Framework help take some of the overhead out of crowdsourcing, by offering tools to freely share and discover data.
But crowdsourced science relies on some degree of openness. Companies need to worry about intellectual property and contracts — can crowdsourcing work for businesses as well as it works for academic labs?
One solution is to crowdsource problems to your customers, which is the model used by direct-to-consumer-genetics company 23andMe. To conduct genetic studies, the company draws on its customers, all of whom have had their genomes analyzed. Some of 23andMe’s studies rely on customers’ self-reported health histories and dietary habits. But others take a more innovative approach to build patient cohorts that are rigorously examined by physicians. This year, the company, together with the University of Rochester, described a “virtual cohort” of patients to study the genetic underpinnings of Parkinson’s disease. The virtual cohort idea solves a key problem: Rare genetic variants linked with disease are rare, meaning it’s hard to find and recruit people who carry them. 23andMe, however, can search its database to find customers who have those rare variants, and invite them to participate in the study.
In this case, the company reached out to customers who had consented in advance to be contacted for research, and who agreed to learn about their carrier status for the LRRK2 mutation. 23andMe found several hundred people across 33 states who carried the rare G2019S LRRK2 variant, some of whom reported having Parkinson’s. All carriers of the variant were invited to undergo a rigorous physical exam for Parkinson’s symptoms and to participate in future studies of treatments that could prevent or reverse the symptoms of Parkinson’s.
There are other ways for companies to open their innovation process in a way that’s compatible with their need to protect intellectual property and establish clear contracts. A pair of legal scholars recently proposed to classify different levels of crowdsourcing, or, more broadly, so-called “open innovation.” At one level of open innovation, companies disclose a problem they have and invite outside teams to solve it. To help with the process, other companies exist to help crowdsource problems. Innocentive helps companies pose clear, well-defined challenges that are presented to a network of capable solvers who can win a cash prize. Innocentive helps negotiate all of the contracts in advance, which means that the host company can start using the solution as soon as they get it, without worrying about long contract negotiations.
At the other end of the open innovation spectrum are companies that open their tools, resources and even their science to others. The pharmaceutical company AstraZeneca invites researchers to use what they call a “preclinical toolbox,” which consists of huge libraries of molecular compounds. Labs that are expert in the biology of certain diseases are invited to use the toolbox in basic studies, while AstraZeneca has the chance to use the results in drug development. LEO Pharma, based in Denmark, takes this one step further, letting those who use their technologies keep their intellectual property rights.
And then there are organizations that open everything, and whose main goal is not commercial development. Open Source Pharma is an international organization that calls itself “Linux for drugs.” The group is trying to establish a completely new model of drug development, one that doesn’t involve producing expensive, patent-protected drugs. Open Source Malaria draws on the expertise of anyone who can contribute to fighting one of the world’s deadliest diseases.
Crowdsourcing may seem like an odd way to succeed in industries that rely heavily on the value of intellectual property. Many companies, however, are making use of the growing suite of tools and platforms out there that make crowdsourcing feasible. And it’s clear that — while machine learning and AI are better than ever — human intuition is still one of the most powerful data-processing tools around, for those who know how to harness it.