Machine Learning’s Future Depends on Privacy. These 11 Startups Are Working on It.
Call it the clinician’s paradox.
Privacy regulations do the good work of keeping patients’ personal medical data secure, but at the same time, they also hamstring sharing of data, which might otherwise help drive important clinical or genomic research.
A similar dynamic plays out in finance, according to Alon Kaufman, CEO and co-founder of Duality Technologies. Authorities recommend banks and financial institutions work together to ward off financial criminal wrongdoing, but data privacy laws also restrict sharing in this context, “making such collaborative investigations impossible,” he told Built In via email.
Few of us are arguing for less data privacy, to be sure. But the need to securely share and analyze sensitive data has only become greater amid the pandemic, Kaufman added.
VCs are taking notice too. At a recent virtual conference held by privacy-focused AI community OpenMined, panelists noted that venture investment in privacy and security keeps growing year-over-year, with 2019’s total hitting $10 billion.
Factors driving investment include fears of GDPR or CCPA sanctions, concerns over possible data breaches and subsequent brand corrosion, and an increasing awareness of data governance at the C-suite level, said Salesforce Ventures investor Jackson Cummings in the presentation.
Kaufman’s Duality Technologies focuses on a technique called homomorphic encryption, sometimes called the “holy grail” of cryptography. Other notable methods include differential privacy, synthetic data and federated learning. Given that privacy tech is having its close-up, we decided to survey the companies doing notable work within each category.
Here’s how we’ve defined this most hallowed cryptographic technique in previous coverage: “In the simplest terms, homomorphic encryption (HE) allows computation to be performed on encrypted data, including in cloud environments, and produce an encrypted result, which can then be decrypted, with the end result being the same as if you did math on unencrypted data.”
Cryptographers have long grasped the power of fully homomorphic encryption, but its computational overhead was far too high for it to be of practical use. But researchers keep chipping away at that overhead. And while HE today only makes sense for certain narrow use cases, the potential for more widespread applicability down the road has investors watching closely.
Duality has made a name for itself in the healthcare and financial industries, demonstrating how HE can facilitate genomic research on encrypted genetic data — Duality researchers contributed to this study — and help facilitate cooperation on financial crime investigations among different parties and across borders. The company was also contracted in July by the Defense Advanced Research Projects Agency (DARPA) to explore how HE could be applied to machine learning analysis that investigates potential genetic susceptibilities to severe COVID-19 symptoms.
Total funding: $20 million
Founded by a former National Security Agency researcher with expertise in encrypted search, Enveil offers an HE-leveraging product suite dubbed ZeroReveal, which promises some search capabilities and machine-learning analysis on data at rest and in transit, even if encrypted. The company has attracted investments from major finance and credit outfits (Mastercard, Capital One Growth Ventures) plus In-Q-Tel, the venture capital wing of the Central Intelligence Agency.
Total funding: $15 million
Put simply, differential privacy involves intentionally injecting noise into a data set. That helps anonymize data, but in a manner that still allows for statistical analysis. It hinges on a metric called a privacy budget. But the better the budget, the more data required. That’s why most of the early commercial deployments in differential privacy were done by the likes of Apple and Google. But as research matures, non-FAANG firms are commanding more and more attention too.
Bay Area-based LeapYear, founded in 2014, has deployed its platform — which uses differential privacy to allow machine-learning analysis without revealing personal identifying information — in some 1,000 organizations in healthcare, finance and insurance, according to the company. The company counts Aaron Roth, co-author of The Ethical Algorithm and a leading expert on differential privacy, as an advisor.
Total funding: $38.2 million
This well-funded U.K. company offers differentially private approaches to regulation compliance, de-identification and, for financial institutions, anti-money laundering compliance. Investors include HSBC Venture Capital Coverage Group and Citigroup. Accel led a $40 million Series B round last year.
Total funding: $150.5 million
Headed up by the man who spearheaded Spotify’s Discover Weekly function, this news recommendation app combines machine-learning algorithms with human-in-the-loop editorial curation to push back against echo-chamber media consumption. It also foregrounds privacy. The recommender system is fed by differentially private analysis of the collective user behavior — all data stays on user devices. Things have quieted at Canopy since it was acquired by CNN in April, but operations will reportedly relaunch after the pandemic subsides.
Total funding: $4.5 million
Synthetic data is basically what it sounds like — when data is either too scarce or sensitive, build your own and mix it with some of the real stuff to train a model. Limited data access, privacy protections, lack of quality data and the time and financial burden of data annotation can all make synthetic data an attractive component when building models.
There are a few different ways to produce synthetic data, but it often involves a generative adversarial network (GAN). That’s a neural network in which two networks work in tandem to simultaneously distinguish real from fake data, then improve the ability to generate more (and better) synthetic data. The techniques have rapidly improved in recent years across many forms of data — tabular, text and images — and advances seem to be coming by the week.
Synthetic data sometimes works hand-in-hand with differential privacy, which essentially describes Hazy’s approach. Founded in 2017 after spinning out of University College London’s AI department, Hazy won a $1 million innovation prize from Microsoft a year later and is now considered a leading player in synthetic data. Notable clients include Accenture, which has used Hazy-generated data to verify and train finance models.
Total funding: $6.8 million
This Vienna-based synthetic data startup has worked with noteworthy clients like Erste Group and Microsoft. The company’s blog also provides a newcomer-friendly overview of the challenges and advantages of synthetic data, including a recent five-part series on methods to ensure fairness in synthetic-data generation.
Total funding: $6.1 million
For companies trading in synthetic image data, the setup often involves the intersection of artificial intelligence with video game-like, procedurally generated FX worlds built in platforms like Unreal Engine. That’s effectively how AI Reverie works, building environments from synthetic, self-annotating image data and, if needed, optimizing with neural networks. Use cases range from smart stores to identifying rare planes.
Total funding: $10 million
In federated learning, an algorithm is trained across multiple decentralized servers, then aggregated into a more robust composite algorithm, all while keeping the original training data separate. (The privacy technique was pioneered by Google in 2016 as a way to allow mobile devices to collectively learn prediction models without moving data to a centralized server, but its implications extend well beyond the mobile market.) As one analogy goes, think of the model as a grazing sheep and data as distinct fields of grass — it makes more sense to bring the sheep to the grass, rather than vice versa. As Morning Brew recently noted, research in the field is skyrocketing — the number of papers mentioning FL has risen from 180 in 2018 to 965 in 2019 to 1,050 in just the first half of 2020.
Much of federated learning’s promise intersects with healthcare — no surprise, given HIPAA’s thorough privacy protections — which is where New York- and Paris-headquartered Owkin focuses. The company — founded by a former hematology/oncology professor and an artificial intelligence researcher — has spearheaded federated-learning solutions for medical research that also integrate with NVIDIA’s prominent FL models. Owkin recently helped complete the first FL run for a consortium of drug development companies.
Total funding: $74.1 million
This Spain-based startup works in several AI lanes, including recommendation engines and voice assistants. (The co-founder of Siri is a strategic advisor.) It also offers a privacy-preserving framework for machine learning that’s built on differential privacy and federated learning. The company’s founder, Xabi Uribe-Etxebarria, is a veteran of MIT Technology Review’s under-35 list and is working on a Hippocratic Oath for AI alongside Rafael Yuste, a veteran of the Obama administration’s BRAIN Initiative.
Total funding: $19 million
Secure AI Labs (SAIL)
Co-founded in 2017 by an MIT computer science professor and an MIT Media Labs veteran through the school’s delta v accelerator, this Cambridge startup has built a federated network that allows research models to be trained on SAIL algorithms without moving data out of its storage environment. A white paper drafted last year by a SAIL co-founder and researchers at Novartis, a pharmaceutical company, showed comparable microbiome test results between traditional, open research environments and SAIL’s platform, which uses federated learning together with differential privacy and secure enclaves.
Total funding: Undisclosed