When you use machine learning, you aren’t just optimizing models and streamlining business. You’re governing.
In effect, ML models embody and implement policies that control access to opportunities and resources, such as credit, employment, housing — and even freedom, when it comes to arrest-prediction models that inform parole and sentencing.
Insurance risk models determine what each policyholder must pay, and targeted marketing determines who gains discounts, exclusive deals and even the awareness of certain financial products.
When ML acts as the gatekeeper to these opportunities, it can perpetuate or magnify social injustice, adversely affecting underprivileged groups by undeservingly denying access disproportionately often. Here are four ways in which that can happen, among others.
How Does Machine Learning Perpetuate Bias?
- Machine learning models can use race or national origin as input data, so protected class status informs its decisions.
- Models can unfairly deny access or opportunities to one group more than another.
- Lack of representation in model training means models work poorly for underrepresented groups.
- Models can predict sensitive information, which people can use to directly discriminate against those groups.
1. Discriminatory Models
Models that take a protected class such as race or national origin as an input so that their decisions are directly based in part on that class. These models discriminate explicitly, doing so more visibly and detectably than a person who discriminates but keeps private the basis for their decisions.
For example, such a model could penalize a Black person for being Black. Although outlawed in some contexts and relatively uncommon so far, some decorated experts in ML ethics loudly advocate for allowing protected classes as model inputs.
2. Machine Bias
Unequal false-positive rates between groups, which means the model incorrectly denies approval for or access to opportunities to one group more often than another. This can and often does occur even if the model is not discriminatory (per above), since a model can employ other, unprotected input variables as proxies for a protected class.
For example, ProPublica famously exposed a rearrest-prediction model that wrongly jails Black defendants more often than white defendants.
3. The Coded Gaze
When a group is underrepresented in the training data, the resulting model won’t work as well for members of that group. This results in exclusionary experiences, such as when a facial recognition system fails for Black people more often than for people of other races. Also known as representation bias, this phenomenon can also occur for speech recognition.
4. Inferring Sensitive Attributes
A model’s predictions can reveal group membership, such as sexual orientation, whether someone is pregnant, whether they’ll quit their job or whether they’re going to die.
Researchers have shown that it is possible to predict race based on Facebook likes, and officials in China use facial recognition to identify and track the Uighurs, a minority ethnic group systematically oppressed by the government. In these cases, sensitive information about an individual is derived from otherwise innocuous data.
Define Standards and Take a Stand
The question to always ask is, “For whom will this fail?” says Cathy O’Neil, author of Weapons of Math Destruction and one of the most visible activists in ML ethics. This fundamental question conjures the four issues above and many others as well. It’s an ardent call to action that reminds us to pursue ethical considerations as an exercise in empathy.
Only proactive leaders can meet these ethical challenges. Companies using ML are mostly frozen by the cosmetics demanded by corporate public relations. It’s often only to posture when firms call for ML deployment to be “fair, unbiased, accountable and responsible.”
These are vague platitudes that don’t alone guide concrete action. Declaring them, corporations perform ethics theater, protecting their public image rather than protecting the public. Rarely will you hear a firm come down explicitly on one side or the other for any of the four issues I listed above, for example.
O’Neil has taken on the indifference to these and other issues with another weapon: shame. She advocates for shaming as a means to battle corporations that deploy analytics irresponsibly. Her more recent book, The Shame Machine, takes on “predatory corporations” while criticizing shame that punches down rather than up.
The fear of shame delivers clients for her model-auditing consulting practice. “People hire me to look into their algorithms,” says O’Neil. “Usually, to be honest, the reason they do that is because they got in trouble, because they’re embarrassed . . . or sometimes it’s like, ‘We don’t want to be accused of that and we think that this is high-risk.’”
But I would invite you to also consider a higher ideal: Do good rather than avoid bad. Instead of dodging shame, make efforts to improve equality. Take on the setting of ethical ML standards as a form of social activism.
To this end, define standards that take a stand rather than only conveying vague platitudes. For starters, I advocate for the following standards, which I consider necessary but not sufficient: Prohibit discriminatory models, balance the false-positive rates across protected groups, deliver on a person’s right to explanation for algorithmic decisions — at least in the public sector — and diversify analytics teams.
Fight Injustice With Machine Learning
Your role is critical. As someone involved in initiatives to deploy ML, you have a powerful, influential voice — one that is quite possibly much more potent than you realize.
You are one of a relatively small number who will mold and set the trajectory for systems that automatically dictate the rights and resources to which great numbers of consumers and citizens gain access.
Allan Sammy, director of data science and audit analytics at Canada Post, put it this way: “A decision made by an organization’s analytic model is a decision made by that entity’s senior management team.”
ML can help rather than hurt. Its widening adoption provides an unprecedented new opportunity to actively fight injustice rather than perpetuate it. When a model shows the potential to disproportionately affect a protected group adversely, it has put the issue on the table and under a spotlight by quantifying it.
The analytics then provide quantitative options to tackle injustice by adjusting for it. And the very same operational framework to automate or support decisions with ML can be leveraged to deploy models adjusted to improve social justice.
As you work to get ML successfully deployed, make sure you’re putting this powerful technology to good use. If you optimize only for a single objective such as improved profit, there will be fallout and dire ramifications.
But if you adopt humanistic objectives as well, science can help you achieve them. O’Neil sees this, too: “Theoretically, we could make things more fair. We could choose values that we aspire to and embed them in code. We could do that. That’s the most exciting thing, I think, about the future of data science.”
Over the last decade, I have spent a considerable portion of my work on ML ethics. For a more in-depth dive, such as a visual explanation of machine bias, a call against models that explicitly discriminate, and more details regarding the standards I propose, see my writing and videos here.
This article is excerpted from the book The AI Playbook: Mastering the Rare Art of Machine Learning Deployment with permission from the publisher, MIT Press.