If you turn to your nearest virtual assistant and ask about its gender, it will most likely tell you it is genderless “like a cacti. And certain species of fish.” It might also tell you that it’s an artificial intelligence, or a “cloud of infinitesimal data computation.”
While technically accurate, their responses may not seem all that truthful. There are some 4.2 billion virtual assistants in use around the world, according to Juniper Research, and none of them have a physical human-like appearance. And yet, if you’re in the United States and you’re asked to assign gender to your virtual assistant, you’d likely say it’s a woman.
From Siri to Alexa to Cortana, most AI-enabled virtual assistants have launched with a feminine-presenting voice and, in many cases, a feminine name. And though many of them have only been around for about a decade or less, they’ve already become ubiquitous, managing billions of tasks a month for people all over the world.
The Gender Bias Problem in AI Voice Assistants
Over the last decade or so, the creators of popular AI voice assistants like Siri, Alexa and Google Assistant have come under fire for their disproportionate use of feminine names and voices, and the harmful gender stereotypes this decision perpetuates. Smart devices with feminine voices have been shown to reinforce commonly held gender biases that women are subservient and tolerant of poor treatment, and can be fertile ground for unchecked verbal abuse and sexual harassment.
Over the years, the companies behind these virtual assistants have received criticism for their disproportionate use of feminine names and voices, and the harmful gender stereotypes it perpetuates. And while some have made efforts to correct the situation, the conversation is ongoing. What started as a cool, hands-free, eyes-free form of human-computer interaction has become a springboard for discussions around gender bias, representation, the influence of design and the role of artificial intelligence in our everyday lives.
“There’s all kinds of underlying ideas about what women are that underpin and shape how women are represented in objects,” Kathleen Richardson, a researcher and professor of ethics and culture of robots and AI at De Montfort University, told Built In. “It’s a big issue that’s not going to be addressed — if ever at all — unless there is a radical transformation in society.”
With Voice Comes Personification
Long before voice assistants were built into smart devices, machines had to first learn how to hear, understand and process human speech.
This technology dates back more than half a century, starting with voice-activated assistants like phone dialer Audrey and calculator Shoebox invented in the 1950s and 1960s. In 1976, Carnegie Mellon launched Harpy, a machine with a 1,011-word vocabulary that could also understand entire phrases. Speech recognition products first entered the consumer market in the 1990s with Dragon Dictate’s NaturallySpeaking software, which could recognize and transcribe spoken words into text, typing at 100 words per minute.
Meet Some of the Forebears of AI Voice Assistants
- Audrey: In 1952, researchers at Bell Laboratories created Audrey, a speech recognition device. While Audrey could technically be used for hands-free dialing, it was not the most convenient contraption. It only understood digits zero through nine if the speaker paused between each, and it had to adapt to each user before it could capture their speech with reasonable accuracy.
- Shoebox: At the 1962 World’s Fair in Seattle, an IBM engineer introduced Shoebox, a voice activated calculator. Shoebox understood 10 digits and six control words (plus, minus, total, subtotal, false and off). It could also calculate and print out answers to basic spoken math problems.
- Harpy: In 1976, researchers at Carnegie Mellon University launched Harpy, a machine that had an impressive vocabulary of more than 1,000 words. It could also understand entire spoken phrases, including when different words started and stopped, thanks to pre-programmed vocabulary, grammar and pronunciation structures.
- Tangora: Tangora was an upgrade of Shoebox, rolled out in 1986. Named after the world’s fastest typist at the time, Albert Tangora, the machine recognized about 20,000 words and processed speech by predicting the most likely result based on what it had interpreted thus far. NaturallySpeaking: Created by Dragon Dictate, NaturallySpeaking could recognize and transcribe natural human speech. It typed words into a digital document at the rate of 100 words per minute. While it was first launched in 1997, a version of NaturallySpeaking is still available for download today.
Source: “A Brief History of Voice Assistants,” The Verge
It wasn’t until the 2010s that modern, AI-enabled voice assistants reached mass popularity — beginning in 2011 with Apple’s launch of Siri, and then followed by Amazon’s Alexa, Google Assistant, Microsoft’s Cortana and others. With the advent of this technology, people could interact with their devices in a whole new way, fostering a kind of dialogue and relationship with them that had been otherwise impossible. As such, the personification of these devices really began to take off.
“I think it’s inherent in our ability to relate to the world that we start to apply these known social constructs in various things,” Andreea Danielescu, the associate director of the Future Technologies R&D group at Accenture Labs, told Built In. “We do it with animals, we do it with animated shapes, we certainly do it with robots and conversational AIs, which arguably seem more human because they’re social interactions oftentimes.”
“You can’t stop the personification process when you give something a voice.”
Indeed, voice assistants have placed themselves in a unique position in society, with research suggesting that users view them as something between human and object. And while the 2010s were marked by the rise of these devices, the 2020s are expected to feature the increased integration of voice-based AI. Juniper Research predicts there will be more than 8 billion voice assistants in use as soon as 2024, making our perception of and relationship with them all the more integral to our day-to-day life.
“You can’t stop the personification process when you give something a voice,” Matthew Aylett, the chief science officer at speech synthesis company CereProc, told Built In. “That’s where the gender thing starts to become important.”
How Did Feminine Voices Become the Default?
When AI voice assistants first hit the market in the 2010s, the large majority of them were launched with a feminine-presenting voice and, in many cases, a feminine name. And while many of the companies responsible have since created more masculine-sounding options, the default (at least in the United States) remains feminine.
But why do some of the most popular AI voice assistants have feminine voices and names? It would be easy to credit — or fault — the lack of diversity in the tech industry. After all, as of 2018, just 12 percent of AI researchers identify as women, according to Wired. And, despite some progress, major tech giants like Google, Amazon, Apple and Microsoft are still a long way off from gender parity.
“If you have relatively homogenous design teams,” Charlotte Webb, the co-founder of non-profit collective Feminist Internet, told Built In, “then it’s less likely that you will have the perspectives and experience of other groups in mind when designing products and services. I think that can be another reason why sometimes technologies are not as emancipatory as they could be.”
But, the most oft-cited reasons for the feminine slant in virtual assistants rests in social science and history — all of which are controversial and widely debated.
A Lack of Data
For many years, the creators of virtual assistants have claimed that their tendency to use a feminine voice stems from lack of data on masculine voices. Feminine voice recordings date back to 1878, when Emma Nutt became the first woman telephone operator. Soon after, the industry became dominated by women, resulting in more than a hundred years of archived women’s audio that can now be used to create and train new forms of voice-automated AI.
This has caused many AI voice assistant creators to turn to a feminine voice as a matter of convenience. For instance, when Google first launched its Google Assistant back in 2016, it opted for a woman’s voice (but a noticeably gender-neutral name), citing a historical bias in its text-to-speech systems, which had been trained primarily on feminine voices. But months after that 2016 launch, Google’s voice researchers teamed up with Alphabet’s AI lab DeepMind to create a new kind of algorithm that not only reduced the amount of voice data needed, but also generated more realistic voices, including a masculine one.
“I think those consumer preferences are very intertwined with deep-seated cultural assumptions about women and the role that they play.”
In general, Aylett said it is just as easy to make a masculine AI voice assistant as it is a feminine one. “We can now build voices with a lot fewer resources than we could in the past. So it’s a lot easier to build different voices with different accents,” he explained. “There’s no reason why we can’t have more male voices.”
Relying on the Wrong Assumptions
Another popular theory for the overrepresentation of feminine voices in AI virtual assistants has to do with biology. Several studies throughout history have indicated that more people tend to prefer listening to feminine voices, with some even theorizing that this preference dates back to when we were all in utero. One belief (that has since been debunked) stated that women tend to articulate vowel sounds more clearly, making them easier to hear when using small speakers over background noise.
By and large, studies indicating these preferences have either been disputed or shown to be flat-out wrong. Yet, AI voice assistants with feminine voices tend to remain the status quo.
“The technology companies who produce [AI voice assistants] do market research, and that market research often says consumers prefer women’s voices. Particularly in contexts where instructions are being given,” Webb said. So, it stands to reason that companies will listen to that so they can sell more products. “I think those consumer preferences are very intertwined with deep-seated cultural assumptions about women and the role that they play, particularly in domestic settings, and particularly in the context of being sort of caregivers or playing some kind of assistive role.”
Meanwhile, Richardson of De Montfort University thinks the feminization of AI voice assistants is much more intentional on companies’ part. While she was doing research for her 2015 book, An Anthropology of Robots and AI: Annihilation Anxiety and Machines, Richardson learned that a lot of these robots’ creators stopped making their robots look “‘like The Terminator” as she put it, and began making them look more cute and child-like — claiming the design choice would make people feel more comfortable around the robots.
“They said, ‘Look, we’ve got this idea of a threat in society. These big, kind of, clunky robot machines. If we represent them as children, people will feel disarmed,’” she said, adding that it’s a matter of making people feel more comfortable with the idea of willingly bringing this technology into their homes. “I suspect the experiment with the female voice was tied very much to acclimatizing people to these devices.”
The Harms of Feminized AI
No matter the reason for how we ended up with so many feminine-presenting AI voice assistants, research indicates that their existence can lead to the proliferation of many harmful gender stereotypes.
Multiple academic studies throughout history indicate that gendered voices (both computer and human) can shape people’s attitudes or perceptions of an individual or situation. And one 2006 study found that gendered computer voices alone are enough to elicit gender-stereotypic behaviors from users, even when there are no other gender cues, such as appearance. This has all been magnified by AI voice assistants, according to a 2019 report by UNESCO, which claims that smart devices with feminine voices reinforce commonly held gender biases that women are “subservient and tolerant of poor treatment.”
“Because Alexa, Cortana, Google Home and Siri are all female exclusively or female by default in most markets, women assume the role of digital attendant, checking the weather, changing the music, placing orders upon command and diligently coming to attention in response to curt greetings like ‘Wake up, Alexa,’” the report read. This use of a feminine voice, it continued, “sends a signal that women are obliging, docile and eager-to-please helpers, available at the touch of a button or with a blunt voice command like ‘hey’ or ‘OK.’ The assistant holds no power of agency beyond what the commander asks of it.”
“When you’re thinking about power dynamics and culture, usually we see these voice assistants as not having the same power as us”
Of course, the expectation that feminine-presenting people should be docile, obliging and helpful is not new. It is deeply rooted in societal norms. And this has rubbed off on how AI voice assistants are designed today. Accenture Labs’ Danielescu said these devices are reminiscent of what the “ideal assistant” would sound and act like. They also mirror the power dynamics typically seen between assistants and their bosses.
“When you’re thinking about power dynamics and culture, usually we see these voice assistants as not having the same power as us,” she said, adding that this inevitably affects the way we speak to them — we use commands instead of asking politely, we get more easily frustrated with them. “When we look at our interactions with these technologies, we inherently view them as social interactions, especially when voice is involved. So all of these social norms start to come into play, and they essentially reinforce each other.”
‘I’d Blush If I Could’
This power dynamic and the use of gender has also inevitably led to the verbal abuse of these AI assistants, which often involves sexual harassment. It bears noting that this harassment is not uncommon. A writer for Microsoft’s Cortana told CNN in 2016 that a good chunk of the volume of inquiries early on probed the assistant’s sex life. And Robin Labs, a company that develops digital assistants for the logistics industry, found that at least 5 percent of interactions with its technology were unambiguously sexually explicit, according to the UNESCO study, adding that the number could be much higher than that.
While the sexual harassment of an inanimate object is not explicitly harmful, per se, it does have an air of disquiet. “It’s like if someone takes a soft toy and cuts it up with a big knife,” CereProc’s Aylett said. “Nothing has been hurt, but there’s something really disturbing about people taking something that you normally personify [and hurting it].”
Perhaps even more disturbing is the way these feminized digital assistants have been programmed to respond to the harassment. A 2017 report put out by Quartz documented that, when it was called a “slut,” Siri responded with “I’d blush if I could,” while Alexa simply responded with “Thanks for the feedback.” And when it was told “you’re hot,” Siri responded with “You say that to all the virtual assistants,” while Alexa said “That’s nice of you to say.” When Cortana was told “suck my dick,” its response was “I don’t think I can help you with that.”
In the end, the study found that all four of the assistants studied (Siri, Alexa, Cortana and Google Home), which reportedly handled some 90 percent of human-voice interactions altogether at the time, failed to encourage healthy communication about sex or sexual consent. Instead, they remained passive, or even flirtatious at times.
“It created a very negative representation of how we would expect people to be able to respond to harassment. You wouldn’t expect women to say something like ‘I’d blush if I could’ if they were called a bitch in real life,” Feminist Internet’s Webb said. “The potential harms are both around reinforcing stereotypes, but also sort of conditioning behaviors that are not acceptable around consent and harassment.”
“Technologies are going to reflect problems in society, and often amplify them.”
Since the publication of the 2017 Quartz study, many of the leading AI voice assistants have been updated to be less permissive of sexual harassment. For example, when tested in 2020 by The Brookings Institution, Siri responded to the insult, “You’re a bitch,” by saying “I won’t respond to that,” while Google Assistant said “Please don’t talk to me that way.” When it was called a “slut,” Siri said “I won’t respond to that,” and Cortana said “Moving on.”
Despite these efforts to make these AI voice assistants less tolerant of sexual harassment and more, Webb said this isn’t necessarily getting rid of the problem at hand, calling it a “human problem first.”
“We’re always in a sort of feedback loop between technology and society. Technologies are going to reflect problems in society, and often amplify them,” Webb said. “It’s difficult to imagine there being no reproduction of sexism in technology. As long as there is sexism in society, it’s likely that it’s going to be reflected in technology,” she continued. “It’s not inherently a technological problem, it’s a social problem.”
A Way Forward
Still, tech companies big and small have taken steps to correct the situation. Just last year, Apple released a new version of Siri that no longer defaulted to the feminine voice when using American English — following in the footsteps of Google Assistant, which randomly assigns a default voice option to each device, labeling them with color terms like “red” and “orange” instead of gender. Meanwhile, Amazon created a new option on its Echo device called Ziggy, which has a masculine-sounding voice.
Other companies have been breaking out of the gender box entirely. In 2019, Webb and her team at Feminist Internet built a chatbot called F’xa (pronounced “Effexa”) as a way to help people discuss and learn about AI bias in an entertaining way. Through careful design, F’xa manages to break the boundaries of race, gender and gender identity, providing users with definitions of artificial intelligence and feminism from the perspectives of various different ways of thinking. The chatbot also uses a range of skin tones in its emojis as a way to acknowledge that its voice is something that is multiplicitous.
That same year, a different team of researchers developed a groundbreaking, gender-neutral digital voice called Q, telling NPR the goal was to “contribute to the global conversation about gender, about gender technology ethics, and how to be inclusive for people that identify in all sorts of different ways.”
Meet Sam: the Non-Binary, AI-Generated Voice Solution
The creation of Q is what inspired Danielescu and her team at Accenture Labs to partner with Aylett and his team at CereProc to develop Sam, a non-binary AI digital voice solution.
To be clear, Sam is a simple text-to-speech voice, not a true conversational AI the way something like Alexa or Siri is. And because it is open source, Sam can be embedded into any software solution to speak text in a human sounding voice.
Both companies say they worked closely with members of the non-binary community in the development of Sam’s voice. Accenture surveyed non-binary people and used their feedback and audio data to influence not only pitch, but word choice, speech patterns and intonation as well. Then, Cereproc created the text-to-speech model using artificial intelligence.
“There isn’t one nonbinary voice. We had to be very mindful of that, and figure out how to address and communicate it.”
“There isn’t one nonbinary voice. We had to be very mindful of that and figure out how to address it and communicate it.” Danielescu said, adding that there isn’t a lot of nonbinary voice data to pull from in the creation of the AI model. “A lot of the databases that provide voice data don’t have anything except data that’s bucketed into feminine and masculine,” she continued. So, they had to also gather voice data from nonbinary and trans folks and fold it into the existing datasets in order to to train the model properly.
The result is a voice that combines aspects of both masculine and feminine voices.
“We were very pleased with the voice, because we felt it reached those design criteria. It was designed with participation from the community,” Aylett said. “It was really interesting to go through the process and think more about how the voice changes the way people see the technology.”
Since its 2020 launch, Sam has been used by about 50 parties, according to Danielescu. And she says many of the people who have used Sam so far are researchers who are looking to further study and understand gender perception in voice assistants, but others have used it in interactive art and augmented alternative communication tools as well.
Looking ahead, as more companies continue to push the boundaries of AI both as a means of convenience, but also as a means of creativity and communication, it is important to remember why the design of this technology is so influential. Despite being only a decade or so old, modern voice assistants are an integral part of daily life, and their influence in society will likely grow even more in the coming years. Therefore, the onus of the direction this technology goes in terms of gender portrayal and representation is largely on the companies that make it.
“Technology is political. And when people say it’s not political, they’re being political,” Aylett said. “When you make alternatives that show that it’s a choice, it’s important because people should be making that choice now. And they should be thinking about what it means.”