How Does Content Moderation for Audio Chat Work?

A new kind of chat room is rising in popularity, in which people speak up instead of type away. Startups like Clubhouse, Rodeo, Chalk and Spoon let users form voice chat rooms where they can talk with friends and strangers, or just drop in and listen to conversations. The resulting vibe is a hybrid between a group call and a live podcast you can participate in.

Despite being relatively new, these companies have raised big money from investors and are generating buzz. It’s not hard to see the logic behind this — voice chat could be the new way of socializing while staying indoors during the pandemic.

But first, these apps will have to clear a hurdle that all social platforms encounter: getting a handle on content moderation.

This challenge was spotlighted again recently, when users of Clubhouse complained about anti-Semitic comments on the app. The company, which raised funds from investors earlier this year on a $100 million valuation, announced new safety features in response.

The episode raises important questions: How can social platforms maintain a safe environment that runs on live audio — when you have no idea what’s going to come out of someone’s mouth? Is it even possible to effectively moderate content that is both invisible and ephemeral?

Whether voice-based chat rooms will be a passing fad or the standard mode by which we connect with friends in the future may depend on the industry’s ability to solve this puzzle.

ReadWhy Friction Is the Key to a Better Social Media Ecosystem

Voice Is Trickier Than Text

These new audio apps follow in the wake of successful social platforms like Discord, which lets users organize into topic-specific communities to chat via text and voice, and Twitch, where users join channels and interact via text as the host broadcasts a live video stream. Moderating communities like these requires a fundamentally different approach than what you would take moderating platforms like Facebook and Twitter.

For instance: In voice-based communities, as soon as something is said, it’s gone. In text-based chats, moderation is “based on the prerequisite that there’s something to remove,” Aaron Jiang, a Ph.D. candidate at the University of Colorado–Boulder who researches content moderation, told Built In. “Real-time voice changes that assumption, it throws it into the fire.”

In the face of real-time voice chat, many pages in the content moderation playbook are suddenly obsolete. “The whole moderation landscape has been changing so fast, and it will keep changing at least as fast, or even faster than right now,” Jiang added.

“The whole moderation landscape has been changing so fast, and it will keep changing at least as fast, or even faster.”

The issue of documentation is perhaps the most knotty. It’s hard to gather material evidence for a bad actor’s harmful comments — there’s nothing to screenshot.

Recording the audio (and having some way to identify the different speakers) is one way around this, Jiang said. But that’s not common, and it raises a whole host of issues about consent and data storage.

“How to prove that someone did the bad thing or broke the rule is definitely the million-dollar question,” Jiang said. “It almost entirely depends on community reports.”

Find out who's hiring.

See all Product jobs at top tech companies & startups

View Jobs

Preventing Abuse

In text-based communities, preventing abuse is possible (to an extent). On Twitch, for example, streamers can set up filters so that, when a user types a swear word, the back-end detects it and blocks the comment. (Though users can get awfully creative with how they type things.) When you’re dealing with live audio, however, that’s not something you can control at all.

That’s why moderation in live communities focuses more on what to do after abuse or harassment occurs. Basic moderator tools for chat communities include the ability to mute, boot, block and ban people who break the rules — all ad hoc solutions.

Important steps communities can take, then, are to prepare for how to react when violations happen — and to set clear standards and norms so that “you’re not always trying to put out fires, you’re setting up things so that the fires don’t happen to begin with,” Yvette Wohn, associate professor at the New Jersey Institute of Technology, told Built In.

“Just make people know what is acceptable and what is not.”

Wohn’s research focuses on the role of algorithms and social interactions in livestreaming, e-sports and social media. She believes it’s important that social platforms head off as much abuse as possible, by making community guidelines extremely visible and clear. That means not just having a terms of service that users mindlessly sign, but rules that everyone must see and agree to abide by upon joining.

“It sounds very intuitive,” Wohn added. “Even really simple rules. Just make people know what is acceptable and what is not.”

She also recommends that voice-based communities consider adding text-based back channels. Carving out this extra space, outside of the main voice-based forum, may be useful for education, she said. Back channels can help for cases in which users want to pull one another aside to discuss insensitive behavior, for example, or for moderators to explain to a user why their comment was inappropriate.

“People could choose to use Discord or some other social media to have those conversations, Wohn said. “But I think if you build that into your platform, that would be really advantageous.”

More Moderator Tools

Wohn also recommends that social platforms equip moderators with sophisticated technical tools to make it easier for them to enforce community standards.

She cites Twitch as an example that fledgling voice-based apps may want to follow. The platform, which has 15 million daily active users, has allowed third-party developers to make moderator tools that livestreamers can use. It eventually rolled out a moderation toolkit that incorporated some of the more-popular features offered by the third-party tools, Wohn said.

Letting third-party developers get creative may be advantageous, and easier than starting from scratch, which may delay wide-scale content moderation efforts.

One helpful tool gives Twitch moderators the ability to click on a user and see their moderation history. That way, moderators can see if the user is a repeat offender, or if this is the first time someone has reported an issue involving them. The added context may help moderators make a more suitable judgment call on whether to outright ban the user, or pull them aside and remind them of the community standards.

Find out who's hiring.

See all Product jobs at top tech companies & startups

View Jobs

Can Artificial Intelligence Help?

Wohn doesn’t see much of an opportunity, if any, for artificial intelligence to help prevent abuse on audio social platforms. But not for AI’s lack of ability; there’s just no feasible way it can anticipate someone saying something harmful. Where she thinks AI can make an impact, though, is in helping human moderators make judgment calls.

If someone makes a racist remark on an audio platform, for example, the participants typically don’t stop and wait for the moderator to take action; the conversation keeps going. So it’s up to the moderator to quickly make a decision on how to step in.

Based on previous training data, machine learning can identify an offensive verbal comment as such, flag it for the moderator, and give them a real-time update about how comments like that were handled in the past. This can help the moderator make a quick, consistent ruling.

That doesn’t erase whatever harm was done by the comment, of course. It’s a small step toward equipping audio communities to enforce their standards rather than being caught unprepared and leaving affected users wondering who’s going to hold offenders accountable.

“We do need those tools to help make it easier, but, in terms of moderation, there will always need to be a human in the loop,” Wohn said. “Because what you’re dealing with is other creative human beings.”

ReadWhat Happened to Xfire, the Early Social Network for Gamers?

Are Content Moderators Ready for Voice-Based Chat Rooms?

Voice Is Trickier Than Text

Preventing Abuse

More Moderator Tools

Can Artificial Intelligence Help?

Recent Consumer Tech Articles