The Unreal Slim Shady — How We Trained an AI to Simulate Eminem’s Style.

Without context, the video is almost believable.

The story could be as simple as this: In the wake of his many musical feuds, from Mariah Carey to Machine Gun Kelly, Eminem decided to aim his sights higher, looking to take tech CEO Mark Zuckerberg down. For three minutes, Eminem skewers the mogul, mocking everything from his past controversies to his “glossy eyes.” The rapper’s classic trademarks are there: his nasal intonations, his throaty yells and a biting sense of humor.

Yet Eminem, though he’s notorious for disses, did not write, record or create this track. The truth is far stranger. There almost is no human element to the song at all. The lyrics were generated by GPT-3 and the vocals created synthetically.

Creating an AI Eminem

To create our AI Eminem, we asked GPT-3 to generate lyrics in his style. We then sent the lyrics to 30 Hertz, an artist who creates deepfake music, who generated the vocals. Hertz uses the program Tacotron 2, a network that turns text into synthesized speech. It requires users to create their own models of human voices. The program requires some coding experience, which means it isn’t totally accessible to a layperson. In order to synthesize voices, a creator must compile videos of an artist with little background noise and feed them to the program. The process entails much trial and error and is often complex with just middling results.

More From Jacob VausCan an AI Write a Speech Better Than a Human?

AI Rap

The idea was a creative experiment. Does AI know enough about Eminem to write lyrics in his voice? On our YouTube channel, Calamity AI, my co-creator Eli Weiss and I aim to explore artificial intelligence’s creative capabilities. This exploration ranges from testing its ability to write a short film to how well it can mimick a Dr. Seuss book. We use Shortly AI, which is a GPT-3 program developed to cure writer’s block, though this is a capability we’ve admittedly never used it for.

The inspiration for the video came after discovering 30 Hertz’s channel. He creates a wide variety of deepfake music. This involves recreations of singer’s voices, from Tupac to Billie Eilish. Though his videos are machine generated, he wrote the songs. What if, we wondered, we added another level of AI to the equation by having it write the lyrics?

We wanted to add humor to the proceedings. Eminem had recently taken over rap headlines for his beef with Machine Gun Kelly. Who would be the most absurd figure for Eminem to target?

So, we wrote, “Eminem’s new song is a diss against Mark Zuckerberg.”

AI Versus Mark Zuckerberg

The AI generated the following: “Yo, I’m coming for the Zuck, the epical CEO who’s on a sack of fail. When he ain’t doing shrooms and having sex in VIP, he’s rage-querying his ex-employees that that he got rid of, claiming that he’s gonna create another Twitter.”

Fascinatingly enough, GPT-3 seemed able to use information about Zuckerberg — controversy with former employees — in Eminem’s voice that blends crude humor with cutting insults.

We continued generating. Though the song never reaches the real Eminem’s lyrical heights, the song includes several lines with a touch of cleverness, however rudimentary. For instance, it crafted the put-down, “You’re so stupid, you had to ask the owner of RapGenius to tell you what diss meant.” It also offered a surprisingly cogent reference: “You made your fortune off the back of others’ art.” In terms of generating relevant insults, GPT-3 is surprisingly capable. With enough human oversight, I believe you could create a compelling song using the AI. If someone were to only use the best lines, a song could sound convincingly human.

We did not edit the output here, however. So, for every clever lyric, a nonsense one lives alongside it. Consider when Eminem relates Zuckerberg to a dog: “Your name should be Facebark, because you’re all bark. You got no bite, but the way you bark, it’s kinda hot.” From there, the AI somehow seized on the subject of hair and continued to devolve. “I’m making a photoshopped picture of you tomorrow. I’m gonna make you into a real big hairy steamboat. I’ll call you Mark by the River.”

The oddity of the lyric comes from its lack of context. To my knowledge, the steamboat imagery and the hair are references to nothing, purely random. Later, the AI falls into a spiral, calling Zuckerberg stupid four lines in a row. “You’re so stupid, that you say, ‘I’m feeling a little sick.’ You’re so stupid, and you smile. You’re 26, you’re so stupid.” Still, the AI manages to end on a half-baked pun (“Your face lost … Bye.”)

Making Real Fake Music

From there, we sent the lyrics to 30 Hertz, who generated the vocals. Hertz uses the program Tacotron 2, a network that turns text into synthesized speech. It requires users to create their own models of human voices. 30 Hertz has created many models himself, well showcased on this track, which features the voices of Tupac, Biggie, Jay-Z, Eminem, Big L and DMX, in one madcap Christmas song.

The program requires some coding experience, which means it isn’t totally accessible to a layperson. In order to synthesize voices, a creator must compile videos of an artist with little background noise and feed them to the program. The process entails much trial and error and is often complex with just middling results. A comprehensive description can be found in this tutorial.

Because 30 Hertz had already trained a model for Eminem’s voice, the process for this video was simpler. He fed the lines to the program, continuing to tweak and massage those portions that were more uncanny. He then created a beat, which was the only truly human component of the song, and put it underneath the rap.

Successes and Failures

There are still some noticeable design flaws. Although the AI manages to deliver certain lines convincingly — “running from their dreams like you’re the devil” is a particularly effective one— it often stumbles over harder words and phrases like “cryogenic.” The lyrics sound good individually, but often lack consistency from line to line. The song often sounds like multiple versions of Eminem, jumping in at random times.

Though the video opens with a qualifier (“The following song is written and performed by a computer.”) the opening cue is still uncanny. To a listener without context, it sounds like Eminem’s voice rises out of the speakers, waging an assault on Mark Zuckerberg. There are kinks, yes, but the ultimate result is frighteningly human. Thanks to this strangeness, the video received our most attention yet, earning coverage on NME and UPROXX. The project is often misunderstood, however, with many people refusing to believe it’s the work of AI. Comments read, “Eminem low-key actually recorded this.” “This was made by people. I cannot believe otherwise.” “If this isn't Eminem, who is it then?”

Even for its creator, the video carries an element of oddity. I played the song for a friend, an Eminem fan, without telling him it was machine-generated. “Eminem dissed him out of nowhere,” I said. “It’s crazy.” We listened in the car together, and he nodded his head. “Whoa,” he said, enjoying the song and smiling at the insults. It was only 30 seconds later, when enough of the flaws had become evident, that his smile faded: “What is this?”

More in AI + MediaHow Tech Is Rewriting the Script for Hollywood

The Future of AI Art

Although this project did enough to point out AI’s current capabilities, it is difficult not to look toward the future. Though the output is at a more rudimentary level, its potential can’t be denied. Here, we connected two disparate elements (AI lyrics and a synthesized voice) it’s not hard to imagine that these will be linked. The coding processes will likely be simplified with automation able to collect the auditory samples and create its own models. Will the future be fully customizable? Can we pick our artist, their tone, the subject matter?

Imagine.

In the future, a teen peers out a rainy window. “Write me a sad Taylor Swift song,” they say. “About fall leaves.” In time, Swift’s voice is there. Pauses and pronunciations go unnoticed. The flow is unmatched, smart, concise, specific. Yet the words were never recorded, the lyrics never written. “Make it happier,” the teen says. The tone rises, the voice lilts, the instruments brighten. The song is perfectly calibrated, another artist’s voice molded, just for them.

“Good,” they say. “Repeat.”

The Unreal Slim Shady: How We Trained an AI to Simulate Eminem’s Style