Leaf&Core

OpenAI Made a Voice That Sounds Suspiciously Like Scarlett Johansson

Reading Time: 8 minutes.
Photo of an iPhone saying "You look lonely. I can fix that"

Okay, this quote wasn’t from a Scarlett Johansson film (Blade Runner 2049), but it fits with ‘Her.’

Scarlett Johansson has an interesting history with AI. In the problematic live action adaptation of the fantastic anime Ghost in the Shell, she plays Major, a woman with an unknown past and a sense of lost identity and self, as she is a mind in a robotic body. Her voice, her movements, they are dictated by her mind, but her body is not her own. Who’s voice does she have? Not the one she was born with, clearly, but one that was made for her. What ethical considerations were put into crafting that voice though?

In another movie, Johansson is all shell, no ghost. She plays Samantha in Her, the movie about a lonely man who falls in love with the AI in his operating system. The CEO of Open AI, Sam Altman, reportedly states that Her is his favorite movie.

I enjoyed both movies. Though I was reluctant to watch Ghost in the Shell due to whitewashing concerns, an issue they never address and one that sours the rest of a decent “cyberpunk lite” action flick. Her might be a more optimistic view of our near future. We are not far from an AI that is realistic, flirty, and fun that people will fall in love with it. In fact, OpenAI may have revealed it already. Like Her, it could sound like it’s voiced by Scarlett Johansson, even if she doesn’t want it to.

Imitation is the Highest Form of Theft?

“I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference.”

– Scarlett Johansson

According to Johansson, Sam Altman reached out in September, 2023, asking her to be the voice of their AI. Altman had previously stated that Her was his favorite movie, so it makes sense that he’d want his AI to sound like Johansson’s AI character in the film, Samantha. Johansson turned down the offer. Fast forward to 9 months later, just two days before revealing ChatGPT-4o, Altman reached out again, supposedly urging Johansson to reconsider. She didn’t even have time to respond before OpenAI shared their demos featuring the “Sky” voice, which Johansson reports even her close friends reached out, believing she had once again voiced an AI.

She didn’t.

OpenAI could have trained their AI on only Scarlett Johansson’s voice. She has been in enough films, given enough interviews, and been in the public eye long enough to give them plenty of data to copy her voice. However, OpenAI says that isn’t what happened, and The Washington Post seems to have confirmed that there was a separate actress they got to make the voice. The “Sky” voice OpenAI used for their ChatGPT-4o product isn’t exactly exactly Johansson, even if it’s similar enough that researchers and AI can spot the similarities.

In testing done by the University of Arizona, the Sky voice that OpenAI created sounds more like Scarlett Johansson than 98% of the roughly 600 actresses they tested it against. They even stated that it has an “identical” vocal track length, which considers the shape and size of the mouth, nasal passageways, and throat. That sounds incredibly damning, however, they also stated it matches Anne Hathaway and Keri Russel better.

Because AI has no voice of its own, it must use the voices of others. OpenAI has not revealed their data sources or how they train their models, so we can’t be sure what they used to make these voices. When combining multiple vocal sources into a single voice, it won’t sound like any one directly. However, the final product sounds, to many, like a breathier version of Scarlett Johansson, perfect for her role in Her. Fans, friends, and journalists alike could not tell the difference.

OpenAI claims that the likeness is unintentional. Despite Her being Altman’s favorite movie, and the fact that he directly referenced the movie before revealing the AI, and that he reached out to Scarlett Johansson twice before its release, it apparently isn’t based on Her. OpenAI claims they hired a non-union voice actress before Altman reached out to Scarlett Johansson last year. They haven’t opened up about the process used to create the voice, and the actress is anonymous. The Washington Post got to hear voice samples for Sky, and admits it sounds like the final product, but no one can say for sure if she was chosen for sounding like a familiar voice or how she was coached for the role. OpenAI likely chose a non-union actress due to union protests regarding AI and unwillingness of professional voice actors to train vocal AI they see as a replacement for their skills. Perhaps they should have listened to the professionals.

OpenAI has since pulled the “Sky” voice, but you can still listen to “her” on the sample videos OpenAI showed. Then watch the trailer for Her and decide for yourself how close the voices are. Or just watch the whole movie. It’s very good! I wouldn’t call it my favorite, but it is definitely a fantastic movie that’s worth a re-watch in 2024.

Fighting Off Posers (Legally)

“In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity.”

– Scarlett Johansson

In the game Cyberpunk 2077, we’re thrust into a world of corporations, violence, and, of course, body modification. When you can change your entire body, your complete appearance, even your voice, what’s to stop you from just finding your favorite ageless celebrity and making yourself look like them? The law. Famous people could get a copyright on their likeness, and no doctor could legally make you look or sound like them.

It might sound dystopian to have to copyright your face, but some U.S. states already have similar laws on the books. In California, where OpenAI is based, there’s a “right to publicity.” To distill a more complex legal concept down than I’m qualified to address, it basically means a person owns their own brand, even if it’s not been explicitly copyrighted. The things that make a person unique, in the public eye, are owned by that person. A company hiring an impersonator to sing a song of the person they’re imitating for a commercial, for example, would be a violation of that right to publicity. Copying someone, intentionally or accidentally, for the purpose of replacing their work is, in some U.S. states, illegal. It’s unfortunately not defined at a federal level yet.

“There are a few courses of actions she can take, but case law supports her position.”

– Purvi Patel Albers, partner at the law firm Haynes Boone, to The Verge

If Scarlett Johansson brought a lawsuit forward, it could have some ramifications for generative AI (GenAI) companies. Right now, there’s little to nothing keeping companies like OpenAI, Microsoft, Google, or anyone else, from gobbling up the entire internet. Most of these big companies have hidden the true size and nature of the data they’ve ingested. It’s lead to deepfakes of celebrities and even a Stable Diffusion model that includes exploitative material of children and produces it as well. If companies can be held responsible for deepfakes, they’d have to control data ingestion better. This would reduce bias and improve the ecological impact of AI too. Plus, companies could hire models and actors, compensating them for being part of the dataset. That would ensure that deepfaking is far more difficult, as the requested people won’t be in the dataset at all and eliminate the chances the model could be used to generate abusive material of children. First this would take making sure that AI copies are a violation of a person’s right to publicity, then it would involve assigning blame to those who make those tools, not just those who use them.

Johansson hasn’t sued OpenAI yet. Legal experts seem to believe she has the grounds to do so, but currently her lawyers have only asked OpenAI for details on how the voice was created. Since then, OpenAI has taken Sky offline. It’s not an admission of guilt, but it does show that OpenAI does not have complete confidence in their generated Sky voice. After all, it could set precedence that could put them in the crosshairs of anyone their software helps deepfake.

It’s an Uphill Battle Against AI

If I had to guess, OpenAI wanted an actress and, while they didn’t say Scarlett Johansson, she fit the role perfectly. AI, even when trained on one voice, still uses data from other voices to complete the language, tone, and other parts of speech that aren’t in the samples. An AI voice will always be a bit different than what generated it. Could a company combine the work of a voice actress with a target voice? Yeah. It may not be what happened here, OpenAI may have created a voice that sounds like Scarlett Johansson completely by mistake, just by targeting someone “warm, engaging, [and] charismatic,” and accidentally ended up creating Scarlett Johansson’s voice. She has a great voice! But it does seem unlikely that it would sound so familiar to everyone outside of OpenAI, but no one noticed it within the company. Given a voice sample on top of a model, the training of the model itself could warp the sample slightly, just enough to make the illusion more real. OpenAI won’t detail what went into the creation of anything they make, but it could be a reasonable explanation for why an idea of a good AI voice became something like Scarlett Johansson’s voice.

It’s hard to prove something that’s not documented, especially with AI. GenAI companies intentionally act like their products are “black boxes,” and therefore cannot be predicted, but they could be better documented. Fortunately, if Johansson wants to sue, she doesn’t have to worry about proving that it was intentional, only that it’s similar enough to her own voice to confuse people, and it seemingly met that criteria already.

Don’t Let It Happen Again

To prevent theft of a person’s voice, accidental or otherwise, we’ll need to codify this as a transgression in federal law. Doing so could give us a stepping stone for eroding deepfakes too, especially if a company could be liable for helping someone else make a deepfake. If a fast food restaurant, for example, could use the tool to steal a person’s voice, then not only that restaurant, but also the maker of the deepfaking tool should be liable. If you make something that cannot be used safely, a design defect, you can be liable for the dangers that the product you’ve made can introduce. Ask the makers of lawn darts. Experts have been screaming into the void about the dangers of AI as we produce it now. Claiming ignorance at a company like OpenAI or Google—who fired their ethics team—is simply not possible. Currently, AI companies don’t have to worry about the harm they could be used for, so the ingest data indiscriminately. However, precedent here, or a new law to protect people from deepfaking for the sake of advertising, could help protect us from dangerous AI moving forward.

There’s a future for AI, but only if we regulate it.

“He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al. He said he felt that my voice would be comforting to people.”

– Scarlett Johansson in a statement to NPR’s Bobby Allyn

OpenAI wanted a voice for their AI that would feel inviting. When Altman reached out to Johansson, he said he wanted her because he felt her voice would make people more comfortable with AI. Instead, this entire situation has shown us exactly why we should be afraid of AI.

Altman claims “Sky” was never supposed to sound like the AI from his favorite movie, played by the actress he reached out to—twice. Yet we ended up with a voice that many believe sounds like the actress. With few laws regulating AI, especially when it comes to gathering data, there’s nothing to stop a company from doing this maliciously, instead of—as Altman claims—accidentally. Depending on the state, a company could currently just use deepfakes to advertise their products. It might not even be stars. Imagine a YouTube ad with the voice of your own family members telling you to buy something. If it’s profitable, companies will do it until it’s made illegal.

OpenAI showed us once gain that a lack of regulation combines poorly with a race to profitability and a lack of ethics. Even if this wasn’t on purpose, it was highly irresponsible and shouldn’t have been released. The fact that it’s so easy to deepfake anyone is a problem that more ethical AI would have made impossible. Ethics aren’t driving modern AI, greed is, and there’s nothing in the way of it.

The comparisons to a movie like Her or Ghost in the Shell are easy to make. But perhaps a different movie makes more sense here. In The Island, (spoilers) Scarlett Johansson played a clone who was created for a rich client, to be harvested for parts. That seems closer to the future AI companies are building for us: people used for parts.


Sources:
Exit mobile version