- ChatGPT has decided to remove a newly introduced AI voice after users noted that it sounds a lot like Scarlett Johansson from the movie “Her”
- Sam Altman had previously reached out to the actress to seek permission to use her voice, which she denied.
- It ran auditions and reviewed 400+ submissions before finalizing the top 5 voices. The entire process took about 5 months.
ChatGPT has decided to remove a voice called “Sky” after some users noted that it sounds a lot like Scarlett Johansson’s voice from the movie “Her” – a movie about artificial intelligence.
For those who haven’t used ChatGPT, these voices are used to read out the responses that ChatGPT offers.
‘We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them.’ – OpenAI
The controversy started when last week the company launched a bunch of new products which include new AI voices, ChatGPT-4o, and a desktop version of ChatGPT.
‘Sky’ Is Flirting with Users
The likeness of the voice is not the only concern. “Sky” is also designed to be a little flirtatious which did not go down well for some users.
It seemed to laugh and giggle at everything. For instance, it said things like, “Wow, that’s quite an outfit you have got on” or something along the lines of “Stop you are making me blush.
Some users pointed out on X that it sounded like a “woman written by a man”. Another said that there’s no need for the voice to be this flirtatious and obsequious.
Well, OpenAI has an answer for this too. It said that when looking for the right voice, it was looking for something that sounded approachable, charismatic, and timeless. All the voice actors were informed of the firm’s vision before the project was finalized. So they all knew what they were signing up for.
Is It Really Scarlett Johansson’s Voice?
Although users seem to feel there’s a resemblance, OpenAI says it is not actually her voice. All the five voices that were introduced recently, which include Juniper, Ember, Cove, Breeze, and Sky, were recorded by professional voice artists.
Over 400 submissions were received initially after which a group of 14 was selected and an internal committee selected the final 5. The entire process lasted for 5 months.
The company has also added that it cannot reveal the names of the actual artists to protect their privacy.
However, there’s a lot more to the story. According to the actress, Altman had approached her nine months ago to convince her to license her voice to OpenAI. However, Scarlett denied the offer for ‘personal reasons’.
Even two days before the launch, Altman connected with Johansson’s team requesting her to reconsider her decision. However, before any reply could come through, OpenAI went ahead with its launch event.
And sure enough, soon after the live demonstration, users who watched it began posting online that it sounded a lot like Scarlett Johansson. Adding fuel to the fire is a recent X post by Sam Altman on 13th May where he simply posted the word “her”.
This only goes to show that Altman was well aware of what he was doing and in fact, wanted some glory for copying Johansson’s voice without permission.
However, OpenAI can now land in legal trouble for this stunt. Johansson’s legal team has now sent two letters to OpenAI asking it to disclose how it built an AI voice similar to the actress.
‘I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference’ – Scarlett Johansson
Johansson also condemned this OpenAI stunt in times when deepfakes and AI have made it difficult for people to protect their likeness, work, and identities. In fact, in January, sexually explicit deepfake images of singer Taylor Swift went viral, drawing the attention of US lawmakers.
When Is the Voice Mode Coming Out?
Despite all the controversies surrounding the voice resemblance, ChatGPT is looking forward to launching the Voice Mode in the coming weeks. According to a post made by the company, paid users will get early access to the feature. More new voices will also be added.
Speaking of its capabilities, the Voice Mode can perform a number of tasks, such as:
- Read out a bedtime story
- Sing the story
- Help calm a person before a public speech
- Analyze the visual expressions of a person to comment on their emotional state
The company also added that the voices will provide fast replies to your queries. In its own words, it will take “as little as 232 milliseconds” and an average of 320 milliseconds – which is almost the same as human response.