Meta introduces text-to-speech generative AI model Voicebox

0
30

Meta is developing a new tool that makes use of generative AI, the technology underlying the popular chatbot ChatGPT. The Voicebox tool lets people create speech using simple text inputs and audio samples. Additionally, Meta claims that Voicebox can remove unwanted background noise from audio samples.

However, Voicebox is still not available to testers and may remain restricted for some time, unlike other generative AI tools like ChatGPT and Bard or AI image generators like Dall-E or Midjourney. That’s because Voicebox, according to Meta, can be misused and presents a lot of risks.

Voicebox “Reads input sentences in a natural voice” “Records other people’s voices and reads the input sentences” “Records other people’s voices and reads the input sentences with a specified intonation” It is a voice generation AI that allows operations such as ‘recording other people’s voices and editing parts’. You can understand the high performance of Voicebox in one shot by playing the movie included in the tweet below.

How does Meta Voicebox operate?

In its announcement, Meta claimed that it is significantly more effective than its competitors. For example, it may generate words up to 20 times faster than competitor Vall-E and with an error rate of 5.9% as compared to Vall-E’s 1.9%.

According to Meta, Voicebox relies on a method known as “Flow Matching” that converts text into speech. It is said that the model may handle unpredictable and complex interactions between speech and text. Additionally, it allows Voicebox to train on a larger and more diverse set of data, increasing its flexibility and power.

Currently, Voicebox can generate speech in English, French, German, Spanish, Polish, and Portuguese. Meta states the technology is “exciting” as it can help people communicate in natural and authentic ways “even if they don’t speak the same languages.”

Voicebox can be used for audio editing. In a demonstration, Meta demonstrates how the program effectively eradicated dog barking background noise from a sample. Zoom and Google Meet already offer similar audio options for filtering.

Potential Risks of Misuse

Meta says the company is not “making the Voicebox model or code publicly available at this time” because of the “potential risks of misuse.” It adds, “While we believe it is important to be open with the AI community and to share our research to advance the state of the art in AI, it’s also necessary to strike the right balance between openness with responsibility. With these considerations, today we are sharing audio samples and a research paper detailing the approach and results we have achieved.”

The decision may go well with some critics, even though it could indicate that the Meta is still working on Voicebox and the AI tool is still not complete. Text-to-image AI generators were used earlier this year to create images of Elon Musk, Barack Obama, and Donald Trump in various settings and attire. In India, where the fight against fake news on WhatsApp is continuing, an AI-generated voice sample might also be a nightmare for politicians. Alternate or AI-produced audio samples might be useful.

Meta adds that it plans to “investigate proactive methods for training the generative model such that the synthetic speech can be more easily detected, such as embedding artificial fingerprints.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here