Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate any voice after being fed a short three-second sample.
Dubbed VALL-E, the new tool was trained on around 60,000 hours of English-language speech data, which Microsoft says is “100 times larger than existing systems.” Knowing this, its developers claim that it only takes a smattering of speech input to understand how to replicate a user’s voice.
Even more impressive is that VALL-E can reproduce the emotions, vocal tones, and acoustic environment found in each sample, something other speech AI programs have struggled to do. This gives it a more realistic aura and brings its results closer to what could pass for real human speech.
Compared to other text-to-speech (TTS) competitors, according to Microsoft VALL-E, Microsoft “clearly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker likeness.” In other words, VALL-E sounds a lot more like real humans as competing AIs encountering audio inputs they weren’t trained to handle.
Microsoft has one on GitHub small pattern library created with VALL-E. For the most part, the results are very impressive, with many samples reproducing the intonation and accent of the speakers’ voices. Some of the examples are less than compelling, suggesting that VALL-E is unlikely to be a finished product, but overall the performance is compelling.
in one Paper introducing VALL-E, Microsoft explains that VALL-E “may pose potential risks of misusing the model, such as: B. spoofing speech recognition or the identity of a specific speaker”. Such a powerful tool for generating realistic-sounding speech raises the specter of increasingly convincing deepfakes that could mimic anything from a former romantic partner to an international celebrity.
To mitigate this threat, Microsoft says “it’s possible to build a detection model to distinguish if an audio clip was synthesized by VALL-E.” The company says it will use its own as well AI principles in the development of his work. These principles cover areas such as fairness, security, privacy and accountability.
VALL-E is just the latest example of Microsoft’s experimentation with AI. Recently, the company has been working to integrate ChatGPT with Bing, use AI to summarize your team meetings, and integrate advanced tools into apps like Outlook, Word, and PowerPoint. And according to Semafor, Microsoft is looking for invest $10 billion in ChatGPT maker OpenAIa company in which significant funds have already been invested.
Despite the obvious risks, tools like VALL-E could be particularly useful in medicine, for example to help people regain their voice after an accident. The ability to replicate speech with such a small input set could hold immense promise in these situations, provided it’s done right. But with all the money being spent on AI — both from Microsoft and others — it’s clear it’s not going away any time soon.
#fake #vote #seconds