Voice AI: the key to success in the metaverse


Today, more and more people are talking about the Metaverse! All we see online sounds pretty exciting but, at the same time,  so far from the world we are living in now.

Having qualitative and sensible conversations with virtual people is key for the success of the metaverse.

That’s why, in this article on voice AI in the metaverse, I’ll walk you through:

  • The fundamentals of voice AI and the Metaverse
  • What stage we are at now
  • What needs development
  • Predictions from industry experts 
  • My own prediction (the most important one😜)

Let’s dive right in!

The fundamentals

The Metaverse and voice AI are both pretty extensive topics so we will go through the key main fundamentals.

Don’t worry I’ll make it as simple as possible!

What is the metaverse?

The Metaverse is a 3D virtual world where users can interact and connect with each other. 

There, people can perform several activities like playing games, conducting meetings, or talking to their friends. On the business side of things, companies can promote their products and services in the Metaverse as they do on social media.


The Metaverse is still a blurry topic for most people but some companies are working hard every day to make this virtual world a reality.

What is voice AI?

Voice AI is an umbrella term for all voice-related AI technologies. Voice AI technologies rely on machine learning to perform tasks like answering voice requests and completing tasks by voice.

It relies on sophisticated speech-to-text, NLP and text-to-speech deep learning models. 

Here is how conversational AI works step by step

  1. The user speaks to the AI (Siri, website, Alexa)
  2. The AI converts the voice message into text thanks to a speech to text technologies.
  3. The AI analyzes the produced text to determine the user’s request
  4. Then the AI finds the most suitable answer according to the user request
  5. Finally,  it converts the most suitable answer’s text back to speech and answers vocally to the user ((text to speech technology) ex Acapela or Readspeaker). 

What is the place of voice AI in the metaverse?

The metaverse will need to combine several top technologies in order to work properly and an extremely performant voice AI is one of the key pillars to reach this goal.

Let me explain:

Voice AI won’t affect conversation from human to human in the metaverse, conversations will work as they do today in games.

But when it comes to virtual robots it’s different:

IKEA won’t be able to double their number of employees and have virtual humans staying in the metaverse store all day and night, right? 

I mean they could but it would be extremely expensive to have people in their store 24/7.

The obvious solution is to have robots performing the tasks that the human will be performing in the real store.

We all faced an employee that wasn’t very good at his job or was having a bad day, we are humans we can’t be perfect.

People might not be as patient in front of robots so he must be at the top of his conversational game. 

Let’s take an example: 

Say you enter an IKEA store in the Metaverse to buy a table for your living room for instance (In the metaverse or in the real world

You walk through the doors and you see all the products while progressing through the corridors surrounded by virtual tables.

After walking for 10 minutes, one suddenly catches your attention and you want more information about it.

Yes, there’s this book giving you all the details but that’s a lot of information to go through for just the questions that you have. 

You also would like extra information but there is no one to ask.

Voice AI would empower brands like IKEA to have virtual agents there to solve your problem and smoothen your customer experience.

With voice AI in place, you just give a shout-out to the robot, which will have a sensible conversation where it answers all your concerns. 

Once all your questions are swept away and your questions are all answered, you’re very excited about buying that cute little table and you will do so 100% with your voice.

Does this sound far-fetched to you? Well, get ready because the technology is very close!

Massive progress has been made by startups like Vocads to make all this possible but there is still a lot of hard work on the way.

This is the place Voice AI should have in the metaverse!

What stage are we at now?

Now that we have the fundamentals, let’s take a look at what the landscape looks like today.

Voice and web 3 (third generation of the internet (ex: social media are web 2 and NFTs are web 3)) have been put together on a few occasions but a full conversation between a human and a robot inside the Metaverse hasn’t happened yet.

Voice and audio in NFTs

In the NFT space, a few projects that include audio have been created.

For example the Voiceverse Origins collection. They created an NFT that says a quote. The idea is then to provide the ownership of a unique voice in the metaverse.

Check their collection here 

Go and have a look at their roadmap too, It’s pretty interesting.

Here, it was a sentence but some artists like Booba, a french rapper sold 2 songs as NFT.

According to the French media Mouv, the first one “TN” was sold in 25 000 copies for more than 600 000$. And a second one “GDC” was released at the end of December.

Don Diablo has sold his entire concert as an NFT for 1 265 000$.

metaverse voicemetaverse voice

Artists are starting to use this technology more and more because they need less intermediate and therefore gain a higher percentage of the benefits than when they go through the traditional path.

Audius, a fully decentralized crypto application, enables artists to manage their music by themselves. These kinds of projects are really promising as they are a new step to a more decentralized world.

Have a conversation with an NFT

Until now we were able to put voice in an NFT but what about having an actual conversation with an NFT?

This might sound silly, but bear with me …

This isn’t possible yet as it would require a lot of development but here is an example of what a conversation with an NFT could look like with the technology of Vocads.

The NFT comes from the Otaku Origin project which is a pretty good project you can go check their website here.

Facial expressions according to the voice

Another important thing for the Metaverse is for avatars to have facial expressions and coherent lips movement while speaking.

Companies such as NVIDIA are working on generating facial expressions from an audio source with their deep learning AI technology.

The Nvidia Omniverse Audio2Face project already has a partnership with Marvel and that might only be the beginning for them.

What is ongoing and needs to be improved

So there is some good work in progress and it’s promising but a lot of areas need improvement.

Let’s take a look at what needs to be improved for a good voice experience in the metaverse!

Improvement in Voice biometrics

When we will have a discussion in the metaverse with robots, security will be the main concern and an accurate voice biometrics system will be the way to solve this!


Because without voice biometrics, anyone can pretend to be you and make purchases and take decisions using your avatar in the metaverse!

Let’s say you’re just looking at a phone in a Samsung Metaverse store and the robot says do you want to purchase this item?

And your 9-month son who you just taught how to say yes and no shout’s a big “YES MOMMY YES”

You hear the robot saying “your purchase has been confirmed”.

And you just end up with a 900$ phone purchase! 

metaverse voicemetaverse voice

According to Mirror this already happened with Amazon Alexa with 2 kids buying 500£ of toys without the mum knowing.

These events could multiply in the Metaverse without an accurate voice biometrics system!

To learn more about voice biometrics, check out our guide on the topic here.

Voice payments to facilitate the user experience

We already mentioned voice payments in my little IKEA story earlier but let’s talk about it in more detail. 

All brands will have a slot in the metaverse and they will be selling items there.

Let’s take an example:

Say HP is trying to sell a computer in the metaverse. You might be able to get the computer not only at home but also a digital version in your virtual home.

But it will not be convenient to leave the metaverse and get your credit card.

Let’s say you know your card details per heart, it will be very long to fill in the details from your computer.

We are not yet sure which payment method will be the most convenient in the metaverse but it’s an area that needs improvement.

Voice payment associated with voice biometrics could be a solution to pay in the metaverse but lots of work has to be done in that area.

If you want to know more about voice payment, have a look at our guide here.

Sandbox and Decentraland still need some improvement

Recently some metaverse such as The Sandbox and Decentraland have made the headlines for being the closest thing we can associate to a metaverse.

metaverse voicemetaverse voice

It’s a good start but they both need considerable improvement for people to have real conversations with virtual bots.

Meta ( previously Facebook) is also working on its own metaverse where you will be able to have virtual meetings. 

metaverse voicemetaverse voice

Let’s say you have a secretary robot managing who is allowed to have a meeting in the metaverse, when and where. You need to be able to have a sensible conversation with the secretary.

Here again voice ai is needed.

Other metaverse projects are happening every day such as Metaverse GT for example!

For the moment the focus is mainly on VR but the competition will be tough and the one that uses the best voice ai will have an advantage over the others!

Improve the conversation with Vocads

Nothing is yet sure about how voice AI in the metaverse will look like!

But before even thinking about web 3, what does the best conversation you had with an AI in web 2 looks like?

The first things that come to your mind might be Amazon Alexa or Google Home.

If this is the case you haven’t tried Vocads. The AI-powered no-code platform to create voice conversation in any environment.

Try it free here.

Get help from a Voice agency

As it’s the case for many industries, all voice agencies are not ready to go into web 3.

Fortunately, some of them are!

For example Skilled Creative, a voice agency based in New York and Its CEO Brandon Kaplan are already minded for the transition from Web 2 to Web 3.

They already work with global brands in web 2 and might be one of the agencies that stands out when it comes to bringing the voice of global brands in the metaverse.

Read a great article from Forbes to learn more here 

Voice might not be the first thing that comes into the mind of brands willing to enter the metaverse (they will think about land and visuals first) but, at one point, they will realize that it is a true cornerstone for a good user experience. 

Additionally, we have listed out the best voice agencies that can set up voice technologies for your company, if you’d like to take a look, click here 

What experts says about the Metaverse?

Jamie Beaumont, COO at SoapBox Labs

“We can’t yet predict exactly how the metaverse will develop, but what we can say with absolute certainty is that it will come (in fact in some ways, it is already here). In doing so, the future of human/computer action will evolve to become more natural and frictionless, with less reliance on a mouse, control pad, keyboard, or other physical input devices.

This inevitably means voice will become the default way that humans interact with technology. The first adopters will naturally be children and we can already see kids today growing up as “metaverse natives” in the same way as Millennials and Gen Z have grown up as digital natives. Children are at ease inhabiting these multi-faceted, play/social/experience-based platforms and embracing and adopting new technologies.

It is therefore critical that they be provided with the tools to maximize these experiences and ensure that they can do so safely and in an age-appropriate way. Voice AI can provide a range of critical services and will ultimately be a core part of the infrastructure of the metaverse.”

Elise Pinto, CEO of Vocads

“The new world on the web will not be what we imagine today, the technology is not completed and the adoption is not global. I see a first version similar to video games in the next 2-3 years. This will prepare everyone for the second one, which will come later and will be more immersive with augmented reality. This second version has a brake which is the headset. There is a lot of improvement revolving around the headset and understanding the metaverse as a whole that needs to be done for users to have a seamless experience.

For the immersion to be total, you need voice. It is the last step of the interaction that has never taken place. Voice is a natural part of any exchange and it will be the same on the next version of the web. Today, with the considerable improvement of microphones and AIs, everything is grouped together so that the voice experience will be at the center of the user experience in the next few years.”

Alexis Rocco, Marketer at Vocads (author):

“According to me the metaverse is still at a very early stage and not much will happen for the general public in 2022. This said lots of companies are deploying a lot of resources and working hard to make the metaverse possible and are showing good results. 

For a metaverse to be successful, different technology in various areas need to be improved (avatar, voice ai, internet, VR…). Without very good conversational AI technology, having a natural conversation with any virtual being (with no human being speaking live behind it) will be impossible and will drastically decrease the quality of the experience and the opportunities that users could experience in the metaverse.”


Good progress has been made so far in the development of voice AI and the metaverse but there are still a lot of things to improve to ensure a quality experience.

On top of that, all experts aren’t agreeing on how the metaverse will look like, or even on whether it will actually happen or not!

In a few years, the metaverse is likely to not be what we imagine it to be right now, but I think the future developments related to it are pretty exciting!

Now, what about you? Do you think the metaverse will happen one day?

Do you believe that voice AI will be good enough to ensure good interactions in the metaverse?

Anyway, leave us a comment to give us your opinion and if you want if you think this article could be useful for others, share it on your social media!

Leave a Reply

Your email address will not be published. Required fields are marked *