Voice Biometrics: The Definitive Guide (2022)

Introduction

This article is about authentication, the fight against crime, and wolves.

In this complete tour of voice biometrics, we cover everything from basic notions to more advanced topics.

Why do we need voice biometrics?

From more secure authentication to crime investigation passing by digital transaction signatures, you will discover here why and how voice biometrics will find a place in many domains!

In this article, we cover:

The problem with current authentication systems
The basics of voice biometrics
The 3 main types of voice biometric systems
4 different use cases for biometrics
The limits of current systems and how they can be beaten

If you want to see how this new technology will impact our future, you’re at the right place.

Let’s dive in!

The problem with usual authentication processes

Let me walk you through a story we’ve all already experienced.

Summer has just started, you’ve been working very hard for the last few months and start looking at options to release the pressure during the summer break.

As you’re skimming through all the unread emails in your inbox, you notice a message from your favourite band. You subscribed to their newsletter a few weeks back, don’t you remember?

The email’s subject is written as “We just added new dates to our tour, book them now!”. An outdoor concert from your favourite band: that would be a great reward for the efforts you’ve done in the last few months wouldn’t it?

However, you notice that the email dates back to a few days ago and you know how popular the band is so it’s unlikely that there still are tickets left …

Your curiosity and hope push you to open that email and check the booking link for tickets on sale. You end up on a page that says “only 2 places left”: you have to book fast!

You already imagine yourself in the show, singing along to your favourite songs. You click on “Get my ticket now” and ……. a page shows up asking you for your email and password to complete your order.

You want to create an account but the page says that there already is one in your name. You remember that you’ve created one 2 years ago to book tickets for another show; but what was the password again?

In the hope to earn time, you go to the login page and try your classics: “Password123”, “password123”, “PASSWORD123” (nobody remembers where those caps are), “[email protected]$$w0rd123”, …

After 2 minutes of looking for the right key, you realise there’s no way you will get through authentication like that and ask to reset your password.

This will take a few minutes but at least you know that you’re almost done.

While you’re waiting for that password reset email, your vision of being sat in the concert hall starts drifting away, escaping your fingers like a pile of sand.

The fear of having had a false hope rises as time ticks and someone else may take your spot.

In the end, you still go through an authentication and get your ticket, after resetting your password to a new one you have already forgotten.

This story perfectly illustrates the weaknesses of the email/password identification scheme: you never remember your passwords because they have to be always more and more complicated, and you always end up resetting them every time you want to login to a service you’re not using very often.

This is one of the core areas where voice biometrics can help;

What is voice biometrics?

If you’re unsure about what voice biometrics refers to, the next few lines will make it crystal clear for you. You may already have heard of the word biometrics in movies, but what is it exactly?

Biometric identification uses statistical methods on biological characteristics to identify a person or a group. The most commonly used biometric features for identification are fingerprint, face, voice and typing cadence (the speed at which you type).

Have a look at Apple: TouchID and FaceID are two examples of biometric identification.

Voice biometrics is the subset of biometric identifiers that rely on voice!

Also known as voice verification or speaker identification, voice biometrics allows fast seamless and secure authentications on applications ranging from websites to call centres passing by mobile apps and voice assistance.

Will we someday see a VoiceID on our Apple devices? Let’s discuss that further down.

How does voice biometrics work?

At this point, you may be wondering: “Ok, I’ve got the fact that it’s all about identifying someone from their voice, but how does it actually work?”.

Well, just as finger biometrics rely on fingerprints, voice biometrics rely on … voiceprints!

I’m sure you know that fingerprints are unique to every individual as the police often rely on them to identify criminals from the small finger traces they leave behind them.

This is no surprise when you know that fingerprints arise from the movements and touches a baby does when they are still in their mom’s belly.

The randomness in all those movements makes every person’s finger pattern unique.

You may now wonder: “But what’s the link with voice?”. The answer to this becomes very clear once you know that there are more than 70 of your body parts (mouth, tongue, throat muscles, …) involved in shaping the sound of your voice.

Each of these body parts’ shape, size and harshness are unique to you, making your voice unique as well.

Voice biometrics authentication creates your voice signature from a recording of your voice and then uses this to identify you back later. Whether or not it relies on you saying a specific phrase depends on each system, more on that below.

Deeply improved by recent advancements in AI, here is how voice biometrics works:

Once your voiceprint has been created, the system is good to go and you can authenticate using your voice, isn’t that amazing?

To recognise you, the voice biometrics engine computes your voiceprint and checks that it matches the one linked to you in its database. if it does, you’re in!

Modern voice biometrics algorithms are advanced enough to work with language-independent voiceprints. In other words, these technologies can be used with any language in any country.

Passive VS Active Voice Biometrics: What’s The Difference?

Now that you’re more familiar with the basics, let’s dive into the different families of voice biometrics.

Voice biometrics systems can be divided into 3 main categories:

Text-dependent active voice biometrics
Text-independent active voice biometrics
Text-independent passive voice biometrics

Let’s dive in and understand together what those terms mean.

First, text-dependent methods ask to use a specific short phrase to recognise a speaker.

For instance, HSBC asks their customers to pronounce “My voice is my password” to log in to their call support lines. This means that you have to explicitly say a specific phrase for the system to be able to identify you.

In contrast, text-independent systems aim at identifying speakers no matter what they say. In this paradigm, you don’t have to say anything specific for the system to recognise you, you can say anything you want.

Now that we know the difference between text-dependent and text-independent identification, what differs between active and passive voice biometrics?

It all comes down to whether you have to actively do something to be identified, or whether it can be done passively without you doing anything specific.

In the case of active authentication, the system tells you something like: “To authenticate, please say ”My voice is my password” (text-dependent) or “To authenticate, please say something” (text-independent). This is for instance used in Amazon call centres.

But then, what is passive voice biometrics? This is when the identification system recognises you in the background, without you having to do anything particular.

For this reason, passive voice biometrics is mostly text-independent as we don’t want to impose anything on the speaker.

Passive voice biometrics systems can come in pretty handy for smooth user authentication or crime investigation, as we will see below.

Hopefully, now the difference between text-dependent and text-independent as well as between passive and active voice biometrics systems is clear to you.

Let’s now dive into the different places where this technology is being used today.

Where is voice biometrics used today?

Voice authentication

The natural first thought when it comes to voice biometrics applications is voice authentication aka logging into an account with your voice.

This application revolutionalises the process of authentication by removing any dependency on knowledge (“Do you remember your password ?”) or access to a device (“Do you have your phone where we just sent your verification code ?”).

Instead, it all now relies on you being yourself, making the whole flow much smoother.

Voice authentication can now be seen in many applications such as:

Authentication on online websites and apps;
Telephone and online transactions;
IVR (Interactive Voice Responder) -based banking and healthcare systems;
or even audio signatures to digital documents.

Voice authentication would fit very well to give customers a full online voice commerce experience. Take them through your store with a Vocads campaign, then authenticate them with a voice to validate the purchase.

Crime investigation

Voice identification is not restricted to authentication. Indeed, think of criminal cases where the police have an audio recording of a suspect they would like to identify.

Using voice biometrics, they can go back to the identity of the suspect and make major steps towards solving the case.

What if no suspect matches? Can voice signals still help?

If there is no particular person that matches with the recording, investigators still can get an estimate of the person’s demographics including age, gender and origin, all precious clues for potential high stakes cases.

Voice biometrics made a key difference in the Rebecca Zahay murder case. Her death was first classified as suicide in 2011 but requalified as homicide in 2016 after analysing “her” call to 911 with voice biometrics to determine that it wasn’t her speaking.

Zoology

On a lighter note, speaking of identifying individuals from the sounds they make, a group of international researchers applied voice biometrics to a group of wolves in Yellowstone National park and, from their ”howl prints” could determine which individual the sound came from. How amazing is that!*

Healthcare and diagnosis

Lastly, other researchers recently focused on extracting voice-based disease biomarkers using AI and huge patient databases.

The objective here is to get as much information about a patient’s health state from an audio sample.

For example, doctors try to answer questions like: “Based on this recording, how likely is this patient to have disease A ?” “At which stage of disease B is this patient ?”.

See for example this article from MIT news describing a system that detects asymptomatic covid cases from cellphone recordings.

In a nutshell …

From the most heard of voice authentication to the less known yet super helpful other use cases in criminology, healthcare and even zoology, voice biometrics have many applications today.

The most recent advances in AI research promise a bright future for the discipline with many great products to come!

Voice biometrics providers

Now that we understand why voice biometrics is important, how it works and where it can be applied, you may wonder: “But how can I set voice biometrics up in my application?”

Don’t worry, we’ve got your back.

Here is a selection of the top voice biometrics companies to data.

Whispeak

Whispeak provides a solution for faster, easier and more secure authentication using voice biometrics.

Their software makes it easy to add a voice biometric factor in authentication processes.

We will see below that voice has a nice place in multi-factor authentication and Whispeak can help here.

Their solution can be deployed on call centres, websites or even mobile and embedded devices.

Want to learn more? Go check the demo on their website here!

ID R&D

ID R&D is a leading company in biometric authentication.

At the forefront of technology, they propose a wide range of products to add biometric authentication from your business.

Their main products perform face, document and voice liveliness checks as well as voice biometrics.

NICE

As part of their contact centre automation range, NICE proposes a voice authentication product.

Their solution uses passive voice authentication to seamlessly register users during their journey on the contact centre.

This, in addition, makes the user’s journey smoother, faster and safer while lowering your costs.

Phonexia

If there’s one company in the list that may be used by secret services, it’s this one.

Phonexia is a Voice Verification and speech recognition software company with over 15 years of experience in the market and projects in over 60 countries.

Their product range includes 4 main technologies all leveraging the power of voice biometrics:

Voice Verify authenticates users in seconds using their voice
Orbis Fraud Detection detects call centre fraudsters from their voice
Orbis Investigator is an all-in-one audio investigation tool for detailed audio analysis
Speech Platform empowers you to find speakers and other informations in large amounts of audio data

Phonexia has free online demos on their website. Go check them out if you’re interested!

Omilia

Omelia shows a wide range of voice-related technologies. From conversational AI to voice biometrics, it’s all there!

Their promise is to provide businesses with a tool to make their customer experience more smooth and flexible, while reducing costs.

DiaManT® Anti-Fraud, the shiniest of their products, uses voice biometrics for proactive end-to-end fraud prevention.

From voice to device type passing by geolocation, they check everything to keep the baddies out of your way.

Veri das

With over 50 million terminal users in just 4 years, Veridas is all about biometric identification.

Their solutions cover both face and voice biometrics.

Their voice biometrics products promise to authenticate users fast and securely.

Use their voice biometric authentication to increase sales funnel conversion, smoothen your customer experience and reduce cost.

They also have a solution to protect you from fraudsters and all their products comply with the latest regulation.

Nuance

And last but definitely not least: Nuance!

Nuance is a major actor of AI-powered voice technologies.

Recently acquired by Microsoft for an astonishing $20 billion bill, Nuance demonstrates the power of voice technologies in many sectors from retail to healthcare.

In the context of this blog post on voice biometrics, two of nuance’s products are particularly interesting:

Voice authentication authenticates users from their voice, leading to a smoother and safer experience for them and lower operational costs for you;
Fraud prevention stops fraud at the source by identifying and incapacitating fraudsters at the root;
Nuance Gatekeeper puts it all together in a cloud-native solution.

This company is a real leader in the industry of voice technologies, go check them out!

Can voice biometrics be beaten?

In this post, we’ve seen what voice biometric systems are, how they work, what the problems with current authentication methods are and how voice biometrics can help with that.

Then, one natural question arises: “How accurate is voice biometrics ? Can it be beaten ?”.

This is a very natural, great and essential question to ask and it’s a good thing to dig in that crucial direction. Indeed, if we rely on those procedures in the future for banking and healthcare services, it’d better be rock solid!

Before going deeper into the topic, it’s key to define more precisely what is meant by “accurate”.

When you think about it, there are 2 types of errors an authentication system can make:

let the wrong person in (false accept);
reject the good person (false reject).

Both are quite bad as a false acceptance gives access to somebody who shouldn’t get it, leading to potentially massive privacy and security issues, and the latter prevents you from accessing something you should be able to reach.

As you can guess, there is a balance between the 2 types of errors above. Decreasing the number of false rejects by letting more people in would increase the number of falses acceptance rates.

Similarly, decreasing the false accept rate by letting fewer people in will increase the false reject rate.

Voice biometrics systems usually come with an adjustable threshold for users to define whether they want to favour one or the other.

Somewhere in this balance is a point called the Equal Error Rate (EER), where the proportions of false defects and false accepts are equal. This number is used as a metric for system performance.

Let’s clarify the above sentences with an example. If a voice authentication system has an EER of 10%, 10 out of 100 legit people will be rejected despite being themselves and 10/100 fraudsters will mistakenly be let in under someone else’s name.

If you followed me until then, you should now think that a system with an EER of 0 would be perfect and you’re absolutely right!

However, such a perfect performance cannot be reached in practice and companies are happy with a slightly higher EER as voice authentication carries many benefits.

To get a better idea of the benefits, you can check Nuance’s voice authentication cost savings calculator.

Most industry-leading voice biometrics companies advertise an EER of around 10-20% in real-life scenarios, which they look happy with.

However, current research state-of-the-art shows EERs down to around 2% and I’m in no doubt that these new algorithms will not need a lot of time to reach the markets.

On a different note, where standard passwords are threatened by data leaks, voice authentication systems are under increasing pressure from voice spoofing.

Indeed, one could well record their target’s voice and instantly get access to whatever is protected by voice authentication.

Text-independent (see the definition above) authentication systems are particularly weak to those attacks as they don’t require users to say anything specific, making text-dependent voice authentication more robust.

Demanding users to say something specific is however not perfectly strong, considering the massive leaps done on deepfakes and voice synthesis in the last few years.

In a world where anyone will be able to synthesise a target’s voice from a few seconds recording of them speaking, they could be able to break their voice authentication.

Researchers, engineers and companies like ID R&D are working on voice biometrics systems are aware of those threats and work on counters to limit their potential impact, if not annihilate them.

In the end, this is the good old mouse and cat game where some try to build increasingly more robust security walls while others find smarter and smarter ways to get through without permission.

To build secure systems, a great compromise, for now, is to use voice biometrics as part of multi-factor authentication. Therefore, to be granted access, you need the correct password, your phone and your voice.

Companies like banks already implement variants of this pipeline to make their customer’s journey always more secure and smooth.

Conclusion

I hope you enjoyed this article on voice biometrics.

Now I want to hear from YOU!

Would you agree to use your voice as the key to your online banking system? What do you think about using voice biometrics for crime investigation?

And, most importantly, were you surprised by the use case on wolves?

Share your opinion in the comments!

If you know someone interested in this type of topic, send them a link to this post.

If you’ve appreciated our work, the best you could do to spread the word would be to share this article on your social media!

Solutions

Resources