What is voice print adaptation and how does it work?

What is it?

Voice print adaptation is the process of updating a voice print to reflect changes in user behavior and environment. It’s designed to ensure that the user has the best possible voice login / verification experience irrespective of their location or environment. In this, it minimizes the chance of a genuine user being falsely rejected by the voice biometric system.

Why is it needed?

When a user registers, their voice print is based on the 3 or 4 speech samples they provided. They will have been collected from the user at a specific time and environment / location.

Subsequent login / verification attempts might take place in different locations, at different times of the day, or when the user is perhaps in a different mood. All of these factors will affect the ability of the user to voice verify their identity quickly and easily.

In practical terms, adaptation is the process whereby additional samples of the users’ voice are used to update the voice print. This is an ongoing process that keeps the voice print current and reflective of user behavior. For example, if they registered their voice at home in the morning but typically verify themselves in the afternoon at the office, the voice print will adapt itself over time to provide a better user experience at the office.

The end result of adaptation is that the advertised false reject rates (false negatives) are maintained without compromising security.

How does it work?

There are two types of adaption that are always both used. One happens automatically and one requires a third-factor to be incorporated into an app or solution.

While the inclusion of a third factor like a swipe or PIN is mandatory for a ViGo deployment, what it is, and how it is implemented is at your discretion. The PIN or swipe in the ViGo demo app is just an example of how it could be done / might look. You can use any mechanism and implement it in any way you like.

‘Automatic’, as the name suggests, is configured into the voice biometric system and happens automatically. It is triggered if (a) the users’ login / verification attempt is successful and (b) the system decides that their ‘score’ lies within a certain range that is below the ‘strong accept’ threshold level. The levels at which it is triggered are not configurable.

’Supervised’ is an adaptation process that requires the use of a third-factor in your app or solution. It is triggered if the users’ ‘confidence score’ lies within a certain range that is above the ‘strong reject’ threshold level (if you score below this threshold you are rejected outright as being an impostor). At this trigger point, a user has in effect been rejected by the system but the user can, through the re-establishment of ground truth, confirm that they are who they claim to be or they are an impostor.

Ground truth can be re-established in any number of ways and the mechanism used will be defined by the security requirements of your application. It could be anything from entering a user-defined PIN or swipe (as is the case in the ViGo demo app) or having the user speak to a call center agent and respond to some knowledge based questions.

If ground truth is re-established, the speech sample in question is used to adapt the voice print so that next time the user finds themselves in that environment they will score much higher as their voice print now contains speech representative of the new environment.

For more information, get the ViGo Adaptation Processes document.

Return to General Voice Biometric Concepts

Return to Tutorials and HowTos

Experience Voice Biometrics with the ViGo Demo App