One-Shot Speaker Identification using a CNN-based Generic Verifier for a Service Robot

This page provides a CNN-based speaker identification system that does not require re-training when new speakers are encountered. This system has great potential in Service Robotics applications, such as being a waiter in a restaurant or a buttler in a domestic setting.

The following is a video demonstrating the system with two users that initially are unknown to the system, and that speak in two different languages.

You can download the TensorFlow source code here.

The final model was trained using the Voxceleb 2 corpus, which you can obtain from the VoxCeleb website.

Preliminary models were trained using the LibriSpeech corpus, which you can obtain from the OpenSLR website.

You can download the LibriSpeechReal evaluation corpus here.