Close your eyes. What are you listening? Perhaps the fan of your computer, or the sounds coming from a window. Your phone rings, you answer it and listen: it's a friend asking about plans to go to the movies. Your roommate, from the living room, tells your that your favorite TV show is starting.
You distinguished between noise (fan, ambient sounds) and relevant information (phone, friend and roommate). With your eyes closed, you located your phone and your roommate. In addition, you recognized what or who was producing each sound.
This is what Golem, our service robot, is in the process of being able to do. I am designing an audio module to simulate the act of human listening. This is also known as Auditory Scene Analysis, which is a generalization of the project Robot Audition.
Golem used to carry out speech recognition through a microphone which was in great proximity to the user (headset). The audio module is now in the process of being on a system of microphones placed directly on the robot. Literally, I am building ears for Golem.
There is a postgraduate level course I give in the UNAM about this topic, information of which can be found here (only available in Spanish). I can also supervize various projects at different academic levels, information of which can be found here.
- Integration of multiple direction of arrival estimation in an Human-Robot Interaction scheme.
- Tracking several speakers in a real environment, using only three microphones.
- Estimate in real time, the direction-of-arrival of sounds in relation to the robot, in a range of 360°, with moderately high reliability in rooms with medium reverberation.
- Removal of reverb from audio signals for the benefit of speech recognition.
- Use reverberative residue to estimate characteristics of the environment.
- Separation of sound sources in audio signals according to their direction-of-arrival in relation to the robot.