Dr. Caleb Antonio Rascón Estebané

Current Work

Close your eyes. What are you listening? Perhaps the fan of your computer, or the sounds coming from a window. Your phone rings, you answer it and listen: it's a friend asking about plans to go to the movies. Your roommate, from the living room, tells your that your favorite TV show is starting.

You distinguished between noise (fan, ambient sounds) and relevant information (phone, friend and roommate). With your eyes closed, you located your phone and your roommate. In addition, you recognized what or who was producing each sound.

This is what I would like that a service robot be able to do, such as the one from the Golem Group with which I collaborate. I am designing an audio module to simulate the act of human listening. This is also known as Auditory Scene Analysis, which is a generalization of the project Robot Audition.

Robots typically perform speech recognition using a microphone that is in close proximity to the user (headset) or suffer from poor performance. The idea is that the software collects audio information from an array of microphones mounted directly on the robot, and in conjunction act as its auditory system. I'm literally building ears for robots.

There is a postgraduate level course I give in the UNAM about this topic, information of which can be found here (only available in Spanish). I can also supervize various projects at different academic levels, information of which can be found here.

Recent Progress

  • Integration of multiple direction of arrival estimation in an Human-Robot Interaction scheme.


  • Tracking several speakers in a real environment, using only three microphones.


  • Estimate in real time, the direction-of-arrival of sounds in relation to the robot, in a range of 360°, with moderately high reliability in rooms with medium reverberation.


Pending Work

  • Removal of reverb from audio signals for the benefit of speech recognition.
  • Use reverberative residue to estimate characteristics of the environment.
  • Separation of sound sources in audio signals according to their direction-of-arrival in relation to the robot.