Entry Date:
January 20, 2017

Understanding Real-World Auditory Scene Analysis

Principal Investigator Josh McDermott

Project Start Date April 2015

Project End Date
 March 2020


A fundamental question in auditory science concerns how people can recognize speech and other sounds in the presence of competing sound sources, as when conversing with a dinner partner at a crowded restaurant. The process of hearing a sound of interest when it is embedded in a mixture of other sounds is known as "sound segregation" and human listeners vastly outperform machine systems for segregating sounds. However, the process is frequently effortful, is highly vulnerable to hearing impairment, including hearing impairment that typically accompanies normal aging. Understanding the basis of sound segregation in human listeners, and the factors that limit human segregation abilities, would enhance efforts to develop assistive listening devices and machine systems for robust speech recognition and sound recognition. The project will be complemented by an educational effort to stimulate interest in audition in the general public and in middle- and high-school students through a series of publicly available online video presentations describing auditory research with associated sound demonstrations.

This CAREER award is aimed at enriching the understanding of human auditory perception by exploring the basis of sound segregation with natural sounds. The experiments will leverage recent advances in speech analysis and synthesis methods to 1) manipulate grouping cues in natural speech and test their effect on sound segregation in human listeners; 2) manipulate voice and speech structure to probe their role in segregation; and 3) test the ability of human listeners to attend to and track target sound sources. The long-term goals are to inspire signal-processing algorithms that facilitate segregation by human listeners and replicate their competence in machine systems.