Gaurav Naithani: Tackling the cocktail party problem with neural networks for augmented hearing devices

Photo: Aparna Gangadharan

In his doctoral dissertation, M.Sc. Gaurav Naithani investigated deep neural network-based methods for solving the classic cocktail party problem, i.e., separating individual sound sources from a mixture of multiple sound sources. He focused on applications such as hearing aids where the latency requirements are very stringent. The investigated algorithms benefit hearing-impaired listeners and are applicable to augmented hearing devices as well.

Our world is an orchestra of overlapping sounds. From bustling city streets to crowds and gatherings, deciphering specific audio streams can be a challenge. This is where Deep Neural Networks (DNNs) have emerged as a revolutionary tool for navigating our sonic landscape. However, they have not yet been studied much in latency-critical applications such as hearing aids. By his doctoral research, Gaurav Naithani seeks to fill this gap in knowledge. In his research, latency refers to the time delay between capturing a sound in the environment delivering it to the user’s ear after processing.

In his doctoral dissertation, Naithani focused on latency constraints, and investigated different neural network architectures and training methodologies such as different training objectives and input features. He collaborated with Eriksholm Research Centre (ERH), part of Oticon in Denmark, to conduct listening tests and showed that the algorithmic improvements yield improvements for the hearing-impaired listeners.

According to the World Health Organization WHO, around 5 % of the world’s population requires some kind of rehabilitation due to hearing loss. Therefore, Naithani’s work is an important step in integrating deep learning-based methods into conventional hearing aids improving the lives of hearing-impaired listeners.

As a major player in the manufacture of hearing aids, Oticon is an ideal partner to bring research results to use in real life.

The research was conducted at the Audio and Speech Processing Research Group under the Faculty of Information Technology and Communication Sciences at Tampere University. The listener evaluations were conducted at ERH, Denmark.

Gaurav Naithani was born in India and earned his bachelor’s degree there. He moved to Finland to pursue his master’s degree at Tampere University.

Public defence on Tuesday 21 May

The doctoral dissertation of M.Sc. (Tech) Gaurav Naithani in the field of Computing and Electrical Engineering titled Low-Latency Single-Channel Speech Separation with Deep Neural Networks will be publicly examined at the Faculty of Information Technology and Communication Sciences at Tampere University at 12:00 on Tuesday 21 May 2024 at Hervanta Campus in the auditorium SA203 S2 of the Sähkötalo building (Korkeakoulunkatu 3, Tampere).

The Opponent will be Professor Sharon Gannot from Bar-Ilan University and Associate Professor Tom Bäckström from Aalto University. The Custos will be Professor Tuomas Virtanen from Tampere University.

The doctoral dissertation is available online.

The public defence can be followed via remote connection.

Gaurav Naithani: Tackling the cocktail party problem with neural networks for augmented hearing devices

Public defence on Tuesday 21 May ­­

Public defence on Tuesday 21 May