Nalmpantis, C., Vrysis, L., Vlachava, D., Papageorgiou L. & Vrakas D., "Noise invariant feature pooling for the internet of audio things". Multimed Tools Appl 81, 32057–32072 (2022). https://doi.org/10.1007/s11042-022-12931-y
This manuscript discusses the robustness to noise of deep learning models for two audio classification tasks. The first task is a speaker recognition application, trying to identify five different speakers. The second one is a speech command identification where the goal is to classify ten voice commands. These two tasks are very important to make the communication between humans and smart devices as smooth and natural as possible. The emergence of smart home devices such as personal assistants and the deployment of audio based applications in noisy environments raise new challenges and reveal the weaknesses of existing speech recognition systems. Despite the advances of deep learning in audio tasks, most of the proposed architectures are computationally inefficient and very sensitive to noise. This research addresses these problems by proposing two neural architectures that incorporate a novel pooling operation, named entropy pooling. Entropy pooling is based on the principle of maximum entropy. A detailed ablation study is conducted to evaluate the performance of entropy pooling against the classic max and average pooling layers. The neural networks that are developed are based on two architectures, convolutional networks and residual ones. The study shows that entropy based feature pooling improves the robustness of these architectures in the presence of noise.