Masked Conditional Neural Networks for Sound Recognition

Abstract

Sound recognition has been studied for decades to grant machines the human hearing ability. The advances in this field help in a range of applications, from industrial ones such as fault detection in machines and noise monitoring to household applications such as surveillance and hearing aids. The problem of sound recognition like any pattern recognition task involves the reliability of the extracted features and the recognition model. The problem has been approached through decades of crafted features used collaboratively with models based on neural networks or statistical models such as Gaussian Mixtures and Hidden Markov models. Neural networks are currently being considered as a method to automate the feature extraction stage together with the already incorporated role of recognition. The performance of such models is approaching handcrafted features. Current neural network based models are not primarily designed for the nature of the sound signal, which may not optimally harness distinctive properties of the signal. This thesis proposes neural network models that exploit the nature of the time-frequency representation of the sound signal. We propose the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN). The CLNN is designed to account for the temporal dimension of a signal and behaves as the framework for the MCLNN. The MCLNN allows a filterbank-like behaviour to be embedded within the network using a specially designed binary mask. The masking subdivides the frequency range of a signal into bands and allows concurrent consideration of different feature combinations analogous to the manual handcrafting of the optimum set of features for a recognition task. The proposed models have been evaluated through an extensive set of experiments using a range of publicly available datasets of music genres and environmental sounds, where they surpass state-of-the-art Convolutional Neural Networks and several hand-crafted attempts.

Metadata

Supervisors:	Chesmore, David and Robinson, John
Related URLs:	International Conference on Artificial Neural Networks and Machine Learning, ICANN, 2017 (Related publication) IEEE International Conference on Data Mining, ICDM, 2017 (Related publication) International Conference on Neural Information Processing, ICONIP, 2017 (Related publication) IEEE International Conference on Machine Learning and Applications, ICMLA, 2017 (Related publication) Advanced Data Mining and Applications, ADMA, 2017 (Related publication) Artificial Intelligence XXXIV, SGAI, 2017 (Related publication) IEEE International Conference on Data Science and Advanced Analytics, DSAA, 2017 (Related publication) YorNoise sound dataset (Research data) Code (Research data)
Awarding institution:	University of York
Academic Units:	The University of York > School of Physics, Engineering and Technology (York)
Academic unit:	Electronic Engineering
Identification Number/EthosID:	uk.bl.ethos.759903
Depositing User:	Fady Medhat
Date Deposited:	04 Dec 2018 11:03
Last Modified:	21 Mar 2024 15:12
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:21594

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

CORE (COnnecting REpositories)

Masked Conditional Neural Networks for Sound Recognition

Abstract

Metadata

Download

Examined Thesis (PDF)

Export

Statistics

Masked Conditional Neural Networks for Sound Recognition

Abstract

Metadata

Download

Examined Thesis (PDF)

Related datasets

Export

Statistics