Barahona-Ríos, Adrián ORCID: https://orcid.org/0000-0001-7538-2793 (2023) Deep Learning for the Synthesis of Sound Effects. PhD thesis, University of York.
Abstract
In media production, the sound design process often involves the use of pre-recorded sound samples as the source of the audio assets. However, the increasing size and complexity of interactive media such as video games, may render this process very time-consuming and memory-demanding.
In contrast, the use of sound synthesis for sound effects can improve the sound palette of media, tackling the challenges derived from current workflows. These synthesised sound effects are usually generated using digital signal processing (DSP) methods. Nonetheless, creating sound effects using DSP methods may be challenging, and can produce unsatisfactory results, which hampers their adoption among audio professionals.
Recent data-driven approaches propose an alternative to these DSP methods for the synthesis of audio, surpassing them and establishing the state of the art in sound generation. This thesis explores the suitability of DSP systems, generative deep learning architectures, and a combination of both for the synthesis of sound effects, with an especial focus on game audio.
The results show that some DSP methods, with constraints, can be perceptually effective for this task. Furthermore, it is shown: how generative deep learning methods, not necessarily bound by those constraints, are not far from achieving a plausibility comparable to pre-recorded samples; how they can also be trained in data-scarce scenarios outperforming DSP approaches in plausibility and variation of the synthesised sounds; and how a combination of deep learning and DSP processes can be used to build expressive models, linking human-interpretable controls to the output audio.
The implications of the proposed work suggest that both generative deep learning methods and a combination of them alongside DSP approaches contribute to addressing the challenges hampering the adoption of synthesised sound effects. This work could lead to the establishment of novel data-driven workflows tailored to the preferences of audio professionals, in line with current industry demands.
Metadata
Supervisors: | Pauletto, Sandra and Wells, Jez and Collins, Tom |
---|---|
Keywords: | deep learning, sound synthesis, procedural audio, digital signal processing, neural audio synthesis, game audio, sound design. |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Depositing User: | Dr Adrián Barahona-Ríos |
Date Deposited: | 23 May 2024 09:19 |
Last Modified: | 23 May 2024 09:19 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:34929 |
Download
Examined Thesis (PDF)
Filename: Barahona_Rios_Thesis.pdf
Description: PhD Thesis
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.