Vickers, Peter George Jarvis (2024) Navigating Multimodal Complexity: Advances in Model Design, Dataset Creation, and Evaluation Techniques. PhD thesis, University of Sheffield.
Abstract
Ibn Sina, a philosopher of 11th-century Persia, wrote of a `Floating Man'. This man is floating through a void, without the use of his sight or touch or any of the senses which make us human. Yet as he has a human brain this man, according to Ibn Sina, is capable of imagining and reasoning with the capabilities of any other person. With the development of Large Language Models the field of Artificial Intelligence has come close to making a `Floating Man' - or at least making a `Floating Man' with memories of more books than exist in the wildest dreams of the librarians of Alexandria or Oxford. In this thesis, we question if the `floating man' of AI could benefit from more of his senses, reasoning that as humans a great deal of our experience is multimodal. Our research aims to address the limitations of current NLP models that heavily rely on textual information, often at the expense of multimodal cues. Such errors highlight the critical need for multimodal approaches in many applications, of which we study Visual Question Answering, Citation Recommendation, and Eye-Tracking Prediction, where text alone can lead to biased, harmful, or simply incorrect outcomes, such as mistaking a metal table for one made of wood due to textual biases. Through our research, we aim to show the potential for Multimodality in enriching the capabilities of Artificial Intelligence.
Metadata
Supervisors: | Aletras, Nikos and Barrault, Loïc |
---|---|
Keywords: | Artificial Intelligence, Multimodal, Natural Language Processing, Computer Vision, Knowledge Graphs, AI, NLP, CV, KG |
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Depositing User: | Mr Peter George Jarvis Vickers |
Date Deposited: | 16 Sep 2024 09:10 |
Last Modified: | 16 Sep 2024 09:10 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:35513 |
Download
Final eThesis - complete (pdf)
Filename: Peter_Vickers_PhD_Thesis__Final_.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.