Toumpa, Alexia ORCID: https://orcid.org/0000-0003-4438-6809 (2023) Categorization of Affordances and Prediction of Future Object Interactions using Qualitative Spatial Relations. PhD thesis, University of Leeds.
Abstract
The application of deep neural networks on robotic platforms has successfully advanced robot perception in tasks related to human-robot collaboration scenarios. Tasks such as scene understanding, object categorization, affordance detection, interaction anticipation, are facilitated by the acquisition of knowledge about the object interactions taking place in the scene.
The contributions of this thesis are two-fold:
1) it shows how representations of object interactions learned in an unsupervised way can be used to predict categories of objects depending on the affordances;
2) it shows how future frame-independent interaction can be learned in a self-supervised way by exploiting high-level graph representations of the object interactions.
The aim of this research is to create representations and perform predictions of interactions which abstract from the image space and attain generalization across various scenes and objects. Interactions can be static, eg. holding a bottle, as well as dynamic, eg. playing with a ball, where the temporal aspect of the sequence of several static interactions is of importance to make the dynamic interaction distinguishable. Moreover, occlusion of objects in the 2D domain should be handled to avoid false positive interaction detections. Thus, RGB-D video data is exploited for these tasks.
As humans tend to use objects in many different ways depending on the scene and the objects' availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open-set of interactions and class-agnostic objects. In order to abstract from the continuous representation of spatio-temporal interactions in video data, a novel set of high-level qualitative depth-informed spatial relations is presented. Learning similarities via an unsupervised method exploiting graph representations of object interactions induces a hierarchy of clusters of objects with similar affordances. The proposed method handles object occlusions by capturing effectively possible interactions and without imposing any object or scene constraints.
Moreover, interaction and action anticipation remains a challenging problem, especially considering the generalizability constraints of trained models from visual data or exploiting visual video embeddings. State of the art methods allow predictions approximately up to three seconds of time in the future. Hence, most everyday-life activities, which consist of actions of more than five seconds in duration, are not predictable. This thesis presents a novel approach for solving the task of interaction anticipation between objects in a video scene by utilizing high-level qualitative frame-number-independent spatial graphs to represent object interactions. A deep recurrent neural network learns in a self-supervised way to predict graph structures of future object interactions, whilst being decoupled from the visual information, the underlying activity, and the duration of each interaction taking place.
Finally, the proposed methods are evaluated on RGB-D video datasets capturing everyday-life activities of human agents, and are compared against closely-related and state-of-the-art methods.
Metadata
Supervisors: | Cohn, Anthony |
---|---|
Related URLs: | |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Depositing User: | Miss Alexia Toumpa |
Date Deposited: | 11 Oct 2023 14:51 |
Last Modified: | 16 Jan 2025 10:45 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:33571 |
Download
Final eThesis - complete (pdf)
Filename: Alexia_Toumpa_PhD_Thesis_WhiteRose.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Related datasets
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.