Muller, Bruce ORCID: https://orcid.org/0000-0003-3682-9032 (2022) Semantics and Planar Geometry for self-supervised Road Scene Understanding. PhD thesis, University of York.
Abstract
In this thesis we leverage domain knowledge, specifically of road scenes, to provide a self-supervision signal, reduce the labelling requirements, improve the convergence of training and introduce interpretable parameters based on vastly simplified models. Specifically, we chose to research the value of applying domain knowledge to the popular tasks of semantic segmentation and relative pose estimation towards better understanding road scenes. In particular we leverage semantic and geometric scene understanding separately in the first two contributions and then seek to combine them in the third contribution.
Firstly, we show that hierarchical structure in class labels for training networks for tasks such as semantic segmentation can be useful for boosting performance and accelerating training. Moreover, we present a hierarchical loss implementation which differentiates between minor and serious errors, and evaluate our method on the Vistas road scene dataset.
Secondly, for the task of self-supervised monocular relative pose estimation, we propose a ground-relative formulation for network output which roots our problem in a locally planar geometry. Current self-supervised methods generally require over-parameterised training of both a pose and depth network, and our method entirely replaces the need for depth estimation, while obtaining competitive results on the KITTI visual odometry dataset, dramatically simplifying the problem.
Thirdly, we combine semantics with our geometric formulation by extracting the road plane with semantic segmentation and robustly fitting homographies to fine-scale correspondences between coarsely aligned image pairs. We show that with aid from our geometric knowledge and a known analytical method, we can decompose these homographies into camera-relative pose, providing a self-supervision signal that significantly improves our visual odometry performance at both training and test time. In particular, we form a non-differentiable module which computes real-time pseudo-labels, avoiding training complexity, and additionally allowing for test-time performance boosting, helping tackle bias present with deep learning methods.
Metadata
Supervisors: | Smith, William |
---|---|
Keywords: | Deep Learning,Computer Vision,Visual Odometry,Convolutional Neural Network,CNN,Neural Network,Hierarchical,Homography,machine learning |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.871151 |
Depositing User: | Mr Bruce Robert Muller |
Date Deposited: | 26 Jan 2023 12:56 |
Last Modified: | 21 Feb 2023 10:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32144 |
Download
Examined Thesis (PDF)
Filename: Bruce_Muller_Thesis.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.