Koizumi, Tatsuro ORCID: https://orcid.org/0000-0003-0201-4687 (2023) Model-based Self-supervision for Dense Face Alignment and 3D Reconstruction. PhD thesis, University of York.
Abstract
In the field of monocular 3D reconstruction, self-supervision based on differentiable rendering and a statistical 3D model has been proposed to alleviate the need for datasets with ground truth. In theory, this enables training of neural networks only using unannotated images. However, training through self-supervision tends to be unstable and surrogate supervision such as landmarks is required in practice. Moreover, reaching convergence in self-supervised 3D reconstruction is slow or unachievable due to the weak and discontinuous supervisory signal provided by a differentiable renderer. Our research starts from the aim to improve such problems in differentiable renderer-based self-supervision.
Firstly, we combined differentiable linear least-squares fitting of a 3D morphable model (3DMM), pose, and lighting with self-supervision. We propose linear least-squares solutions for geometric and photometric parameters including a novel inverse spherical harmonic lighting model. This assures optimal fitting of photometric components given estimated geometric parameters and improves fidelity in reconstructed appearance. This concept also provides an opportunity to combine 3DMM fitting with image-to-image networks, leading to stable training without requiring landmark supervision.
Secondly, we proposed supervision based on semantic segmentation. In contrast to landmarks, this form of supervision is dense and always well defined. However, it is not one-to-one, meaning more complex loss functions are required to exploit it. We propose two novel cohesive measures for semantic segmentation supervision. First, we show how precomputed distance maps in a 3DMM UV space can be used to supervise pixel-wise estimates of image-model correspondence. Second, we derive a novel differentiable vertex to pixel cohesive measure based on the geometric Renyi divergence. Using this loss, we show that pure shape-from-semantic segmentation is possible via analysis-by-synthesis.
Lastly, we combined both techniques and propose the self-supervised architecture for 3D face reconstruction that does not require a differentiable renderer.
Metadata
Supervisors: | Smith, William |
---|---|
Related URLs: | |
Keywords: | 3D reconstruction, face reconstruction, face alignment, 3D morphable model, differentiable renderer, least squares, geometric Renyi divergence, semantic segmentation, self-supervision, deep learning, neural network |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Depositing User: | Tatsuro Koizumi |
Date Deposited: | 12 Jun 2024 09:59 |
Last Modified: | 12 Jun 2024 09:59 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:35048 |
Download
Examined Thesis (PDF)
Embargoed until: 12 June 2025
Please use the button below to request a copy.
Filename: TatsuroKoizumi_Thesis.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.