Yang, Jingbo ORCID: https://orcid.org/0000-0002-8962-8800
(2024)
Long video generation using the VAE-GAN method.
PhD thesis, University of York.
Abstract
Video generation has emerged as a critical area in machine learning, with applications spanning entertainment, virtual reality, and surveillance. However, generating realistic and temporally coherent videos, especially for long-term sequences, remains challenging. This thesis addresses these challenges through novel hybrid models and transformer-based architectures, improving video quality, efficiency, and duration.
The thesis first analyses limitations of existing generative models. While GANs produce sharp videos but suffer from computational expense and mode collapse, VAEs are more efficient but yield blurry outputs. We propose hybrid VAE-GAN models that combine their strengths by combining the inference ability of the VAE with the generative properties of the GAN, using a VAE encoder with GAN generators to enhance video consistency and continuity.
Focusing on temporal modelling, we address the critical challenge of long-duration video generation under computational constraints. Emphasizing GPU memory efficiency, we develop a novel recall mechanism that decomposes videos into temporally coherent sub-sequences with Markovian dependencies. This enables efficient long-term modelling with fixed memory requirements. Further refinement through auto-regressive modelling enhances temporal consistency, while our introduction of the Generative Pre-trained Transformer (GPT) architecture provides global temporal perspective through latent space sequence modelling.
The thesis provides several key contributions: (1) Encoding GAN3 (EncGAN3), integrating VAE and GAN for high-quality short-term videos; (2) Recall Encoding GAN3 (REncGAN) with a recall mechanism for efficient long-duration generation, developed through iterative architectural improvements; (3) Auto-Regressive R2 (AR2) with auto-regressive recall; and (4) GPT R2 (R3) leveraging transformer architectures. These models achieve minimal, fixed GPU memory increments regardless of video length.
Experimental results demonstrate significant improvements in both short- and long-term video generation across multiple benchmarks, showing superior performance in quality, coherence, and computational efficiency. This thesis advances video generation techniques, particularly for applications requiring high-quality extended sequences.
Metadata
Supervisors: | Bors, Adrian |
---|---|
Related URLs: | |
Keywords: | VAE-GAN, video generation, recall mechanism |
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Depositing User: | Mr Jingbo Yang |
Date Deposited: | 25 Apr 2025 15:22 |
Last Modified: | 25 Apr 2025 15:22 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:36662 |
Download
Examined Thesis (PDF)
Filename: Yang_207058919_Thesis_revised.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.