Ringer, Charles (2022) Multi-Modal Livestream Highlight Detection from Audio, Visual, and Language Data. PhD thesis, University of York.
Abstract
Livestreaming is the act of live broadcasting via the internet and allows viewer-host interaction via a text-based chat system. In particular, video game livestreaming is prevalent, where streamers host individual play sessions or present esports competitions. While livestreaming is an emerging entertainment medium, it is popular. For example, every minute, about 1900 hours of footage is livestreamed on Twitch.tv, currently the most popular video game livestreaming platform.
It can be challenging for viewers to access the content they are most likely to enjoy. One solution is ‘highlight videos’, which can entertain users who did not watch a broadcast, e.g. due to a lack of awareness, availability, or willingness. Furthermore, livestream content creators can grow their audiences by using highlights to advertise their streams and engage casual followers. However, hand-generating these videos is laborious. Thus there is great value in developing automatic highlight detection methods.
Video game streaming provides the viewer with a rich set of audio-visual data, conveying information about the game through game footage and the streamer’s emotional state via webcam footage. Analysing both the game and the behaviour of broadcast personnel is crucial for modelling the exciting aspects of livestreams. Furthermore, livestreaming offers a unique opportunity to understand the viewing experience through the text-based chat system. However, livestream data has a significant set of challenges, e.g. how to fuse multimodal data captured by different sources in uncontrolled, noisy conditions. Thus deep learning models able to leverage complex data are appealing for highlight detection methods.
This thesis explores the application of deep learning highlight detection models to the domain of livestreaming. Multimodal highlight detection methods are developed for personalitydriven livestreams and esports broadcasts. The unique nature of livestream audience chat language is explored, and audience-based highlight methods are proposed. Finally, a model
capable of modelling all these modalities in one system is presented.
Metadata
Supervisors: | James, Walker and Mihalis, Nicolaou |
---|---|
Awarding institution: | University of York |
Academic Units: | The University of York > Computer Science (York) |
Identification Number/EthosID: | uk.bl.ethos.875105 |
Depositing User: | Dr Charles Ringer |
Date Deposited: | 02 Mar 2023 10:38 |
Last Modified: | 21 Apr 2023 09:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:32406 |
Download
Examined Thesis (PDF)
Filename: PhD_Thesis__Revised_.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial NoDerivatives 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.