Context-Aware Pedestrian Intention Prediction with Evidence Accumulation for Autonomous Vehicles

Abstract

This thesis tackles a central challenge in autonomous driving: figuring out early and accurately whether a pedestrian is about to cross the road in complex, real-world environments. It advances the pedestrian intention prediction (PIP) field by developing a comprehensive perception pipeline framework that combines multiple levels of information, including pedestrian visual appearance, motion patterns, scene-level semantic cues, and wider traffic context. Through extensive multimodal analysis, the work demonstrates that combining close-range pedestrian features with broader environmental information yields more accurate and generalisable predictions across diverse urban scenarios. The research further quantifies the relative importance of visual, kinematic, and contextual cues under varying conditions, supporting principled feature selection and reducing prediction bias. The proposed framework extends both temporal and spatial perception by leveraging multi-camera viewpoints and predicting up to four seconds ahead, showing that broader contextual awareness enhances confidence and temporal stability. To improve robustness under limited or changing data domains, the thesis employs synthetic-domain pretraining and domain randomisation to strengthen low-level perception modules such as pose estimation and semantic segmentation. The study also explores the use of vision-language models guided by structured prompts to enhance scene understanding and behavioural inference. Finally, an evidence accumulation mechanism inspired by drift–diffusion models is introduced to represent decision-making as a continuous temporal process, enabling the system to infer crossing intention earlier and with increasing confidence over time. Experiments across benchmark and real-world datasets confirm that (i) integrating pedestrian-centric and scene-level cues substantially improves accuracy and generalisation, (ii) synthetic-domain pretraining and robust perception modules mitigate domain bias, and (iii) modelling intention prediction as an evidence accumulation process produces earlier, stable, and interpretable inferences essential for safe autonomous operation. Overall, the thesis contributes a comprehensive framework that advances the reliability and robustness of pedestrian intention prediction for autonomous vehicles.

Metadata

Supervisors:	Rezaei, Mahdi and Wang, He
Related URLs:	PIP-Net: Pedestrian Intention Prediction in the Wild (Related publication) Pedestrian Intention Prediction via Vision-Language Foundation Models (Related publication) Enriched Pedestrian Crossing Prediction Using Carla Synthetic Data (Related publication) Local and Global Contextual Features Fusion for Pedestrian Intention Prediction (Related publication)
Keywords:	Computer Vision; Machine Learning; Autonomous Vehicles; Intelligent Transportation Systems; Human Behaviour Modelling; Pedestrian Crossing Prediction; Deep Neural Networks
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Environment (Leeds) > Institute for Transport Studies (Leeds)
Date Deposited:	08 Apr 2026 10:14
Last Modified:	08 Apr 2026 10:14
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:38368

Download

Final eThesis - complete (pdf)

Embargoed until: 1 April 2027

Please use the button below to request a copy.

Filename: 2026-03-09 - Mohsen_s_Thesis__University_of_Leeds___Final_.pdf

Request a copy

Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Context-Aware Pedestrian Intention Prediction with Evidence Accumulation for Autonomous Vehicles

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics