Xu, Zisong (2025) Physics-Based Multi-Object Tracking for Non-Prehensile Robotic Manipulation. PhD thesis, University of Leeds.
Abstract
This thesis presents an algorithm to track objects, particularly designed to address the challenge of “tracking objects in cluttered environments.” The key point is that the robot can continuously track the target objects, even when they are heavily occluded. This method can be used to retrieve an object from inside a fridge or a crowded shelf, where the robot needs to know the 6D pose of each object in the scene to decide how to move obstacles and then locate and pick up the target item. Accurately performing the task requires knowing the 6D pose of each object, which enables motion planning. However, obtaining such information in real-world environments is extremely difficult. Traditional motion planning usually assumes access to full 6D poses, often using tools like OptiTrack[1] with reflective markers attached to objects. Unfortunately, placing markers on every object and setting up an OptiTrack system is not feasible in practical situations. Another option is to use RGB-based 6D pose estimation systems, but these become failures when objects are heavily occluded—something very common in realworld cluttered scenes. This work addresses these challenges by proposing a tracking algorithm that combines physics predictions with vision information based on the particle filtering algorithm. The system uses the physics simulation and the robot’s joint states as
the motion model input and camera images as the observation model input. This setup helps recover the object’s pose even when it becomes temporarily occluded. Initially, the method used only RGB images to track a single object. Results showed that the physics-based approach worked well, outperforming baseline methods in tracking accuracy. However, using only RGB input struggled with occlusions and made it hard to handle multiple objects.
To overcome this, I proposed a new approach that uses depth images and a visibility score indicator. This enhanced method performs better than baseline systems when tracking multiple objects and dealing with heavy occlusions, while still maintaining good tracking accuracy.
Metadata
Supervisors: | Dogar, Mehmet and Wang, He |
---|---|
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Date Deposited: | 01 Oct 2025 08:48 |
Last Modified: | 01 Oct 2025 08:48 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:37339 |
Download
Final eThesis - complete (pdf)
Filename: zisong_final_thesis_new_verision_update_0819.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.