Li, Sijia ORCID: https://orcid.org/0000-0002-0302-9541 (2022) Bayesian Networks' Reliability and Multiway Networks for Gaussian and Non-Gaussian Distributed Data. PhD thesis, University of Leeds.
Abstract
This thesis focuses on two topics in the area of statistical modelling applications.
The first topic concerns the evaluation of the reliability of Bayesian Hierarchical Models for scRNAseq data. Bayesian Hierarchical Models (BHM) are used in various application fields such as biology, social science and engineering for identification of confounding factors, thus enabling the extraction of the information of interest. BHMs are typically formulated by specifying the data model, the parameters model and the prior distributions. The posterior inference of a BHM depends on both the model specification and the computational algorithm used. We use the term "reliability" to indicate a methodology's ability to recover the "ground truth" or the underlying distribution embedded in the data. Testing the reliability of a BHM is an open question. The most straightforward way to test the reliability of a BHM inference is to compare the posterior distributions with the ground truth value of the model parameters, when available. However, when dealing with experimental data, the true value of the underlying parameters is typically unknown. In these situations, numerical experiments based on synthetic datasets generated from the model itself offer a natural approach to check model performance and posterior estimates. In this thesis, we show how to test the reliability of a BHM. We introduce a change in the model assumptions to allow for prior contamination, and develop a simulation-based evaluation framework to assess the reliability of the inference of a given BHM. We illustrate our approach on a specific BHM used for Bayesian analysis of scRNAseq Data (BASiCS).
The second topic considers the problem of efficient multi-way network inference for non-Gaussian data. Classically, statistical datasets have more data points than features (n > p). The standard model of classical statistics caters for the case where data points are considered conditionally independent given the parameters. However, for n~p or p>n data such models are poorly determined. Kalaitzis et al. (2013) introduced the Bigraphical Lasso, a method for two-way network inference in both samples and features. Greenwald et al. (2019) introduced an algorithm for the inference of the multi-way version. Both methods estimate sparse precision matrices based on the Cartesian product of Gaussian Markov random field graphs. However, the theoretical foundation of such models has some gaps in the previous literature, to the best of my knowledge. In this thesis we formally give and prove a theorem as the theoretical foundation of multi-way graphical models. Moreover, the original Bigraphical Lasso algorithm is not applicable in case of large p and n due to memory requirements. In this thesis we present Scalable Bigraphical Lasso, a novel version of the algorithm which exploits eigenvalue decomposition of the Cartesian product graph, and matrix algebra, to reduce the memory requirements from O(n^2p^2) to O(n^2 + p^2), and to improve the computational efficiency. We also present the Scalable K-graphical Lasso method for multi-way network inference, leveraging eigenvalue decomposition to simultaneously infer hidden structures in tensor-valued data. Finally, many datasets in different application fields, such as biology, medicine and social science, come as non-Gaussian data, for which Gaussian based models such as the original Bigraphical model and its multi-way version are not applicable. Thus, we extend our multi-way network inference approach so that it can be used for non-Gaussian data. In summary, our methodology accounts for the dependencies across different directions in datasets, reduces the computational complexity for high dimensional data and enables us to deal with both discrete and continuous data. Numerical studies on both synthetic and real datasets are presented to showcase the performance of our methods.
Metadata
Supervisors: | Cutillo, Luisa and López-García, Martín |
---|---|
Related URLs: | |
Keywords: | Bayesian Hierarchical Model, Singe-cell Sequencing Data, Parameter Calibration, Simulation Based Calibration, Network Inference, Gaussian Copula, Dimension Reduction, Efficient Inference, Tensor Valued Data |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Depositing User: | Sijia Li |
Date Deposited: | 09 Jan 2023 10:40 |
Last Modified: | 01 Jan 2024 01:06 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:31927 |
Download
Final eThesis - complete (pdf)
Filename: Li_S_Mathematics_PhD_2022.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.