Fusi, Nicolo (2015) Probabilistic Latent Variable Models in Statistical Genomics. PhD thesis, University of Sheffield.
Abstract
In this thesis, we propose different probabilistic latent variable mod-
els to identify and capture the hidden structure present in commonly
studied genomics datasets. We start by investigating how to cor-
rect for unwanted correlations due to hidden confounding factors in
gene expression data. This is particularly important in expression
quantitative trait loci (eQTL) studies, where the goal is to identify
associations between genetic variants and gene expression levels. We
start with a na¨ ıve approach, which estimates the latent factors from
the gene expression data alone, ignoring the genetics, and we show
that it leads to a loss of signal in the data. We then highlight how,
thanks to the formulation of our model as a probabilistic model, it is
straightforward to modify it in order to take into account the specific
properties of the data. In particular, we show that in the na¨ ıve ap-
proach the latent variables ”explain away” the genetic signal, and that
this problem can be avoided by jointly inferring these latent variables
while taking into account the genetic information. We then extend
this, so far additive, model to additionally detect interactions between
the latent variables and the genetic markers. We show that this leads
to a better reconstruction of the latent space and that it helps dis-
secting latent variables capturing general confounding factors (such
as batch effects) from those capturing environmental factors involved
in genotype-by-environment interactions. Finally, we investigate the
effects of misspecifications of the noise model in genetic studies, show-
ing how the probabilistic framework presented so far can be easily ex-
tended to automatically infer non-linear monotonic transformations of
the data such that the common assumption of Gaussian distributed
residuals is respected.
Metadata
Supervisors: | Lawrence, Neil |
---|---|
Awarding institution: | University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield) |
Identification Number/EthosID: | uk.bl.ethos.638984 |
Depositing User: | Mr Nicolo Fusi |
Date Deposited: | 13 Mar 2015 09:45 |
Last Modified: | 03 Oct 2016 12:09 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:8326 |
Download
final_version_thesis
Filename: final_version_thesis.pdf
Licence:
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.