Improving the understanding of the complexity of
molecular pathways underlying cancer phenotypes is essential to
uncovering the dynamic processes of cancer development. As part of
this, linking quantified, experimentally defined gene expression
signatures with known biological pathway gene sets is a key challenge.
This dissertation presents a novel Bayesian statistical approach to
this pathway annotation problem.
In my approach, a formal probabilistic model
delivers probabilities over pathways for an experimental signature,
thus allowing a quantitative assessment and ranking of pathways
putatively linked to the experimental phenotype. The fundamental
advantage of this approach is formal modeling of the uncertainty in the
pathway analysis. Biological understanding of the data and knowledge
are incorporated in the model. In addition, coherent inference on
uncertainties about gene pathway membership highlights a key benefit of
this model-based approach.
Technically, this research involves advanced
statistical modeling and high-dimensional computation. Analysis of the
models uses Markov chain Monte Carlo techniques and variational methods
for statistical computation. To evaluate model evidence, a critical
component of pathway analysis, I propose an innovative Monte Carlo
variational method that provides optimal upper and lower bounds on
model evidence. This method, motivated and developed by genomic pathway
analysis, is in fact general and represents an advance in statistical
model-based computation of much broader utility.
The effectiveness and robustness of my approach are
tested through simulation studies as well as analyses of real data
sets, including “proof-of-principle” pathway annotation for breast
tumor estrogen-receptor and ErbB2 phenotypes. A study of pathway
activities underlying the cellular response to lactic acidosis
micro-environment in breast tumors involves the analyses of both in
vitro and in vivo data, and demonstrates the application of the method
in decomposing the complexity of gene expression based predictions
about interacting pathway activation in this cancer context.
_______________________________________
This thesis is available in PDF format (5.2MB).