On the effects of large-scale transcriptomics datasets on gene functional analyses

Bhat, Prajwal

(2012)

Bhat, Prajwal (2012) On the effects of large-scale transcriptomics datasets on gene functional analyses.

Our Full Text Deposits

Full text access: Open

Full text file - 3.86 MB

Abstract

The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. In this thesis we show that using such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. The study begins by discussing the basic principles underlying the definition of gene function and the use of large microarray collections for GBA based gene function analyses. We look at the effects of condition specific gene expression on GBA analyses and provide a mathematical and biological perspective. We show that using large microarray collections to calculate correlation can mask the effectiveness of the GBA principle. We suggest that using only those experiments that are relevant to the biological function under analysis can significantly improve GBA based gene functional analyses. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. The algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for Yeast and Arabidopsis. We demonstrate that: (i) using the selected experiments there is a statistically significant improvement both in correlation between genes in the functional category of interest and in GBA based function predictions; (ii) the effectiveness of the selected experiments increases with annotation specificity; (iii) our algorithm can be successfully applied to GBA based pathway reconstruction. We conclude by discussing the potential applications of our technique. We outline several developments that could be implemented in the future to improve the efficiency of the experiment selection procedure.

Information about this Version

This is a Approved version
This version's date is: 2012
This item is not peer reviewed

Link to this Version

https://repository.royalholloway.ac.uk/items/f63c7106-2a4f-6437-210a-2f8829b7f61a/9/

Item TypeThesis (Doctoral)
TitleOn the effects of large-scale transcriptomics datasets on gene functional analyses
AuthorsBhat, Prajwal
DepartmentsFaculty of Science\Biological Science

Identifiers

Deposited by Research Information System (atira) on 18-Nov-2014 in Royal Holloway Research Online.Last modified on 07-Feb-2017


Details