Journal Article

A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide

Jonathan D. Wren

in Bioinformatics

Volume 25, issue 13, pages 1694-1701
Published in print July 2009 | ISSN: 1367-4803
Published online May 2009 | e-ISSN: 1460-2059 | DOI:
A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology


Show Summary Details


Motivation: Approximately 9334 (37%) of Human genes have no publications documenting their function and, for those that are published, the number of publications per gene is highly skewed. Furthermore, for reasons not clear, the entry of new gene names into the literature has slowed in recent years. If we are to better understand human/mammalian biology and complete the catalog of human gene function, it is important to finish predicting putative functions for these genes based upon existing experimental evidence.

Results: A global meta-analysis (GMA) of all publicly available GEO two-channel human microarray datasets (3551 experiments total) was conducted to identify genes with recurrent, reproducible patterns of co-regulation across different conditions. Patterns of co-expression were divided into parallel (i.e. genes are up and down-regulated together) and anti-parallel. Several ranking methods to predict a gene's function based on its top 20 co-expressed gene pairs were compared. In the best method, 34% of predicted Gene Ontology (GO) categories matched exactly with the known GO categories for ∼5000 genes analyzed versus only 3% for random gene sets. Only 2.4% of co-expressed gene pairs were found as co-occurring gene pairs in MEDLINE.

Conclusions: Via a GO enrichment analysis, genes co-expressed in parallel with the query gene were frequently associated with the same GO categories, whereas anti-parallel genes were not. Combining parallel and anti-parallel genes for analysis resulted in fewer significant GO categories, suggesting they are best analyzed separately. Expression databases contain much unexpected genetic knowledge that has not yet been reported in the literature. A total of 1642 Human genes with unknown function were differentially expressed in at least 30 experiments.

Availability: Data matrix available upon request.


Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  6351 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.