Journal Article

Mixture-model based estimation of gene expression variance from public database improves identification of differentially expressed genes in small sized microarray data

Mingoo Kim, Sung Bum Cho and Ju Han Kim

in Bioinformatics

Volume 26, issue 4, pages 486-492
Published in print February 2010 | ISSN: 1367-4803
Published online December 2009 | e-ISSN: 1460-2059 | DOI:

Show Summary Details


Motivation: The small number of samples in many microarray experiments is a challenge for the correct identification of differentially expressed gens (DEGs) by conventional statistical means. Information from public microarray databases can help more efficient identification of DEGs. To model various experimental conditions of a public microarray database, we applied Gaussian mixture model and extracted bi- or tri-modal distributions of gene expression. Prior variance of Baldi's Bayesian framework was estimate for the analysis of the small sample-sized datasets.

Results: First, we estimated the prior variance of a gene expression by pooling variances obtained from mixture modeling of large samples in the public microarray database. Then, using the prior variance, we identified DEGs in small sample-sized test datasets using the Baldi's framework. For benchmark study, we generated test datasets having several samples from relatively large datasets. Our proposed method outperformed other benchmark methods in terms of detecting gold-standard DEGs from the test datasets. The results may be a challenging evidence for usage of public microarray databases in microarray data analysis.

Availability: Supplementary data are available at


Journal Article.  4517 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.