Journal Article

Estimating population haplotype frequencies from pooled SNP data using incomplete database information

Matti Pirinen

in Bioinformatics

Volume 25, issue 24, pages 3296-3302
Published in print December 2009 | ISSN: 1367-4803
Published online October 2009 | e-ISSN: 1460-2059 | DOI:
Estimating population haplotype frequencies from pooled SNP data using incomplete database information

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology


Show Summary Details


Motivation: Information about haplotype structures gives a more detailed picture of genetic variation between individuals than single-locus analyses. Databases that contain the most frequent haplotypes of certain populations are developing rapidly (e.g. the HapMap database for single-nucleotide polymorphisms in humans). Utilization of such prior information about the prevailing haplotype structures makes it possible to estimate the haplotype frequencies also from large DNA pools. When genetic material from dozens of individuals is pooled together and analysed in a single genotyping, the overall number of genotypings and the costs of the genetic studies are reduced.

Results: A Bayesian model for estimating the haplotypes and their frequencies from pooled allelic observations is introduced. The model combines an idea of using database information for haplotype estimation with a computationally efficient multinormal approximation. In addition, the model treats the number and structures of the unknown haplotypes as random variables whose joint posterior distribution is estimated. The results on real human data from the HapMap database show that the proposed method provides significant improvements over the existing methods.

Availability: A reversible-jump Markov chain Monte Carlo algorithm for analysing the model is implemented in a program called Hippo (Haplotype estimation under incomplete prior information using pooled observations). For comparisons, an approximate expectation-maximization algorithm (EM-algorithm) that utilizes database information about the existing haplotypes is implemented in a program called AEML. The source codes written in C (using GNU Scientific Library) are available at∼mpirinen.


Journal Article.  6189 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.