Journal Article

Experimental Analysis of Sources of Error in Evolutionary Studies Based on Roche/454 Pyrosequencing of Viral Genomes

Ericka A. Becker, Charles M. Burns, Enrique J. León, Saravanan Rajabojan, Robert Friedman, Thomas C. Friedrich, Shelby L. O'Connor and Austin L. Hughes

in Genome Biology and Evolution

Published on behalf of Society for Molecular Biology and Evolution

Volume 4, issue 4, pages 457-465
Published in print January 2012 |
Published online March 2012 | e-ISSN: 1759-6653 | DOI:

More Like This

Show all results sharing these subjects:

  • Bioinformatics and Computational Biology
  • Evolutionary Biology
  • Genetics and Genomics


Show Summary Details


Factors affecting the reliability of Roche/454 pyrosequencing for analyzing sequence polymorphism in within-host viral populations were assessed by two experiments: 1) sequencing four clonal simian immunodeficiency virus (SIV) stocks and 2) sequencing mixtures in different proportions of two SIV strains with known fixed nucleotide differences. Observed nucleotide diversity and frequency of undetermined nucleotides were increased at sites in homopolymer runs of four or more identical nucleotides, particularly at AT sites. However, in the mixed-strain experiments, the effects on estimated nucleotide diversity of such errors were small in comparison to known strain differences. The results suggest that biologically meaningful variants present at a frequency of around 10% and possibly much lower are easily distinguished from artifacts of the sequencing process. Analysis of the clonal stocks revealed numerous rare variants that showed the signature of purifying selection and that elimination of variants at frequencies of less than 1% reduced estimates of nucleotide diversity by about an order of magnitude. Thus, using a 1% frequency cutoff for accepting a variant as real represents a conservative standard, which may be useful in studies that are focused on the discovery of specific mutations (such as those conferring immune escape or drug resistance). On the other hand, if the goal is to estimate nucleotide diversity, an optimal strategy might be to include all observed variants (even those at less than 1% frequency), while masking out homopolymer runs of four or more nucleotides.

Keywords: pyrosequencing; natural selection; simian immunodeficiency virus; homopolymer

Journal Article.  5726 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology ; Evolutionary Biology ; Genetics and Genomics

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.