Journal Article

Identification of pseudogenes in the <i>Drosophila melanogaster</i> genome

Paul M. Harrison, Duncan Milburn, Zhaolei Zhang, Paul Bertone and Mark Gerstein

in Nucleic Acids Research

Volume 31, issue 3, pages 1033-1037
Published in print February 2003 | ISSN: 0305-1048
Published online February 2003 | e-ISSN: 1362-4962 | DOI:
Identification of pseudogenes in the Drosophila melanogaster genome

More Like This

Show all results sharing these subjects:

  • Chemistry
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Genetics and Genomics
  • Molecular and Cell Biology


Show Summary Details


Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are classed as either processed pseudogenes (made by reverse transcription from an mRNA) or duplicated pseudogenes, arising from duplication in the genomic DNA and subsequent disablement. Historically, there is anecdotal evidence that the fruit fly (Drosophila melanogaster) has few pseudogenes. Investigators have linked this to a high deletion rate of genomic DNA, for which there is evidence from genetic experiments on genome size. Here, we apply a homology‐based pipeline that was developed previously to identify pseudogenes in other eukaryotic genomes, to the fruit fly, so as to derive the first complete survey of its pseudogene population. We find approximately 100 pseudogenes, with at least a sixth of these as candidate processed pseudogenes. This gives a much lower proportion of pseudogenes (compared with the size of the proteome) than in the genomes of other eukaryotes for which data are available (human, nematode and budding yeast). Closest matching proteins to Drosophila pseudogenes are significantly longer than the average protein in its proteome (up to ∼60% more than the average protein’s length), in contrast to the situation in the three other eukaryotic genomes. This may be due to the persistence of fragments of longer genes. In the fly pseudogene population, we found most pseudogenes for serine proteases (which are more abundant in the Drosophila lineage compared with the other eukaryotes), immunoglobulin‐motif‐containing proteins and cytochromes P450. Data on the sequences and positions of the putative pseudogenes are available at: The detection of a small number of pseudogenes in the Drosophila genome and the higher mean length for the closest matching proteins to pseudogenes (possibly because remnants of genes encoding longer proteins are more likely to persist) are further evidence for a high deletion rate of genomic DNA in the fruit fly. The data are useful for molecular evolution study in Drosophila.

Journal Article.  3420 words.  Illustrated.

Subjects: Chemistry ; Biochemistry ; Bioinformatics and Computational Biology ; Genetics and Genomics ; Molecular and Cell Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.