Journal Article

Novel Phylogenetic Studies of Genomic Sequence Fragments Derived from Uncultured Microbe Mixtures in Environmental and Clinical Samples

Takashi Abe, Hideaki Sugawara, Makoto Kinouchi, Shigehiko Kanaya and Toshimichi Ikemura

in DNA Research

Published on behalf of Kazusa DNA Research Institute

Volume 12, issue 5, pages 281-290
Published in print January 2006 | ISSN: 1340-2838
Published online January 2005 | e-ISSN: 1756-1663 | DOI: http://dx.doi.org/10.1093/dnares/dsi015

Show Summary Details

Preview

A self-organizing map (SOM) was developed as a novel bioinformatics strategy for phylogenetic classification of sequence fragments obtained from pooled genome samples of uncultured microbes in environmental and clinical samples. This phylogenetic classification was possible without either orthologous sequence sets or sequence alignments. We first constructed SOMs for tetranucleotide frequencies in 210 000 5 kb sequence fragments obtained from 1502 prokaryotes for which at least 10 kb of genomic sequence has been deposited in public DNA databases. The sequences could be classified primarily according to phylogenetic groups without information regarding the species. We used the SOM method to classify sequence fragments derived from environmental samples of the Sargasso Sea and of an acidophilic biofilm growing in acid mine drainage. Phylogenetic diversity of the environmental sequences was effectively visualized on a single map. Sequences that were derived from a single genome but cloned independently could be reassociated in silico. G + C% has been used for a long period as a fundamental parameter for phylogenetic classification of microbes, but the G + C% is apparently too simple a parameter to differentiate a wide variety of known species. Oligonucleotide frequency can be used to distinguish the species because oligonucleotide frequencies vary significantly among their genomes.

Keywords: self-organizing map; environmental samples; metagenome; phylogenetic classification; SOM

Journal Article.  5398 words.  Illustrated.

Subjects: Genetics and Genomics

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.