Journal Article

A visual framework for sequence analysis using <i>n</i>-grams and spectral rearrangement

Stefan R. Maetschke, Karin S. Kassahn, Jasmyn A. Dunn, Siew-Ping Han, Eva Z. Curley, Katryn J. Stacey and Mark A. Ragan

in Bioinformatics

Volume 26, issue 6, pages 737-744
Published in print March 2010 | ISSN: 1367-4803
Published online February 2010 | e-ISSN: 1460-2059 | DOI: http://dx.doi.org/10.1093/bioinformatics/btq042
A visual framework for sequence analysis using n-grams and spectral rearrangement

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology

GO

Show Summary Details

Preview

Motivation: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution.

Results: Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis.

Availability: A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/

Contact: m.ragan@uq.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  5273 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.