Preview
Motivation: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution.
Results: Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis.
Availability: A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/
Contact: m.ragan@uq.edu.au
Supplementary information: Supplementary data are available at Bioinformatics online.
Journal Article. 5273 words. Illustrated.
Subjects: Bioinformatics and Computational Biology
Go to Oxford Journals » abstract
Full text: subscription required
How to subscribe Recommend to my Librarian
Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.