Journal Article

A knowledge-based approach to predict intragenic deletions or duplications

Krishna R. Kalari, Thomas L. Casavant and Todd E. Scheetz

in Bioinformatics

Volume 24, issue 18, pages 1975-1979
Published in print September 2008 | ISSN: 1367-4803
Published online July 2008 | e-ISSN: 1460-2059 | DOI:
A knowledge-based approach to predict intragenic deletions or duplications

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology


Show Summary Details


Motivation: Despite recent improvements in high-throughput or classic molecular biology approaches it is still challenging to identify intermediate resolution genomic variations (50 bp to 50 kb). Although array-based technologies can be used to detect copy number variations in the human genome they are biased to detect only the largest such deletions or duplications. Several studies have identified deletions or duplications occurring within a gene that directly cause or predispose to disease. We have developed a novel computational system, SPeeDD (system to prioritize deletions or duplications) that utilizes machine learning techniques to predict likely candidate regions that delete or duplicate exon(s) within a gene.

Results: Data mining and machine learning methods were applied to identify sequence features that were predictive of homologous recombination events. The logistic model tree (LMT) method yielded the best results. Sensitivity varied from 20% to 71.6% depending on the specific machine learning model used, but specificity exceeded 90% for all methods evaluated. In addition, the SPeeDD system successfully predicted and prioritized a recently published novel BRCA1 mutation.

Conclusions: Results suggest that the SPeeDD system is effective at prioritizing candidate deletions and duplications within a gene. Use of SPeeDD enables more focused screening, which reduces the labor and associated costs of the molecular assays and may also lead to targeted design of new array-based screens to focus on candidate areas to accelerate the process of mutation discovery.


Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  3666 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.