Journal Article

Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations

Gerard Wong, Christopher Leckie, Kylie L. Gorringe, Izhak Haviv, Ian G. Campbell and Adam Kowalczyk

in Bioinformatics

Volume 26, issue 8, pages 1007-1014
Published in print April 2010 | ISSN: 1367-4803
Published online February 2010 | e-ISSN: 1460-2059 | DOI:
Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology


Show Summary Details


Motivation: High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed.

Results: We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Dectecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them.

Availability: The MATLAB implementation of DRECS is available at∼gwong/DRECS/index.html


Supplementary information: Supplementary information is available at Bioinformatics online.

Journal Article.  6044 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.