Journal Article

Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest

Usman Roshan, Satish Chikkagoudar, Zhi Wei, Kai Wang and Hakon Hakonarson

in Nucleic Acids Research

Volume 39, issue 9, pages e62-e62
Published in print May 2011 | ISSN: 0305-1048
Published online February 2011 | e-ISSN: 1362-4962 | DOI: https://dx.doi.org/10.1093/nar/gkr064

More Like This

Show all results sharing these subjects:

  • Chemistry
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Genetics and Genomics
  • Molecular and Cell Biology

GO

Show Summary Details

Preview

We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.

Journal Article.  6178 words.  Illustrated.

Subjects: Chemistry ; Biochemistry ; Bioinformatics and Computational Biology ; Genetics and Genomics ; Molecular and Cell Biology

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content. subscribe or login to access all content.