Journal Article

Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation

Theodore Alexandrov, Jens Decker, Bart Mertens, Andre M. Deelder, Rob A. E. M. Tollenaar, Peter Maass and Herbert Thiele

in Bioinformatics

Volume 25, issue 5, pages 643-649
Published in print March 2009 | ISSN: 1367-4803
Published online January 2009 | e-ISSN: 1460-2059 | DOI:

Show Summary Details


Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i.e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls.

Results: Our procedure provided a high recognition rate (97.3%), sensitivity (98.4%) and specificity (95.8%). The extracted biomarker patterns mostly represent the peaks expressing mean differences between the cancer and control spectra. However, we showed that the discriminative power of a peak is not simply expressed by its mean height and cannot be derived by comparison of the mean spectra. The obtained classifiers have high generalization power as measured by the number of support vectors. This prevents overfitting and contributes to the reproducibility of the results, which is required to find biomarkers differentiating cancer patients from healthy individuals.

Availability: The data and scripts used in this study are available at


Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  5709 words.  Illustrated.

Subjects: Bioinformatics and Computational Biology

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.