Journal Article

SOLpro: accurate sequence-based prediction of protein solubility

Christophe N. Magnan, Arlo Randall and Pierre Baldi

in Bioinformatics

Volume 25, issue 17, pages 2200-2207
Published in print September 2009 | ISSN: 1367-4803
Published online June 2009 | e-ISSN: 1460-2059 | DOI:
SOLpro: accurate sequence-based prediction of protein solubility

More Like This

Show all results sharing this subject:

  • Bioinformatics and Computational Biology


Show Summary Details


Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins.

Results: Here, we first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, we extract and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to standard evaluation metrics, with an overall accuracy of over 74% estimated using multiple runs of 10-fold cross-validation.

Availability: SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at:


Supplementary information: Supplementary data are available at Bioinformatics online.

Journal Article.  6887 words. 

Subjects: Bioinformatics and Computational Biology

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.