A better way to find biomarkers in infectious disease

09 May 2013

NIMR scientists have developed a new statistical technique for exploring infectious diseases. The research is published in PLOS Computational Biology.

Biomarkers can help in the challenge of understanding infectious diseases such as tuberculosis and malaria. A biomarker is a biological measurement that can be used to measure the presence of or progress of a disease. They are valuable both for scientists trying to understand the biochemical basis of the disease and for medical doctors making diagnosis. Researchers can identify new biomarkers by finding small subsets of relevant variables in ‘omics data that correlate with the clinical syndromes of interest. Despite the fact that most clinical phenotypes (e.g. diseases) are characterised by a complex set of clinical parameters, with a variable degree of overlap, current computational approaches do not take into consideration the multivariate nature of the phenotypes.

To overcome this limitation Delmiro Fernandez-Reyes (pictured), from NIMR’s Division of Parasitology, has proposed a new method for biomarker discovery that works by finding canonical correlations between two sets of data, the plasma proteomic profiles and a set of clinical data composed of patient history, signs, symptoms and clinical laboratory measurements of the individuals with syndromes of interest.

This approach, based on asymmetrical sparse canonical correlation analysis (SCCA), finds multivariate correlations between the ‘omics measurements and the clinical data. The researchers correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. They discovered relevant ‘omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5–3% of all ‘omic variables.

In this scenario the (expensive) proteomics data is only required for setting up the models. Predictions can then be made in the clinical setting using the previously learned biomarker model. The researchers believe that this is a realistic setup considering real-world deployment of decision support systems into resource-poor health care centres.

We have shown that via canonical correlation analysis, it is possible to make use of proteomic data in order to improve on the diagnostic classification, even if no proteomics data is available at the time of prediction, only at the time of training the model. This is a close match to a real-world scenario of deploying a diagnostic tool to health care centres without expensive ‘omics measurement capabilities. The analysis can also be used in the opposite way, using clinical data to extract more predictive biomarkers from proteomics data, and thus enhance the understanding of the systems biology underlying the complex phenotypes.

Delmiro Fernandez-Reyes

Original article

Rousu J, Agranoff DD, Sodeinde O, Shawe-Taylor J, Fernandez-Reyes D (2013)

Biomarker discovery by sparse canonical correlation analysis of complex clinical phenotypes of tuberculosis and malaria.

PLOS Computational Biology 9(4): e1003018. Full-text of article.

News archive

Top of page

© MRC National Institute for Medical Research
The Ridgeway, Mill Hill, London NW7 1AA