Practical proteomic biomarker discovery: taking a step back to leap forward
by Jennifer Listgarten and Andrew Emili
Synopsis: Proteomic screening methods for clinical diagnosis currently (to date, which was December 2005) have not been proven robust enough for clinical deployment. They point to a need for an airtight, carefully designed, well-executed experiement that can work as a proof-of-concept. Other points they raise:
- these are inherently observational studies, not experiments.
- dominant signals distinguishing affect versus control samples are more likely to be recognized, and may not be clinically informative
- diagnostically useful samples are by definition hard to come by (e.g people with early stage cancers exhibiting little to no symptoms don’t usually volunteer to give you serum sample)
- more attention needs to be paid to ensuring reproducible results
- a collective target or benchmark / generic experiment like CASP (for protein structure prediction) or CAPRI (for protein-protein interaction) would be helpful, especially some sort of modular suite of benchmark experiments
They cite 45 different papers, and it’s a good and thoroughly readible introduction to the exciting field of biomarker discovery. This was back in ’05, I wonder if any progress has been made towards benchmarking since then?
Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-Tandem Mass Spectormetry
by Jennifer Listgarten and Andrew Emili.
Synopsis: This extensive review of LS-MS/MS leads the reader from the beginning of an LC-MS/MS experiment to the end. At various stages in between, they identify different approaches and comment on their relative merits. They also talk about MALDI/SELDI mass spec instruments as well. Thought there are many methods evaluated in the paper, they can be broadly grouped into three:
- low-level processing, such as forming a data matrix, filtering, baseline estimation and subtraction
- mid-level processing, such as data normalization, alignment in the time domain, peak detection, peak quantification, peak matching, and the most difficult peak interpretation problem of all: twin peaks.
- high-level processing, such as sample classification/characterization via feature selection, significance testing/multiple testing.
Another paper by these two, and a good primer if you’re new to problems in MS (as I am). Having spoken to the second author recently, it seems as if most if not all of the processing steps contain open problems.