Transactions on Mass-Data Analysis of Images and Signals (ISSN:1868-6451)
Volume 1 - Number 2 - September 2009 - Pages 178-197
Identification of Protein Biomarkers using Geostatistics and Block-Error Matching Strategy
T. D. Pham, D. Beck and M. Brandl
Bioinformatics Research Group, School of Engineering and Information Technology, University of New South Wales, ADFA, Canberra, ACT 2600, Australia
The discovery of protein biomarkers has been predominantly performed with serum or plasma for early prediction of diseases and new drug discovery. Our objective is to develop a novel pattern classification strategy for protein biomarker identification and early disease prediction using mass spectrometry data. We applied the theory of geostatistics to extract the statistically spatial features of the mass spectrometry (MS) peak signals of the control and patient populations in a linear prediction fashion. We then used the mathematical framework of signal error matching to estimate the dissimilarity between the MS peaks in the form of vectorized spatial prediction coefficients. We finally applied the minimum decision rule and the majority vote to classify the samples based on the best match between the unkown and known MS peaks. We used high-throughput, SELDI-TOF MS data to acquire the protein profiles from patient and control populations. While the minimum decision rule outperformed other two current methods, the majority voting rule gave the best performance among other six techniques. We found that The combination of geostatistics-based variance error derivation, linear prediction, and signal distortion measure for identifying useful MS peaks are able to provide an average classification rate better than some benchmark methods. The proposed approach is promising as a general computational bioinformatic model for proteomic-pattern based biomarker discovery.