Le but de cet article est de faire une revue des méthodes existantes pour l’analyse de données protéomiques issues de spectromètres de masse, et de présenter une nouvelle méthodologie pour l’extraction automatique de pics significatifs (bio-marqueurs). Pour les étapes de pré-traitement nécessaires pour des données issues de spectres MALDI-TOF ou SELDI-TOF, nous utilisons une approche purement nonparamétrique qui combine la transformée en ondelettes invariante par translation pour le débruitage et la régression quantile pénalisée à partir de splines pour la correction de la ligne de base. Nous présentons ensuite une technique d’alignement multi-échelle qui est basée sur l’identification des pics statistiquement significatifs dans un ensemble de spectres. Cette méthode permet de trouver les pics communs à un ensemble de spectres qui peuvent être associés aux protéines des individus. Ceux-ci peuvent servir de bio-marqueurs utiles pour des applications médicales, ou bien de vecteurs de caractéristiques pour une analyse statistique multi-dimensionnelle des individus. Des spectres MALDI-TOF obtenus à partir d’échantillons de sérum sont utilisés à travers tout l’article pour illustrer la méthodologie.
The goal of this paper is to review existing methods for protein mass spectrometry data analysis, and to present a new methodology for automatic extraction of significant peaks (biomarkers). For the pre-processing step required for data from MALDI-TOF or SELDI-TOF spectra, we use a purely nonparametric approach that combines stationary invariant wavelet transform for noise removal and penalized spline quantile regression for baseline correction. We further present a multi-scale spectra alignment technique that is based on identification of statistically significant peaks from a set of spectra. This method allows one to find common peaks in a set of spectra that can subsequently be mapped to individual proteins. This may serve as useful biomarkers in medical applications, or as individual features for further multidimensional statistical analysis. MALDI-TOF spectra obtained from serum samples are used throughout the paper to illustrate the methodology.
Mot clés : regression nonparamétrique, ondelettes, régression quantile, détection de pic, alignement de courbes, identification de biomarqueurs
@article{JSFS_2010__151_1_17_0, author = {Antoniadis, Anestis and Bigot, J\'er\'emie and Lambert-Lacroix, Sophie}, title = {Peaks detection and alignment for mass spectrometry data}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {17--37}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {151}, number = {1}, year = {2010}, mrnumber = {2652788}, zbl = {1316.62153}, language = {en}, url = {http://www.numdam.org/item/JSFS_2010__151_1_17_0/} }
TY - JOUR AU - Antoniadis, Anestis AU - Bigot, Jérémie AU - Lambert-Lacroix, Sophie TI - Peaks detection and alignment for mass spectrometry data JO - Journal de la société française de statistique PY - 2010 SP - 17 EP - 37 VL - 151 IS - 1 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2010__151_1_17_0/ LA - en ID - JSFS_2010__151_1_17_0 ER -
%0 Journal Article %A Antoniadis, Anestis %A Bigot, Jérémie %A Lambert-Lacroix, Sophie %T Peaks detection and alignment for mass spectrometry data %J Journal de la société française de statistique %D 2010 %P 17-37 %V 151 %N 1 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2010__151_1_17_0/ %G en %F JSFS_2010__151_1_17_0
Antoniadis, Anestis; Bigot, Jérémie; Lambert-Lacroix, Sophie. Peaks detection and alignment for mass spectrometry data. Journal de la société française de statistique, Tome 151 (2010) no. 1, pp. 17-37. http://www.numdam.org/item/JSFS_2010__151_1_17_0/
[1] Nonparametric Pre-processing Methods and Inference Tools for Analyzing Time-of-Flight Mass Spectrometry Data, Current Analytical Chemistry, Volume 3 (2007), pp. 127-147
[2] Normalization, baseline correction and alignment of high-throughput mass spectrometry data, Proceedings Gensips (2004) (in press)
[3] Baseline correction for second-harmonic detection with funable diode lasers, Anal. Chim. Acta, Volume 183 (1986), pp. 141-151
[4] A deconvolution approach to estimation of a common shape in a shifted curves model, Annals of Statistics, Volume to be published (2010) | MR | Zbl
[5] Smoothing under diffeomorphic constraints with homeomorphic splines, SIAM Journal on Numerical Analysis, Volume 48 (2010), pp. 224-243 | MR | Zbl
[6] Statistical M-Estimation and Consistency in large deformable models for Image Warping, Journal of Mathematical Imaging and Vision, Volume 34 (2009), pp. 270-290 | MR
[7] Estimation of translation, rotation and scaling between noisy images using the Fourier Mellin transform, SIAM Journal on Imaging Sciences, Volume 2 (2009), pp. 614-645 | MR | Zbl
[8] A scale-space approach with wavelets to singularity estimation, ESAIM: PS, Volume 9 (2005), pp. 143-164 | Numdam | MR | Zbl
[9] Landmark-based registration of curves via the continuous wavelet transform, Journal of Computational and Graphical Statistics, Volume 15 (2006) no. 3, pp. 542-564 | MR
[10] Translation invariant de-noising, Lecture Notes in Statistics, Volume 103 (1995), pp. 125-150 | Zbl
[11] Application of wavelet transform in processing chromatographic data, Walczak B (ed.) Wavelets in Chemistry, Elsevier Science, 2000
[12] Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra using the Undecimated Discrete Wavelet Transform, Proteomics, Volume 41 (2005), pp. 4107-4117
[13] Ten lectures on wavelets, SIAM, 1992 | MR | Zbl
[14] A pratical guide to splines, Vol. 27 of Applied Mathematical Sciences, Springer-Verlag, New-York, 1978 | MR | Zbl
[15] Curve and surface fitting with splines, Clarendon, Oxford, 1993 | MR | Zbl
[16] Searching for Structure in Curve Samples, Journal of the American Statistical Association, Volume 90 (1995) no. 432, pp. 1179-1188 | Zbl
[17] Overview of Image Matching Techniques, OEEPE - Applications of Digital Photogrammetric Work- stations, Proceedings, Lausanne, Switzerland (1996), pp. 173-191
[18] Algorithms for alignment of mass spectrometry proteomic data, Bioinformatics, Volume 21 (2005) no. 14, pp. 3066-3073
[19] High-speed peak matching algorithm for retention time alignment of gas chromatographic data, Journal of Chromatography A, Volume 996 (2003), pp. 141-155
[20] Regression quantiles, Econometrica, Volume 1 (1978), pp. 33-50 | MR | Zbl
[21] Statistical Tools to Analyze Data Representing a Sample of Curves, Annals of Statistics, Volume 20 (1992) no. 3, pp. 1266-1305 | MR | Zbl
[22] Quantile smoothing splines, Biometrika, Volume 81 (1994) no. 4, pp. 673-680 | MR | Zbl
[23] Semi-parametric estimation of shifts, Electronic Journal of Statistics, Volume 1 (2007), pp. 616-640 | MR | Zbl
[24] A Wavelet Tour of Signal Processing. 2nd ed., San Diego: Academic Press, 1999 | MR
[25] Application of wavelet transform to background correction in inductively coupled plasma atomic emission spectrometry, Anal. Chim. Acta, Volume 485 (2003) no. 2, pp. 233-239
[26] Bayesian PIXE background subtraction, Nucl. Instrum. Methods Phys. Res. B, Volume 150 (1999), pp. 129-135
[27] Multiscale processing of mass spectrometry data, Biometrics, Volume 59 (2003), pp. 143-151 | MR | Zbl
[28] The use of classification in baseline correction of FT NMR spectra, J. Magn. Reson. Ser. A, Volume 102 (1993), pp. 357-359
[29] Baseline subtraction using robust local regression estimation, Journal of Quantitative Spectroscopy and Radiative Transfer, Volume 68 (2001) no. 2, pp. 179-193
[30] Curve Registration, Journal of the Royal Statistical Society, Series B, Volume 60 (1998), pp. 351-363 | MR | Zbl
[31] Multiscale processing of mass spectrometry data, Biometrics, Volume 62 (2006) no. 2, pp. 589-597 (in press) | MR | Zbl
[32] The rpm package: aligning LC/MS mass spectra with R, Interdisciplinary Center for Scientific Computing, University of Heidelberg, Germany UseR2006 (2006)
[33] Wavelet DeNoising for Unequally Spaced Data, Statistics and Computing, Volume 9 (1999) no. 1, pp. 65-75
[34] Sample classification from protein mass spectrometry, by peak probability contrasts, Bioinformatics, Volume 20 (2004) no. 17, pp. 3034-3044
[35] Application of mathematical procedures to background correction and multivariate analysis in inductively coupled plasma-optical emission spectrometry, Spectrochimica Acta Part B: Atomic Spectroscopy, Volume 53 (1998) no. 5, pp. 639-669
[36] Fast and precise automatic baseline correction of one- and two-dimensional NMR spectra, J. Magn. Reson., Volume 91 (1991), pp. 1-11
[37] GACV for Quantile Smoothing Splines, Computational Statistics and Data Analysis, Volume 50 (2006) no. 3, pp. 813-829 | MR | Zbl
[38] Detecting and Aligning Peaks in Analyzing MALDI Mass Spectrometry Data, Computational Biology and Chemistry, Volume 30 (2006), pp. 27-38 | Zbl
[39] Aligning spectral peaks in mass spectrometry data with a robust point matching approach, In 52nd ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN, May (2004), pp. 23-27