L’inférence de réseaux génétiques à partir de données issues de biopuces est un des défis majeurs de l’ère post-génomique, en partie à cause du grand nombre de réseaux possibles et de la quantité relativement faible de données disponibles. Dans ce contexte, la théorie des modèles graphiques gaussiens est un outil efficace pour la reconstruction de réseaux.
A travers ce travail nous proposons une approche d’inférence de réseaux de régulation à partir d’un a priori biologique robuste sur la structure des réseaux afin de limiter le nombre de candidats possibles.
Les voies métaboliques, qui rendent compte des connaissances biologiques des réseaux de régulation, nous permettent de définir cet a priori. Cette approche est basée sur la sélection d’un ensemble de gènes pertinents, appelé “signature moléculaire”, potentiellement associé à un phénotype d’intérêt (par exemple les gènes impliqués dans le développement d’une pathologie). Dans ce contexte, l’analyse différentielle est la strategie prédominante. Néanmoins les signatures de gènes diffèrent d’une étude à l’autre et la robustesse de telles approches peut être remise en question. Ainsi, la première partie de notre travail consistera en l’amélioration de la stratégie d’identification des gènes les plus informatifs afin de garantir la robustesse et la reproductibilité de la signature moléculaire.
Notre approche vise à comparer les réseaux inférés dans différentes conditions d’étude et à faciliter l’interprétation biologique des résultats. Ainsi, elle permet de mettre en avant des régulations différentielles entre ces conditions.
Nous appliquerons notre méthode à l’étude de la réponse au traitement dans le cancer du sein.
Inferring genetic networks from gene expression data is one of the most challenging work in the post-genomic era, partly due to the vast space of possible networks and the relatively small amount of data available. In this field, Gaussian Graphical Model (GGM) provides a convenient framework for the discovery of biological networks.
In this paper, we propose an original approach for inferring gene regulation networks using a robust biological prior on their structure in order to limit the set of candidate networks.
Pathways, that represent biological knowledge on the regulatory networks, will be used as an informative prior knowledge to drive Network Inference. This approach is based on the selection of a relevant set of genes, called the “molecular signature”, associated with a condition of interest (for instance, the genes involved in disease development). In this context, differential expression analysis is a well established strategy. However outcome signatures are often not consistent and show little overlap between studies. Thus, we will dedicate the first part of our work to the improvement of the standard process of biomarker identification to guarantee the robustness and reproducibility of the molecular signature.
Our approach enables to compare the networks inferred between two conditions of interest (for instance case and control networks) and help along the biological interpretation of results. Thus it allows to identify differential regulations that occur in these conditions. We illustrate the proposed approach by applying our method to a study of breast cancer’s response to treatment.
Mot clés : Inférence de réseaux, Modèle graphique gaussien, Pénalisation $\ell _1$, Information a priori, Analyse de voies métaboliques
@article{JSFS_2011__152_2_97_0, author = {Jeanmougin, Marine and Guedj, Mickael and Ambroise, Christophe}, title = {Defining a robust biological prior from {Pathway} {Analysis} to drive {Network} {Inference}}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {97--110}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {152}, number = {2}, year = {2011}, mrnumber = {2821224}, zbl = {1316.92050}, language = {en}, url = {http://www.numdam.org/item/JSFS_2011__152_2_97_0/} }
TY - JOUR AU - Jeanmougin, Marine AU - Guedj, Mickael AU - Ambroise, Christophe TI - Defining a robust biological prior from Pathway Analysis to drive Network Inference JO - Journal de la société française de statistique PY - 2011 SP - 97 EP - 110 VL - 152 IS - 2 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2011__152_2_97_0/ LA - en ID - JSFS_2011__152_2_97_0 ER -
%0 Journal Article %A Jeanmougin, Marine %A Guedj, Mickael %A Ambroise, Christophe %T Defining a robust biological prior from Pathway Analysis to drive Network Inference %J Journal de la société française de statistique %D 2011 %P 97-110 %V 152 %N 2 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2011__152_2_97_0/ %G en %F JSFS_2011__152_2_97_0
Jeanmougin, Marine; Guedj, Mickael; Ambroise, Christophe. Defining a robust biological prior from Pathway Analysis to drive Network Inference. Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 97-110. http://www.numdam.org/item/JSFS_2011__152_2_97_0/
[1] Inferring sparse Gaussian graphical models with latent structure, Electronic Journal of Statistics, Volume 3 (2009), pp. 205-238 | MR | Zbl
[2] The public road to high-quality curated biological pathways, Drug Discovery Today, Volume 13 (2008)
[3] Informative structure priors: joint learning of dynamic regulatory networks from multiple types of data., Pac Symp Biocomput (2005), pp. 459-470
[4] Random Forests, Machine Learning, Volume 45 (2001), pp. 5-32 (10.1023/A:1010933404324) | Zbl
[5] Inferring Multiple Graph Structures, Statistics and Computing (2010) | MR | Zbl
[6] Calmodulin modulates Akt activity in human breast cancer cell lines., Breast Cancer Res Treat, Volume 115 (2009) no. 3, pp. 545-560 | DOI
[7] Modeling and simulation of genetic regulatory systems: a literature review., J Comput Biol, Volume 9 (2002) no. 1, pp. 67-103 | DOI | MR
[8] Outcome signature genes in breast cancer: is there a unique set?, Bioinformatics, Volume 21 (2005) no. 2, pp. 171-178 | DOI
[9] Using Bayesian networks to analyze expression data., J Comput Biol, Volume 7 (2000) no. 3-4, pp. 601-620 | DOI
[10] Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets., Nat Genet, Volume 38 (2006) no. 3, pp. 285-293 | DOI
[11] Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer., J Clin Oncol, Volume 24 (2006) no. 26, pp. 4236-4244 | DOI
[12] Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies, PLoS ONE, Volume 5 (2010) no. 9 | DOI
[13] Metabolic stability and epigenesis in randomly constructed genetic nets, Journal of Theoretical Biology, Volume 22 (1969) no. 3, pp. 437-467 | DOI | MR
[14] From genomics to chemical genomics: new developments in KEGG., Nucleic Acids Res, Volume 34 (2006) no. Database issue, p. D354-D357 | DOI
[15] Graphical models, Clarendon Press, 1996 | MR | Zbl
[16] Reveal, A General Reverse Engineering Algorithm For Inference Of Genetic Network Architectures, 1998
[17] A human phenome-interactome network of protein complexes implicated in genetic disorders., Nat Biotechnol, Volume 25 (2007) no. 3, pp. 309-316 | DOI
[18] Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer., J Clin Oncol, Volume 26 (2008) no. 8, pp. 1275-1281 | DOI
[19] Network inference using informative priors., Proc Natl Acad Sci U S A, Volume 105 (2008) no. 38, pp. 14313-14318 | DOI
[20] Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma., Clin Cancer Res, Volume 10 (2004) no. 16, pp. 5367-5374 | DOI
[21] The modular nature of genetic diseases., Clin Genet, Volume 71 (2007) no. 1, pp. 1-11 | DOI
[22] Predicting disease genes using protein-protein interactions., J Med Genet, Volume 43 (2006) no. 8, pp. 691-698 | DOI
[23] Probabilistic reasoning in intelligent systems : networks of plausible inference, Morgan Kaufmann, 1997 http://www.worldcat.org/isbn/1558604790 | MR | Zbl
[24] Molecular portraits of human breast tumours., Nature, Volume 406 (2000) no. 6797, pp. 747-752 | DOI
[25] From minimal signed circuits to the dynamics of Boolean regulatory networks, Bioinformatics, Volume 24 (2008) no. 16, p. i220-i226 | DOI
[26] STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene., Nucleic Acids Res, Volume 28 (2000) no. 18, pp. 3442-3444
[27] Linear models and empirical bayes methods for assessing differential expression in microarray experiments., Statistical applications in genetics and molecular biology, Volume 3 (2004) no. 1 | DOI | MR | Zbl
[28] Breast cancer classification and prognosis based on gene expression profiles from a population-based study, Proceedings of the National Academy of Sciences of the United States of America, Volume 100 (2003) no. 18, pp. 10393-10398 | DOI
[29] Repeated observation of breast tumor subtypes in independent gene expression data sets., Proc Natl Acad Sci U S A, Volume 100 (2003) no. 14, pp. 8418-8423 | DOI
[30] Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling, Bioinformatics, Volume 18 (2002), pp. 287-297
[31] Boolean formalization of genetic control circuits, Journal of Theoretical Biology, Volume 42 (1973) no. 3, pp. 563 -585 | DOI
[32] Supervised graph inference, Advances in Neural Information Processing Systems (2005), pp. 1433-1440 http://eprints.pascal-network.org/archive/00001405/
[33] Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks, Bioinformatics, Volume 22 (2006) no. 20, pp. 2523-2531 | DOI
[34] Graphical Models in Applied Multivariate Statistics (Wiley Series in Probability & Statistics), John Wiley & Sons, 1990 http://www.worldcat.org/isbn/0471917508 | MR | Zbl
[35] Protein network inference from multiple genomic data: a supervised approach., Bioinformatics, Volume 20 Suppl 1 (2004), p. i363-i370 | DOI