OBJECTIVE: To evaluate to what extent an inefficient statistical model affects the study of genetic factors in extra-intestinal manifestations of Crohn's disease (CD) and how clinical predictions can be improved using more adequate techniques. MATERIALS: Extra-intestinal manifestations were studied in 152 CD patients. Three sets of variables were considered: (1) disease characteristics--presentation, behavior, location; (2) generic risk factors--age, gender, smoke and familiarity; and (3) genetic polymorphisms of the NOD2, CD14, TNF, IL12B, and IL1RN genes, whose involvement in CD is known or suspected. METHODS: Six statistical classifiers and data mining models were applied: (1) logistic regression as a benchmark; (2) generalized additive model; (3) projection pursuit regression; (4) linear discriminant analysis, (5) quadratic discriminant analysis; (6) artificial neural networks one-layer feed forward. Models were selected using the Akaike Information criterion and their accuracy was compared with several indexes. RESULTS: Extra-intestinal manifestations occurred in 75 patients. The model with clinical variables only selected familiarity, gender, presentation, and behavior as significantly associated with extra-intestinal manifestations, whereas when the genetic factors were also included familiarity was no longer significant, being replaced by the NOD2, TNF, and IL12B single nucleotide polymorphisms. The projection pursuit regression performed best in predicting individual outcomes (Kappa statistics 0.078 [SE 0.09] without and 0.108 [SE 0.075] with genetic information). One-layer artificial neural networks did not show any particular improvement in terms of model accuracy over nonlinear techniques. CONCLUSIONS: The correct identification of factors associated with extra-intestinal symptoms in CD, in particular the genetic ones, is highly dependent on the model chosen for the analysis. By using the most sophisticated statistical models, the accuracy of prediction can be strengthened by 10-64%, compared with linear regression.

Modeling the role of genetic factors in characterizing extra-intestinal manifestations in Crohn's disease patients: does this improve outcome predictions?

GIACHINO, Daniela Francesca;DE MARCHI, Mario;GREGORI, Dario
2007-01-01

Abstract

OBJECTIVE: To evaluate to what extent an inefficient statistical model affects the study of genetic factors in extra-intestinal manifestations of Crohn's disease (CD) and how clinical predictions can be improved using more adequate techniques. MATERIALS: Extra-intestinal manifestations were studied in 152 CD patients. Three sets of variables were considered: (1) disease characteristics--presentation, behavior, location; (2) generic risk factors--age, gender, smoke and familiarity; and (3) genetic polymorphisms of the NOD2, CD14, TNF, IL12B, and IL1RN genes, whose involvement in CD is known or suspected. METHODS: Six statistical classifiers and data mining models were applied: (1) logistic regression as a benchmark; (2) generalized additive model; (3) projection pursuit regression; (4) linear discriminant analysis, (5) quadratic discriminant analysis; (6) artificial neural networks one-layer feed forward. Models were selected using the Akaike Information criterion and their accuracy was compared with several indexes. RESULTS: Extra-intestinal manifestations occurred in 75 patients. The model with clinical variables only selected familiarity, gender, presentation, and behavior as significantly associated with extra-intestinal manifestations, whereas when the genetic factors were also included familiarity was no longer significant, being replaced by the NOD2, TNF, and IL12B single nucleotide polymorphisms. The projection pursuit regression performed best in predicting individual outcomes (Kappa statistics 0.078 [SE 0.09] without and 0.108 [SE 0.075] with genetic information). One-layer artificial neural networks did not show any particular improvement in terms of model accuracy over nonlinear techniques. CONCLUSIONS: The correct identification of factors associated with extra-intestinal symptoms in CD, in particular the genetic ones, is highly dependent on the model chosen for the analysis. By using the most sophisticated statistical models, the accuracy of prediction can be strengthened by 10-64%, compared with linear regression.
2007
23
1657
1665
http://www.tandfonline.com/doi/pdf/10.1185/030079907X210471?needAccess=true
D. GIACHINO; S. REGAZZONI; M. BARDESSONO; M. DE MARCHI; D. GREGORI
File in questo prodotto:
File Dimensione Formato  
Giachino_IBD 2007.pdf

Accesso riservato

Descrizione: pdf editoriale
Tipo di file: PDF EDITORIALE
Dimensione 506.9 kB
Formato Adobe PDF
506.9 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/37025
Citazioni
  • ???jsp.display-item.citation.pmc??? 4
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact