Machine Learning Classification of Inflammatory Bowel Disease in Children Based on a Large Real-World Pediatric Cohort CEDATA-GPGE® Registry

Front Med (Lausanne). 2021 May 24;8:666190. doi: 10.3389/fmed.2021.666190.eCollection 2021.

Nicolas Schneider 1, Keywan Sohrabi 2, Henning Schneider 1, Klaus-Peter Zimmer 3, Patrick Fischer 1, Jan de Laffolie 3, CEDATA-GPGE Study Group


Author information

  • 1Institute of Medical Informatics, Justus-Liebig-University Giessen, Gießen, Germany.
  • 2Faculty of Health, Technical University of Applied Sciences Mittelhessen, Gießen, Germany.
  • 3Department of Pediatrics, Justus-Liebig-University Giessen, Gießen, Germany.


Introduction: The rising incidence of pediatric inflammatory bowel diseases (PIBD) facilitates the need for new methods of improving diagnosis latency, quality of care and documentation. Machine learning models have shown to be applicable to classifying PIBD when using histological data or extensive serology. This study aims to evaluate the performance of algorithms based on promptly available data more suited to clinical applications. Methods: Data of inflammatory locations of the bowels from initial and follow-up visitations is extracted from the CEDATA-GPGE registry and two follow-up sets are split off containing only input from 2017 and 2018. Pre-processing excludes patients in remission and encodes the categorical data numerically. For classification of PIBD diagnosis, a support vector machine (SVM), a random forest algorithm (RF), extreme gradient boosting (XGBoost), a dense neural network (DNN) and a convolutional neural network (CNN) are employed. As best performer, a convolutional neural network is further improved using grid optimization. Results: The achieved accuracy of the optimized neural network reaches up to 90.57% on data inserted into the registry in 2018. Less performant methods reach 88.78% for the DNN down to 83.94% for the XGBoost. The accuracy of prediction for the 2018 follow-up dataset is higher than those for older datasets. Neural networks yield a higher standard deviation with 3.45 for the CNN compared to 0.83-0.86 of the support vector machine and ensemble methods. Discussion: The displayed accuracy of the convolutional neural network proofs the viability of machine learning classification in PIBD diagnostics using only timely available data.


© Copyright 2013-2024 GI Health Foundation. All rights reserved.
This site is maintained as an educational resource for US healthcare providers only. Use of this website is governed by the GIHF terms of use and privacy statement.