DIABETOLOGY / RESEARCH PAPER
 
KEYWORDS
TOPICS
ABSTRACT
Introduction:
This project intended to develop and validate a diabetes prediction model for high-risk populations based on machine learning algorithms.

Material and methods:
A total of 2,355 samples from the National Health and Nutrition Examination Survey (NHANES) database covering three cycles from 2013 to 2018 were included. The data were divided into training and testing sets in a 7:3 ratio. Nineteen risk prediction factors were selected as feature variables, including demographic baseline data, measurement data, medical history, and psychological health. Five machine learning models, including decision tree, random forest (RF), multilayer perceptron (MLP), Adaboost, and XGBoost,

Results:
The present work ultimately included 2,355 individuals at high risk of diabetes for analysis, with 260 cases of diabetes and 2,095 cases without diabetes. Among the five machine learning models established in this project, the RF and XGBoost models exhibited better overall performance compared to other models. In the test set, the RF model had an AUC of 0.896, accuracy of 0.784, sensitivity of 0.739, specificity of 0.849, and MCC of 0.418. The XGBoost model had corresponding values of AUC as 0.903, accuracy of 0.815, sensitivity of 0.962, and MCC of 0.443. According to the importance analysis of features in these two optimal models, waist circumference, age, BMI, gender

Conclusions:
The RF and XGBoost models in machine learning demonstrate good performance in predicting the occurrence of diabetes in high-risk populations, which can aid in developing more precise intervention measures and personalized treatment plans to effectively reduce the incidence of diabetes and related risks in this population.
eISSN:1896-9151
ISSN:1734-1922
Journals System - logo
Scroll to top