GASTROENTEROLOGY / RESEARCH PAPER
Prediction of Prognosis and Survival of Patients with Gastric Cancer by Weighted Improved Random Forest Model
More details
Hide details
1 |
College of Computer Science and Technology, Huaibei Normal University, China |
2 |
School of Higher Vocational Eduction, Nanjing University of the Arts, China |
3 |
College of Wenzheng, Suzhou University, China |
CORRESPONDING AUTHOR
Jing Wang
College of Computer Science and Technology, Huaibei Normal University, No. 100 Dongshan Road, Huaibei City, Anhui Provinc, 235000, Huaibei, China
Submission date: 2021-01-28
Final revision date: 2021-04-07
Acceptance date: 2021-04-07
Online publication date: 2021-04-10
Arch Med Sci 2022;18(5)
KEYWORDS
TOPICS
ABSTRACT
Introduction:
It’s very necessary to predict the survival status of patients based on their prognosis. This can assist physicians in evaluating treatment decisions. Random Forest is an excellent machine learning algorithm even without any modification. We propose a new Random Forest weighting method and apply it to the gastric cancer patient data from the Surveillance, Epidemiology, and End Results (SEER) program, and then evaluated the generalization ability of this weighted Random Forest algorithm on 10 public medical datasets. Furthermore, for the same weighting mode, the difference between using out-of-bag (OOB) data and all training sets as the weighting basis is explored.
Material and methods:
110697 cases of gastric cancer patients diagnosed between 1975 and 2016 obtained from the SEER database were contained in the experiment. In addition, 10 public medical datasets are used for the generalization ability evaluation of this weighted Random Forest algorithm.
Results:
Through experimental verification, on the SEER gastric cancer patient data, the weighted Random Forest algorithm improves the accuracy by 0.79% compared with the original Random Forest. In AUC, Macro-averaging increased by 2.32% and Micro-averaging increased by 0.51% on average. Among the 10 public datasets, the Random Forest weighted in accuracy has the best performance on 6 datasets, with an average increase of 1.44% in accuracy and an average increase of 1.2% in AUC.
Conclusions:
Compared with the original Random Forest, the weighted Random Forest model has a significant improvement in performance, and the effect of using all training data as the weighting basis is better than using OOB data.