Dublin Core
Title
Abstract
Diabetes is a growing global health issue, and early prediction is key to preventing its effects. This thesis develops predictive models for diabetes using various machine learning methods, including Logistic Regression, Decision Trees, K-Nearest Neighbors (KNN), Random Forest, Support Vector Machine (SVM), and XGBoost, using the Diabetes Health Indicators dataset, which covers clinical, lifestyle, and demographic factors. Feature selection identifies the most important diabetic predictors, and model performance is evaluated using macro average and weighted average metrics, accuracy, precision, recall, F1-score, and error metrics (MSE and RMSE) to provide a thorough evaluation of model performance across the classes. Both SVM and Random Forest performed best overall, with an accuracy of 0.86. They also performed exceptionally well in weighted average and macro average measures, with overall recall and F1-scores of 0.86. SVM has the highest precision performance at 0.88, with Random Forest achieving the next best score of 0.87. These models are very dependable for diabetes prediction tasks because of their remarkable balance while handling both classes. SVM and Random Forest offer more dependable performance on a range of metrics, as evidenced by the weaker outcomes of Decision Tree, KNN, XGBoost, and Logistic Regression.
