PREDICTIVE MODELING OF SURVIVAL, TREATMENT DELAY, AND RISK STRATIFICATION IN CANCER CARE: A MACHINE LEARNING APPROACH USING SEER DATA
Karthik S, Prof. Sowmya D S
RV Institute of Management, Bangaluru, Karnataka
Abstract
Objective: This study aims to predict survival outcomes and treatment delays in cancer patients using machine learning models applied to demographic and clinical data from the SEER database, while identifying key prognostic factors and patient subgroups through advanced clustering techniques.
Methods: Clinical data from 622,345 patients were analyzed using regression models (Linear, Ridge, Lasso), tree-based algorithms (Random Forest, Gradient Boosting, XGBoost, LightGBM), and a Cox Proportional Hazards (CoxPH) model to evaluate survival prediction. Treatment delay classification employed an 80% accurate model, adjusted for class imbalance. Feature importance analysis and clustering via autoencoder-derived latent features (10 dimensions) combined with KMeans (3 clusters) were used to stratify patients.Performance metrics included R², MAE, MSE, C-index, and Silhouette scores.
Results: LightGBM achieved the highest survival prediction accuracy (R² = 0.5278), while CoxPH demonstrated strong discriminative power (C-index = 0.89), identifying advanced SEER stage (HR = 1.68), surgery (HR = 0.27), and marital status as significant predictors. The delay model showed high recall (98%) for delayed cases but poor overall explanatory power (R² = 0.15). Clustering revealed three distinct groups: low-risk early-stage patients (Cluster 0: no chemo, all surgery), intermediate-risk cases (Cluster 1: mixed treatments), and high-risk patients (Cluster 2: aggressive therapies), validated by a moderate Silhouette score (0.42). Non-linear interactions (e.g., chemotherapy, income) significantly influenced predictions. Survival and delay outcomes were negatively correlated (-0.46), suggesting shorter survival linked to longer delays.
Conclusion:The study successfully integrates diverse machine learning approaches to predict cancer outcomes, offering actionable risk stratification. LightGBM and CoxPH models provided robust survival insights, while clustering highlighted treatment patterns. Challenges remain in addressing data imbalance and refining delay prediction. This framework enhances prognostic modeling in oncology, emphasizing the need for tailored clinical interventions and model optimization to improve predictive accuracy in real-world healthcare settings.
Keywords: Machine Learning, Predictive Models, Deep Learning, SEER Database
Journal Name :
VIEW PDF
EPRA International Journal of Multidisciplinary Research (IJMR)
VIEW PDF
Published on : 2025-06-10
| Vol | : | 11 |
| Issue | : | 6 |
| Month | : | June |
| Year | : | 2025 |