Predicting incident dementia in cerebral small vessel disease: comparison of machine learning and traditional statistical models.


BACKGROUND: Cerebral small vessel disease (SVD) contributes to 45% of dementia cases worldwide, yet we lack a reliable model for predicting dementia in SVD. Past attempts largely relied on traditional statistical approaches. Here, we investigated whether machine learning (ML) methods improved prediction of incident dementia in SVD from baseline SVD-related features over traditional statistical methods. METHODS: We included three cohorts with varying SVD severity (RUN DMC, n = 503; SCANS, n = 121; HARMONISATION, n = 265). Baseline demographics, vascular risk factors, cognitive scores, and magnetic resonance imaging (MRI) features of SVD were used for prediction. We conducted both survival analysis and classification analysis predicting 3-year dementia risk. For each analysis, several ML methods were evaluated against standard Cox or logistic regression. Finally, we compared the feature importance ranked by different models. RESULTS: We included 789 participants without missing data in the survival analysis, amongst whom 108 (13.7%) developed dementia during a median follow-up of 5.4 years. Excluding those censored before three years, we included 750 participants in the classification analysis, amongst whom 48 (6.4%) developed dementia by year 3. Comparing statistical and ML models, only regularised Cox/logistic regression outperformed their statistical counterparts overall, but not significantly so in survival analysis. Baseline cognition was highly predictive, and global cognition was the most important feature. CONCLUSIONS: When using baseline SVD-related features to predict dementia in SVD, the ML survival or classification models we evaluated brought little improvement over traditional statistical approaches. The benefits of ML should be evaluated with caution, especially given limited sample size and features.