Validating supervised learning approaches to the prediction of disease status in neuroimaging

Alzheimer’s disease (AD) is a serious global health problem with growing human and monetary costs. Neuroimaging data offers a rich source of information about pathological changes in the brain related to AD, but its high dimensionality makes it difficult to fully exploit using conventional methods....

Full description

Bibliographic Details
Main Author: Mendelson, A. F.
Published: University College London (University of London) 2017
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.746719
Description
Summary:Alzheimer’s disease (AD) is a serious global health problem with growing human and monetary costs. Neuroimaging data offers a rich source of information about pathological changes in the brain related to AD, but its high dimensionality makes it difficult to fully exploit using conventional methods. Automated neuroimage assessment (ANA) uses supervised learning to model the relationships between imaging signatures and measures of disease. ANA methods are assessed on the basis of their predictive performance, which is measured using cross validation (CV). Despite its ubiquity, CV is not always well understood, and there is a lack of guidance as to best practice. This thesis is concerned with the practice of validation in ANA. It introduces several key challenges and considers potential solutions, including several novel contributions. Part I of this thesis reviews the field and introduces key theoretical concepts related to CV. Part II is concerned with bias due to selective reporting of performance results. It describes an empirical investigation to assess the likely level of this bias in the ANA literature and relative importance of several contributory factors. Mitigation strategies are then discussed. Part III is concerned with the optimal selection of CV strategy with respect to bias, variance and computational cost. Part IV is concerned with the statistical analysis of CV performance results. It discusses the failure of conventional statistical procedures, reviews previous alternative approaches, and demonstrates a new heuristic solution that fares well in preliminary investigations. Though the focus of this thesis is AD ANA, the issues it addresses are of great importance to all applied machine learning fields where samples are limited and predictive performance is critical.