Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS – the ability to detect genetic association by linkage disequilibrium (LD) – is also its limitation. Whilst the ever-increasing study s...

Full description

Bibliographic Details
Main Authors: Hannah L. Nicholls, Christopher R. John, David S. Watson, Patricia B. Munroe, Michael R. Barnes, Claudia P. Cabrera
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-04-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.00350/full
id doaj-d25406a98a014738a499143e68bbf8d2
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Hannah L. Nicholls
Hannah L. Nicholls
Christopher R. John
Christopher R. John
David S. Watson
David S. Watson
Patricia B. Munroe
Patricia B. Munroe
Michael R. Barnes
Michael R. Barnes
Michael R. Barnes
Michael R. Barnes
Claudia P. Cabrera
Claudia P. Cabrera
Claudia P. Cabrera
spellingShingle Hannah L. Nicholls
Hannah L. Nicholls
Christopher R. John
Christopher R. John
David S. Watson
David S. Watson
Patricia B. Munroe
Patricia B. Munroe
Michael R. Barnes
Michael R. Barnes
Michael R. Barnes
Michael R. Barnes
Claudia P. Cabrera
Claudia P. Cabrera
Claudia P. Cabrera
Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
Frontiers in Genetics
machine learning
artificial intelligence
genome-wide association study
genomics
candidate gene
clinical translation
author_facet Hannah L. Nicholls
Hannah L. Nicholls
Christopher R. John
Christopher R. John
David S. Watson
David S. Watson
Patricia B. Munroe
Patricia B. Munroe
Michael R. Barnes
Michael R. Barnes
Michael R. Barnes
Michael R. Barnes
Claudia P. Cabrera
Claudia P. Cabrera
Claudia P. Cabrera
author_sort Hannah L. Nicholls
title Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_short Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_full Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_fullStr Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_full_unstemmed Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_sort reaching the end-game for gwas: machine learning approaches for the prioritization of complex disease loci
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2020-04-01
description Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS – the ability to detect genetic association by linkage disequilibrium (LD) – is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact.
topic machine learning
artificial intelligence
genome-wide association study
genomics
candidate gene
clinical translation
url https://www.frontiersin.org/article/10.3389/fgene.2020.00350/full
work_keys_str_mv AT hannahlnicholls reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT hannahlnicholls reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT christopherrjohn reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT christopherrjohn reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT davidswatson reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT davidswatson reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT patriciabmunroe reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT patriciabmunroe reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT michaelrbarnes reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT michaelrbarnes reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT michaelrbarnes reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT michaelrbarnes reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT claudiapcabrera reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT claudiapcabrera reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT claudiapcabrera reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
_version_ 1724932990095589376
spelling doaj-d25406a98a014738a499143e68bbf8d22020-11-25T02:06:36ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-04-011110.3389/fgene.2020.00350521712Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease LociHannah L. Nicholls0Hannah L. Nicholls1Christopher R. John2Christopher R. John3David S. Watson4David S. Watson5Patricia B. Munroe6Patricia B. Munroe7Michael R. Barnes8Michael R. Barnes9Michael R. Barnes10Michael R. Barnes11Claudia P. Cabrera12Claudia P. Cabrera13Claudia P. Cabrera14Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomCentre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomCentre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomCentre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomCentre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomOxford Internet Institute, University of Oxford, Oxford, United KingdomClinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomNIHR Barts Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomClinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomCentre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomNIHR Barts Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomThe Alan Turing Institute, British Library, London, United KingdomClinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomCentre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomNIHR Barts Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United KingdomGenome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS – the ability to detect genetic association by linkage disequilibrium (LD) – is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact.https://www.frontiersin.org/article/10.3389/fgene.2020.00350/fullmachine learningartificial intelligencegenome-wide association studygenomicscandidate geneclinical translation