Fine mapping of causal HLA variants using penalised regression

The identification of risk loci in the Human Leukocyte Antigen (HLA) region using single-SNP association tests has been hampered by the extent of linkage disequilibrium (LD). Penalised regression via Least Absolute Shrinkage and Selection Operator (LASSO) can be used as a method for selection of var...

Full description

Bibliographic Details
Main Author: Vignal, Charlotte
Other Authors: Balding, David ; Bansal, Aruna
Published: Imperial College London 2010
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.526298
Description
Summary:The identification of risk loci in the Human Leukocyte Antigen (HLA) region using single-SNP association tests has been hampered by the extent of linkage disequilibrium (LD). Penalised regression via Least Absolute Shrinkage and Selection Operator (LASSO) can be used as a method for selection of variables in multi-SNP analysis, and to deal with the problem of multi-collinearity among predictors. This method applies a penalty that shrinks the estimates of the regression coefficients towards zero. This is equivalent to applying a double exponential (DE) prior distribution to the coefficients with a mode at zero, corresponding to the prior belief that most of the effects are negligible in a Bayesian approach. Parameter inference is based on the posterior mode, with non-zero values indicating marker-disease associations. Single-SNP, stepwise regression and the LASSO approach were applied to case-control studies of rheumatoid arthritis, a disease which has been associated with markers from the HLA region. A generalisation of the LASSO called the HyperLasso (HLASSO), which uses the normal-exponential-gamma prior in place of the DE, was also investigated. These approaches were applied to data from the Genetics of Rheumatoid Arthritis (GoRA) study. Genotype imputation was used as a means to jointly analyse the GoRA and the Wellcome Trust Case Control Consortium (WTCCC) HLA SNPs. The North American Rheumatoid Arthritis Consortium (NARAC) study was used to validate the findings. After controlling for type-I error, the penalised approaches greatly reduced the number of positive signals compared to single-SNP analysis, suggesting that correlation among SNP loci was better handled. The HLASSO results were sparser but similar to the LASSO results. One SNP in HLA-DPB1 was replicated in the NARAC study. In both models, the robustness of the retained variables was verified by bootstrapping. The results suggest that SNP-selection using LASSO or HLASSO shows a substantial benefit in identifying risk loci in regions of high LD.