Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids ar...

Full description

Bibliographic Details
Main Authors: Tjaart A P de Beer, Roman A Laskowski, Sarah L Parks, Botond Sipos, Nick Goldman, Janet M Thornton
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC3861039?pdf=render
id doaj-2dee5efa98f44270bc3c08090e703767
record_format Article
spelling doaj-2dee5efa98f44270bc3c08090e7037672020-11-25T01:46:01ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582013-01-01912e100338210.1371/journal.pcbi.1003382Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.Tjaart A P de BeerRoman A LaskowskiSarah L ParksBotond SiposNick GoldmanJanet M ThorntonThe 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.http://europepmc.org/articles/PMC3861039?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Tjaart A P de Beer
Roman A Laskowski
Sarah L Parks
Botond Sipos
Nick Goldman
Janet M Thornton
spellingShingle Tjaart A P de Beer
Roman A Laskowski
Sarah L Parks
Botond Sipos
Nick Goldman
Janet M Thornton
Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
PLoS Computational Biology
author_facet Tjaart A P de Beer
Roman A Laskowski
Sarah L Parks
Botond Sipos
Nick Goldman
Janet M Thornton
author_sort Tjaart A P de Beer
title Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
title_short Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
title_full Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
title_fullStr Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
title_full_unstemmed Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
title_sort amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2013-01-01
description The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.
url http://europepmc.org/articles/PMC3861039?pdf=render
work_keys_str_mv AT tjaartapdebeer aminoacidchangesindiseaseassociatedvariantsdifferradicallyfromvariantsobservedinthe1000genomesprojectdataset
AT romanalaskowski aminoacidchangesindiseaseassociatedvariantsdifferradicallyfromvariantsobservedinthe1000genomesprojectdataset
AT sarahlparks aminoacidchangesindiseaseassociatedvariantsdifferradicallyfromvariantsobservedinthe1000genomesprojectdataset
AT botondsipos aminoacidchangesindiseaseassociatedvariantsdifferradicallyfromvariantsobservedinthe1000genomesprojectdataset
AT nickgoldman aminoacidchangesindiseaseassociatedvariantsdifferradicallyfromvariantsobservedinthe1000genomesprojectdataset
AT janetmthornton aminoacidchangesindiseaseassociatedvariantsdifferradicallyfromvariantsobservedinthe1000genomesprojectdataset
_version_ 1725021267952664576