MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes

Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC...

Full description

Bibliographic Details
Main Authors: Anna E. Letiagina, Evgeniya S. Omelina, Anton V. Ivankin, Alexey V. Pindyurin
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-05-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.618189/full
id doaj-6c73479fa67f42488787ec7a95ad19b4
record_format Article
spelling doaj-6c73479fa67f42488787ec7a95ad19b42021-05-11T17:06:27ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-05-011210.3389/fgene.2021.618189618189MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated BarcodesAnna E. Letiagina0Anna E. Letiagina1Evgeniya S. Omelina2Anton V. Ivankin3Alexey V. Pindyurin4Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaFaculty of Natural Sciences, Novosibirsk State University, Novosibirsk, RussiaInstitute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaInstitute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaInstitute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaMassively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.https://www.frontiersin.org/articles/10.3389/fgene.2021.618189/fullmassively parallel reporter assayMPRAreporter constructsregion of interestbarcodesnext-generation sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Anna E. Letiagina
Anna E. Letiagina
Evgeniya S. Omelina
Anton V. Ivankin
Alexey V. Pindyurin
spellingShingle Anna E. Letiagina
Anna E. Letiagina
Evgeniya S. Omelina
Anton V. Ivankin
Alexey V. Pindyurin
MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
Frontiers in Genetics
massively parallel reporter assay
MPRA
reporter constructs
region of interest
barcodes
next-generation sequencing
author_facet Anna E. Letiagina
Anna E. Letiagina
Evgeniya S. Omelina
Anton V. Ivankin
Alexey V. Pindyurin
author_sort Anna E. Letiagina
title MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_short MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_full MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_fullStr MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_full_unstemmed MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_sort mpradecoder: processing of the raw mpra data with a priori unknown sequences of the region of interest and associated barcodes
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2021-05-01
description Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
topic massively parallel reporter assay
MPRA
reporter constructs
region of interest
barcodes
next-generation sequencing
url https://www.frontiersin.org/articles/10.3389/fgene.2021.618189/full
work_keys_str_mv AT annaeletiagina mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT annaeletiagina mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT evgeniyasomelina mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT antonvivankin mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT alexeyvpindyurin mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
_version_ 1721443638725050368