Exploiting orthology and de novo transcriptome assembly to refine target sequence information

Abstract Background The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniP...

Full description

Bibliographic Details
Main Authors:	Julia F. Söllner, Germán Leparc, Matthias Zwick, Tanja Schönberger, Tobias Hildebrandt, Kay Nieselt, Eric Simon
Format:	Article
Language:	English
Published:	BMC 2019-05-01
Series:	BMC Medical Genomics
Subjects:	RNA-Seq de novo transcriptome assembly Orthology Sequence refinement Comparative genomics
Online Access:	http://link.springer.com/article/10.1186/s12920-019-0524-5

id	doaj-ea1ee743c46f4087b5d52424809c6211
record_format	Article
spelling	doaj-ea1ee743c46f4087b5d52424809c62112021-04-02T11:41:33ZengBMCBMC Medical Genomics1755-87942019-05-0112111210.1186/s12920-019-0524-5Exploiting orthology and de novo transcriptome assembly to refine target sequence informationJulia F. Söllner0Germán Leparc1Matthias Zwick2Tanja Schönberger3Tobias Hildebrandt4Kay Nieselt5Eric Simon6Computational Biology & Genomics, Boehringer Ingelheim Pharma GmbH & Co. KGTransl. Medicine + Clin. Pharmacology, Boehringer Ingelheim Pharma GmbH & Co. KGComputational Biology & Genomics, Boehringer Ingelheim Pharma GmbH & Co. KGDrug Discovery Sciences, Boehringer Ingelheim Pharma GmbH & Co. KGComputational Biology & Genomics, Boehringer Ingelheim Pharma GmbH & Co. KGIntegrative Transcriptomics, Center for Bioinformatics, University of TübingenComputational Biology & Genomics, Boehringer Ingelheim Pharma GmbH & Co. KGAbstract Background The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniProt and RefSeq, are extensive protein and nucleotide sequence repositories. However, many sequences for non-human organisms are predicted by computational pipelines and may thus be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Transcriptome analysis by RNA-Seq has been established as a standard method for gene expression analysis. Apart from this common application, paired-end RNA-Seq data can also be used to obtain full coverage cDNA sequences via de novo transcriptome assembly. Methods To assess whether de novo transcriptome assemblies can be used to determine a protein’s sequence by searching the assembly for a known orthologous sequence, we generated 3 × 6 = 18 tissue specific assemblies (three organs: brain, kidney and liver; six species: human, mouse, rat, dog, pig and cynomolgus monkey). These assemblies and the manually curated human protein sequences from UniProtKB/Swiss-Prot were used in a reciprocal BLAST search to identify best matching hits. We automated and generalised our approach and present the a&o-tool, a workflow which exploits de novo assemblies of paired-end RNA-Seq data and orthology information for target sequence validation and refinement across related species. Furthermore, the a&o-tool extracts best hits’ sequences from a reciprocal BLAST search, translates them into protein sequences, computes a multiple sequence alignment and quantifies the refinement. Results For the three human assemblies we observed a hit rate greater than 60% with 100% sequence coverage and identity. For assemblies from the other species we observed similar hit rates and coverage with highest identities for cynomolgus monkey. Conclusions In summary, we show how to refine protein sequences using RNA-Seq data and sequence information from closely related species. With the a&o-tool we provide a fully automated pipeline to perform refinement including cDNA translation and multiple sequence alignment for visual inspection. The major prerequisite for applying the a&o-tool is high quality sequencing data.http://link.springer.com/article/10.1186/s12920-019-0524-5RNA-Seqde novo transcriptome assemblyOrthologySequence refinementComparative genomics
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Julia F. Söllner Germán Leparc Matthias Zwick Tanja Schönberger Tobias Hildebrandt Kay Nieselt Eric Simon
spellingShingle	Julia F. Söllner Germán Leparc Matthias Zwick Tanja Schönberger Tobias Hildebrandt Kay Nieselt Eric Simon Exploiting orthology and de novo transcriptome assembly to refine target sequence information BMC Medical Genomics RNA-Seq de novo transcriptome assembly Orthology Sequence refinement Comparative genomics
author_facet	Julia F. Söllner Germán Leparc Matthias Zwick Tanja Schönberger Tobias Hildebrandt Kay Nieselt Eric Simon
author_sort	Julia F. Söllner
title	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_short	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_full	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_fullStr	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_full_unstemmed	Exploiting orthology and de novo transcriptome assembly to refine target sequence information
title_sort	exploiting orthology and de novo transcriptome assembly to refine target sequence information
publisher	BMC
series	BMC Medical Genomics
issn	1755-8794
publishDate	2019-05-01
description	Abstract Background The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniProt and RefSeq, are extensive protein and nucleotide sequence repositories. However, many sequences for non-human organisms are predicted by computational pipelines and may thus be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Transcriptome analysis by RNA-Seq has been established as a standard method for gene expression analysis. Apart from this common application, paired-end RNA-Seq data can also be used to obtain full coverage cDNA sequences via de novo transcriptome assembly. Methods To assess whether de novo transcriptome assemblies can be used to determine a protein’s sequence by searching the assembly for a known orthologous sequence, we generated 3 × 6 = 18 tissue specific assemblies (three organs: brain, kidney and liver; six species: human, mouse, rat, dog, pig and cynomolgus monkey). These assemblies and the manually curated human protein sequences from UniProtKB/Swiss-Prot were used in a reciprocal BLAST search to identify best matching hits. We automated and generalised our approach and present the a&o-tool, a workflow which exploits de novo assemblies of paired-end RNA-Seq data and orthology information for target sequence validation and refinement across related species. Furthermore, the a&o-tool extracts best hits’ sequences from a reciprocal BLAST search, translates them into protein sequences, computes a multiple sequence alignment and quantifies the refinement. Results For the three human assemblies we observed a hit rate greater than 60% with 100% sequence coverage and identity. For assemblies from the other species we observed similar hit rates and coverage with highest identities for cynomolgus monkey. Conclusions In summary, we show how to refine protein sequences using RNA-Seq data and sequence information from closely related species. With the a&o-tool we provide a fully automated pipeline to perform refinement including cDNA translation and multiple sequence alignment for visual inspection. The major prerequisite for applying the a&o-tool is high quality sequencing data.
topic	RNA-Seq de novo transcriptome assembly Orthology Sequence refinement Comparative genomics
url	http://link.springer.com/article/10.1186/s12920-019-0524-5
work_keys_str_mv	AT juliafsollner exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT germanleparc exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT matthiaszwick exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT tanjaschonberger exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT tobiashildebrandt exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT kaynieselt exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation AT ericsimon exploitingorthologyanddenovotranscriptomeassemblytorefinetargetsequenceinformation
_version_	1721571758953201664

Exploiting orthology and de novo transcriptome assembly to refine target sequence information

Similar Items