Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

High throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often...

Full description

Bibliographic Details
Main Authors: Miri eMichaeli, Hila eNoga, Hilla eTabibian-Keissar, Iris eBarshack, Ramit eMehr
Format: Article
Language:English
Published: Frontiers Media S.A. 2012-12-01
Series:Frontiers in Immunology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fimmu.2012.00386/full
id doaj-0cff8df78d9d4eb0b72c1281ba7ea622
record_format Article
spelling doaj-0cff8df78d9d4eb0b72c1281ba7ea6222020-11-24T21:42:14ZengFrontiers Media S.A.Frontiers in Immunology1664-32242012-12-01310.3389/fimmu.2012.0038633663Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencingMiri eMichaeli0Hila eNoga1Hilla eTabibian-Keissar2Hilla eTabibian-Keissar3Iris eBarshack4Iris eBarshack5Ramit eMehr6Bar-­Ilan UniversityBar-­Ilan UniversityBar-­Ilan UniversitySheba Medical CenterSheba Medical CenterTel Aviv UniversityBar-­Ilan UniversityHigh throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.http://journal.frontiersin.org/Journal/10.3389/fimmu.2012.00386/fullhigh-throughput sequencingB cell receptorComputer programsImmunoglobulin (Ig) genesinsertions and deletions (indels).
collection DOAJ
language English
format Article
sources DOAJ
author Miri eMichaeli
Hila eNoga
Hilla eTabibian-Keissar
Hilla eTabibian-Keissar
Iris eBarshack
Iris eBarshack
Ramit eMehr
spellingShingle Miri eMichaeli
Hila eNoga
Hilla eTabibian-Keissar
Hilla eTabibian-Keissar
Iris eBarshack
Iris eBarshack
Ramit eMehr
Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
Frontiers in Immunology
high-throughput sequencing
B cell receptor
Computer programs
Immunoglobulin (Ig) genes
insertions and deletions (indels).
author_facet Miri eMichaeli
Hila eNoga
Hilla eTabibian-Keissar
Hilla eTabibian-Keissar
Iris eBarshack
Iris eBarshack
Ramit eMehr
author_sort Miri eMichaeli
title Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
title_short Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
title_full Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
title_fullStr Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
title_full_unstemmed Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
title_sort automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
publisher Frontiers Media S.A.
series Frontiers in Immunology
issn 1664-3224
publishDate 2012-12-01
description High throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.
topic high-throughput sequencing
B cell receptor
Computer programs
Immunoglobulin (Ig) genes
insertions and deletions (indels).
url http://journal.frontiersin.org/Journal/10.3389/fimmu.2012.00386/full
work_keys_str_mv AT miriemichaeli automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
AT hilaenoga automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
AT hillaetabibiankeissar automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
AT hillaetabibiankeissar automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
AT irisebarshack automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
AT irisebarshack automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
AT ramitemehr automatedcleaningandpreprocessingofimmunoglobulingenesequencesfromhighthroughputsequencing
_version_ 1725918159691579392