SparkRA: Enabling Big Data Scalability for the GATK RNA-seq Pipeline with Apache Spark

SparkRA: Enabling Big Data Scalability for the GATK RNA-seq Pipeline with Apache Spark

The rapid proliferation of low-cost RNA-seq data has resulted in a growing interest in RNA analysis techniques for various applications, ranging from identifying genotype−phenotype relationships to validating discoveries of other analysis results. However, many practical applications in th...

Full description

Bibliographic Details
Main Authors:	Zaid Al-Ars, Saiyi Wang, Hamid Mushtaq
Format:	Article
Language:	English
Published:	MDPI AG 2020-01-01
Series:	Genes
Subjects:	gatk variant calling rna-seq apache spark scalability computation time
Online Access:	https://www.mdpi.com/2073-4425/11/1/53

Similar Items

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework
by: Tanveer Ahmad, et al.
Published: (2020-11-01)

GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark
Published: (2019)

Implementing Apache Spark jobs execution and Apache Spark cluster creation for Openstack Sahara[1]
by: A. . Aleksiyants, et al.
Published: (2018-10-01)

Mining Formal Concepts in Large Binary Datasets using Apache Spark
by: Rayabarapu, Varun Raj
Published: (2021)

Recommendations for performance optimizations when using GATK3.8 and GATK4
by: Jacob R Heldenbrand, et al.
Published: (2019-11-01)

Performance assessment of Apache Spark applications
by: AL Jorani, Salam
Published: (2019)

Using Apache Spark on genome assembly for scalable overlap-graph reduction
by: Alexander J. Paul, et al.
Published: (2019-10-01)

BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data
by: Seokjun Soe, et al.
Published: (2018-12-01)

pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP
by: Weiya Chen, et al.
Published: (2020-09-01)

Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
by: Casey, Walker Evan
Published: (2014)

Distributed graph decomposition algorithms on Apache Spark
by: Mandal, Aritra
Published: (2018)

Time series analysis with apache spark and its applications to energy informatics
by: Cornelia Krome, et al.
Published: (2018-10-01)

StreamAligner: a streaming based sequence aligner on Apache Spark
by: Sanjay Rathee, et al.
Published: (2018-02-01)

Enumerating k-cliques in a large network using Apache Spark
by: Dheekonda, Raja Sekhar Rao
Published: (2017)

Efficient iterative virtual screening with Apache Spark and conformal prediction
by: Laeeq Ahmed, et al.
Published: (2018-03-01)

The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments
by: Jean-Simon Brouard, et al.
Published: (2019-06-01)

Implementing a Deep Learning Model for Intrusion Detection on Apache Spark Platform
by: Mohamed Haggag, et al.
Published: (2020-01-01)

Parallel and Distributed Implementation of Sine Cosine Algorithm on Apache Spark Platform
by: Mohammad Gh. Alfailakawi, et al.
Published: (2021-01-01)

OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
by: Jochen Bathke, et al.
Published: (2021-08-01)

SANJYOT – WE SAVE LIFE Using Big Data - Apache Spark
by: Nipun Tyagi, et al.
Published: (2020-12-01)

Matrix Multiplications on Apache Spark through GPUs
by: Safari, Arash
Published: (2017)

SparkBLAST: scalable BLAST processing using in-memory operations
by: Marcelo Rodrigo de Castro, et al.
Published: (2017-06-01)

Leveraging resource management for efficient performance of Apache Spark
by: Khadija Aziz, et al.
Published: (2019-08-01)

Nodule Detection with Convolutional Neural Network Using Apache Spark and GPU Frameworks
by: Nikitha Johnsirani Venkatesan, et al.
Published: (2021-03-01)

GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
by: Shanshan Ren, et al.
Published: (2019-04-01)

Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
by: Jinbae Lee, et al.
Published: (2019-01-01)

Deploying Apache Spark virtual clusters in cloud environments using orchestration technologies
by: O. . Borisenko, et al.
Published: (2018-10-01)

Parallelization of Hybrid Content Based and Collaborative Filtering Method in Recommendation System with Apache Spark
by: Rakhmad Ikhsanudin, et al.
Published: (2019-04-01)

Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
by: Ameema Zainab, et al.
Published: (2021-01-01)

Дослідження продуктивності кластера Apache Spark на платформі Azure для методів машинного навчання
by: С.В. Мінухін
Published: (2020-04-01)

A Parallel Community Detection in Multi-Modal Social Network With Apache Spark
by: Yoon-Sik Cho
Published: (2019-01-01)

Distributed multi-label learning on Apache Spark
by: Gonzalez Lopez, Jorge
Published: (2019)

Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
by: Hoger Khayrolla Omar, et al.
Published: (2019-05-01)

Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow
by: Madeti, Preetham
Published: (2016)

Profile, Monitor, and Introspect Spark Jobs Using OSU INAM
by: Kedia, Mansa
Published: (2020)

分散式計算系統及巨量資料處理架構設計-基於YARN, Storm及Spark
by: 曾柏崴, et al.

Distributed Computing System and Big Data Real-time Processing Structure --Based on YARN, Storm and Spark
by: 曾柏崴 Po-Wei Tseng, et al.
Published: (2016-10-01)

Parallel Processing of Probabilistic Models-Based Power Supply Unit Mid-Term Load Forecasting With Apache Spark
by: Wei Jiang, et al.
Published: (2019-01-01)

Cloud deployment of game theoretic categorical clustering using apache spark: An application to car recommendation
by: Srimanta Kundu, et al.
Published: (2021-12-01)

Two-Step Classification with SVD Preprocessing of Distributed Massive Datasets in Apache Spark
by: Athanasios Alexopoulos, et al.
Published: (2020-03-01)