LAK: Lasso and K-Means Based Single-Cell RNA-Seq Data Clustering Analysis

The single-cell RNA sequencing provides a way to obtain marker genes of different cells, which lays the foundation for discovering new cell types. The general strategy of achieving this goal is to build a clustering pipeline and derive differentially expressed genes, followed by the cell type enrich...

Full description

Bibliographic Details
Main Authors: Jiao Hua, Hongkun Liu, Boyang Zhang, Shuilin Jin
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9143102/
Description
Summary:The single-cell RNA sequencing provides a way to obtain marker genes of different cells, which lays the foundation for discovering new cell types. The general strategy of achieving this goal is to build a clustering pipeline and derive differentially expressed genes, followed by the cell type enrichment analysis and driving force analysis. Throughout the entire analysis process, clustering models and appropriate methods of dimension reduction are two vital and challenging tasks. In this study, we present a novel method LAK (a computational pipeline for single-cell RNA-seq data clustering analysis using Lasso and K-means based feature selection method) that can be applied to single-cell RNA-seq data by selecting the candidate genes. To deal with the sparse high-dimensional data, we integrated Lasso penalty into clustering method for single-cell RNA-seq data as the feature selection method, which extracts out the genes that have an actual effect on clustering. We also improved the parameter selection algorithm to search the appropriate parameters automatically by binary search according to the size of the data. Compared with other computational approaches, LAK obtains a better performance in reliability, stability, convenience and accuracy applied to the real datasets, the simulation data, and the datasets with a large number of dropout events.
ISSN:2169-3536