Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a maj...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-09-01
|
Series: | Frontiers in Big Data |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdata.2021.725276/full |
id |
doaj-f51a8838d56f47ff861e6f335e441693 |
---|---|
record_format |
Article |
spelling |
doaj-f51a8838d56f47ff861e6f335e4416932021-09-16T04:18:18ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2021-09-01410.3389/fdata.2021.725276725276Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological SamplesThanh M. Nguyen0Samuel Bharti1Zongliang Yue2Christopher D. Willey3Jake Y. Chen4Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United StatesCentre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, IndiaInformatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United StatesDepartment of Radiation Oncology, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United StatesInformatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United StatesUnsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/.https://www.frontiersin.org/articles/10.3389/fdata.2021.725276/fullsample enrichment analysisclinotypeSEASglioblastoma multiformepatient-derived xenograftpatient-derived xenograft |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Thanh M. Nguyen Samuel Bharti Zongliang Yue Christopher D. Willey Jake Y. Chen |
spellingShingle |
Thanh M. Nguyen Samuel Bharti Zongliang Yue Christopher D. Willey Jake Y. Chen Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples Frontiers in Big Data sample enrichment analysis clinotype SEAS glioblastoma multiforme patient-derived xenograft patient-derived xenograft |
author_facet |
Thanh M. Nguyen Samuel Bharti Zongliang Yue Christopher D. Willey Jake Y. Chen |
author_sort |
Thanh M. Nguyen |
title |
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_short |
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_full |
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_fullStr |
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_full_unstemmed |
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_sort |
statistical enrichment analysis of samples: a general-purpose tool to annotate metadata neighborhoods of biological samples |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Big Data |
issn |
2624-909X |
publishDate |
2021-09-01 |
description |
Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/. |
topic |
sample enrichment analysis clinotype SEAS glioblastoma multiforme patient-derived xenograft patient-derived xenograft |
url |
https://www.frontiersin.org/articles/10.3389/fdata.2021.725276/full |
work_keys_str_mv |
AT thanhmnguyen statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT samuelbharti statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT zongliangyue statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT christopherdwilley statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT jakeychen statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples |
_version_ |
1717378526585290752 |