Cross-modal predictive modeling of multi-omic data in 3D airway organ tissue equivalents during viral infection

IntroductionDeveloping robust predictive models from multi-omics data is challenging because sample sizes are typically small (often fewer than 100) while the feature space is vast (over 20,000 molecular features such as genes, transcripts, and proteins), which increases the risk of overfitting and...

Full description

Bibliographic Details
Published in:Frontiers in Genetics
Main Authors: Mostafa Rezapour, Patrick M. McNutt, David A. Ornelles, Stephen J. Walker, Sean V. Murphy, Anthony Atala, Metin Nafi Gurcan
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-09-01
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2025.1658577/full
Description
Summary:IntroductionDeveloping robust predictive models from multi-omics data is challenging because sample sizes are typically small (often fewer than 100) while the feature space is vast (over 20,000 molecular features such as genes, transcripts, and proteins), which increases the risk of overfitting and limits generalizability. To address this challenge, this study introduces the Magnitude-Altitude Score Analysis for Tracking Infection and Time-Dependent Genes (MASIT), a novel method adept at filtering out irrelevant features/genes while focusing on important ones.MethodsApplied to the 3D airway organ tissue equivalent model that mimics human airway physiology, MASIT employed both RNA-Seq and NanoString technologies for a comprehensive analysis. RNA-Seq offered a transcriptomic overview of 19,671 protein coding genes, whereas NanoString targeted 773 specific genes. We used MASIT to analyze gene expression changes in the airway tissue equivalent after exposure to Influenza A virus, Human metapneumovirus, and Parainfluenza virus type 3 at 24- and 72-hour post-infection. MASIT was trained and validated on NanoString data, tested on the held-out RNA-Seq test set, and benchmarked against widely used feature selection approaches, including Fisher score, minimum Redundancy Maximum Relevance, embedded Lasso regression, and Boruta feature importance.ResultsMASIT achieved a 92% accuracy in differentiating eight groups of infected samples. Our findings showed that MASIT outperformed models using the full gene set, notably in algorithms like Random Forest, XGBoost, and AdaBoost. Selected genes such as IFIT1, IFIT2, IFIT3, OASL, IFI44, and OAS3 were particularly effective in categorizing samples by viral type and infection stage. Benchmarking further demonstrated that MASIT not only exceeded the performance of existing feature selection methods within NanoString data but also uniquely maintained high accuracy and stability when applied to held-out RNA-Seq data.DiscussionThese results provide insights into the host’s molecular response to viral infections and highlight MASIT as a robust tool for analyzing high-dimensional, small-sample multi-omics datasets.
ISSN:1664-8021