Speaker
Description
The reliability of single-cell RNA sequencing (scRNA-seq) analysis is often hindered by subjective parameter tuning and stochastic inconsistencies, which pose significant challenges for the reproducibility of large-scale studies. To overcome these limitations and establish a rigorous foundation for single-cell foundation models, we propose a fully data-driven analysis framework based on mathematical and physical theories. Centrally, we introduce scLENS, which utilizes Random Matrix Theory (RMT) to distinguish biological signals from random noise. By employing RMT-based noise filtering and a signal robustness test, scLENS enables the objective determination of signal dimensions without manual intervention, ensuring high-fidelity feature extraction even in sparse datasets. Complementing this, scICE evaluates clustering consistency through the Inconsistency Coefficient (IC), providing a scalable and efficient way to identify stable cellular identities across tens of thousands of cells. By replacing subjective heuristics with robust, theory-based signal detection and evaluation, this integrated approach provides the essential high-quality, standardized data processing required to train and deploy reliable, large-scale single-cell foundation models in the era of artificial intelligence.