Speaker
Description
Single-cell technologies have unearthed vast heterogeneities in gene expression across cell populations. Understanding these cell-to-cell differences is essential for determining how DNA sequence specifies cellular function and drives phenotypic diversity. Recent advances in machine learning and AI have enabled the development of DNA sequence-to-expression prediction models. These models are typically trained on bulk expression data—how well they predict single-cell gene expression remains unclear. Here, we develop a joint machine learning and inference framework to parameterise stochastic models of transcription using biological features extracted from sequence-to-expression models and regress them against statistics derived from single-cell RNA-seq data. We further integrate epigenetic measurements, including DNA methylation and chromatin accessibility, to assess whether combining sequence and epigenetic information improves prediction of single-cell variability. We investigate the effects of technical sequencing noise and extrinsic biological variability on model performance, evaluating how well these approaches explain observed heterogeneity in single-cell RNA expression. This framework enables investigation of how mutations in DNA sequence influence regulatory dynamics in single-cell populations.