Speaker
Description
Regulation of gene expression is essential for organisms to survive and reproduce in fluctuating environments. Machine learning has increasingly been used to understand relationships between non-coding sequences and gene expression. However, most studies rely on data from a limited set of model organisms under controlled laboratory conditions, restricting our understanding of gene regulation in natural environments. Here, we applied machine learning-based cis-regulatory motif detection to two species of Fagaceae, a family of dominant forest trees, using field transcriptome data collected across a two-year seasonal cycle in leaf and bud tissues. First, we trained a convolutional neural network (CNN) to predict the presence or absence of gene expression from 2-kb flanking sequences. The model achieved high predictive accuracy (ROC-AUC > 0.8) for most genes, except those related to photosynthesis, suggesting distinct regulatory mechanisms in these pathways. We then extracted DNA motifs based on importance scores and identified motifs corresponding to several known transcription factors, including the BBR/BPC family, a transcriptional repressor in maize and Arabidopsis. These results suggest that binding sites of transcriptional repressors known across a wide range of plants may also contribute to gene expression in forest trees and provide candidates for future experimental validation.