Speaker
Description
Drug resistance remains a primary obstacle in oncology, transforming clinical management into a complex, sequential decision-making problem. While Reinforcement Learning (RL) has shown promise in optimizing adaptive dosing for single agents, its application to large polytherapeutic panels—where clinicians choose from numerous drugs with overlapping resistance profiles—remains underexplored. This gap is exacerbated by the lack of mechanistic models capturing how drug sensitivity is shared across diverse agents due to cell subpopulation overlap.
In this talk, I will present a deep RL framework designed to optimize treatment strategies within a broad therapeutic panel. Inspired by the cyclic response-and-relapse dynamics of multiple myeloma, a mathematical model of tumor evolution is developed to track the competition between sensitive and resistant subpopulations. Crucially, a statistical model is integrated to encode correlations in drug sensitivity profiles, offering a conceptual approach to account for the biological reality of cross-resistance. Using these environments, a Proximal Policy Optimization (PPO) agent is trained to navigate the high-dimensional decision landscape. The agent learns to select drug sequences that adaptively respond to the evolving virtual tumor, aiming to minimize long-term tumor burden. Results demonstrate that RL can identify non-intuitive sequencing strategies that outperform standard clinical heuristics, providing a promising proof-of-concept for the future of AI-driven personalized medicine.