Speaker
Description
Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion.
We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modalityspecific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms.
Improvements varied by baseline strength: linear models showed a 15.9% increase (p < 0.001 for the Wilcoxon signed-rank test, consistent across 15 cross-validation folds), random survival forests gained 5.4% (p = 0.002), and gradient boosting methods improved by 2.1% (p = 0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns.
MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation.
Bibliography
@book{van_buuren_flexible_2018,
location = {Boca Raton, {FL}.},
title = {Flexible Imputation of Missing Data. Second Edition.},
publisher = {{CRC} Press},
author = {van Buuren, S.},
date = {2018},}
@article{krones_review_2025,
title = {Review of multimodal machine learning approaches in healthcare},
volume = {114},
issn = {1566-2535},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524004688},
doi = {10.1016/j.inffus.2024.102690},
abstract = {Machine learning methods in healthcare have traditionally focused on using data from a single modality, limiting their ability to effectively replicate the clinical practice of integrating multiple sources of information for improved decision making. Clinicians typically rely on a variety of data sources including patients’ demographic information, laboratory data, vital signs and various imaging data modalities to make informed decisions and contextualise their findings. Recent advances in machine learning have facilitated the more efficient incorporation of multimodal data, resulting in applications that better represent the clinician’s approach. Here, we provide an overview of multimodal machine learning approaches in healthcare, encompassing various data modalities commonly used in clinical diagnoses, such as imaging, text, time series and tabular data. We discuss key stages of model development, including pre-training, fine-tuning and evaluation. Additionally, we explore common data fusion approaches used in modelling, highlighting their advantages and performance challenges. An overview is provided of 17 multimodal clinical datasets with detailed description of the specific data modalities used in each dataset. Over 50 studies have been reviewed, with a predominant focus on the integration of imaging and tabular data. While multimodal techniques have shown potential in improving predictive accuracy across many healthcare areas, our review highlights that the effectiveness of a method is contingent upon the specific data and task at hand.},
pages = {102690},
journaltitle = {Information Fusion},
shortjournal = {Information Fusion},
author = {Krones, Felix and Marikkar, Umar and Parsons, Guy and Szmul, Adam and Mahdi, Adam},
date = {2025-02-01},
keywords = {Data fusion, Deep learning, Healthcare, Multimodal machine learning},}
@article{van_loon_imputation_2024,
title = {Imputation of missing values in multi-view data},
volume = {111},
issn = {1566-2535},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524003026},
doi = {10.1016/j.inffus.2024.102524},
abstract = {Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression ({StaPLR}) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as {missForest} and predictive mean matching possible in settings where they would otherwise be computationally infeasible.},
pages = {102524},
journaltitle = {Information Fusion},
shortjournal = {Information Fusion},
author = {van Loon, Wouter and Fokkema, Marjolein and de Vos, Frank and Koini, Marisa and Schmidt, Reinhold and de Rooij, Mark},
date = {2024-11-01},
keywords = {Feature selection, Imputation, Missing data, Multi-view learning, Stacked generalization},}
@article{polsterl_scikit-survival_2020,
title = {scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn},
volume = {21},
issn = {1533-7928},
url = {http://jmlr.org/papers/v21/20-729.html},
shorttitle = {scikit-survival},
abstract = {scikit-survival is an open-source Python package for time-to-event analysis fully compatible with scikit-learn. It provides implementations of many popular machine learning techniques for time-to-event analysis, including penalized Cox model, Random Survival Forest, and Survival Support Vector Machine. In addition, the library includes tools to evaluate model performance on censored time-to-event data. The documentation contains installation instructions, interactive notebooks, and a full description of the {API}. scikit-survival is distributed under the {GPL}-3 license with the source code and detailed instructions available at https://github.com/sebp/scikit-survival},
pages = {1--6},
number = {212},
journaltitle = {Journal of Machine Learning Research},
author = {Pölsterl, Sebastian},
date = {2020},}
@article{wolpert_stacked_1992,
title = {Stacked generalization},
volume = {5},
issn = {0893-6080},
url = {https://www.sciencedirect.com/science/article/pii/S0893608005800231},
doi = {10.1016/S0893-6080(05)80023-1},
abstract = {This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the {NETtalk} task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.},
pages = {241--259},
number = {2},
journaltitle = {Neural Networks},
shortjournal = {Neural Networks},
author = {Wolpert, David H.},
date = {1992-01-01},
keywords = {Combining generalizers, cross-validation, Error estimation and correction, Generalization and induction, Learning set preprocessing},}