WP 2: Data and modeling

The overall objective of WP2 is to develop algorithms that provide predictions of outcome for individual patients for each relevant treatment option for 3 major cancers. The algorithms will be translated into decision support tools and included in the MetroMap as developed in WP4.

work package 2 tasks


Creating analysable datasets

We have secured access to a unique set of data, confirmed to be used in this project, from different sources from around Europe, for 3 major cancers. These high-quality and granular data sources include patient characteristics, treatment characteristics, clinical outcomes and patient reported outcomes for large numbers of individual patients. We will translate local data to a Common Data Model (CDM) to ensure that the data can be analysed and that the data will be Findable, Accessible, Interoperable and Reusable (FAIR) for the research community. Open source software will be used throughout.


Develop algorithms predicting outcomes trajectories for different treatments

To predict outcomes we will apply state-of-the art methods, including advanced penalized regression methods, machine learning and/or AI techniques with extensive cross-validation and ‘leave-one-study-out’ validation. As outcomes we will model clinical outcomes and PROMS that are relevant to the decision making processes for each specific cancer. To inform treatment decisions the goal is to quantify the expected benefits and harms of specific treatments, which differ by patient. We will build from the PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement as guidance, starting with a robust risk-modelling approach, followed by a more data driven effect-modelling approach.
Estimates of treatment effect will be borrowed from RCTs and meta-analyses, combined with causal inference analyses in observational data, where we will use propensity score and instrumental variable analysis to mitigate risks of bias.


Quantify uncertainty

The uncertainty in the individual patient trajectories will be disentangled according to its constituent sources, including the statistical basis (effective sample size for predictions, for treatment effects, and their combination), the freedom in modelling approaches (model degrees of freedom), and the anticipated transportability to different care settings and hospitals in different countries (generalizability). Quantifications will be carried out for each aspect using advanced data analysis including random effect modelling, scenario analysis and sensitivity analysis. Uncertainty will be represented in confidence intervals and prediction intervals, with extensions to novel ‘uncertainty intervals’ that capture the full spectrum of sources of variation in predictions.


Translate into decision-support tools

The predictions will be presented in advanced risk calculators, as for the PREDICT algorithm, and embedded in the MetroMapping methodology. The presentation will clearly separate what patients, clinicians and policymakers can expect without a specific treatment versus with a specific treatment.


Hester Lingsma

Work Package 2 Lead

Work package leader

Interested to learn more about the project?