Treatment Effect Estimation (TEE) is essential in healthcare to evaluate the impact of treatment strategies on outcomes. While Randomized Clinical Trials (RCTs) are the gold standard, they are costly and time-consuming. Observational data, like medical claims, offer a complementary approach. Neural Network (NN) methods, especially Transformer-based models, have shown promise in TEE from observational data, outperforming traditional methods like regression trees or random forests. However, existing NN methods have limitations, such as being task- or data-specific and requiring large labelled datasets. Transformers, widely used in various domains like natural language processing and computer vision, offer a pre-training and fine-tuning paradigm. However, applying this paradigm to TEE faces challenges like encoding structured patient data, lack of large pre-training datasets, and real-world benchmark tasks.

 

Source: Patterns Journal

 

A Novel Framework for Estimating Treatment Effects from Real-World Medical Data

An article recently published in the Patterns journal introduces a novel framework called Causal Treatment Effect Estimation (CURE) for estimating treatment effects. It involves extracting large-scale structured patient data from real-world medical claims databases, encoding the data into sequential input, and obtaining around 3 million unlabeled patient sequences for pre-training. Downstream datasets with labelled treatment and outcome are derived from specific Treatment Effect Estimation (TEE) tasks from established Randomized Clinical Trials (RCTs). Four downstream tasks, each containing 10,000–20,000 patient samples, are created to evaluate the comparative effectiveness of two treatment effects in reducing stroke risk for patients with coronary artery disease (CAD). A Transformer-based model is pre-trained on the unlabeled data using unsupervised learning to generate contextualised patient representations. To address issues like complex hierarchical structure and data irregularity, a comprehensive embedding method is proposed to incorporate structure and time information. Finally, the pre-trained model is fine-tuned on various downstream TEE tasks.

 

New Patient Data Encoding Method for Enhanced Treatment Effect Estimation

The authors propose CURE, a Transformer-based framework for Treatment Effect Estimation (TEE). They introduce a novel patient data encoding method to handle structured observational patient data, incorporating covariate type and time into patient embeddings. Large-scale patient data from real-world medical claims are pre-processed for pre-training, and four downstream TEE tasks are derived from established RCTs for evaluation. Through extensive experiments, CURE demonstrates superior performance compared to state-of-the-art methods, significantly improving outcome prediction and estimating heterogeneous effects. The estimated treatment effects are verified against RCT conclusions. Additionally, ablation studies explore the effectiveness of CURE, including the proposed patient embedding, pre-training data size impact, and generalizability to low-resource fine-tuning data.

 

Advancing Treatment Effect Estimation with Significant Performance Enhancements

CURE demonstrates significant performance improvements across four downstream tasks compared to existing methods. Notably, it achieves an average 4% enhancement in area under the receiver operating characteristic curve (AUC) and a 7% improvement in area under the precision-recall curve (AUPR) for outcome prediction. Additionally, CURE exhibits an 8% absolute improvement in estimating heterogeneous effects (IF-PEHE) over the best baseline among the four tasks.

 

Validated Accuracy and Scalability

Validation against published Randomized Clinical Trials (RCTs) confirms the accuracy of CURE's treatment effect estimations. The framework displays scalability, effectively handling large-scale unlabelled patient data for pre-training and smaller labelled datasets for fine-tuning, demonstrating its potential in real-world applications. CURE's novel patient data encoding method outperforms existing approaches, capturing complex hierarchical relationships and temporal irregularities inherent in patient data more effectively. The framework generalises well to datasets similar to MarketScan, broadening its applicability across diverse healthcare datasets. CURE addresses potential heterogeneity in care protocols by analysing a comprehensive array of covariates, serving as proxies for unavailable institute-level data.

 

Despite moderate class imbalance in downstream datasets, CURE consistently outperforms existing models. This is particularly evident in the improvement of AUPR, which highlights its robustness in scenarios of class imbalance. Overall, the results underscore CURE's efficacy in addressing key challenges in TEE from observational data, indicating its potential as a valuable tool in healthcare research and decision-making processes.

 

Source & Image Credit: Patterns Journal

 




Latest Articles

Treatment Effect Estimation, TEE, CURE framework, Transformer-based models, healthcare research CURE offers a transformative approach to Treatment Effect Estimation (TEE) in healthcare, leveraging Transformer-based models for superior outcomes from real-world medical data.