Cid Royo A, Choi J, Plana E, Garcia Esteban R, Pajouheshnia R, Ryan O. A flexible open-source analytic pipeline for self-controlled analysis of real world data. Poster presented at the 2024 ISPE Annual Meeting; August 28, 2024. Berlin, Germany.


BACKGROUND: Self-Controlled Case Series (SCCS) & Self-Controlled Risk Interval (SCRI) analyses are popular tools to monitor the post-marketing safety of vaccines. A prerequisite for conducting these analyses using real world data (RWD) is that the researcher performs a myriad of study-specific data-engineering steps, a process which can be time-consuming, error prone, and difficult to replicate. While software such as the SCCS packages aids researchers in fitting appropriate statistical models for this design, its data preparation functionality does not support all of the steps necessary when dealing with RWD; while the SelfControlledCaseSeries package aids in data preparation, it requires that data is first transformed to the OMOP CDM. A generalized package to support data-preparation for SCCS/SCRI analysis using RWD is currently lacking.

OBJECTIVES: We develop a flexible analytic pipeline for SCCS & SCRI analyses of RWD. Using a set of easyto- construct user inputs, this pipeline allows researchers to perform a variety of different selfcontrol analyses using a single standardized process, avoiding the need for the researcher to design and perform computationally expensive, time-consuming, and difficult-to-replicate data pre-processing steps.

METHODS: The pipeline is designed to take as input a data-frame containing the study population, dates of exposure(s) and outcome(s) of interest, and a set of metadata files which specify the design of their study. The pipeline then performs a series of standard transformations including: the creation of individual-specific time-windows of different types and lengths; identification of relevant cases within each window; merging and trimming of overlapping windows; censoring; and handling of time-varying covariates. At each step the pipeline creates outputs which researchers can use to check the consequences of their design choices. Finally, the pipeline outputs an analytic dataset, along with descriptives such as attrition tables, which can be used to directly estimate the effects of interest.

RESULTS: The pipeline, along with metadata models and flexible functionality, is implemented in R and will be made available open-source on GitHub. We will demonstrate the flexibility and operation of the pipeline with the aid of two simulated examples using different SCRI designs, inspired by real-world post-authorization vaccine safety studies.

CONCLUSIONS: The pipeline simplifies and standardizes the complex data preparation process required for performing self-controlled analyses with real world data. This will aid researchers to use selfcontrol designs to estimate the effects of any transient exposure on an outcome of interest, and crucially, in making the results of these analyses reliable and reproducible.

Share on: