Tools for Real World Evidence Using Diverse Data Sources

Share on: 
picture of Romin Pajouheshnia

Romin Pajouheshnia, PhD
Research Epidemiologist
RTI Health Solutions

Transparency and Reproducibility in Real-World Evidence (RWE) Generation

illustration of funnel with different types of information enteringThe FDA website defines real-world data as “data relating to patient health status and/or the delivery of healthcare routinely collected from a variety of sources.” It is the variety that creates both challenges and opportunities for the generation of RWE. The impact on reproducibility is most evident when we find differences across studies that use different data sources. Even in multidatabase studies, where we can use approaches to standardize our analyses across different real-world data sources, we often still observe differences in the incidence and prevalence of health outcomes, medication use, and even measures of association. These differences can only be explained by differences in the real-world data sources themselves. RWE is extremely difficult to understand and use without a complete picture of where real-world data come from, how they are generated, and what they capture. 

As a part of the DIVERSE initiative, a collaboration among members of the ISPE Databases Special Interest Group, we helped conduct a scoping review to identify methods, tools, and recommendations for describing real-world data sources. The result was the DIVERSE Framework for describing real-world data sources (Gini et al., 2024). The DIVERSE Framework consists of 9 key dimensions of data source diversity, which if properly described, help us to fully understand a data source.

The DIVERSE Framework 

  • The organization accessing the data source – Who makes the data sources accessible for research and under what conditions?
  • The data originator – Who collects the data and for what purpose?
  • The prompts for record creation – What are the processes or events that cause data to be recorded in the data source?
  • Inclusion in the population – What events cause individuals to enter or exit the underlying population of the data source?
  • Content – What kinds of information does the data source capture? 
  • Data dictionary – How is the data source structured and coded (the data dictionary or model), and is free text recorded?
  • Time spans – Over which time periods are the data available?
  • Healthcare system and culture – How do the underlying healthcare system and culture affect what is or is not captured in the data source?
  • Data quality – What do we know about the accuracy and completeness of information in the data source and any changes over time?

Tools for Implementing Studies in Diverse Real-World Data Environments

In addition to tools to better understand and describe real-world data, we also need tools to help us effectively implement studies while accounting for data diversity. In a project led by partners at the Real-World Evidence group at the University Medical Center Utrecht, within the VAC4EU network, we collaborated to develop a flexible analytic pipeline to enable the uniform implementation of self-controlled study designs across a federated network of data sources. The self-controlled case series and self-controlled risk interval designs are part of the arsenal of tools that epidemiologists use to assess the postmarketing safety of vaccines. This work involved developing software to implement self-controlled designs in studies using a common data model and common analytic framework, allowing research teams across the VAC4EU network to efficiently conduct self-controlled studies. 

New approaches to analyze diverse, real-world data sources are making RWE easier to understand and use. These tools will help us to standardize the way we implement analyses across data sources, while highlighting what makes them unique.

 

Describing diversity of real world data sources in pharmacoepidemiologic studies: The DIVERSE scoping review. Gini R, Pajouheshnia R, Gardarsdottir H, Bennett D, Li L, Gulea C, Wientzek-Fleischmann A, Bazelier MT, Burcu M, Dodd C, Durán CE, Kaplan S, Lanes S, Marinier K, Roberto G, Soman K, Zhou X, Platt R, Setoguchi S, Hall GC. Pharmacoepidemiol Drug Saf. 2024 May;33(5):e5787. doi: 10.1002/pds.5787. PMID: 38724471.

Check out this newsletter drafted by Romin and other colleagues from the ISPE Databases SIG and published on the ISPE website. The authors provided perspectives on RWD sources, diversity, and quality – detailing key findings from the DIVERSE initiative work. 
Embracing the Diversity of Real-World Data Sources - International Society for Pharmacoepidemiology
 

Staff Members