OBJECTIVES: Providing participants with choices in how their data are collected may lead to greater participation, less missing data, improved data quality, and in some cases, decreased costs in data collection. To facilitate combining data from multiple versions, the goals of this study were to provide recommended steps to assess measurement comparability using a crossover study design and a casefinding questionnaire, the Lung Function Questionnaire (LFQ), as an example.
METHODS: In the study, the LFQ was administered to participants via paper, Web, interactive voice response system, and interview. A randomized crossover design was used to gather data across the multiple administration types. In addition to the LFQ, participants completed demographic and health questions, and a short questionnaire regarding their administration preference. Four recommended evaluation steps are described and illustrated using data from the crossover study: 1) comparisons of the item-level responses and agreement; 2) comparison of mean scale scores; 3) classification of scores; and 4) questions designed to collect usability and administration preference.
RESULTS: In this example, item-level kappa statistics between the paper and the alternate versions ranged from good to excellent, intraclass correlation coefficients for mean scores were above 0.70, and the rate of disagreement ranged from 2% to 14%. In addition, although participants had an administration preference, they reported few difficulties with the versions they were assigned.
CONCLUSIONS: The steps described provide a guide for evaluating whether to combine scores across administration versions to simplify analyses and interpretation under a crossover design. The guide recommends the investigation of item-level responses, summary scores, and participant usability/preference when comparing versions. Each of these steps provides unique information to support a comprehensive evaluation and informed decisions regarding whether to combine data. Results of this particular study for each of the evaluation steps supported the use of multiple modes of the LFQ.