Measuring implementation fidelity in a cluster-randomized pragmatic trial: development and use of a quantitative multi-component approach

Miranda B. Olson, Ellen M. McCreedy, Rosa R. Baier, Renée R. Shield, Esme E. Zediker, Rebecca Uth, Kali S. Thomas, Vincent Mor, Roee Gutman & James L. Rudolph

Abstract

Background

In pragmatic trials, on-site partners, rather than researchers, lead intervention delivery, which may result in implementation variation. There is a need to quantitatively measure this variation. Applying the Framework for Implementation Fidelity (FIF), we develop an approach for measuring variability in site-level implementation fidelity. This approach is then applied to measure site-level fidelity in a cluster-randomized pragmatic trial of Music & MemorySM (M&M), a personalized music intervention targeting agitated behaviors in residents living with dementia, in US nursing homes (NHs).

Methods

Intervention NHs (N = 27) implemented M&M using a standardized manual, utilizing provided staff trainings and iPods for participating residents. Quantitative implementation data, including iPod metadata (i.e., song title, duration, number of plays), were collected during baseline, 4-month, and 8-month site visits. Three researchers developed four FIF adherence dimension scores. For Details of Content, we independently reviewed the implementation manual and reached consensus on six core M&M components. Coverage was the total number of residents exposed to the music at each NH. Frequency was the percent of participating residents in each NH exposed to M&M at least weekly. Duration was the median minutes of music received per resident day exposed. Data elements were scaled and summed to generate dimension-level NH scores, which were then summed to create a Composite adherence score. NHs were grouped by tercile (low-, medium-, high-fidelity).

Results

The 27 NHs differed in size, resident composition, and publicly reported quality rating. The Composite score demonstrated significant variation across NHs, ranging from 4.0 to 12.0 [8.0, standard deviation (SD) 2.1]. Scaled dimension scores were significantly correlated with the Composite score. However, dimension scores were not highly correlated with each other; for example, the correlation of the Details of Content score with Coverage was τb = 0.11 (p = 0.59) and with Duration was τb = − 0.05 (p = 0.78). The Composite score correlated with CMS quality star rating and presence of an Alzheimer’s unit, suggesting face validity.

Conclusions

Guided by the FIF, we developed and used an approach to quantitatively measure overall site-level fidelity in a multi-site pragmatic trial. Future pragmatic trials, particularly in the long-term care environment, may benefit from this approach.