Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris...

Assessing Intervention Fidelity in RCTs: Concepts and Methods

Panelists:David S. Cordray, PhDChris Hulleman, PhD

Joy Lesnick, PhDVanderbilt University

Presentation for the IES Research ConferenceWashington, DC

June 12, 2008

Overview

• Session planned as an integrated set of presentations• We’ll begin with:

– Definitions and distinctions; – Conceptual foundation for assessing fidelity in RCTs, a special

case.

• Two examples of assessing implementation fidelity:– Chris Hulleman will illustrate an assessment for an intervention

with a single core component– Joy Lesnick illustrates additional consideration when fidelity

assessment is applied to intervention models with multiple program components.

• Issues for the future• Questions and discussion

Definitions and Distinctions

Dimensions Intervention Fidelity

• Little consensus on what is meant by the term “intervention fidelity”.

• But Dane & Schneider (1998) identify 5 aspects:– Adherence/compliance– program components are

delivered/used/received, as prescribed;– Exposure – amount of program content

delivered/received by participants;– Quality of the delivery – theory-based ideal in terms of

processes and content;– Participant responsiveness – engagement of the

participants; and – Program differentiation – unique features of the

intervention are distinguishable from other programs (including the counterfactual)

Distinguishing Implementation Assessment from Implementation Fidelity Assessment

• Two models of intervention implementation, based on:– A purely descriptive model

• Answering the question “What transpired as the intervention was put in place (implemented).

– An a priori intervention model, with explicit expectations about implementation of core program components.

• Fidelity is the extent to which the realized intervention (tTx) is “faithful” to the pre-stated intervention model (TTx)

• Fidelity = TTx – tTx

• We emphasize this model

What to Measure?

• Adherence to the intervention model:– (1) Essential or core components (activities,

processes);– (2) Necessary, but not unique to the theory/model,

activities, processes and structures (supporting the essential components of T); and

– (3) Ordinary features of the setting (shared with the counterfactual groups (C)

• Essential/core and Necessary components are priority parts of fidelity assessment.

An Example of Core Components” Bransford’s HPL Model of Learning and

Instruction• John Bransford et al. (1999) postulate that a

strong learning environment entails a combination of:– Knowledge-centered;– Learner-centered;– Assessment-centered; and– Community-centered components.

• Alene Harris developed an observation system (the VOS) that registered novel (components above) and traditional pedagogy in classes.

• The next slide focuses on the prevalence of Bransford’s recommended pedagogy.

Challenge-based Instruction in “Treatment” and Control Courses: The VaNTH Observation System (VOS)

0

5

10

15

20

25

30

35

Year 2 Year 3 Year 4

Control

Treatment

Percentage of Course Time Using Challenge-based Instructional Strategies

Adapted from Cox & Cordray, in press

Implications

• Fidelity can be assessed even when there is no known benchmark (e.g., 10 Commandments)– In practice interventions can be a mixture of

components with strong, weak or no benchmarks• Control conditions can include core intervention

components due to:– Contamination– Business as usual (BAU) contains shared

components, different levels – Similar theories, models of action

• But to index “fidelity”, we need to measure components within the control condition

Linking Intervention Fidelity Assessment to Contemporary Models of Causality

• Rubin’s Causal Model:– True causal effect of X is (Yi

Tx – YiC)

– RCT methodology is the best approximation to the true effect

• Fidelity assessment within RCT-based causal analysis entails examining the difference between causal components in the intervention and counterfactual condition.

• Differencing causal conditions can be characterized as “achieved relative strength” of the contrast. – Achieved Relative Strength (ARS) = tTx – tC

– ARS is a default index of fidelity

Achieved Relative Strength =.15

100

90

85

80

75

70

65

60

55

50

Outcome

Infidelity

“Infidelity”

TTx

TC

0.50d

85 700.50

30d

t c

pooled

Y Yd

sd

.45

.40

.35

.30

.25

.20

.15

.10

.05

.00

Treatment Strength

(85)-(70) = 15

txC

t tx

cY

tY

Expected Relative Strength =.25

In Practice….

• Identify core components in both groups– e.g., via a Model of Change

• Establish bench marks for TTX and TC;• Measure core components to derive tTx and tC

– e.g., via a “Logic model” based on Model of Change• With multiple components and multiple methods of

assessment; achieved relative strength needs to be: – Standardized, and – Combined across:

• Multiple indicators• Multiple components• Multiple levels (HLM-wise)

• We turn to our examples….

Assessing Implementation Fidelity in the Lab and in Classrooms: The Case of a

Motivation Intervention

Chris S. Hulleman

Vanderbilt University

PERCEIVED UTILITY VALUE

INTEREST

PERFORMANCE

MANIPULATED RELEVANCE

Adapted from:Hulleman (2008); Hulleman, Godes, Hendricks, & Harackiewicz (2008);Hulleman & Harackiewicz (2008); Hulleman, Hendricks, & Harackiewicz (2007); Eccles et al. (1983); Wigfield & Eccles (2002); Hulleman et al. (2008)

The Theory of Change

MethodsLaboratory Classroom

Sample N = 107 undergraduates N = 182 ninth-graders13 classes8 teachers 3 high schools

Task Mental Multiplication Technique

Biology, Physical Science, Physics

Treatment manipulation Write about how the mental math technique is relevant to your life.

Pick a topic from science class and write about how it relates to your life.

Control manipulation Write a description of a picture from the learning notebook.

Pick a topic from science class and write a summary of what you have learned.

Number of manipulations 1 2 – 8

Length of Study 1 hour 1 semester

Dependent Variable Perceived Utility Value

Motivational Outcome

4.78

3.56

5.28

3.62

3

4

5

6

L ab C lassroom

Per

ceiv

ed U

tility

V

alue

C ontrol T reatment

g = 0.45 (p = .03)

g = 0.05 (p = .67)

?

Fidelity Measurement and Achieved Relative Strength

• Simple intervention – one core component

• Intervention fidelity: – Defined as “quality of participant

responsiveness”– Rated on scale from 0 (none) to 3 (high)– 2 independent raters, 88% agreement

Quality of Responsiveness Laboratory Classroom

C Tx C Tx

Quality of Responsiveness N % N % N % N %

0 47 100 7 11 86 96 38 41

1 0 0 15 24 4 4 40 43

2 0 0 29 46 0 0 14 15

3 0 0 12 19 0 0 0 0

Total 47 100 63 100 90 100 92 100

Mean 0.00 1.73 0.04 0.74

SD 0.00 0.90 0.21 0.71

Indexing Fidelity

Absolute– Compare observed fidelity (tTx) to absolute or

maximum level of fidelity (TTx)

Average– Mean levels of observed fidelity (tTx)

Binary– Yes/No treatment receipt based on fidelity scores

– Requires selection of cut-off value

Fidelity IndicesConceptual Laboratory Classroom

Absolute Tx

C

Average Tx 1.73 0.74

C 0.00 0.04

Binary Tx

C

1.73100

3.00 58%

0.00100

3.00 0%

0.74100

3.00 25%

0.04100

3.00 1%

100 Tx TxTx

X t

T

Tx = TxXt

=CCXt

TxTx t

Tx

nt

n

TxC c

C

nt

n

41

630.65

14

920.15

0

470.00

0

900.00

100 C CC

X t

T

Indexing Fidelity as Achieved Relative Strength

Intervention Strength = Treatment – Control

Achieved Relative Strength (ARS) Index

• Standardized difference in fidelity index across Tx and C• Based on Hedges’ g (Hedges, 2007)• Corrected for clustering in the classroom (ICC’s from .01

to .08)

Tx C

T

ARS IndexS

t t

Average ARS Index

Where,

= mean for group 1 (tTx )

= mean for group 2 (tC)

ST = pooled within groups standard deviation

nTx = treatment sample size

nC = control sample size

n = average cluster size

p = Intra-class correlation (ICC)

N = total sample size

1 2 3 2( 1)( ) (1 ) 1

4( ) 9 2T Tx C

X X n pg

S n n N

1X

2X

Group Difference Sample Size Adjustment

Clustering Adjustment

Absolute and Binary ARS Indices

Where, pTx = proportion for the treatment group (tTx )pC = proportion for the control group (tC)

nTx = treatment sample size

nC = control sample sizen = average cluster sizep = Intra-class correlation (ICC)N = total sample size

3 2( 1)2*arcsin ( ) 2*arcsin ( ) (1 ) 1

4( ) 9 2Tx C

Tx C

n pg p p

n n N

Group Difference Sample Size Adjustment

Clustering Adjustment

Achieved Relative Strength = 1.32

Fidelity

Infidelity

“Infidelity”

TTx

TC

0.74 0.04ARS 1.32

0.53g

t c

pooled

X XARS g

sd

100

66

33

0

Treatment Strength

tC

t tx

cX

tX

3

2

1

0

Average ARS Index

(0.74)-(0.04) = 0.70

Achieved Relative Strength Indices

Observed Fidelity

Lab vs. Class Contrasts

Lab Class Lab - Class

Absolute Tx 0.58 0.25

C 0.00 0.01

g 1.72 0.80 0.92

Average Tx 1.73 0.74

C 0.00 0.04

g 2.52 1.32 1.20

Binary Tx 0.65 0.15

C 0.00 0.00

g 1.88 0.80 1.08

Linking Achieved Relative Strength to Outcomes

Sources of Infidelity in the Classroom

Student behaviors were nested within teacher behaviors

• Teacher dosage• Frequency of responsiveness

Student and teacher behaviors were used to predict treatment fidelity (i.e., quality of responsiveness).

Sources of Infidelity: Multi-level Analyses

Part I: Baseline Analyses• Identified the amount of residual variability in

fidelity due to students and teachers.– Due to missing data, we estimated a 2-level

model (153 students, 6 teachers)

Student: Yij = b0j + b1j(TREATMENT)ij + rij,

Teacher: b0j = γ00 + u0j,

b1j = γ10 + u10j


Part II: Explanatory Analyses• Predicted residual variability in fidelity (quality of

responsiveness) with frequency of responsiveness and teacher dosage

Student: Yij = b0j + b1(TREATMENT)ij +

b2(RESPONSE FREQUENCY)ij + rij

Teacher: b0j = γ00 + u0j

b1j = γ10 + b10(TEACHER DOSAGE)j + u10j

b2j = γ20 + b20(TEACHER DOSAGE)j + u20j


Baseline Model Explanatory Model

Variance Component

Residual Variance

% of Total Variance

% Reduction

Level 1 (Student) 0.15437* 52 0.15346* < 1Level 2 (Teacher) 0.13971* 48 0.04924 65*Total

0.29408 0.20270* p < .001.

Case Summary• The motivational intervention was more effective

in the lab (g = 0.45) than field (g = 0.05).• Using 3 indices of fidelity and, in turn, achieved

relative treatment strength, revealed that:– Classroom fidelity < Lab fidelity– Achieved relative strength was about 1 SD

less in the classroom than the laboratory• Differences in achieved relative strength =

differences motivational outcome, especially in the lab.

• Sources of fidelity: teacher (not student) factors

Joy Lesnick

Assessing Fidelity of Interventions with Multiple Components: A Case of Assessing Preschool Interventions

33

What Do We Mean By Multiple Components in Preschool Literacy Programs?

• How do you define preschool instruction?– Academic content, materials, student-teacher interactions,

student-student interactions, physical development, schedules & routines, assessment, family involvement, etc. etc.

• How would you measure implementation?– Preschool Interventions:

• Are made up of components (e.g., sets of activities and processes) that can be thought of as constructs;

• These constructs vary in meaning, across actors (e.g., developers, implementers, researchers);

• They are of varying levels of importance within the intervention; and

• These constructs are made up of smaller parts that need to be assessed.

– Multiple components makes assessing fidelity more challenging

34

Overview• Four areas of consideration when assessing

fidelity of programs with multiple components:1. Specifying Multiple Components2. Major Variations in Program Components3. The ABCs of Item and Scale Construction4. Aggregating Indices

• One caveat: Very unusual circumstances

• Goal of this work:– To build on the extensive evaluation work that had

already been completed and use the case study to provide a framework for future efforts to measure fidelity of implementation.

35

1. Specifying Multiple Components

Our Process• Extensive review of program materials

• Potentially hundreds of components

• How many indicators do we need to assess fidelity?

36

1. Specifying Multiple Components

Interactions between teacher and child

Physical Environment

Routines and classroom management

Instruction

Assessment

Family Involvement

Materials

Content

Processes

Social & Personal Development

Healthful Living

Scientific Thinking

Social Studies

Creative Arts

Physical Development

Technology

Math

Literacy

Structured lessons

Structured units

Letter and word recognition

Book and print awareness

Phonemic awareness

Language, comprehension, response to text

Oral Language

1234123412341234

1234

Writing

1234

Constructs Sub-Constructs Facets Elements Indicators

37

• Conceptual differences between programs may happen at micro-levels

• Empirical differences between program implementation may happen at more macro levels

• Theoretically expected differences vs. empirically observed differences– Must identify conceptual differences between

programs at the smallest grain size at the outset, although may be able to detect empirical differences once implemented at higher macro levels

Grain Size is Important

38

2. Major Variations in Program Components

• One program often has some combination of these different types of components:– Scripted (highly structured) activities– Unscripted (unstructured) activities

• Nesting of activities– Micro-level (discrete) activities– Macro-level (extended) activities

What you’re trying to measure will influence how to measure it -- and how often it needs to be measured.

39

2. Major Variations in Program ComponentsType of Program Component

Example from the Case Study

Implications Abs Avg Bin ARS

Scripted (highly structured) activities

In the first treatment condition, four scripted literacy circles are required. There is known criteria for assessing fidelity.

Fidelity is the difference between the expected and observed values

TTx – tTx

Yes Yes ? Yes

Unscripted (unstructured) activities

In the second treatment condition, literacy circles are required, but the specific content of those group meetings is not specified.

There is unknown criteria for assessing fidelity. We can only record what was done, or in comparison to control

tTx

No? Yes? ? Yes

Abs: “Absolute Fidelity” Index: what happened as compared to what should have happened – highest standard

Avg: Magnitude or exposure level; indicates what happened, but it’s not very meaningful – how do we know if level is good or bad?

Bin: Binary Complier: Can we set a benchmark to determine whether or not program component was successfully implemented? >30% for example? Is that realistic? Meaningful?

ARS : Difference in magnitude between Tx and C – relative strength – is there enough difference to warrant a treatment effect?

Dots under a microscope – what is it???

Starry Night, Vincent Van Gogh, 1889

42

We must measure the trees… and also the forest…

• Micro-level (discrete) activities– Depending on the condition, daily activities (i.e. whole group time, small group

time, center activities) may be scripted or unscripted and take place within larger structure of theme under study.

• Macro-level (extended) activities– Month long thematic unit (is structured in treatment condition and unstructured in

control) is underlying extended structure within which scripted or unscripted micro activities take place.

• In multi-component programs, many activities are nested within larger activity structures. This nesting has implications for fidelity analysis – what to measure and how to measure it.

43

3. The ABCs of Item and Scale Construction

• Aim for one-to-one correspondence of indicators to component of interest

• Balance items across components

• Coverage and quality are more important than the quantity of items

44

3. Aim for one-to-one correspondence

• Example of more than one component being assessed in one item:– [Does the teacher] Talk with children throughout

the day, modeling correct grammar, teaching new vocabulary, and asking questions to encourage children to express their ideas in words? (Yes/No)

• Example of one component being measured in each item:– Teacher provides an environment wherein

students can talk about what they are doing.– Teacher listens attentively to students’ discussions

and responses.– Teacher models and/or encourages students to

ask questions during class discussions.

Diff bw T & C (Oral Lang)*:

T: 1.80 (0.32)

C: 1.36 (0.32)

ARS ES: 1.38

T: 3.45 (0.87)

C: 2.26 (0.57)

ARS ES: 1.62

*Data for the case study comes from an evaluation conducted by Dale Farran, Mark Lipsey, Carol Blibrey, et al.

45

3. Balance items across components

• How many items are needed for each scale?

• Oral-language over represented

• Scales with α<0.80 not reliable

Literacy Content #items α

Oral language 20 0.95

Language, comprehension, and response to text

7 0.70


2 0.80

Phonemic awareness 3 0.68


7 0.76

Writing 6 0.67

Literacy Processes:

Thematic Studies 4 0.62

Structured Literacy Circles

2 0.62

46

3. Coverage and quality more important than quantity

Literacy Content #items α

Oral language 20 0.95


7 0.70


2 0.80

Phonemic awareness 3 0.68


7 0.76

Writing 6 0.67

Literacy Processes:

Thematic Studies 4 0.62

Structured Literacy Circles

2 0.62

• Two scales each have 2 items, but very different levels of reliability

• How many items are needed for each scale?

• Oral Language: 20 items. Randomly selected items and recalculated alpha:– 10 items: α = 0.92– 8 items: α = 0.90– 6 items: α = 0.88– 5 items: α = 0.82– 4 items: α = 0.73

47

4. Aggregating Indices

To weight or not to weight? How do we decide?

• Possibilities:– Theory– Consensus– $$ spent– Time spent

• Case study example – 2 levels of aggregation within and between:– Unit-weight within facet: “Instruction – Content – Literacy”– Hypothetical weight across sub-construct: “Instruction – Content”

48

YOU ARE HERE….




Instruction

Assessment

Family Involvement

Materials

Content

Processes


Healthful Living

Scientific Thinking

Social Studies

Creative Arts


Technology

Math

Literacy

Structured lessons

Structured units



Phonemic awareness


Oral Language

1234123412341234

1234

Writing

1234

UNIT WEIGHT

THEORY WEIGHT

HOW WEIGHT?

HOW WEIGHT?

49


Literacy Content Average Fidelity Index:

Tx

Average Fidelity Index:

C

“Absolute Fidelity” Index:

Tx

“AbsoluteFidelity” Index:

C

Achieved Relative Strength Fidelity Index

(Average)

Achieved Relative Strength Fidelity Index

(Absolute)

Oral language 1.82 1.40 91% 70% 1.36 0.53


1.74 1.37 87% 69% 1.45 0.44


1.91 1.39 96% 70% 1.38 0.73

Phonemic awareness 1.73 1.48 87% 74% 0.74 0.32


1.75 1.36 88% 68% 1.91 0.50

Writing 1.68 1.37 84% 69% 1.22 0.34

Average – unit weighting

1.77 1.38 89% 75% 1.34 0.48

**clustering is ignored

Unit-weight within facet: Instruction – Content – Literacy


Instruction - Content Treatment Control HypotheticalWeight

Literacy 1.77 1.38 40%

Math 1.51 1.80 5%

Social and Personal Development 1.79 1.58 35%

Scientific Thinking 1.57 1.71 5%

Social Studies 1.84 1.41 5%

Creative Arts 1.66 1.32 5%

Physical Development 1.45 1.50 3%

Technology 1.45 1.57 2%

100%

Unweighted Average 1.63 1.53

Weighted Average 1.74 1.49

Theory-weight across sub-construct (hypothetical)

51

YOU ARE HERE …




Instruction

Assessment

Family Involvement

Materials

Content

Processes


Healthful Living

Scientific Thinking

Social Studies

Creative Arts


Technology

Math

Literacy

Structured lessons

Structured units



Phonemic awareness


Oral Language

1234123412341234

1234

Writing

1234

UNIT WEIGHT

THEORY WEIGHT

HOW WEIGHT?

HOW WEIGHT?

Key Points and Future Issues

• Identifying and measuring, at a minimum, should include model-based core and necessary components;

• Collaborations among researchers, developers and implementers is essential for specifying:– Intervention models;– Core and essential components;– Benchmarks for TTx (e.g., an educationally

meaningful dose; what level of X is needed to instigate change); and

– Tolerable adaptation

Points and Issues

• Fidelity assessment serves two roles:– Average causal difference between conditions; and– Using fidelity measures to assess the effects of variation in

implementation on outcomes.

• Should minimize “infidelity” and weak ARS:– Pre-experimental assessment of TTx in the counterfactual

condition…Is TTx > TC?– Build operational models with positive implementation drivers

• Post-experimental (re)specification of the intervention: For example:– MAPARS = .3(planned prof.development)+.6(planned use of data

for differentiated instruction)

Points and Issues

• What does an ARS of 1.20 mean?

• We need experience and a normative framework:– Cohen defined a small effect on outcomes as

0.20; medium as 0.50, and large as 0.80– Overtime this may emerge for ARS

Date post:	26-Dec-2015
Category:	Documents
Upload:	tyrone-baker
View:	215 times
Download:	0 times

Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris...

Documents