1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.

transcript

1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing

2 Evidence-Based Practice Requires accurate diagnosis, treatment placement, and outcomes monitoring Requires accurate diagnosis, treatment placement, and outcomes monitoring Assessment over a wide range of domains Assessment over a wide range of domains The cost of evidence-based assessment is: The cost of evidence-based assessment is: Time Respondent Burden Increased staff resources (including training

3 Improving Efficiency The use of screeners and short-form instruments has significantly improved the efficiency of the assessment process The use of screeners and short-form instruments has significantly improved the efficiency of the assessment process Can help determine whether a full assessment is warranted But not a substitute for a full assessment Lack of precision Lack of precision Floor and ceiling effects Floor and ceiling effects Limited content validity Limited content validity

4 Computerized Adaptive Testing Selects items from a large bank of items based on the responses made to previous items. Selects items from a large bank of items based on the responses made to previous items. Continues to select and administer items until sufficient measurement precision is obtained. Continues to select and administer items until sufficient measurement precision is obtained. Combines the precision and comprehensiveness of a full assessment with the efficiency of a screener. Combines the precision and comprehensiveness of a full assessment with the efficiency of a screener.

5 CAT Process Decreased Difficulty Typical Pattern of Responses Increased Difficulty Middle Difficulty Score is calculated and the next best item is selected based on item difficulty +/- 1 Std. Error CorrectIncorrect

6 CAT in Clinical Assessment

7 CAT in Clinical Assessment: Issues Triage of individuals to support clinical decision making Measurement of multiple clinical dimensions and subdimensions Persons with atypical presentation of symptoms Generalizability of assessment to various groups

8 Clinical Decision Making How severe are the symptoms? How severe are the symptoms? What type of treatment is most appropriate? What type of treatment is most appropriate? Can CAT be used to answer these questions more efficiently? Can CAT be used to answer these questions more efficiently?

9 Strategy Use CAT to place persons into low, moderate and high levels of substance abuse and dependency. Use CAT to place persons into low, moderate and high levels of substance abuse and dependency. Starting Rules Starting Rules Using screener measures to set the initial measure and select the first item Variable Stop Rules Variable Stop Rules Tight precision around cut points Less precision away from cut points

10 CAT Standard Error Middle range where decisions and made and precision is controlled High & Low ranges where there is little impact on clinical decisions and precision is allowed to vary more

11 Results CAT to full-measure correlations ranged from.87 to.99 CAT to full-measure correlations ranged from.87 to.99 Classification of persons into treatment groups based on CAT and full measure (kappa coefficients) ranged from.66 to.71. Classification of persons into treatment groups based on CAT and full measure (kappa coefficients) ranged from.66 to.71. Screener starting rule improved CAT efficiency by 7 percent Screener starting rule improved CAT efficiency by 7 percent Variable stop rules improved efficiency by 15-38 Variable stop rules improved efficiency by 15-38

12 Measuring Multiple Dimensions

13 Assessment on Multiple Dimensions Instruments often measure multiple domains Instruments often measure multiple domains In CAT, treating a multi-domain measure as measuring one domain is problematic: In CAT, treating a multi-domain measure as measuring one domain is problematic: Some subdimensions may not be adequately measured

14 Strategy: Content Balancing Set an item quota for each subscale Set an item quota for each subscale Maximum number of subscale items to administer during the CAT An item is selected if: An item is selected if: Its subscale quota has not been met Provides maximum information

15 Content Balancing Procedures MethodScreener Content Balanced NoneNoNo ScreenerYesNo MixedYesYes FullNoYes

16 Percentage of Items Administered by Subscale IMDS Scale N ItemsNoneScreenerMixedFull Depression 199100 37977100 Homicidal/ Suicidal 121100 388100 Anxiety 1100 3100 Trauma 1100 3100

17 Cont. Balancing: CAT to Full IMDS Correlations IMDS ScalesNoneScreenerMixedFull IMDS0.98 0.97 Depression0.960.940.96 Homicidal/Suicidal0.600.830.960.95 Anxiety0.960.950.96 Trauma0.97 Average r0.890.930.970.96

18 Identifying Persons with Atypical Presentation of Symptoms

19 Overview Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. Statistics that can detect atypical presentation of symptoms have important clinical implications. Strategy: Identify fit statistics sensitive to atypical presentation in a CAT context Strategy: Identify fit statistics sensitive to atypical presentation in a CAT context

20 Rasch Fit Statistics Fit statistics are used to test particular hypotheses. Fit statistics are used to test particular hypotheses. Atypicalness: Used to detect unexpected outlying, off-target responses. Outlier sensitive Atypicalness: Used to detect unexpected outlying, off-target responses. Outlier sensitive Example: A person with a high level on the measured trait misses an easy item. Randomness: Used to detect unexpected inlying, targeted responses. Randomness: Used to detect unexpected inlying, targeted responses. Both infit and outfit are chi-square statistics. An infit or outfit value of 1.0 indicates perfect fit to the Rasch model. Both infit and outfit are chi-square statistics. An infit or outfit value of 1.0 indicates perfect fit to the Rasch model.

21 Problems with Fit Responses by Severity Low High RandomnessAtypicalness 1111111110000000000.30.5 1111010110001000000.61.0 1111110101000000001.01.0 111 00001110000 00000.91.3 011 1111111000000003.81.0 11111111100000 0001 3.81.0 10101010101010 10101010101010104.02.3 000 00000000011 111112.64.3

22 Clinical Implications of Misfit Our analyses indicate that there are subgroups who endorse severe symptoms without endorsement of milder symptoms. Our analyses indicate that there are subgroups who endorse severe symptoms without endorsement of milder symptoms. Examples: Examples: Atypical suicide Substance use withdrawal without dependence

23 Atypicalness by Number of Items Number of Items Atypicalness Categories Uber Typical TypicalAtypical 1630.248.121.7 1234.351.114.6 838.453.28.4 458.240.01.8

24 Content Balancing and Atypicalness AtypicalnessCategory NoneScreenerMixedFull FullIMDS Proto Typical 26.734.648.350.549.2 Typical69.058.740.838.938.4 Atypical4.36.510.910.612.4 Kappa.27.32.48.50--

25 Future Research Identify alternative fit statistics that are more sensitive to atypical presentation of symptoms Identify alternative fit statistics that are more sensitive to atypical presentation of symptoms Determine when it is likely that someone may be present with atypical symptoms, and if so, select items to confirm atypicalness. Determine when it is likely that someone may be present with atypical symptoms, and if so, select items to confirm atypicalness.

26 Generalizability of CAT to Various Groups

27 Overview Persons at the same severity level may differ in their endorsement of specific items. Persons at the same severity level may differ in their endorsement of specific items. This is called differential item functioning (DIF) This is called differential item functioning (DIF) On the GAIN, DIF has been detected by: On the GAIN, DIF has been detected by: Age (adolescent vs. adult) Gender Ethnicity/Race Drug of choice

28 DIF By GAIN Scale ScaleTotalAgeGenderRace Prim. Drug Internal Mental Distress 431351026 Crime & Violence 3111142227 Behavioral Complexity 3312 1722 Substance Problems 16859

29 DIF and CAT The presence of DIF can limit our ability to generalize measurement findings across different groups. The presence of DIF can limit our ability to generalize measurement findings across different groups. Controlling for DIF becomes complicated as the number of DIF items and groups/factors increases. Controlling for DIF becomes complicated as the number of DIF items and groups/factors increases. Currently exploring a number of methods for controlling DIF in CAT. Currently exploring a number of methods for controlling DIF in CAT.

30 Potential of CAT in Clinical Practice Reduce respondent burden Reduce respondent burden Reduce staff resources Reduce staff resources Reduce data fragmentation Reduce data fragmentation Streamline complex assessment procedures Streamline complex assessment procedures Assist in clinical decision making Assist in clinical decision making Identify persons with atypical profiles Identify persons with atypical profiles Improve measurement generalizability Improve measurement generalizability

31 Future Research How do we put it all together? How do we put it all together? Much of the research in the area of CAT has used computer simulation. There is a need to test working CAT systems in clinical practice. Much of the research in the area of CAT has used computer simulation. There is a need to test working CAT systems in clinical practice.

32 Contact Information A copy of this presentation will be at: www.chestnut.org/li/posters A copy of this presentation will be at: www.chestnut.org/li/posters For more information, please contact Barth Riley at bbriley@chestnut.org For more information, please contact Barth Riley at bbriley@chestnut.org

1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.

Documents