Date post: | 17-May-2018 |
Category: |
Documents |
Upload: | nguyendieu |
View: | 215 times |
Download: | 0 times |
2017 Views of Climate and Learning (VOCAL) Validity Study 2017 MCAS QuestionnaireApril 2018
Massachusetts Department of Elementary and Secondary Education75 Pleasant Street, Malden, MA 02148-4906Phone 781-338-3000 TTY: N.E.T. Relay 800-439-2370www.doe.mass.edu
This document was prepared by the Massachusetts Department of Elementary and Secondary Education
Jeff WulfsonActing Commissioner
The Massachusetts Department of Elementary and Secondary Education, an affirmative action employer, is committed to ensuring that all of its programs and facilities are accessible to all members of the public.
We do not discriminate on the basis of age, color, disability, national origin, race, religion, sex, gender identity, or sexual orientation. Inquiries regarding the Department’s compliance with Title IX and other civil rights laws may be directed to the
Human Resources Director, 75 Pleasant St., Malden, MA 02148-4906. Phone: 781-338-6105.
© 2017 Massachusetts Department of Elementary and Secondary EducationPermission is hereby granted to copy any or all parts of this document for non-commercial educational purposes. Please credit the
“Massachusetts Department of Elementary and Secondary Education.”
This document printed on recycled paper
Massachusetts Department of Elementary and Secondary Education75 Pleasant Street, Malden, MA 02148-4906
Phone 781-338-3000 TTY: N.E.T. Relay 800-439-2370www.doe.mass.edu
2
Table of Contents
1. Purpose of this Report………………………………………………………….…12. Survey Design, Survey Administration, and Profile of Respondents…..……... 2
2.1. School climate construct…………………………………………………… 22.2. Survey design principles ………………………………………………….. 42.3. Item and measure development……………………………………………. 52.4. Stakeholder engagement…………………………………………… ……... 62.5. Form building ……………………………… ……………………………... 72.6. Form linking and anchoring process………...……………………………... 92.7. Administration of forms…………………….……………………………... 112.8. Profile of respondents……………………….……………………………... 11
3. Data Analyses Procedures……………………………………………………….. 123.1. Rasch methodology ………………………………………………………... 12
4. Validity Framework ………………………………………………………………13
4.1. Validity framework………………………………………………………... 135. Validity Evidence for VOCAL scales and sub-scales…………………………... 14
5.1. Content validity……………………………………………………………. 155.2 Substantive validity ………………………………………………………. 165.3. Generalizability…………………………………………………………….. 20
5.3.1. Reliability evidence…………………………………………………. 205.3.2. Differential item functioning (DIF) analyses……………………….. 23
5.4. Structural Validity………………………………………………………….. 265.4.1. Residual analyses…………………………………………………….265.4.2. Sub-scale correlations………………………………………………. 27
5.5. External Validity…………………………………………………………... 295.5.1. Student-level responsiveness……………………………………….. 305.5.2. School-level responsiveness and score reporting…………………… 31 5.5.3. Criterion validity…………………………………………………… 33
6. Conclusion………………………………………………………………………… 35
7. References………………………………………………………………………… 378. Appendices……………………………………………………………………….. 41
A. VOCAL test specification………………………………………………… 41B. MCAS student questionnaires (VOCAL forms)…………………………... 42C. Rasch model and logit unit of measurement ……………………………… 55D. Guide to evaluating Rasch model validity data…………………………… 57E. Technical quality of 70-item VOCAL scale………………………………. 58F. Measure order of 70-item VOCAL scale …………………..…………….. 59G. Item prompts for engagement, safety, and environment domains………... 60H. Person Reliability of VOCAL scale, grade-level scales and sub-scales …. 63I. Subgroup DIF plots ………………………………………………………. 64 J. Residual analyses output…………………………………………………. 68K. Transformation of logit measures………………………………………… 69
3
1. Purpose of this Report
This report offers reliability and validity evidence to support the use of DESE’s Views of
Climate and Learning (VOCAL) school climate survey. DESE sought to develop a school
climate instrument that would (1) differentiate levels of school climate within and between
schools, and (2) provide schools and districts with concrete, actionable information about school
climate in order to support continuous improvement. A positive school environment is associated
with healthier social and emotional well-being, reduced substance abuse, and decreased
behavioral problems of students in school (Thapa, Cohen, Guffey and Higgins-D’Alessandro,
2013), and is positively related to students’ academic success (Berkowitz, Moore, Astor, and
Benbenishty, 2017; Hough, Kalogrides, and Loeb, 2017). This technical report provides
information on the survey development process used to develop three forms of the school climate
survey that measure students’ views in grade 5, grade 8, and grade 10, respectively. The report
makes available the results of the reliability and validity analyses performed to justify the use
and reporting of VOCAL scores to schools and districts.
This report is intended for readers with knowledge of survey development and validation,
psychometrics and educational measurement. Familiarity with Wolfe and Smith’s (2007a,
2007b) and Messick’s (1995) construct validity frameworks for instrument development is
helpful. Evidence from six aspects of survey validity (content, substantive, structural,
generalizability, external, and consequential) combine to support the use of VOCAL scores.
1
2. Survey Design, Survey Administration and Profile of Respondents
Instrument development relied on a three-pronged strategy: (1) defining the school climate
construct, (2) incorporating stakeholder feedback to support item and instrument development;
and (3) using Rasch theory to ideate and guide item development and validity analyses. VOCAL
instrument development and validity activities are summarized in Figure 1.
2.1. School Climate Construct
DESE used the United States Department of Education’s (USED, 2017) conceptual framework
for the school climate construct, with survey items designed to measure student perceptions of
three dimensions of school climate: engagement, safety and environment. Each dimension is
further divided into three domains/topics. For example, the engagement dimension consists of
items measuring cultural and linguistic competence, teacher/adult-on-student relationships and
student-on-student relationships, and participation in school life. The conceptual framework with
construct domain definitions is outlined in Table 1. Items from publicly available school climate
instruments were evaluated for inclusion, with a review of school climate research articles
conducted to help ideate new item development. DESE leveraged work done during the
development of its educator evaluation student feedback surveys (SFS), with several SFS items
adapted for potential inclusion in the school climate surveys.
2
Table 1
VOCAL’s conceptual framework
Dimension Domain (label) Definition
Engagement(ENG)
Cultural and Linguistic Competency (CLC)
The extent students feel the school/staff value diversity, manage dynamics of differences, avoid stereotypes and acquire cultural knowledge.
Relationships (REL)The extent students feel there is a social connection and respect between staff/teachers and students, and between students and their peers.
Participation (PAR) The extent students feel they or their parents are engaged in school events.
Safety(SAF)
Emotional Safety (EMO)
The extent students feel a bond to the school, and students/teachers/adults support the emotional needs of others.
Physical Safety (PSF)The extent that students feel physically safe within the school environment and know how to respond to threats to themselves or the school.
Bullying/Cyber-bullying (BUL)
The extent that students report different types of bullying behaviors that occur in the school and the extent that school/staff/students try to counteract bullying.
Environment(ENV)
Instructional (INS)The extent that students feel the instructional environment is engaging, challenging and supportive of learning.
Mental Health (MEN) The extent that students learn to self-manage their feelings and get support if needed.
Discipline (DIS) The extent that discipline is fair, applied consistently and evenly, and a shared responsibility.
2.2. Survey Design Principles
It was important that the surveys were designed with the rigor expected of cognitive tests. When
developing measures in the Rasch framework, best test design (Wright & Stone, 1979) involves:
Items that are evenly spaced from easiest to hard;
The average item difficulty (usually set to zero) centered at the mean of the target or
student distribution;
4
Survey items that are sufficiently dispersed to cover the target distribution;
Items from different dimensions/domains (topics of school climate) overlapping each
other on the item-person continuum, and
A test of appropriate length to provide the responsiveness required to differentiate
performance.
These psychometric criteria were adopted and used to guide the selection of items for the school
climate survey. However, it is important to stress that the stakeholder engagement and feedback
discussed in the next section was the key driver for item selection.
2.3. Item and Measure Development
DESE developed items using a hierarchical perspective. DESE first identified what behaviors,
practices, or systems create the foundation for a positive school climate; students have more
likelihood of responding affirmatively to these foundational items. DESE then identified
behaviors, practices, or systems that represent exemplary school climates. These
behaviors/practices/systems, by their nature, are more difficult to enact within schools and
students are likely to have greater difficulty responding affirmatively to items designed to
measure them. Once these behaviors/practices/systems were identified, items were developed or
acquired from public available surveys to measure and anchor the two ends of the school climate
continuum. The next step in the item development process was to develop or obtain publicly
available items to fill in the continuum. Therefore, the rating scale (always true to never true)
combined with the hypothesized distribution of item difficulties is designed to stretch the item
calibrations and person distribution along the school climate continuum for each dimension and
provide meaningful differentiation of student perceptions. This process was also used to develop
5
items for each domain and helped ensure that item and measure development conformed to best
practice.
Items for the grade 5 form were simplified to ensure students could read and understand the
content. For example, the item, “Adults working at this school treat all students respectfully,
regardless of students’ race, culture, family background, sex, or sexual orientation” was
administered in grade 8 and grade 10; the corresponding item in grade 5 was, "Adults working at
this school treat all students respectfully". Items were also developed for the specific school
climate context. For example, the item, “At our school, a teacher or other adult is available for
students if they need help because of sexual assault or dating violence” was only administered on
the grade 10 survey. Similarly, items related to cyber-bullying were placed on the grade 8 form
to account for the predominance of this type of bullying in middle-school grades. Once items
were selected or developed, they were reviewed by different stakeholder groups.
2.4. Stakeholder Engagement
Multiple stakeholder groups (agency experts, student advisory councils, principal and teacher
advisory councils, and special interest groups) met to review items. The item review process also
prompted new item development. Three to four times the number of items needed for the final
surveys were developed or selected, and students and other stakeholders were asked to rate them.
The item review process was designed to ensure item representativeness (did the items measure
the concept it was designed to measure?), accessibility (would students understand it?),
actionability (would schools be able to use the information?), and responsiveness (would the
6
items measure a continuum of student perceptions that differentiate strong school climates from
relatively weak ones?). Stakeholders worked in groups to review, revise, and reject items.
To further ensure items placed on the grade 5 form met these inclusion criteria, cognitive
interviews were undertaken with a small, but diverse group of fifth-graders. The purpose of these
interviews was to elicit and probe whether students understood the item content in accordance
with the item developer’s intent. Participants in the cognitive interviews reported that most of the
items were easy to understand. The interviews, however, did result in DESE simplifying the
content and readability of some items. Through a deliberative process, the items that survived the
review process were placed on the three forms of the school climate survey; each grade-level
form was designed to meet the best survey design criteria highlighted previously.
2.5. Form Building
DESE administered three parallel forms of the VOCAL survey in the spring of 2017; the number
of items on each form was:
24 items for grade 5 students,
34 item items for grade 8 students,
29 items for grade 10 students
Each survey measured the breadth of the school climate construct and included common items
that were used to place all student responses onto the same scale metric; common items
represented over 30% of the total number of items on each form. The number and types of items
on each form are shown in Figure 2, with a detailed “test” specification found in Appendix A.
7
Figure 2. Form building for VOCAL surveys
This methodology allowed DESE to try out as many items as possible without over-burdening
respondents; seventy-one items were tested in total. However, post administration, one grade 8
item was removed from consideration and not included in the validity analyses. The goal of this
third criterion was to ensure that the mean and standard deviation of the common items
approximate the mean and standard deviation of the whole set of items used. Common items
should represent the breadth of the school climate construct and approximate the average item
difficulty and variance of all 70 items (Engelhard, 2013). The 70 items had an average item
difficulty of 0.00 logits and a standard deviation of 0.75 logits; the corresponding average item
difficulty for the 9 common items was 0.07, with a standard deviation of 0.99.
A Likert scale with four response options was used to rate students’ perceptions of school
climate; coding for all items dictated that a response of “0” (untrue) indicated the lowest level of
school climate, with a “3” (always true) denoting the most positive school climate. Response
scoring categories “1” and “2” corresponded to mostly untrue and mostly true, respectively.
8
Note, seven bullying-behavior and three physical safety items were reverse-scored. The three
forms as they appeared for students are provided in Appendix B.
2.6. Form Linking and Anchoring Process
Each grade form was first calibrated separately to assess the invariance of the common items.
The Pearson product-moment correlation (henceforth Pearson) of the common item difficulties
was 0.9 or above for each paired comparison. Common item invariance allowed ESE to
concurrently calibrate all 70 items on to the same scale metric. Figure 3 illustrates this process.
Figure 3. Concurrent calibration process of grade 5, grade 8, and grade 10 surveys
Items ------------------------ |||||||||||||||||||||||| |||||||||||||||||||||||| G5 Form |||||||||||||||||||||||| |||||||||||||||||||||||| |||||||||||||||||||||||| |||||||||||||||||||||||| ------------------------ - - - - - - - -------------------------- | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| G8 Form | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| --------------------------------------------------- - - - - - - - - - -------------------- | | | | | | | | | |||||||||||||||||||| | | | | | | | | | |||||||||||||||||||| G10 Form | | | | | | | | | |||||||||||||||||||| | | | | | | | | | |||||||||||||||||||| 1Figure template taken from Linacre (2017)
Figure 4 graphically shows the relationship between the seven items common across the three
grade forms. The item difficulties of the two additional items linking the grade 8 and grade 10
forms were also highly correlated (data not shown).
9
Figure 4. Relationship of common items across grade forms
Anchoring Process. The seven items common to each form were simultaneously calibrated,
along with the 2 additional items common to the grade 8 and grade 10 forms. To reduce
positioning effects, common items were placed in the same fixed position on each of the three
forms. Once the common items were placed in their item slots, the remaining unique items were
randomly assigned positions on each form. This process allowed placement of all 70 pilot items
on to the same scale metric. The ensuing validity analyses (and review of the item-variable map
for the relative difficulty, ordering and spacing of items), revealed that 55 of the 70 items were
well-fitting and could be anchored. Anchoring of polytomous items requires fixing the items’
difficulty parameters, and fixing the item Andrich step threshold parameters of the rating scale
(Linacre, 2017). This anchoring process was applied to the 55 items of the VOCAL scale and
helps ensure the comparability of VOCAL scores when reported out to schools and districts.
10
2.7. Administration of Forms
In grades 5 and 8, the forms were administered as part of the Massachusetts Comprehensive
Administration System (MCAS) Science achievement test. In grade 10, the form was attached to
the mathematics MCAS test. The forms were paper-based and attached to the end of the last test
session of the science or mathematics assessments, respectively. Students marked their responses
in their student answer booklets.
2.8. Profile of Respondents
The sampling frame included students in grades 5, 8 and 10. Students who participated in
MCAS-Alternative were not included in the sampling frame, so a census was not feasible. In
addition, participation in the survey was optional for districts, schools and students. As a result,
73% of fifth graders, 70% of eighth graders, and 64% of tenth graders participated in the
surveys, respectively. Eighty-nine percent of districts and fifty-six percent of schools with
Massachusetts received VOCAL scores, respectively. When comparing the sample of students to
the state student profile, the profile of the sample is reasonably representative of the state at each
grade level. The sample profile with state comparison is provided in Table 2. Students with
disabilities (SWD), English Language Learners (ELL), and economically disadvantaged students
are under-represented in grade 8 and grade 10; black and Hispanic students are also under-
represented in grade 10.
11
Table 2
Participating students’ profile
Subgroup Grade 5 Grade 8 Grade 10 State
Female 49.3 49.7 50.2 48.7
Male 50.7 50.3 49.7 51.3
Asian 6.4 6.0 8.8 6.7
Black 9.0 8.4 6.2 8.9
Hispanic 20.7 17.7 16.2 19.4
Mixed-race 3.4 2.8 2.7 3.4
Native American 0.2 0.2 0.2 0.2
Pacific Islander 0.1 0.1 0.1 0.1
White 60.2 64.7 65.7 61.3
Students with disabilities 17.8 15.8 14.8 17.4
English Language Learners 8.3 6.5 4.9 9.5
Economically disadvantaged 30.8 26.1 25.0 30.2
3. Data Analyses Procedures
3.1. Rasch Methodology
Analyses using the Rasch measurement model (Rasch, 1960) and validity framework (Wolfe &
Smith, 2007a, 2007b) are the primary source of reliability and validity data for the VOCAL
survey measures. The Rasch model, which uses an exponential transformation to place ordinal
Likert responses on to an equal-interval logit scale, was used to analyze student responses.
Winsteps software developed by Linacre (2017) was used to perform Rating Scale model
12
analyses of the data (Andrich, 1978a, 1978b). Technical details explaining the Rasch model are
provided in Appendix C. In the Rasch framework, the scale metric axis represents the desirable
structural properties of a Rasch scale; it is: linear, unidimensional (measures only one construct),
hierarchical (items are ordered according to their difficulty to endorse) and measures a
continuum of items and persons. The evaluation criteria used to perform a Rasch-based
reliability and validity assessment for each construct validity aspect (content, substantive,
generalizability, structural and external) are summarized in the next section.
4. Validity Framework and Validity Evidence
4.1. Validity Framework
Messick’s (1980, 1995) unified concept of construct validity guided the validity analyses for the
school climate construct. Messick (1995, p. 741) defines validity as “an evaluative judgment of
the degree to which empirical evidence and theoretical rationales support the adequacy and
appropriateness of interpretations and actions on the basis of test scores or other modes of
assessment.” Evidence from six aspects of test validity (content, substantive, generalizability,
structural, external and consequential) combine to provide test developers with the justification
to claim that the meaning or interpretability of the test scores is trustworthy and appropriate for
the test’s intended use. More recently, Wolfe and Smith (2007a, 2007b, p. 205) used Messick’s
validity conceptualization to detail instrument development activities and evidence that are
needed to support the use of scores from instruments based on the Rasch measurement
framework. Table 3 outlines the specific validity aspects addressed in this technical report. This
report primarily focuses on internal validity with more limited external validity evidence
provided for the school climate construct. Section 6 explains each aspect of construct validity
outlined in Table 3.
13
Table 3
Rasch-Based Instrument Validity Framework and Evidence Collected for VOCAL survey1
Validity AspectContent Substantive Generalizability
Evidence
Instrument Purpose
Test Specification
Expert Reviews2
Item Technical Quality
Rating Scale Functioning
Item Difficulty Hierarchy
Differential Item Functioning
Person Separation Reliability
Item Invariance
Validity AspectStructural External Consequential2
Rasch Dimensionality Analyses
Responsiveness
Sub-scale correlations
Relationship between VOCAL scaled-scores with scores from similar/dissimilar constructs
Standard SettingScore Use
1 Based on: Messick (1995) and Wolfe and Smith (2007b) conceptualization and representation.2Standard setting was not a focus of this pilot study.
5. Validity Evidence for VOCAL scale and sub-scales
The majority of this report is dedicated to the validity evidence needed to support VOCAL score
use. DESE will present data for five aspects of construct validity: content, substantive,
generalizability, structural and external. Consequential validity was beyond the scope of this
pilot administration. Appendix D provides a guide to the validity criteria used in this study for
each aspect of construct validity.
14
5.1. Content Validity
Content validity examines the “content relevance, representativeness and technical quality”
(Messick, 1995, p.745) of the items used as indicators of the construct. Stakeholder engagement
activities (Figure 1) ensured that the items were relevant and representative and, more
importantly, had the potential to provide schools with diagnostic and actionable information. The
content validity evidence reported here predominantly focuses on the technical quality of the
VOCAL survey items. Item technical quality was assessed using point-to-measure (PTM)
correlations and item fit statistics. The PTM correlations and item fit statistics are shown in
Appendix E. Fit statistics above 1.5 indicate that the items may not measure the construct of
interest; these items have additional source(s) of variance and can degrade measurement
(Appendix D). Twelve of the seventy pilot items had outfit Mean Square Errors (MNSQE) of
greater than 1.5. Only five of the twelve misfitting items, however, had PTMs below 0.3 which
suggests these five items are poorly related to the school climate construct.
In terms of content, nine of the twelve misfitting items were from the safety dimension and seven
of the nine required reverse scoring (six of these, in turn, were related to bullying behaviors).
The bullying-behaviors items were also structured differently from other items within the
surveys, which may have contributed to their separation from the primary dimension (e.g., I have
been threatened by other students more than once on social media.). Conceptually, however, the
presence or absence of bullying-behaviors is integral to accurately measuring students’
perceptions of safety within the school environment and these items were retained for further
validity analyses. The remaining fifty-eight items fit the model well, with outfit MNSQE ranging
from 0.72–1.49 and PTMs ranging from 0.32 to 0.64. Fifty-five of the fifty-eight items were used
to anchor the scale.
15
5.2. Substantive Validity
Substantive validity assesses whether the responses to the items are consistent with the
theoretical framework used to develop the items. Two pieces of evidence support the substantive
validity aspect of construct validity: these are (1) rating scale use (Figure 5) and (2) item
difficulty hierarchy (Figure 6). For each threshold of the rating scale, the mean square error fit
statistics should be between 0.7 and 1.3 and, on a four-point scale, the distance between
thresholds should be at least 0.8 logits (Appendix D; Wolfe & Smith, 2007b). The rating scale
for the 70-item VOCAL functioned relatively well, with adjacent category thresholds of near or
greater than 1.0 logit apart. Except for the little used score category of zero (never true), the
category threshold fit statistics have excellent MNSQE (Figure 5).
A qualitative assessment of how well the item hierarchy corresponds to the instrument
developer’s a priori theoretical expectations provides substantive validity evidence. The overall
item hierarchy across the scale met our a priori expectations in terms of relative difficulty of
each dimension (Figure 6) and in terms of individual items within each dimension. The ordered
pattern of item difficulties also conforms to best test design principles.
Figure 6 shows that items of each dimension span the breadth of the continuum with items
measuring relatively weak school climates to relatively strong school climates. In addition, items
from each dimension overlap as you move from low to high on the continuum. The ordered
pattern of relative item difficulty indicates that safety items (particularly the reverse scored
bullying-behavior and physical safety items) were very easy to endorse compared to items from
the other dimensions. Students feel relatively safe in school, with a comparatively low level of
bullying reported. Safety is a harbinger of a positive school environment and it was expected that
these items would be among the easiest to endorse. Similarly, within the Engagement dimension,
16
items related to student-on-student relationships were, as expected, harder for students to affirm
than items related to teacher-on-student relationships. This is consistent with past research
(Thomas, 2004; Peoples, O’Dwyer, Wang, Brown, & Rosca, 2014).
Figure 5. Rating scale structure for VOCAL instrument
SUMMARY OF CATEGORY STRUCTURE. Model="R"-------------------------------------------------------------------|CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY||LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||THRESHOLD| MEASURE||-------------------+------------+------------++---------+--------|| 0 0 238047 6| -.41 -.72| 1.42 1.91|| NONE A |( -2.65)| 0| 1 1 575328 14| .20 .17| 1.02 1.07|| -1.32A| -.94 | 1| 2 2 1588387 39| 1.09 1.24| .95 .85|| -.44A| .77 | 2| 3 3 1690182 41| 2.29 2.20| 1.08 1.05|| 1.76A|( 2.94)| 3|-------------------+------------+------------++---------+--------||MISSING 6325736 61| 1.51 | || | |-------------------------------------------------------------------
CATEGORY PROBABILITIES: MODES P -+---------+---------+---------+---------+---------+---------+-R 1.0 + +O | |B | |A |0 |B .8 + 000 +I | 00 333|L | 00 33 |I | 00 33 |T .6 + 00 222222 33 +Y | 00 222 2222 33 | .5 + 0 222 22233 +O | 00 1111 22 3322 |F .4 + 111*1 11**1 3 222 + | 111 00 2 11 33 22 |R | 111 0022 111 33 222 |E | 111 2200 11 33 22 |S .2 + 111 22 00 1**3 2+P |11 222 00 333 111 |O | 2222 00*33 1111 |N | 222222 3333333 000000 11111111 |S .0 +*****333333333333333333 0000000000000000********+E -+---------+---------+---------+---------+---------+---------+- -3 -2 -1 0 1 2 3 PERSON [MINUS] ITEM MEASURE
Items belonging to the three domains of the Environment dimension were relatively harder to
endorse, especially items related to student autonomy and students taking responsibility for their
actions. This ordering of environment items was expected; past classroom environment research
17
has shown that student autonomy is hard to engender within classrooms and schools, but
important to engaging students (Hafen et al., 2012; Peoples, Abbott, & Flanagan, 2015a, 2015b).
Table 4 below is a specific example of the item hierarchy from the discipline environment
domain. Foundational to a positive discipline environment is the perception that school staff are
fair, supportive and consistent in applying school rules related to discipline. Items such as,
“School rules are fair for all students” that measure this basic environment were, as expected,
easier for students to endorse than items that provide students with a voice in school rules
(e.g., “Student have voice in deciding school rules.”) or holding them responsible for their
actions (e.g., “Teachers give students a chance to explain their behavior when they do something
wrong.”) This ordered pattern of discipline item difficulties confirms developers’ a priori
expectations thereby supporting the domain’s substantive validity.
Table 4.
Item hierarchy of Discipline Environment items
Item code
Grade Administered
Item Prompt Item Difficulty (logits)
DIS_1 5, 8, 10 Students have a voice in deciding school rules 2.20
DIS_7 10 Teachers give students a chance to explain their behavior when they do something wrong
1.38
DIS_5 8 In school, students learn how to control their behavior.
0.80
DIS_6 10 The consequence for inappropriate behavior are enforced fairly
0.46
DIS_2 5 School rules are fair for all students 0.23
DIS_4 8 School staff are consistent when enforcing rules in school
-0.23
DIS_3 5 Adults at my school (for example, my school nurse, my teachers, or my principal) talk with students to help us know how to behave well.
-0.67
19
Appendix F provides the hierarchy for all 70 items; item prompts broken out by dimension are
provided in Appendix G. The well-functioning rating scale combined with the theoretically
grounded 70-item item hierarchy provides the evidence needed to support the substantive
validity aspect of the school climate construct.
5.3. Generalizability
A measure is considered generalizable when the score meaning and properties function similarly
across multiple contexts (e.g., stakeholder groups, forms) or time points. Reliability analyses and
differential item functioning (DIF) analyses are used to assess the generalizability of the
measures. Similar to Cronbach’s alpha, person separation reliability (PSR) looks at the stability
(internal consistency) of the measures across the forms (Schumacker and Smith, 2007) and
scoring structures. The reliability indices depict the ratio of true variance to observed variance; in
the Rasch model, the internal consistency reliability coefficient, the person separation reliability,
measures the ratio of the variance in latent person measures to the estimated person measures.
Standard errors are estimated for each person and each item and are used to provide an estimate
of error variance (Schumacker and Smith, 2007). DESE used DIF analyses to empirically test for
item invariance across several subgroups; item invariance ensures comparability of score
interpretation.
5.3.1. Reliability evidence: The mean difficulty of the 70-item scale was +1.33 logits with a
standard deviation of 1.16 logits (Appendix H). The items are reasonably well targeted for the
student distribution resulting in a real person separation reliability (PSR) of 0.88, and a person
separation index of 2.7 (Figure 6; Appendix H). Best test design principles (Wright, 1979) call
20
for alignment of the mean of the item distribution to the mean of the person distribution. Notable
in Figure 6 is the relative rarity of bullying behaviors when compared to other indicators
assessed; these off-target items likely contributed to the misalignment of the person and item
distributions. However, theoretically, bullying is a critical facet in determining students’
perceptions of the safety and supportive nature of schools; these items were retained for
reporting out scores. The real person separation reliabilities ranged from 0.86 for the 24-item
grade 5 form and the 29-item grade 10 form to 0.88 for the 33-item grade 8 form (Table 5). The
replication of reliabilities across forms provides evidence for the reproducibility and stability of
the school climate construct. Reliabilities above 0.8 are acceptable for the current use of the
surveys (Appendix D), namely to provide schools and districts with formative data to use for
continuous improvement. New items will be tried out in the 2018 VOCAL administration with
the goal of improving the reliability of each grade-level survey.
The bottom of Table 5 shows the reliability of each sub-scale across the three grades. The real
person separation reliability of the engagement, safety, and environment scores was 0.69, 0.68,
and 0.76, respectively. These reliabilities are likely attenuated due to the design of the test forms
(Schwartz, Ayers, and Wilson, 2017). Students across the three grades only responded to a small
sub-set of items for each dimension (the common items). Students from each grade largely
responded to set of unique items thereby creating a large amount of “missing data” when the
three grades’ data were combined to assess the reliability of each dimension. As a result, the true
reliabilities of the dimension scores are underestimated (Schwartz, Ayers, and Wilson, 2017).
School-level reliability. The unit of interest for school climate is not the student, but the
school. In reporting out school climate scores to schools, it is important to ensure that schools
receive reliable data. For a school to receive a report, 10 or more students had to participate in
21
the survey and the reliability of each index score had to be above or equal to 0.7. Figure 7 shows
the distribution of the overall school climate index reliabilities across schools within the sample.
The average reliability of the 1,365 schools in the sample was 0.78 and ranged from 0 to 0.96.
Most schools with reliabilities below 0.7 had only one or very few students respond to the
surveys. Because some schools are kindergarten (K) through grade eight (G8) schools or K–12
schools and contain multiple survey grades, there were over 1,600 potential reports. However,
given DESE’s minimum reporting criteria, 1,345 schools received reports; of these, only 545
were provided a full complement of index scores (an overall school climate index score, an
engagement index score, a safety index score, and an environment index score).
Figure 7. Distribution of overall school-level school climate score reliabilities
22
5.3.2. Differential Item Functioning (DIF) Analyses: To support the claim that the school
climate instrument is generalizable, the items should have the same meaning for different
subgroups of respondents (e.g., gender, ELL) i.e., respondents of the same ability (endorsement
level), should have the same probability of affirming an item irrespective of the subgroup they
belong to. The item deltas did not differ significantly (over 90% of items differed by less than
0.3 logits) across the following subgroups: gender, homelessness, and economically
disadvantaged. Two items exhibited DIF when comparing students with disabilities to students
without disabilities (CLC4 and PSF5, both administered in G10).
DIF was most present for English language learners with seven items having DIF of greater than
1 logit. Six of these seven items (BUL10, BUL11, EMO6, PSF5, DIS7, INS70) were on the
grade 10 form. A further five items exhibited mild to moderate DIF (0.5 – 0.67) between ELL
and non-ELL students; these items (PAR1, PSF2 BUL7, DIS1, CLC4) cut across grades.
DESE’s surveys were not translated for English learners, so the DIF evident most likely resulted
from language difficulties in reading the items administered. Six of the seven items that
displayed severe DIF across ELL groups also exhibited DIF across some race groups (BUL6,
BUL7, BULL11, PSF2, CLC4, EMO6); only two of these items, BUL6 and BUL11 displayed
severe DIF. The remaining items exhibited mild to moderate DIF. Language barriers likely
explain the DIF present across certain race/ethnicity subgroups with students unable to properly
access the survey content. Figure 8 and Figure 9 show DIF plots for gender and race status,
respectively. DIF plots for the remaining subgroup comparisons are found in Appendix I. Note,
when estimating ELL subgroup school climate scores, the six items with moderate to severe DIF
were not included in the calibration. Similarly, when estimating race subgroup school climate
23
Figure 8. Differential item function plot by gender
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
ENGC
LC1
ENGC
LC2
ENGC
LC3
ENGC
LC4
ENGP
AR1
ENGP
AR2
ENGP
AR3
ENGR
EL1
ENGR
EL2
ENGR
EL3
ENGR
EL4
ENGR
EL5
ENGR
EL6
ENGR
EL7
ENGR
EL8
ENGR
EL9
ENGR
EL10
ENGR
EL11
ENGR
EL12
ENGR
EL13
ENGR
EL14
SAFB
UL1
SAFB
UL2
SAFB
UL3
SAFB
UL4
SAFB
UL5
SAFB
UL6
SAFB
UL7
SAFB
UL9
SAFB
UL10
SAFB
UL11
SAFE
MO1
SAFE
MO2
SAFE
MO3
SAFE
MO4
SAFE
MO5
SAFE
MO6
SAFE
MO7
SAFP
SF1
SAFP
SF2
SAFP
SF3
SAFP
SF4
SAFP
SF5
SAFP
SF6
ENVD
IS1EN
VDIS2
ENVD
IS3EN
VDIS4
ENVD
IS5EN
VDIS6
ENVD
IS7EN
VINS
1EN
VINS
2EN
VINS
3EN
VINS
4EN
VINS
5EN
VINS
6EN
VINS
7EN
VINS
8EN
VINS
9EN
VINS
10EN
VINS
11EN
VINS
12EN
VINS
13EN
VMEN
1EN
VMEN
2EN
VMEN
3EN
VMEN
4EN
VMEN
5
DIF
Mea
sure
(diff
.)
ITEM
DIF plot (DIF=@GENDER)Female Male
24
Figure 9. Differential item function plot by race/ethnicity
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
ENGC
LC1
ENGC
LC2
ENGC
LC3
ENGC
LC4
ENGP
AR1
ENGP
AR2
ENGP
AR3
ENGR
EL1
ENGR
EL2
ENGR
EL3
ENGR
EL4
ENGR
EL5
ENGR
EL6
ENGR
EL7
ENGR
EL8
ENGR
EL9
ENGR
EL10
ENGR
EL11
ENGR
EL12
ENGR
EL13
ENGR
EL14
SAFB
UL1
SAFB
UL2
SAFB
UL3
SAFB
UL4
SAFB
UL5
SAFB
UL6
SAFB
UL7
SAFB
UL9
SAFB
UL10
SAFB
UL11
SAFE
MO1
SAFE
MO2
SAFE
MO3
SAFE
MO4
SAFE
MO5
SAFE
MO6
SAFE
MO7
SAFP
SF1
SAFP
SF2
SAFP
SF3
SAFP
SF4
SAFP
SF5
SAFP
SF6
ENVD
IS1EN
VDIS2
ENVD
IS3EN
VDIS4
ENVD
IS5EN
VDIS6
ENVD
IS7EN
VINS
1EN
VINS
2EN
VINS
3EN
VINS
4EN
VINS
5EN
VINS
6EN
VINS
7EN
VINS
8EN
VINS
9EN
VINS
10EN
VINS
11EN
VINS
12EN
VINS
13EN
VMEN
1EN
VMEN
2EN
VMEN
3EN
VMEN
4EN
VMEN
5
DIF
Mea
sure
(diff
.)
ITEM
DIF plot (DIF=@RACE)Asian African American Hispanic White
25
scores, the items with moderate to severe DIF were removed from the calibration. Because no
DIF was apparent in other subgroup comparisons (gender, economically disadvantaged,
homelessness, student with disabilities), these items were retained when reporting out these sub-
group scores. DIF items will be revised or removed for the 2018 VOCAL surveys.
5.4. Structural Validity
Structural validity evaluates the alignment of the scoring structure to the hypothesized structure
of the construct. The fundamental assumption of the Rasch model is that it is used to measure
one latent construct (in this study, the school climate construct). If the data meet this
requirement, the measures are linear, invariant and additive; equal differences on the scale
translate into equal differences in the probability of endorsing an item no matter where on the
scale an item is located. In this validity study, the unidimensionality of the data was assessed by
conducting (1) an analysis of the standardized residuals, (2) correlational analyses of the freely
calibrated dimensions, and (3) an assessment of additional dimensionality data provided by the
Rasch Winsteps software (Linacre, 2017).
5.4.1. Residuals Analyses. If the data fit the model and the variance in responses is explained
by one latent trait (school climate construct), the unexplained or residual variance should be
random (i.e., there is no relationship among the residuals). Results from a principal component
analyses of the residuals (Smith, 2002) using Linacre’s criteria (2017; Appendix D) for
unidimensionality found that the variance explained by the 70-item measure was 40.3% with no
substantial second dimension evident (Table 6). The first contrast’s residual variance was less
26
than 5% of the total item variance. The variance explained by the items of the first dimension
(school construct) was 8 times the variance explained by the first contrast (residual), twice the
criterion of 4 times recommended by Linacre (2017). Two of the three items in the first contrast
were related.to social media and cyber-bullying (BUL5 and BUL7); the loadings on the first
contrast were 0.5 or less. Cyber-bullying is a central facet of feeling safe in school making it
important to measure and the items were retained. The residual analyses results are shown in
Table 6 and Appendix J. These results were replicated when each grade-level form was analyzed
separately (data not shown).
Table 6
Residual analyses of VOCAL data (Grades 5, 8, and 10 combined)
Variance Component Eigenvalue Observed (%)
Total raw variance in observations 117.2 100.0
Raw variance explained by measures 47.2 40.3
Raw variance explained by persons 26.5 22.6
Raw Variance explained by items 20.7 17.7
Unexplained variance in 1st contrast 2.6 2.2
Item variance to 1st contrast multiple 8x
5.4.2. Sub-scale correlations. DESE evaluated the Pearson correlation between subscale scores
for the three freely calibrated dimensions of school climate (engagement, safety, and
environment). The correlations should be positive and of sufficient magnitude (greater than 0.5
but less than 0.9) to indicate that the three sub-scales are measuring distinct but related
dimensions of the school climate construct.
27
Table 7 illustrates that subscale correlations range from 0.66 (safety and environment) to 0.74
(engagement and environment). This magnitude and pattern of correlations was also evident
when examined for each grade separately (data not shown). The lowest correlation (0.58) was
between safety and environment scores in grade 10, with the highest correlation (0.76) between
engagement and safety scores in grade 8. The overarching unifying construct of school climate
explains the moderate-to-strong relationship between the three dimensions highlighted in Table
7. After accounting for measurement error, the sub-scale correlations are close to one.
Table 7
Pearson correlation between three dimensions of the school climate construct
Scale
Overall(N = 148,824)
Engagement(N = 148,338)
Safety(N = 148,380)
Environment(N = 148,364)
Overall 1 --- --- ---
Engagement 0.89 1 0.99 0.99
Safety 0.87 0.70 1 0.89
Environment 0.91 0.74 0.66 1
1Pearson correlations observed are shown below the diagonal; Disattenuated correlations are shown above the diagonal.
Overall, the evidence from residual analyses and the subscale correlational analyses supports the
structural validity aspect of the school climate construct. The one dimension extracted by the
Rasch model meets the unidimensionality assumption of the Rasch model thereby supporting the
use of scores for the intended purpose. The signal-to-noise ratio of the subscales and, more
importantly, the theoretical conceptual framework supports the reporting of subscale scores.
Replication of the results of the residual analyses and subscale correlations across the three
28
grade-level forms provides further evidence supporting the internal structure of the school
climate construct, that is, it is made up of three dimensions (engagement, safety and
environment) whose relationship to each other is explained by the overarching school climate
construct.
5.5. External Validity
This aspect of construct validity relates to the responsiveness of an instrument and the
relationship of its scores to the scores of external measures (criterion validity). The
responsiveness of an instrument refers to “the degree to which an instrument is capable of
detecting changes in person measures following an intervention that is assumed to impact the
target construct” (Wolfe & Smith, 2007b, p. 222). If an instrument is responsive, it can be
applied appropriately to measure expected group differences or individual/group change. The
first section (6.5.1) examines the instrument’s responsiveness at the student-level; the second
section (6.5.2) assesses responsiveness at the school-level and its impact on reportable scores.
Criterion validity is the strongest form of external validity; it determines how well scores from
an instrument predict future scores on a criterion measure (e.g., how well do school climate
scores predict achievement). There are two forms of criterion validity, namely concurrent and
predictive. This section reports data to support the concurrent criterion validity of the VOCAL
survey scores. Because the unit of interest is the school, the external validity analyses focus on
examining the relationship between school-level aggregate school climate scores and school-
level aggregate scores of the following criterion: student achievement, attendance, chronic
29
absence, discipline rates, suspension rates, and retention rates. Concurrent criterion validity is
discussed in section 6.5.3.
5.5.1. Student-level Responsiveness. The responsiveness of an instrument is measured by the
person strata index, H, which provides the number of statistically distinct ability or endorsement
groups whose centers of score distributions are separated by at least three standard errors of
measurement within the sample. According to the formula, H = (4G +1)/3, determined by Wright
and Masters (2002, p. 888) and with a person separation index, (PSI; G) of 2.7 (Table 5), the
number of person strata for the 70-item VOCAL instrument is equivalent to almost 3.9 distinct
person strata. The number of person strata ranged from 3.6 in grade 5 to 4.0 in grade 8. These
results provide evidence that the VOCAL instrument produces reliable, reproducible measures
which are responsive (the instrument can divide the sample into three to four statistically distinct
score groups).
5.5.2. School-level Responsiveness and Score Reporting. The greater the number of person
strata at the individual-level, the more likely the instrument will be able to meaningfully
differentiate schools. At the school-level, the average scaled score was 1.32 logits with a
standard deviation of 0.85 logits (Table 8). After removing schools whose data did not meet our
minimum reporting requirements (N of 10 and school-level reliability of at least 0.7), reportable
school measures ranged from -0.03 logits to 3.02 logits indicating variability in school-level
scores. The relative high degree of responsiveness of the instrument at the student-level appears
to pick up the variation within and between schools.
30
Table 8
Variability of reportable school-level VOCAL scores
Number of StudentsPerson Separation Reliability (PSR)1 Mean ±SD2
Weakest school 1 11 0.83 -0.03 ± 0.64Weakest school 2 21 0.95 0.12 ± 1.38Weakest school 3 98 0.92 0.16 ± 0.94Average school -- ----- 1.32 ± 0.85Strongest school 3 40 0.76 2.96 ± 1.38Strongest school 2 36 0.77 3.00 ± 1.34Strongest school 1 50 0.79 3.02 ± 1.46
1A PSR of 0.7, and an N of 10 or more students was set as the minimum reporting requirements 2SD: Standard Deviation;
Score reporting. Logit scores are confusing to educators so DESE linearly transformed
them to make them more interpretable. The logit measures were standardized and transformed at
the student level to have a mean of 50 and a standard deviation of 20 (see Appendix K for details
on how scores were transformed). The individual scores were aggregated up to the school level;
aggregate school-level scores were then truncated and placed on a scale of 1 – 99 (± 2.5 standard
deviations) with a mean of 50.05 and a standard deviation of 12.8. To help schools interpret their
data, schools were separated into three “performance” levels: schools with relatively weak
school climates had scores that fell 1 or more standard deviations below the mean; schools with
relatively strong school climates had scores that fell 1 or more standard deviations above the
mean. Based on the medium student within these three “performance” groups, a profile or picture
of the school climate was constructed using the item threshold file in Winsteps (Table 9).
Twenty-two percent of the schools with reportable data fell either within the top or bottom
“performance” level with the vast majority of schools with reliable data falling within the
Table 9
31
Massachusetts School Climate Profile
StrongerSchools whose average index score is greater than or equal to one standard deviation above the mean (≥63 points; ~12% of schools).
The average student within these schools responds “always true” to a majority of items and “mostly true” to the remaining items.
1. Student-on-student interactions are mostly respectful, caring, and collaborative within the classroom. Students have a say in school rules and perceive school rules as fair and consistently enforced.
2. Adults actively address safety issues. Students feel safe with few, if any, bullying behaviors reported.
3. Teacher/adult-on-student relationships are respectful, caring, and inclusive. For the most part, adults encourage student autonomy and feedback. Adults/teachers promote responsibility and teach positive behaviors.
4. Support systems are accessible, and teachers/adults actively engage with students to help them emotionally. Most students feel comfortable seeking help.
5. The classroom is a safe and supportive learning environment. Teachers encourage effort, and set high academic expectations. Teachers actively promote and support individual students’ academic success.
6. Students report a strong sense of belonging to the school.AverageSchools whose average index score is between one standard deviation below and one standard deviation above the mean (38 to 62 points; ~78% of all schools).
The average student within these schools responds “mostly true” to a majority of items and “always true” to the remaining items.
1. Student-on-student interactions are mostly respectful and caring, and generally collaborative within the classroom. Students have little say in school rules but perceive school rules as mostly fair and consistently enforced.
2. Adults actively address safety issues. Students feel safe, with few bullying behaviors reported.
3. Teacher/adult-on-student relationships are caring, and mostly respectful and inclusive. For the most part, adults encourage student autonomy and feedback. Adults/teachers promote responsibility and teach respectful behavior. To a lesser degree, adults teach students behavior management.
4. Support systems are available, and teachers/adults engage with students to help them emotionally. However, not all students feel comfortable seeking help.
5. The classroom is a relatively safe and supportive learning environment. Teachers encourage effort, and set high academic expectations. Teachers promote and support individual students’ academic success.
6. Students report a moderately strong sense of belonging to the school.WeakerSchools whose average index score is equal to or less than one standard deviation below the mean (≤37 points; ~10% of schools)
The average student within these schools responds “mostly untrue” and “mostly true” to all but two items, where the average student responded “untrue”.
1. Student-on-student interactions generally lack respect, with students offering limited mutual emotional or academic support. Students have no say in school rules and perceive school rules as relatively unfair and somewhat inconsistently enforced.
2. Adults address safety issues. Students feel safe though some bullying behaviors are reported.
3. Teacher/adult-on-student relationships are somewhat caring, respectful and inclusive. For the most part, adults do not encourage student autonomy or feedback. Promotion of student responsibility and teaching of positive behaviors is relatively low.
4. Support systems are available, but adults do not, for the most part, help students emotionally. Students generally do not feel comfortable seeking help.
5. The classroom is a somewhat safe and supportive learning environment. Teachers mostly encourage effort and have high expectations. Teachers generally encourage and support individual students’ academic success.
6. Students’ report a relatively weak sense of belonging to the school.
32
“average” category. The VOCAL survey meaningfully differentiated schools both quantitatively
and qualitatively. This profile tool was designed to help schools assess their climates. For
schools that fall within the “weaker” category, the profile provides them with a path and the
information needed to improve. For example, students in schools with relatively weak school
climates report that students are not respectful or caring; in contrast students within schools with
relatively strong school climates report that student-on-student relationships are respectful,
caring and collaborative.
Concurrent Validity. Preliminary evidence of concurrent validity at the school level indicates a
positive relationship between students’ VOCAL scaled scores and their Massachusetts
Comprehensive Assessment (MCAS) English Language Arts (ELA) and mathematics
achievement (Table 10) with Pearson correlations of 0.41 and 0.42, respectively. VOCAL scores
were also positively related to students’ growth scores in ELA (0.17) and mathematics (0.26),
although correlations were of a smaller magnitude when compared to static achievement scores.
VOCAL scores are also associated with other school-level indicators, namely, attendance rate
(0.25), chronic absence rate (-0.34), disciplinary rate (-0.48), in-school suspension rate (-0.29),
out-of-school suspension rate (-0.34), retention rate (-0.19), and graduation (0.17) and dropout
rates (-0.26); these data are summarized in Table 11.
The relationship with achievement and other indicators is similar in magnitude to what has been
reported previously for non-cognitive indicators (Peoples, 2016; Hough, Kalogrides, and Loeb,
2017; Peoples, Flanagan, and Foster, 2017), and are in the expected direction for all indicators.
These patterns of associations was replicated across the three grades providing initial evidence of
external validity (Table 10 and Table 11).
33
Table 10
Correlations of 2017 achievement scores and overall VOCAL scores, by school level1,2
Grade All Schools(N = 1,137)
Grade 5 (N = 667)9
Grade 8 (N = 394)10
Grade 10 (N = 294)11
English Language Arts scaled score 0.41 0.42 0.26 0.15
English Language Arts student growth percentile
0.17 0.23 0.08 0.13
Mathematics scaled score 0.42 0.45 0.25 0.16
Mathematics student growth percentile 0.26 0.24 0.18 0.12
1Data based on schools with >= to 10 students contributing to both the aggregate achievement and aggregate VOCAL score, and had a school-level VOCAL reliability of 0.7; 2Grade 5 and grade 8 MCAS tests reflect DESE’s new generation assessments; the grade 10 test is based on the old legacy tests; 9All grade 5 correlations are statistically significant (p< 0.01); 10All grade 8 correlations are statistically significant (p < 0.01) with the exception of eSGP; 11All grade 10 correlations are statistically significant (p < 0.05) with the exception of eSGP and mSGP.
Overall the external validity evidence supports the conclusion that the school climate survey is
responsive (at the individual-level and school level) and should be able to measure change on the
variable. Although the pattern of correlations provide preliminary evidence to support VOCAL’s
external validity, the correlational cross-sectional data do not support the interpretation that more
positive school climates lead to (cause) improved student achievement. In addition, these simple
correlations do not account for the nested nature of educational data with students nested within
schools, which are, in turn, nested within districts. Future validity work will focus on providing
external validity evidence using hierarchical linear models that take into account the nested
structure of education data.
34
Table 11
Correlations of 2017 school-level indicators and VOCAL scores, by school level1
GradeAll Schools(N = 1,137)
Grade 5 (N = 667) 10
Grade 8 (N = 394) 11
Grade 10 (N = 294)12
Attendance rate2 0.25 0.31 0.23 0.13
Chronically absent3
(10% or more) -0.34 -0.33 -0.26 -0.17
Discipline rate4 -0.48 -0.38 -0.35 -0.32
In-school suspension (ISS) 5 -0.29 -0.12 -0.16 -0.19
Out-school suspension (OSS) 6 -0.34 -0.33 -0.37 -0.35
Retention rate7 -0.19 -0.21 -0.13 -0.04
Graduation rate8 NA NA NA 0.17
Drop-out rate9 NA NA NA -0.261Data based on schools with greater than or equal to 10 students contributing to both the aggregate achievement and aggregate VOCAL score, and had a school-level reliability of 0.7 for VOCAL scores; 2Attendance rate: Indicates the average percentage of days in attendance for students enrolled in grades PK–12; 3Chronically absent (10% or more): The percentage of students who were absent 10% or more of their total number of student days of membership in a school. 4Discipline rate: the number of disciplinary incidents divided by school enrollment; 5In-School Suspension Rate: The percentage of enrolled students in grades 1–SP who received one or more in-school suspensions. 6Out-of-School Suspension Rate: The percentage of enrolled students in grades 1–SP who received one or more out-of-school suspensions; 7Retention Rate: The percentage of enrolled students in grades 1–12 who were repeating the grade in which they were enrolled the previous year; 8Graduation rate: The percentage of students who enroll in high school and graduate within 4 years, N = 268; 9Drop-out rate: The percentage of students in grades 9-12 who dropped out of school between July 1 and June 30 prior to the listed year and who did not return to school by the following October 1, N = 268. ; 10All grade 5 correlations are statistically significant (p< 0.01) with the exception of ISS; 11All grade 8 correlations are statistically significant (p< 0.01); 12All grade 10 correlations are statistically significant (p< 0.05), with the exception of attendance rate and retention rate.
Conclusion
The purpose of this research was to use Rasch theory and its validity framework to develop and
pilot an instrument for measuring students’ perceptions of school climate on a large scale. The
psychometric properties of the VOCAL instrument, for the most part, met the assumptions of the
Rasch-model, namely the items are well-fitting, invariant; and form a unidimensional scale. Most
35
importantly, the scale proved reasonably reliable and responsive. With forthcoming
improvements to the instrument (revising behavioral bullying items, increasing the number of
items in each dimension, expanding construct representation), the VOCAL measure shows
promise in providing schools with reliable data, which they can use for continuous improvement
purposes.
36
References
Andrich, D. (1978a). Application of psychometric rating model to ordered categories which are
scored with successive integers. Applied Psychological Measurement, 2 (4), 581-594.
Andrich, D. (1978b). Rating formulation for ordered response categories. Psychometrika, 43 (4),
561-573.
Boone, W. J., and Scantlebury, K. (2006). The role of Rasch analysis when conducting science
education research utilizing multiple-choice tests. Science Education, 90, 253-269.
Boone, W. J., Townsend, J. S., and Staver, J. (2011). Using Rasch theory to guide the practice of
survey development and survey data analysis in science education and to inform science
reform efforts: An exemplar utilizing STEBI self-efficacy data. Science Education, 95,
258-280.
Boone, W. J., Staver, J. R., and Yale, M. S. (2014). Rasch analysis in the human sciences, New
York: Springer.
Berkowitz, R., Moore, H., Astor, R.A., & Benbenishty, R. (2017). A research synthesis of the
associations between socioeconomic background, inequality, school climate an academic
achievement. Review of Educational Research, 87 (2), 425–469.
Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American
Psychologist 32 (7), 513–531.
Engelhard, G. (2013). Invariant measurement: Using Rasch models in the social, behavioral and
health sciences. Routledge Taylor & Francis Group, New York, New York.
Gable, R.K., Ludlow, L.H. and Wolf, M.B. (1990). The Use of Classical and Rasch Latent Trait
Models to Enhance the Validity of Affective Measures. Educational and Psychological
Measurement, 50 (4), 869-878.
37
Hambleton, R. K. & Jones, R. W. (1993). Comparison of classical test theory and item response
theory and their applications to test development. Educational Measurement: Issues and
Practice, Fall, 38-47.
Hafen, C.A., Allen, J. P., Mikami, A. Y., Gregory, A., Hamre, B. & Pianta, R. C. (2012). The
pivotal role of adolescent autonomy in secondary school classrooms. Journal of Youth
Adolescence, 41 (3), 245-255.
Hough, H., Kalogrides, D., & Loeb, S. (2017). Using student surveys of students’ social and
emotional learning and school climate for accountability and continuous improvement.
Policy Analysis for California Education, downloaded from http:/edpolicyinca.org.
Linacre, J. M. (2017). A user’s guide to Winsteps, Ministep Rasch-model computer programs:
program manual 4.0.0, Chicago, US: MESA Press.
Ludlow, L. H. & Haley, S. M. (1995). Rasch model logits: Interpretation, use and
transformation. Educational and Psychological Measurement, 55 (6), 967-975.
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012–
1027.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’
responses and performances as scientific inquiry into score meaning. American
Psychologist, 50 (9), 741–749.
Peoples, S. M., O’Dwyer, L. M., Wang, Y., Brown, J. & Rosca, C. V. (2014) Development and
Application of the Elementary School Science Classroom Environment Scale (ESSCES):
Measuring Student Perceptions of Constructivism within the Science Classroom,
Learning Environments Research Journal, 17, (1), 49-73.
Peoples, S.M., Abbott, C., and Flanagan, K. (2015a). Developing student feedback surveys for
educator evaluation: Combining stakeholder engagement and psychometric analyses in their
38
development. Paper presented to the April, 2015 annual meeting of the American Educational
Research Association, Chicago, IL, US.
Peoples, S.M., Abbott, C., and Flanagan, K. (2015b). Developing student feedback surveys for
educator evaluation: Validating student feedback surveys for educator evaluation using
Rasch survey development tools and the Rasch construct validity framework. Paper presented
at the April, 2015 annual meeting of the American Educational Research Association,
Chicago, IL, US.
Peoples, S. (2016). College and Career Readiness Mathematical Practice Scale CCRMS:
Assessing middle and high school students’ mathematics self-efficacy. Paper presented at
American Educational Research Association Conference, Washington, DC, 2016, District
of Columbia.
Peoples, S., Flanagan, K., & Foster, B. (2017). Measuring students’ college and career
readiness in English Language Arts using a Rasch-based self-efficacy scale. Paper
presented at American Educational Research Association Conference, San Antonio,
Texas, 2017.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen:
Danish Institute for Educational Research. (Expanded edition, 1980. Chicago: University
of Chicago Press).
Smith, E. V. Jr. (2000). Metric Development and Score Reporting in Rasch Measurement.
Journal of Applied Measurement, 1(3), 303-326.
Smith, E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit
statistics and principal component analysis of residuals. Journal of Applied
Measurement, 3, 205-231.
39
Schumacker, R. E. & Smith, E. V. (2007). Reliability: A Rasch perspective. Educational and
Psychological Measurement, 67 (3), 394-409.
Schwartz, R., Ayers, E., & Wilson, M. (2017). Mapping a data modeling and statistical reasoning
learning progression using unidimensional and multidimensional item response models.
Journal of Applied Measurement, 18(3), 268–298.
Sinnema, C. E. L. and Ludlow, L. H. (2013). A Rasch approach to the measurement of
responsive curriculum practice in the context of curricula reform. The International
Journal of Educational and Psychological Assessment, 12 (2), 33-55.
Thapa, A., Cohen, J., Guffey, S., & Higgins-D’Alessandro, A. (2013). A review of school
climate research, Review of Educational Research, 83 (3), 357–385.
Thomas, G. P. (2004). Dimensionality and construct validity of an instrument designed to
measure the metacognitive orientation of science classroom learning environments.
Journal of Applied Measurement, 5(4), 367-384.
United States Department of Education. (2017). National Center on Safe Supportive Learning
Environments, ED School Climate Surveys (EDSCLS),
https://safesupportivelearning.ed.gov/edscls/measures
Wolfe, E. W., & Smith, E. V. Jr. (2007a). Instrument development tools and activities for
measure validation using Rasch models: Part I – Instrument development tools. Journal
of Applied Measurement, 8 (1), 97–123.
Wolfe, E. W. & Smith Jr., E. V. (2007b). Instrument development tools and activities for
measure validation using Rasch models: Part II – Validation activities. Journal of
Applied Measurement, 8 (2), 204–234.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press
40
Wright B. D., and Masters, G. N. (2002). Number of Person or Item Strata. Rasch Measurement
Transactions, 16 (3), 888.
41
APPENDICES
Appendix A: VOCAL 2017 Test Specification1
Dimension Domain/Topic G5 Items G8 Items G10 Items Total
Engagement (ENG)
Cultural and Linguistic Competence (CLC) 2 2 2 4
Relationships (REL) 4 7 6 14
School Participation (PAR) 1 1 1 3
Subtotal 7 10 9 21
Safety (SAF)
Emotional Safety (EMO) 3 3 3 7
Physical Safety (PSAF) 2 2 2 6
Bullying/cyber-bullying (BUL) 3 6 4 11
Subtotal 8 11 9 24
Environment (ENV)
Instructional Environment (INS) 4 8 6 13
Mental Health (MEN) 2 2 2 6
Discipline (DIS) 3 3 3 7
Subtotal 9 13 11 26
TOTAL 24 34 29 71 1Common items that appear on each grade-level survey are only counted once in total column.
42
Appendix B1: Student MCAS Questionnaire - Grade 5 VOCAL form
Spring 2017STUDENT QUESTIONNAIRE
Grade 5DIRECTIONS Mark your answers to the following questions in the box labeled Student Questionnaire on the inside back cover of your Student Answer Booklet. Please ask your test administrator for help if you are not sure where or how to mark your answers to these questions.This questionnaire asks about what it’s like to be a student in your school. There are no right or wrong answers. Your teachers and principal will not see your answers; your answers will be combined with those of your classmates. Your school will use these combined answers to better understand what school life is like for students. When you read each statement, think about the last 30 days in your school. Please answer honestly so your school knows how you really feel about the school. PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Always true
Mostlytrue
Mostly untrue
Nevertrue
1. Teachers support (help) students who come to class upset. A B C D
2. School rules are fair for all students. A B C D
3. I am happy to be at our school. A B C D
4. My teachers care about me as a person. A B C D
5. In school, I learn how to control my feelings when I am angry or upset. A B C D
43
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Nevertrue
6.Teachers at this school accept me for who I am.
A B C D
7.I get the chance to take part in school events (for example, science fairs, art, or music shows).
A B C D
8. Students respect one another. A B C D
9.Teachers don’t let students pick on other students in class or in the hallways.
A B C D
PLEASE PROCEED TO THE NEXT PAGE
PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Alwaystrue
Mostly true
Mostly untrue
Nevertrue
10. My teachers are proud of me when I work hard in school. A B C D
11.At our school, students learn to care about other students’ feelings.
A B C D
12.If I heard about a threat to our school or to my classmates, I would report it to an adult.
A B C D
13. I feel safe at our school. A B C D
14.Adults working at this school treat all students respectfully.
A B C D
15. Students help each other learn without having to be
A B C D
44
asked by the teacher.
Think of the last 30 days in school.
Alwaystrue
Mostly true
Mostly untrue
Nevertrue
16.I feel comfortable talking to my teachers about something that is bothering me.
A B C D
17.
If I tell a teacher or other adult at school that someone is being bullied, the teacher/adult will do something to help.
A B C D
18.My teachers help me succeed with my schoolwork when I need help.
A B C D
19. Students have a voice in deciding school rules. A B C D
20.Students will help other students, even if they are not close friends.
A B C D
21. My teachers use my ideas to help my classmates learn. A B C D
22.
Adults at my school (for example, my school nurse, my teachers, or my principal) talk with students to help us know how to behave well.
A B C D
PLEASE PROCEED TO THE NEXT PAGE
45
PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
23.I have been punched or shoved by other students more than once in the school or on the playground.
A B C D
24. Students at our school get along well with each other. A B C D
Thank you for sharing your experiences and opinions through this student questionnaire. The information you provided can help inform your school’s efforts to create safe and supportive learning environments for all students. If you would like to speak with someone about the topics on this questionnaire, we encourage you to reach out to a family member and/or guidance counselor, teacher, principal, or other adult in the school.
46
Appendix B2: Student MCAS Questionnaire - Grade 8 VOCAL form
Spring 2017STUDENT QUESTIONNAIRE
Grade 8DIRECTIONSMark your answers to the following questions in the box labeled Student Questionnaire on the inside back cover of your Student Answer Booklet. If you do not see one best answer for a question, leave that question blank in your answer booklet and go to the next question. Please ask your test administrator for help if you are not sure how to answer any of these questions.
1. How does using a computer compare with working by hand when you are completing school assignments such as reports or essays?A. It is a lot easier to write on a computer than by hand.B. It is somewhat easier to write on a computer than by hand.C. It doesn’t make any difference whether I write on a computer or by hand.D. It is somewhat harder to write on a computer than by hand.E. It is a lot harder to write on a computer than by hand.
2. What types of tests have you taken on a computer? Choose all that apply.A. multiple-choiceB. essayC. combination of multiple-choice questions and written responsesD. I have never taken a test on a computer.E. I don’t know.
3. In general, how much time do you spend on homework each week?A. less than 3 hours each weekB. about 3 to 6 hours each weekC. about 7 to 9 hours each weekD. about 10 to 12 hours each weekE. about 13 to 15 hours each week
47
F. more than 15 hours each week PLEASE PROCEED TO THE NEXT PAGE
The next set of questions asks what it’s like to be a student in your school. There are no right or wrong answers. Your teachers and principal will not see your individual answers; your answers will be combined with those of your classmates. Your school will use these combined answers to better understand what school life is like for students. When you read each statement, think about the last 30 days in your school. Please answer honestly so your school knows how you really feel about the school.
PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
4. Teachers support students who come to class upset. A B C D
5. My schoolwork is appropriately challenging. A B C D
6. School staff are consistent when enforcing rules in school. A B C D
7.I have a choice in how I show my learning (e.g., write a paper; prepare a presentation; make a video).
A B C D
8. Teachers are available when I need to talk with them. A B C D
9. My teachers inspire confidence in my ability to do well in class. A B C D
10. Students at this school try to stop bullying when they see it happening. A B C D
11. Students respect one another. A B C D
12. My teachers care about my academic success. A B C D
13. My teachers are proud of me when I work hard in school. A B C D
14. I have seen students with weapons at our school. A B C D
15. I am not scared to make mistakes in A B C D
48
my teachers’ classes.
PLEASE PROCEED TO THE NEXT PAGE
PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
16.I have been teased or picked on more than once because of my religion.
A B C D
17.
Adults working at this school treat all students respectfully, regardless of a student’s race, culture, family background, sex, or sexual orientation.
A B C D
18.Students help each other learn without having to be asked by the teacher.
A B C D
19.If I need help with my emotions (feelings), help is available at our school.
A B C D
20.If I tell a teacher or other adult that someone is being bullied, the teacher/adult will do something to help.
A B C D
21.I have been teased or picked on more than once because of my physical or mental disability.
A B C D
22. Students have a voice in deciding school rules. A B C D
23.If I am absent from school, a teacher or other adult will notice that I was not in class.
A B C D
24.I feel comfortable reaching out to teachers/counselors for emotional support if I need it.
A B C D
25. My teachers set high expectations A B C D49
for my work.
26.Students at this school damage and/or steal other students’ property.
A B C D
27.Teachers encourage students to respect different points of view when expressed in class.
A B C D
28.My parents/guardians feel respected when they participate at our school (e.g., at open houses or conferences with teachers).
A B C D
29.Teachers and adults are interested in my well-being beyond just my class work.
A B C D
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
30.
My textbooks or class materials include people and examples that reflect my race, cultural background, and/or identity.
A B C D
31.
Students have spread rumors or lies about me more than once on social media.
A B C D
32.
Students from different backgrounds get along well with each other in our school, regardless of their race, culture, family background, sex, or sexual orientation.
A B C D
33.
My teachers believe that all students can do well in their learning.
A B C D
34.
In school, students learn how to control their behavior. A B C D
50
35.
I have been threatened by other students more than once on social media.
A B C D
36.
My teachers give me individual help with my schoolwork when I need help.
A B C D
37.
Our school offers guidance to students on how to mediate (settle) conflicts by themselves.
A B C D
Thank you for sharing your experiences and opinions through this student questionnaire. The information you provided can help inform your school’s efforts to create safe and supportive learning environments for all students. If you would like to speak with someone about the topics on this questionnaire, we encourage you to reach out to a family member and/or guidance counselor, teacher, principal, or other adult in the school.
51
Appendix B3: Student MCAS Questionnaire - Grade 10 VOCAL form
Spring 2017STUDENT QUESTIONNAIRE
Grade 10DIRECTIONSMark your answers to the following questions in the box labeled Student Questionnaire on the inside back cover of your Student Answer Booklet. If you do not see one best answer for a question, leave that question blank in your answer booklet and go to the next question. Please ask your test administrator for help if you are not sure how to answer any of these questions.
1. How does using a computer compare with working by hand when you are completing school assignments such as reports or essays?A. It is a lot easier to write on a computer than by hand.B. It is somewhat easier to write on a computer than by hand.C. It doesn’t make any difference whether I write on a computer or by hand.D. It is somewhat harder to write on a computer than by hand.E. It is a lot harder to write on a computer than by hand.
2. What types of tests have you taken on a computer? Choose all that apply.A. multiple-choice B. essayC. combination of multiple-choice questions and written responsesD. I have never taken a test on a computer.E. I don’t know.
3. What are your plans after high school?A. attend a four-year collegeB. attend a two-year collegeC. join the militaryD. work full-timeE. other
52
F. I don’t know. 4. If you are not planning to attend a two- or four-year college, which of the following
best describes your plans for future job training? (If you are planning to attend a two- or four-year college, skip this question.)A. attend college sometime in the future for vocational training or credentialingB. attend a post-secondary vocational school for more advanced trainingC. on-the-job trainingD. I do not plan to seek future job training.E. I don’t know.
PLEASE PROCEED TO THE NEXT PAGE
The next set of questions asks what it’s like to be a student in your school. There are no right or wrong answers. Your teachers and principal will not see your individual answers; your answers will be combined with those of your classmates. Your school will use these combined answers to better understand what school life is like for students. When you read each statement, think about the last 30 days in your school. Please answer honestly so your school knows how you really feel about the school. PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
5. Teachers support students who come to class upset. A B C D
6. Teachers ask students for feedback on their classroom instruction. A B C D
7.My teachers are approachable if I am having problems with my class work.
A B C D
8. I am encouraged to take upper-level courses (honors, AP). A B C D
9.I have been teased or picked on more than once because of my race or ethnicity.
A B C D
10. Teachers give students a chance to explain their behavior when they do
A B C D
53
something wrong.
11. My teachers support me even when my work is not my best. A B C D
12. Students respect one another. A B C D
13.
I feel welcome to participate in extra-curricular activities offered through our school, such as school clubs or organizations, musical groups, sports teams, or student council.
A B C D
14. My teachers are proud of me when I work hard in school. A B C D
15.I have access to help at school if I am struggling emotionally or mentally.
A B C D
16. Students know what to do if there is an emergency at school. A B C D
PLEASE PROCEED TO THE NEXT PAGE
PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
17. I feel as though I belong to our school community. A B C D
18.
Adults working at this school treat all students respectfully, regardless of a student’s race, culture, family background, sex, or sexual orientation.
A B C D
19.Students help each other learn without having to be asked by the teacher.
A B C D
54
20. The consequences for inappropriate behavior are enforced fairly. A B C D
21.If I tell a teacher or other adult that someone is being bullied, the teacher/adult will do something to help.
A B C D
22. The things I am learning in school are relevant (important) to me. A B C D
23. Students have a voice in deciding school rules. A B C D
24.Adults at our school are respectful of student ideas, even if the ideas expressed are different from their own.
A B C D
25.Students try to work out their problems with other students in a respectful way.
A B C D
26. My teachers set high expectations for my work. A B C D
27.I have been teased or picked on more than once because of my real or perceived sexual orientation.
A B C D
28.Teachers, students, and the principal work together in our school to prevent bullying.
A B C D
29. My teachers promote respect among students. A B C D
30. I sometimes stay home because I don’t feel safe at our school. A B C D
PLEASE PROCEED TO THE NEXT PAGE
PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.
55
Think of the last 30 days in school.
Always true
Mostly true
Mostly untrue
Never true
31.At our school, a teacher or other adult is available to help students who have experienced sexual assault or dating violence.
A B C D
32. I have at least one friend whom I can count on to support me. A B C D
33.
Students from different backgrounds get along well with each other in our school, regardless of their race, culture, family background, sex, or sexual orientation.
A B C D
The last five questions relate to your mathematics instruction. Please think of your current or most recent mathematics class when responding to the statements.
Think of your current or most recent math class.
Always true
Mostly true
Mostly untrue
Never true
34. I am challenged to support my answers or reasoning in this class. A B C D
35.My teacher helps us identify our strengths and shows us how to use them to help us learn.
A B C D
36.During our lessons, I am asked to apply what I know to new types of challenging problems or tasks.
A B C D
37.My teacher checks to make sure we understand what he or she is teaching us.
A B C D
38. I have to use my critical thinking skills, and not just memorize facts,
A B C D
56
to do my work in this teacher’s class.
Thank you for sharing your experiences and opinions through this student questionnaire. The information you provided can help inform your school’s efforts to create safe and supportive learning environments for all students. If you would like to speak with someone about the topics on this questionnaire, we encourage you to reach out to a family member and/or guidance counselor, teacher, principal, or other adult in the school.
57
Appendix C1: The Rasch ModelThe Rasch model uses an exponential transformation to place ordinal Likert responses on to an equal-interval logit scale (Rasch, 1960). This transformation ensures that stakeholder perceptions are measured appropriately and that the data meet the assumptions of parametric testing (Ludlow and Haley, 1995; Boone, Staver, and Yale 2014). In addition, the sample independence features of the Rasch model overcome the fundamental drawbacks of classical test theory (CTT) analyses Smith (2000). In CTT, the difficulty of a test is sample dependent, making it problematic to measure change on a variable (Smith, 2000; Boone & Scantlebury, 2006). In contrast, the Rasch property of item invariance implies that the relative endorsements and location of the items do not change (within measurement error), or are independent of the sample responding; in kind, the relative item endorsements should behave as expected across different samples (Smith, 2002, Engelhard, 2013). When items are invariant, the Rasch model is particularly discerning in differentiating between high and low scorers (Gable, Ludlow, and Wolf, 1990; Sinnema & Ludlow, 2013) on a measurement scale as it places persons and items on a common scale metric (Hambleton and Jones, 1993; Engelhard, 2013).
The Rasch rating scale model provides a mathematical model for the probabilistic relationship
between a person’s ability (βn ) and the difficulty of items (δ i ) on a test or survey. Andrich’s (1978a, 1978b) rating scale model (RSM) used in this study is defined in Equation 1.
∅ nij=exp¿¿ j = 1, 2, …, mi. (1)
Where ∅ nij is the “conditional probability of person, n responding in category j to item i”. Tau is the estimate of the location of the jth step for each item relative to that item’s scale value (δi). The number of response categories is equal to mi +1 where mi is the number of thresholds. In the RSM, moving from one threshold to the next contiguous threshold is assumed to have the same mean difference across all items of the survey. The unit of measurement resulting from the natural log transformation of person responses results in separate ability and item difficulty estimates called logits (Ludlow & Haley, 1995). The persons and items are placed on a common continuum (the scale metric axis of the variable map) and as such, the persons can be characterized by their location on the continuum by the types and level of items of which they are associated. By taking the natural log of the odds ratio, stable replicable information about the relative strengths of persons and items is derived with equal differences in logits translating into equal differences in the probability of endorsing an item no matter where on the scale metric an item is located; this interval-level unit of measurement is a fundamental assumption of parametric tests (Boone, Townsend, and Staver, 2011). By default, in WINSTEPS, the item mean summed across the thresholds equals zero; the person and item measures are generated and reported on the logit scale. In the context of this study, a respondent with a positive logit value on an educator preparation survey feels relatively more positive about the program than a respondent with a negative logit value.
58
Appendix C2: Logit Unit of Measurement
The unit of measurement resulting from the natural log transformation of person responses results in separate ability and item difficulty estimates called logits (Ludlow & Haley, 1995); this transformation expands the theoretical ability (endorsement) range from negative infinity to plus infinity with most estimates falling in the range of -4 to +4 logits (Ludlow & Haley, 1995). Items can be similarly interpreted in logits with a theoretical range of negative infinity to positive infinity; items with a positive logit are, on average, more difficult to endorse than items with negative logits (Ludlow & Haley, 1995). The persons and items are placed on a common continuum (the scale metric axis of the variable map) and as such, the persons can be characterized by their location on the continuum by the types and level of items of which they are associated. Person expected responses can be compared to their observed responses to determine if “the logit estimate of ability (affirmation) corresponding to an original raw data summary score is consistent or inconsistent with the pattern expected for that estimate of ability (affirmation)” (Ludlow & Haley, 1995). By taking the natural log of the odds ratio, stable replicable information about the relative strengths of persons and items is derived with equal differences in logits translating into equal differences in the probability of endorsing an item no matter where on the scale metric an item is located; this interval-level unit of measurement is a fundamental assumption of parametric tests (Ludlow and Haley, 1995; Boone, Townsend, and Staver, 2011).
59
Appendix D: Guide for evaluating Rasch Model validity dataValidity Aspect Statistic/Data Cutoff Criteria or Typical Standard CommentContent Point-to-measure
CorrelationPositive and >0.3. Analog to CTT item-total
correlation. Content & Structural
Infit and Outfit Mean-square Fit Statistics (MNSQ)
0.5 – 1.5 Disruption of pattern in magnitude of misfit.
Mean square errors should have a mean of one i.e. (observed = expected).
Substantive
Rating Scale Functioning
Minimum of 10 responses per category. Categories are unimodal. Observed score averages and item threshold
parameters increase monotonically. Un-weighted MNSQ < 2.0 for ea. category.
Rating scale is used according to the intent of instrument developers – supports score use and inferences.
Item Difficulty Hierarchy
Ordering of item deltas correspond to theoretical expectations.
Item/person variable maps.
Qualitative assessment of items in the construct and/or dimensions/domains.
Generaliz-ability
Item Invariance and Differential Item Functioning (DIF)
Within standard error, items should retain same item difficulty (deltas) across administrations and survey forms (correlation of greater than or equal to 0.9).
For DIF, recommended criteria vary: delta difference of 0.3 – 0.64 Logits (0.5 used in study)
DIF flags items that need further review. Items may need revision to eliminate bias or removal when estimating scores if bias is significant.
Person SeparationReliability (PSR)
Typical ~ 0.8; High Stakes > 0.9 0.9 Construct; 0.8 Dimensions; 0.7 school-
level scores
PSR is similar to Cronbach α and ranges from 0 to 1.
Structural
Sub-scale Correlations
Positive and substantial (> 0.5 but < 0.9) The items that form a 2nd dimension should be reviewed qualitatively to determine their commonality and if their co-variation is meaningful.
Standardized Residuals
No correlation between residuals from separate calibrations of two item subsets.
Winsteps Software(PCA: Principal Component Analyses of Residuals).
Total variance explained:>40% very good; >50% excellent
2nd dimension: < 5% of total variance. 2nd dimension Eigen < 3 1st contrast item variance 4x variance of
2nd item contrast Cluster correlations
> 0.82 likely only one latent trait > 0.71 more dependency than
independence
External
Responsiveness Typical ~ 3 person strata (low, medium, high). H = (4G +1)/3 where H is the number of
person strata and G is the person separation index.
Instruments that are responsive can better differentiate high and low scorers by reliably separating individuals into a greater number of performance levels, thereby facilitating the measurement of change of respondent views on a construct.
60
Appendix E: Technical quality (mean-square error) of 70-item VOCAL scale|ENTRY TOTAL TOTAL MODEL| INFIT | OUTFIT |PTMEASUR-AL|EXACT MATCH| | ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR. EXP.| OBS% EXP%|DISPLACE| ITEM ||------------------------------------+----------+----------+-----------+-----------+--------+---------|| 30 115122 43443 -1.66 .01|2.41 9.9|2.88 9.9|A .22 .41| 69.6 70.5| .00| SAFBUL10|| 31 110085 43416 -1.23 .01|2.27 9.9|2.73 9.9|B .24 .45| 56.7 64.5| .00| SAFBUL11|| 43 117086 43970 -1.71 .01|2.05 9.9|2.57 9.9|C .29 .41| 73.0 71.3| .00| SAFPSF5 || 23 109404 50974 .52 .01|2.42 9.9|2.51 9.9|D .38 .58| 34.8 56.1| .00| SAFBUL2 || 27 121136 44965 -1.92 .01|2.31 9.9|2.41 9.9|E .23 .39| 73.8 73.3| .00| SAFBUL6 || 26 103394 46560 -.36 .01|2.15 9.9|2.10 9.9|F .39 .51| 38.7 56.8| .00| SAFBUL5 || 42 114381 45244 -1.26 .01|2.06 9.9|2.15 9.9|G .32 .45| 58.4 64.2| .00| SAFPSF4 || 28 120598 47104 -1.36 .01|2.11 9.9|1.98 9.9|H .36 .44| 62.8 65.7| .00| SAFBUL7 || 3 78101 46844 .82 .01|1.93 9.9|2.03 9.9|I .28 .56| 37.5 50.2| .00| ENGCLC3 || 4 89363 43661 .11 .01|1.73 9.9|1.78 9.9|J .41 .53| 40.7 54.8| .00| ENGCLC4 || 37 117366 44577 -1.58 .01|1.70 9.9|1.57 9.9|K .37 .42| 68.1 69.5| .00| SAFEMO6 || 5 129736 51223 -.59 .01|1.66 9.9|1.57 9.9|L .38 .50| 59.7 66.3| .00| ENGPAR1 || 61 89389 47342 .39 .01|1.49 9.9|1.49 9.9|M .37 .55| 45.8 52.8| .00| ENVINS10|| 45 155552 142488 2.20A .00|1.37 9.9|1.46 9.9|N .50 .62| 45.2 48.9| -.02| ENVDIS1 || 40 135990 51182 -1.08 .01|1.44 9.9|1.24 9.9|O .42 .46| 71.0 72.6| .00| SAFPSF2 || 41 82833 46675 .62A .01|1.32 9.9|1.38 9.9|P .43 .56| 47.3 51.5| -.01| SAFPSF3 || 24 129741 50861 -.67A .01|1.33 9.9|1.23 9.9|Q .49 .50| 67.1 67.4| .02| SAFBUL3 || 65 114368 51234 .30A .01|1.31 9.9|1.25 9.9|R .52 .57| 53.1 57.7| .00| ENVMEN1 || 14 110606 47723 -.60A .01|1.29 9.9|1.24 9.9|S .47 .49| 54.7 58.4| .00| ENGREL7 || 35 80034 48203 .85A .01|1.29 9.9|1.28 9.9|T .59 .57| 43.0 50.2| -.01| SAFEMO4 || 66 100694 51216 .93A .01|1.28 9.9|1.24 9.9|U .56 .60| 49.5 54.1| .00| ENVMEN2 || 67 99984 48292 .01A .01|1.25 9.9|1.21 9.9|V .55 .53| 50.7 54.9| -.01| ENVMEN3 || 19 96983 43999 -.25A .01|1.23 9.9|1.20 9.9|W .52 .51| 55.9 56.9| .00| ENGREL12|| 68 107825 48385 -.37A .01|1.22 9.9|1.13 9.9|X .59 .51| 54.3 57.0| .00| ENVMEN4 || 7 105042 45209 -.57A .01|1.21 9.9|1.16 9.9|Y .52 .49| 58.1 58.7| .01| ENGPAR3 || 47 130390 51137 -.67A .01|1.19 9.9|1.07 7.6|Z .51 .50| 67.8 67.4| .02| ENVDIS3 || 55 99203 51188 1.00A .01|1.19 9.9|1.17 9.9| .55 .60| 50.9 53.5| .00| ENVINS4 || 48 102344 47094 -.23A .01|1.17 9.9|1.17 9.9| .43 .52| 53.1 56.2| .00| ENVDIS4 || 6 113998 48278 -.72A .01|1.15 9.9|1.07 9.0| .54 .48| 62.1 59.4| .00| ENGPAR2 || 58 80180 47245 .78A .01|1.12 9.9|1.15 9.9| .42 .56| 48.5 50.5| -.01| ENVINS7 || 70 104016 45057 -.52A .01|1.14 9.9|1.07 9.4| .56 .50| 59.2 58.5| .00| ENVMEN6 || 1 343464 143914 -.56A .00|1.13 9.9|1.05 9.9| .55 .51| 61.1 61.1| .00| ENGCLC1 |
61
| 2 137670 51269 -1.23A .01|1.11 9.9| .92 -7.0| .52 .45| 76.8 74.6| .02| ENGCLC2 || 34 121223 51192 -.08A .01|1.10 9.9|1.04 5.1| .57 .54| 60.7 60.4| .01| SAFEMO3 || 39 129842 51172 -.62A .01|1.08 9.9| .93 -7.7| .56 .50| 69.1 66.6| .01| SAFPSF1 || 46 115656 51260 .23A .01|1.08 9.9|1.08 9.9| .52 .56| 57.4 58.0| .01| ENVDIS2 || 44 105521 45198 -.60A .01|1.07 9.5|1.05 7.2| .44 .49| 58.3 59.0| .01| SAFPSF6 || 64 72836 43688 .90A .01|1.05 7.7|1.07 9.9| .49 .57| 49.8 50.3| -.01| ENVINS13|| 10 136865 51134 -1.19A .01|1.06 7.0| .87 -9.9| .53 .45| 76.8 73.9| .02| ENGREL3 || 60 98973 48279 .05A .01| .99 -2.0|1.06 8.6| .32 .53| 58.3 54.7| .00| ENVINS9 || 53 356836 144258 -.84A .00|1.05 9.9| .93 -9.9| .56 .49| 66.1 63.5| .01| ENVINS2 || 56 214029 92697 -.56A .01|1.04 8.2|1.03 6.1| .45 .50| 59.5 58.4| .00| ENVINS5 || 57 108341 48331 -.41A .01|1.02 3.5| .98 -3.4| .58 .51| 59.9 57.2| .01| ENVINS6 || 54 135430 51147 -1.06A .01|1.01 1.4| .86 -9.9| .54 .47| 75.0 72.4| .02| ENVINS3 || 22 354361 144350 -.77A .00|1.00 1.1| .90 -9.9|z .59 .49| 67.7 62.9| .00| SAFBUL1 || 25 73025 47560 1.08A .01| .99 -1.3|1.00 .1|y .56 .57| 50.5 48.7| -.01| SAFBUL4 || 59 119775 48331 -1.08A .01|1.00 .2| .88 -9.9|x .58 .46| 67.8 62.6| .01| ENVINS8 || 12 205745 92101 -.36A .01| .99 -2.3| .97 -5.2|w .49 .51| 60.6 57.2| .00| ENGREL5 || 63 75936 44277 .80A .01| .95 -8.0| .97 -5.2|v .54 .57| 53.1 50.8| .00| ENVINS12|| 38 95837 45394 -.03A .01| .96 -6.4| .93 -9.8|u .59 .53| 58.9 55.7| -.01| SAFEMO7 || 51 62805 44602 1.38A .01| .93 -9.9| .96 -6.5|t .55 .58| 51.3 47.9| -.01| ENVDIS7 || 15 113456 48652 -.65A .01| .92 -9.9| .89 -9.9|s .56 .49| 63.8 58.7| .01| ENGREL8 || 17 103010 48419 -.13A .01| .92 -9.9| .89 -9.9|r .61 .52| 58.2 55.6| .00| ENGREL10|| 36 97968 48451 .11A .01| .92 -9.9| .89 -9.9|q .64 .54| 56.9 54.4| .00| SAFEMO5 || 50 84402 44804 .46A .01| .92 -9.9| .92 -9.9|p .57 .55| 57.2 53.1| .00| ENVDIS6 || 52 286343 144678 .45A .00| .90 -9.9| .92 -9.9|o .50 .57| 58.0 54.2| .00| ENVINS1 || 18 100415 43715 -.49A .01| .91 -9.9| .90 -9.9|n .55 .50| 63.1 58.2| .00| ENGREL11|| 9 98906 51151 1.01A .01| .90 -9.9| .88 -9.9|m .62 .60| 58.3 53.5| .00| ENGREL2 || 16 119107 48619 -.99A .01| .90 -9.9| .82 -9.9|l .61 .47| 69.7 61.5| .01| ENGREL9 || 33 110177 51207 .50A .01| .89 -9.9| .86 -9.9|k .62 .58| 60.8 56.1| .00| SAFEMO2 || 32 301912 143502 .17A .00| .87 -9.9| .88 -9.9|j .59 .56| 61.7 55.7| .00| SAFEMO1 || 29 92720 44527 .03A .01| .87 -9.9| .85 -9.9|i .63 .53| 60.6 55.4| .00| SAFBUL9 || 13 106798 48351 -.32A .01| .85 -9.9| .85 -9.9|h .57 .51| 61.8 56.7| .00| ENGREL6 || 21 104130 44633 -.59A .01| .85 -9.9| .80 -9.9|g .61 .49| 67.8 58.8| .00| ENGREL14|| 49 79639 47284 .80A .01| .85 -9.9| .85 -9.9|f .58 .57| 56.6 50.4| -.01| ENVDIS5 || 69 74062 44616 .91A .01| .84 -9.9| .85 -9.9|e .55 .57| 56.3 50.1| -.01| ENVMEN5 || 8 252700 143591 .91A .00| .76 -9.9| .79 -9.9|d .56 .60| 60.5 51.4| .00| ENGREL1 || 11 92802 50987 1.25 .01| .76 -9.9| .79 -9.9|c .58 .61| 63.0 52.4| .00| ENGREL4 |
62
| 62 87148 44798 .34A .01| .78 -9.9| .79 -9.9|b .58 .55| 61.6 53.8| -.01| ENVINS11|| 20 87839 45016 .32A .01| .73 -9.9| .72 -9.9|a .63 .55| 63.6 53.9| .00| ENGREL13|
63
Appendix F: Measure order of 70-item VOCAL scale|ENTRY TOTAL TOTAL MODEL| INFIT | OUTFIT |PTMEASUR-AL|EXACT MATCH| | ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR. EXP.| OBS% EXP%|DISPLACE| ITEM ||------------------------------------+----------+----------+-----------+-----------+--------+---------|| 45 155552 142488 2.20A .00|1.37 9.9|1.46 9.9| .50 .62| 45.2 48.9| -.02| ENVDIS1 || 51 62805 44602 1.38A .01| .93 -9.9| .96 -6.5| .55 .58| 51.3 47.9| -.01| ENVDIS7 || 11 92802 50987 1.25 .01| .76 -9.9| .79 -9.9| .58 .61| 63.0 52.4| .00| ENGREL4 || 25 73025 47560 1.08A .01| .99 -1.3|1.00 .1| .56 .57| 50.5 48.7| -.01| SAFBUL4 || 9 98906 51151 1.01A .01| .90 -9.9| .88 -9.9| .62 .60| 58.3 53.5| .00| ENGREL2 || 55 99203 51188 1.00A .01|1.19 9.9|1.17 9.9| .55 .60| 50.9 53.5| .00| ENVINS4 || 66 100694 51216 .93A .01|1.28 9.9|1.24 9.9| .56 .60| 49.5 54.1| .00| ENVMEN2 || 8 252700 143591 .91A .00| .76 -9.9| .79 -9.9| .56 .60| 60.5 51.4| .00| ENGREL1 || 69 74062 44616 .91A .01| .84 -9.9| .85 -9.9| .55 .57| 56.3 50.1| -.01| ENVMEN5 || 64 72836 43688 .90A .01|1.05 7.7|1.07 9.9| .49 .57| 49.8 50.3| -.01| ENVINS13|| 35 80034 48203 .85A .01|1.29 9.9|1.28 9.9| .59 .57| 43.0 50.2| -.01| SAFEMO4 || 3 78101 46844 .82 .01|1.93 9.9|2.03 9.9| .28 .56| 37.5 50.2| .00| ENGCLC3 || 49 79639 47284 .80A .01| .85 -9.9| .85 -9.9| .58 .57| 56.6 50.4| -.01| ENVDIS5 || 63 75936 44277 .80A .01| .95 -8.0| .97 -5.2| .54 .57| 53.1 50.8| .00| ENVINS12|| 58 80180 47245 .78A .01|1.12 9.9|1.15 9.9| .42 .56| 48.5 50.5| -.01| ENVINS7 || 41 82833 46675 .62A .01|1.32 9.9|1.38 9.9| .43 .56| 47.3 51.5| -.01| SAFPSF3 || 23 109404 50974 .52 .01|2.42 9.9|2.51 9.9| .38 .58| 34.8 56.1| .00| SAFBUL2 || 33 110177 51207 .50A .01| .89 -9.9| .86 -9.9| .62 .58| 60.8 56.1| .00| SAFEMO2 || 50 84402 44804 .46A .01| .92 -9.9| .92 -9.9| .57 .55| 57.2 53.1| .00| ENVDIS6 || 52 286343 144678 .45A .00| .90 -9.9| .92 -9.9| .50 .57| 58.0 54.2| .00| ENVINS1 || 61 89389 47342 .39 .01|1.49 9.9|1.49 9.9| .37 .55| 45.8 52.8| .00| ENVINS10|| 62 87148 44798 .34A .01| .78 -9.9| .79 -9.9| .58 .55| 61.6 53.8| -.01| ENVINS11|| 20 87839 45016 .32A .01| .73 -9.9| .72 -9.9| .63 .55| 63.6 53.9| .00| ENGREL13|| 65 114368 51234 .30A .01|1.31 9.9|1.25 9.9| .52 .57| 53.1 57.7| .00| ENVMEN1 || 46 115656 51260 .23A .01|1.08 9.9|1.08 9.9| .52 .56| 57.4 58.0| .01| ENVDIS2 || 32 301912 143502 .17A .00| .87 -9.9| .88 -9.9| .59 .56| 61.7 55.7| .00| SAFEMO1 || 36 97968 48451 .11A .01| .92 -9.9| .89 -9.9| .64 .54| 56.9 54.4| .00| SAFEMO5 || 4 89363 43661 .11 .01|1.73 9.9|1.78 9.9| .41 .53| 40.7 54.8| .00| ENGCLC4 || 60 98973 48279 .05A .01| .99 -2.0|1.06 8.6| .32 .53| 58.3 54.7| .00| ENVINS9 || 29 92720 44527 .03A .01| .87 -9.9| .85 -9.9| .63 .53| 60.6 55.4| .00| SAFBUL9 || 67 99984 48292 .01A .01|1.25 9.9|1.21 9.9| .55 .53| 50.7 54.9| -.01| ENVMEN3 || 38 95837 45394 -.03A .01| .96 -6.4| .93 -9.8| .59 .53| 58.9 55.7| -.01| SAFEMO7 |
64
| 34 121223 51192 -.08A .01|1.10 9.9|1.04 5.1| .57 .54| 60.7 60.4| .01| SAFEMO3 || 17 103010 48419 -.13A .01| .92 -9.9| .89 -9.9| .61 .52| 58.2 55.6| .00| ENGREL10|| 48 102344 47094 -.23A .01|1.17 9.9|1.17 9.9| .43 .52| 53.1 56.2| .00| ENVDIS4 || 19 96983 43999 -.25A .01|1.23 9.9|1.20 9.9| .52 .51| 55.9 56.9| .00| ENGREL12|| 13 106798 48351 -.32A .01| .85 -9.9| .85 -9.9| .57 .51| 61.8 56.7| .00| ENGREL6 || 12 205745 92101 -.36A .01| .99 -2.3| .97 -5.2| .49 .51| 60.6 57.2| .00| ENGREL5 || 26 103394 46560 -.36 .01|2.15 9.9|2.10 9.9| .39 .51| 38.7 56.8| .00| SAFBUL5 || 68 107825 48385 -.37A .01|1.22 9.9|1.13 9.9| .59 .51| 54.3 57.0| .00| ENVMEN4 || 57 108341 48331 -.41A .01|1.02 3.5| .98 -3.4| .58 .51| 59.9 57.2| .01| ENVINS6 || 18 100415 43715 -.49A .01| .91 -9.9| .90 -9.9| .55 .50| 63.1 58.2| .00| ENGREL11|| 70 104016 45057 -.52A .01|1.14 9.9|1.07 9.4| .56 .50| 59.2 58.5| .00| ENVMEN6 || 1 343464 143914 -.56A .00|1.13 9.9|1.05 9.9| .55 .51| 61.1 61.1| .00| ENGCLC1 || 56 214029 92697 -.56A .01|1.04 8.2|1.03 6.2| .45 .50| 59.5 58.4| .00| ENVINS5 || 7 105042 45209 -.57A .01|1.21 9.9|1.16 9.9| .52 .49| 58.1 58.7| .01| ENGPAR3 || 21 104130 44633 -.59A .01| .85 -9.9| .80 -9.9| .61 .49| 67.8 58.8| .00| ENGREL14|| 5 129736 51223 -.59 .01|1.66 9.9|1.57 9.9| .38 .50| 59.7 66.3| .00| ENGPAR1 || 14 110606 47723 -.60A .01|1.29 9.9|1.24 9.9| .47 .49| 54.7 58.4| .00| ENGREL7 || 44 105521 45198 -.60A .01|1.07 9.5|1.05 7.2| .44 .49| 58.3 59.0| .01| SAFPSF6 || 39 129842 51172 -.62A .01|1.08 9.9| .93 -7.7| .56 .50| 69.1 66.6| .01| SAFPSF1 || 15 113456 48652 -.65A .01| .92 -9.9| .89 -9.9| .56 .49| 63.8 58.7| .01| ENGREL8 || 24 129741 50861 -.67A .01|1.33 9.9|1.23 9.9| .49 .50| 67.1 67.4| .02| SAFBUL3 || 47 130390 51137 -.67A .01|1.19 9.9|1.07 7.6| .51 .50| 67.8 67.4| .02| ENVDIS3 || 6 113998 48278 -.72A .01|1.15 9.9|1.07 9.0| .54 .48| 62.1 59.4| .00| ENGPAR2 || 22 354361 144350 -.77A .00|1.00 1.2| .90 -9.9| .59 .49| 67.7 62.9| .00| SAFBUL1 || 53 356836 144258 -.84A .00|1.05 9.9| .93 -9.9| .56 .49| 66.1 63.5| .01| ENVINS2 || 16 119107 48619 -.99A .01| .90 -9.9| .82 -9.9| .61 .47| 69.7 61.5| .01| ENGREL9 || 54 135430 51147 -1.06A .01|1.01 1.4| .86 -9.9| .54 .47| 75.0 72.4| .02| ENVINS3 || 59 119775 48331 -1.08A .01|1.00 .2| .88 -9.9| .58 .46| 67.8 62.6| .01| ENVINS8 || 40 135990 51182 -1.08 .01|1.44 9.9|1.24 9.9| .42 .46| 71.0 72.6| .00| SAFPSF2 || 10 136865 51134 -1.19A .01|1.06 7.0| .87 -9.9| .53 .45| 76.8 73.9| .02| ENGREL3 || 31 110085 43416 -1.23 .01|2.27 9.9|2.73 9.9| .24 .45| 56.7 64.5| .00| SAFBUL11|| 2 137670 51269 -1.23A .01|1.11 9.9| .92 -7.0| .52 .45| 76.8 74.6| .02| ENGCLC2 || 42 114381 45244 -1.26 .01|2.06 9.9|2.15 9.9| .32 .45| 58.4 64.2| .00| SAFPSF4 || 28 120598 47104 -1.36 .01|2.11 9.9|1.98 9.9| .36 .44| 62.8 65.7| .00| SAFBUL7 || 37 117366 44577 -1.58 .01|1.70 9.9|1.57 9.9| .37 .42| 68.1 69.5| .00| SAFEMO6 || 30 115122 43443 -1.66 .01|2.41 9.9|2.88 9.9| .22 .41| 69.6 70.5| .00| SAFBUL10|
65
| 43 117086 43970 -1.71 .01|2.05 9.9|2.57 9.9| .29 .41| 73.0 71.3| .00| SAFPSF5 || 27 121136 44965 -1.92 .01|2.31 9.9|2.41 9.9| .23 .39| 73.8 73.3| .00| SAFBUL6 |
66
Appendix G1: Engagement Items
Indicator Grade Item code Item prompt
Cultural and Linguistic Competence
5, 8, 10 ENGCLC1 Adults working at this school treat all students respectfully.
5 ENGCLC2 Teachers at this school accept me for who I am.
8 ENGCLC3 My textbooks or class materials include people and examples that reflect my race, cultural background and/or identity.
10 ENGCLC4 I am encouraged to take upper level courses (honors, AP).
Relationships
5, 8, 10 ENGREL1 Students respect one another.
5 ENGREL2 Students will help other students, even if they are not close friends.
5 ENGREL3 My teachers care about me as a person.
5 ENGREL4 Students at my school get along well with each other.
8, 10 ENGREL5Students from different backgrounds get along well with each other in our school, regardless of their race, culture, family background, sex, or sexual orientation.
8 ENGREL6 Teachers are available when I need to talk with them.
8 ENGREL7 If I am absent from school, there is a teacher or other adult that will notice I was not in class.
8 ENGREL8 Teachers encourage students to respect different points of view when expressed in class.
8 ENGREL9 My teachers care about my academic success.
8 ENGREL10 My teachers inspire confidence in my ability to do well in school.
10 ENGREL11 My teachers are approachable if I am having problems with my class work.
10 ENGREL12 At our school, a teacher or some other adult is available to help students who have experienced sexual assault or dating violence.
10 ENGREL13 Adults at our school are respectful to student ideas even if the ideas expressed are different from their own.
10 ENGREL14 My teachers promote respect among students.
Participation5 ENGPAR1 I get the chance to take part in school events (for example,
science fairs, art or music shows).
8 ENGPAR2 My parents feel respected when they participate at our school (e.g., at parent-teacher conferences, open houses).
10 ENGPAR3I feel welcome to participate in extra-curricular activities offered through our school, such as, school clubs or organizations, musical groups, sports teams, student council.
67
Appendix G2: Safety Items
Domain Grade Item code Item prompt
Emotional
5, 8, 10 SAFEMO1 Teachers support (help) students who come to class upset.
5 SAFEMO2 At our school, students learn to care about other students' feelings.
5 SAFEMO3 I am happy to be at our school.
8 SAFEMO4 I feel comfortable reaching out to teachers/counselors for emotional support if I need it.
8 SAFEMO5 Teachers and adults are interested in my well-being beyond just my class work.
10 SAFEMO6 I have at least one friend who I can count on to support me.
10 SAFEMO7 I feel as though I belong to my school community.
Physical
5 SAFPSF1 I feel safe at our school.
5 SAFPSF2 If I heard about a threat to our school or to my classmates, I would report it to an adult.
8 SAFPSF3 Students at this school damage and/or steal other students' property.
8 SAFPSF4 I have seen students with weapons at our school.
10 SAFPSF5 I sometimes stay home because I don’t feel safe at our school.
10 SAFPSF6 Students know what to do if there is an emergency at school.
Bullying/cyber bullying
5, 8, 10 SAFBUL1 If I tell a teacher or other adult that someone is being bullied, the teacher/adult will do something to help.
5 SAFBUL2 I have been punched or shoved by other students more than once in the school or in the playground.
5 SAFBUL3 Teachers don't let students pick on other students in class or in the hallways.
8 SAFBUL4 Students at this school try to stop bullying when they see it happening.
8 SAFBUL5 Students have spread rumors or lies about me more than once on social media.
8 SAFBUL6 I have been teased or picked on more than once because of my religion.
8 SAFBUL7 I have been threatened by other students more than once on social media.
10 SAFBUL9 Teachers, students, and the principal work together in our school to prevent bullying.
10 SAFBUL10 I have been teased or picked on more than once because of my real or perceived sexual orientation.
69
Appendix G2: Safety Items
10 SAFBUL11 I have been teased or picked on more than once because of my race or ethnicity.
Appendix G3: Environment Items
Domain Grade Item code Item prompt
Instructional
5, 8, 10 ENVINS1 Students help each other learn without having to be asked by the teacher.
5, 8, 10 ENVINS2 My teachers are proud of me when I work hard in school.
5 ENVINS3 My teachers help me succeed with my school work when I need help.
5 ENVINS4 My teachers use my ideas to help my classmates learn.
8, 10 ENVINS5 My teachers set high expectations for my work.
8 ENVINS6 My teachers give me individual help with my school work when I need help.
8 ENVINS7 I have a choice in how I show my learning (e.g., write a paper, prepare a presentation, make a video).
8 ENVINS8 My teachers believe that all students can do well in their learning.8 ENVINS9 My school work is appropriately challenging.10 ENVINS10 I am not scared to make mistakes in my teachers' classes.10 ENVINS11 My teachers support me even when my work is not my best.10 ENVINS12 The things I am learning in school are relevant (important) to me.10 ENVINS13 Teachers ask students for feedback on their classroom instruction.
Mental health
5 ENVMEN1 In school, I learn how to control my feelings when I am angry or upset.
5 ENVMEN2 I feel comfortable talking to my teacher(s) about something that is bothering me.
8 ENVMEN3 Our school offers guidance to students on how to mediate (settle) conflicts by themselves.
8 ENVMEN4 If I need help with my emotions (feelings), help is available at my school.
10 ENVMEN5 Students at this school try to work out their problems with other students in a respectful way.
10 ENVMEN6 I have access to help at school if I am struggling emotionally or mentally.
Discipline
5, 8, 10 ENVDIS1 Students have a voice in deciding school rules. 5 ENVDIS2 School rules are fair for all students.
5 ENVDIS3Adults at my school (for example, my school nurse, my teachers, or my principal) talk with students to help us know how to behave well.
8 ENVDIS4 School staff are consistent when enforcing rules in school.
70
Appendix G3: Environment Items
8 ENVDIS5 In school, students learn how to control their behavior.
10 ENVDIS6 The consequences for inappropriate behavior are enforced fairly.
10 ENVDIS7 Teachers give students a chance to explain their behavior when they do something wrong.
71
Appendix H:
Person Reliability of VOCAL scale, grade-level VOCAL scales and dimension sub-scales
Overall School Climate (Persons = 148,824); items = 70)1
Person Separation Reliability (PSR)2
Person Separation Index (PSI: G)
Person Strata (H)
Mean ±SD3
Real – Model2 0.88–0.90 2.67–2.98 3.9–4.3 1.33 ± 1.16
Grade 5 items (Persons =51,384 ); items = 24)1
Person Separation Reliability (PSR)
Person Separation Index (PSI: G)
Person Strata(H)
Mean ±SD
Real - Model 0.86–0.88 2.47–2.74 3.6–4.0 1.81 ± 1.22
Grade 8 items (Persons =50,334 ); items = 33)1
Person Separation Reliability (PSR)
Person Separation Index (PSI: G)
Person Strata(H)
Mean ±SD
Real - Model 0.88–0.91 2.74–3.14 4.0–4.5 1.05 ± 1.01
Grade 10 items (Persons =47,106 ); items = 29)1
Person Separation Reliability (PSR)
Person Separation Index (PSI: G)
Person Strata(H)
Mean ±SD
Real - Model 0.86–0.88 2.45–2.71 3.6–3.9 1.10 ± 1.09
Engagement (Persons = 148,338); items = 21)1
Person Separation Reliability (PSR)
Person Separation Index (PSI: G)
Person Strata (H)
Mean ±SD
Real - Model 0.69–0.73 1.50–1.65 2.3–2.5 1.36 ± 1.30
Safety items (Persons = 148,338); items = 23)1
Person Separation Reliability (PSR)
Person Separation Index (PSI: G)
Person Strata(H)
Mean ±SD
Real - Model 0.68–0.73 1.46–1.65 2.3–2.5 1.43 ± 1.47
Environment items (Persons = 148,364); items = 26)1
Person Separation Reliability (PSR)
Person Separation Index (PSI: G)
Person Strata(H)
Mean ±SD
Real - Model 0.76–0.80 1.79–2.00 2.7–3.0 1.35 ± 1.2417 common items: grade 5, 8, and 10, 2 common items: grade 8 and grade10, 62 remaining items are and distributed across three grades; 2Real PSR lower bound of reliability; Model PSR upper bound; 3SD Standard Deviation.
72
Appendix I1: DIF Plot: Economically Disadvantaged (ECODIS)
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
ENGC
LC1
ENGC
LC2
ENGC
LC3
ENGC
LC4
ENGP
AR1
ENGP
AR2
ENGP
AR3
ENGR
EL1
ENGR
EL2
ENGR
EL3
ENGR
EL4
ENGR
EL5
ENGR
EL6
ENGR
EL7
ENGR
EL8
ENGR
EL9
ENGR
EL10
ENGR
EL11
ENGR
EL12
ENGR
EL13
ENGR
EL14
SAFB
UL1
SAFB
UL2
SAFB
UL3
SAFB
UL4
SAFB
UL5
SAFB
UL6
SAFB
UL7
SAFB
UL9
SAFB
UL10
SAFB
UL11
SAFE
MO1
SAFE
MO2
SAFE
MO3
SAFE
MO4
SAFE
MO5
SAFE
MO6
SAFE
MO7
SAFP
SF1
SAFP
SF2
SAFP
SF3
SAFP
SF4
SAFP
SF5
SAFP
SF6
ENVD
IS1EN
VDIS2
ENVD
IS3EN
VDIS4
ENVD
IS5EN
VDIS6
ENVD
IS7EN
VINS
1EN
VINS
2EN
VINS
3EN
VINS
4EN
VINS
5EN
VINS
6EN
VINS
7EN
VINS
8EN
VINS
9EN
VINS
10EN
VINS
11EN
VINS
12EN
VINS
13EN
VMEN
1EN
VMEN
2EN
VMEN
3EN
VMEN
4EN
VMEN
5EN
VMEN
6
DIF
Mea
sure
(diff
.)
ITEM
DIF plot (DIF=@ECODIS)Non-Economically Disadvantaged Economically Disadvantaged
74
Appendix I2: DIF Plot: Students with Disabilities (SWD)
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
ENGC
LC1
ENGC
LC2
ENGC
LC3
ENGC
LC4
ENGP
AR1
ENGP
AR2
ENGP
AR3
ENGR
EL1
ENGR
EL2
ENGR
EL3
ENGR
EL4
ENGR
EL5
ENGR
EL6
ENGR
EL7
ENGR
EL8
ENGR
EL9
ENGR
EL10
ENGR
EL11
ENGR
EL12
ENGR
EL13
ENGR
EL14
SAFB
UL1
SAFB
UL2
SAFB
UL3
SAFB
UL4
SAFB
UL5
SAFB
UL6
SAFB
UL7
SAFB
UL9
SAFB
UL10
SAFB
UL11
SAFE
MO1
SAFE
MO2
SAFE
MO3
SAFE
MO4
SAFE
MO5
SAFE
MO6
SAFE
MO7
SAFP
SF1
SAFP
SF2
SAFP
SF3
SAFP
SF4
SAFP
SF5
SAFP
SF6
ENVD
IS1EN
VDIS2
ENVD
IS3EN
VDIS4
ENVD
IS5EN
VDIS6
ENVD
IS7EN
VINS
1EN
VINS
2EN
VINS
3EN
VINS
4EN
VINS
5EN
VINS
6EN
VINS
7EN
VINS
8EN
VINS
9EN
VINS
10EN
VINS
11EN
VINS
12EN
VINS
13EN
VMEN
1EN
VMEN
2EN
VMEN
3EN
VMEN
4EN
VMEN
5EN
VMEN
6
DIF
Mea
sure
(diff
.)
ITEM
DIF plot (DIF=@SWD)Non-students with disabilities Students with disabilties
75
Appendix I3: DIF Plot: English Language Learner (ELL)
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
ENGC
LC1
ENGC
LC2
ENGC
LC3
ENGC
LC4
ENGP
AR1
ENGP
AR2
ENGP
AR3
ENGR
EL1
ENGR
EL2
ENGR
EL3
ENGR
EL4
ENGR
EL5
ENGR
EL6
ENGR
EL7
ENGR
EL8
ENGR
EL9
ENGR
EL10
ENGR
EL11
ENGR
EL12
ENGR
EL13
ENGR
EL14
SAFB
UL1
SAFB
UL2
SAFB
UL3
SAFB
UL4
SAFB
UL5
SAFB
UL6
SAFB
UL7
SAFB
UL9
SAFB
UL10
SAFB
UL11
SAFE
MO1
SAFE
MO2
SAFE
MO3
SAFE
MO4
SAFE
MO5
SAFE
MO6
SAFE
MO7
SAFP
SF1
SAFP
SF2
SAFP
SF3
SAFP
SF4
SAFP
SF5
SAFP
SF6
ENVD
IS1EN
VDIS2
ENVD
IS3EN
VDIS4
ENVD
IS5EN
VDIS6
ENVD
IS7EN
VINS
1EN
VINS
2EN
VINS
3EN
VINS
4EN
VINS
5EN
VINS
6EN
VINS
7EN
VINS
8EN
VINS
9EN
VINS
10
ENVI
NS1
1EN
VINS
12
ENVI
NS1
3EN
VMEN
1EN
VMEN
2EN
VMEN
3EN
VMEN
4EN
VMEN
5EN
VMEN
6
DIF
Mea
sure
(diff
.)
ITEM
DIF plot (DIF=@ELL)Non-English learner English learner
76
Appendix J: Winsteps Residual Analyses Output
Table of STANDARDIZED RESIDUAL variance in Eigenvalue units = ITEM information units Eigenvalue Observed ExpectedTotal raw variance in observations = 117.2517 100.0% 100.0% Raw variance explained by measures = 47.2517 40.3% 44.5% Raw variance explained by persons = 26.5307 22.6% 25.0% Raw Variance explained by items = 20.7210 17.7% 19.5% Raw unexplained variance (total) = 70.0000 59.7% 100.0% 55.5% Unexplned variance in 1st contrast = 2.5826 2.2% 3.7% Unexplned variance in 2nd contrast = 2.2237 1.9% 3.2% Unexplned variance in 3rd contrast = 2.0287 1.7% 2.9% Unexplned variance in 4th contrast = 1.7088 1.5% 2.4% Unexplned variance in 5th contrast = 1.5612 1.3% 2.2%
78
Appendix K: Transformation of Logit Scores
To transform student-level person measures into interpretable school-level scores, the following steps were
taken:
1. The school climate person measures were exported out from Winsteps based on the joint calibration of
all students (all students from across the three grades included),
2. Each person’s logit measure was standardized by subtracting the mean of the overall school climate
measure from each students’ score and dividing by the standard deviation of the overall school climate
measure
sclstd= person school climate measure−mean of school climate measure
standard deviation of school climate measure
where sclstd is the person’s standardized school climate measure
3. The standardized estimates were then multiplied by 20 and 50 was added to each individual score.
As a result of this process, student scores were centered at 50 with a standard deviation of 20. Upon aggregation
to the school-level, scores were truncated to range from 1–99. School-level scores had a mean of 50.05 and a
standard deviation of 12.83. A similar process was used for each dimension score.
79