+ All Categories
Home > Documents > 2017 Views of Climate and Learning (VOCAL) Validity … · Web viewA Likert scale with four...

2017 Views of Climate and Learning (VOCAL) Validity … · Web viewA Likert scale with four...

Date post: 17-May-2018
Category:
Upload: nguyendieu
View: 215 times
Download: 0 times
Share this document with a friend
120
2017 Views of Climate and Learning (VOCAL) Validity Study 2017 MCAS Questionnaire April 2018
Transcript

2017 Views of Climate and Learning (VOCAL) Validity Study 2017 MCAS QuestionnaireApril 2018

Massachusetts Department of Elementary and Secondary Education75 Pleasant Street, Malden, MA 02148-4906Phone 781-338-3000 TTY: N.E.T. Relay 800-439-2370www.doe.mass.edu

This document was prepared by the Massachusetts Department of Elementary and Secondary Education

Jeff WulfsonActing Commissioner

The Massachusetts Department of Elementary and Secondary Education, an affirmative action employer, is committed to ensuring that all of its programs and facilities are accessible to all members of the public.

We do not discriminate on the basis of age, color, disability, national origin, race, religion, sex, gender identity, or sexual orientation. Inquiries regarding the Department’s compliance with Title IX and other civil rights laws may be directed to the

Human Resources Director, 75 Pleasant St., Malden, MA 02148-4906. Phone: 781-338-6105.

© 2017 Massachusetts Department of Elementary and Secondary EducationPermission is hereby granted to copy any or all parts of this document for non-commercial educational purposes. Please credit the

“Massachusetts Department of Elementary and Secondary Education.”

This document printed on recycled paper

Massachusetts Department of Elementary and Secondary Education75 Pleasant Street, Malden, MA 02148-4906

Phone 781-338-3000 TTY: N.E.T. Relay 800-439-2370www.doe.mass.edu

2

Table of Contents

1. Purpose of this Report………………………………………………………….…12. Survey Design, Survey Administration, and Profile of Respondents…..……... 2

2.1. School climate construct…………………………………………………… 22.2. Survey design principles ………………………………………………….. 42.3. Item and measure development……………………………………………. 52.4. Stakeholder engagement…………………………………………… ……... 62.5. Form building ……………………………… ……………………………... 72.6. Form linking and anchoring process………...……………………………... 92.7. Administration of forms…………………….……………………………... 112.8. Profile of respondents……………………….……………………………... 11

3. Data Analyses Procedures……………………………………………………….. 123.1. Rasch methodology ………………………………………………………... 12

4. Validity Framework ………………………………………………………………13

4.1. Validity framework………………………………………………………... 135. Validity Evidence for VOCAL scales and sub-scales…………………………... 14

5.1. Content validity……………………………………………………………. 155.2 Substantive validity ………………………………………………………. 165.3. Generalizability…………………………………………………………….. 20

5.3.1. Reliability evidence…………………………………………………. 205.3.2. Differential item functioning (DIF) analyses……………………….. 23

5.4. Structural Validity………………………………………………………….. 265.4.1. Residual analyses…………………………………………………….265.4.2. Sub-scale correlations………………………………………………. 27

5.5. External Validity…………………………………………………………... 295.5.1. Student-level responsiveness……………………………………….. 305.5.2. School-level responsiveness and score reporting…………………… 31 5.5.3. Criterion validity…………………………………………………… 33

6. Conclusion………………………………………………………………………… 35

7. References………………………………………………………………………… 378. Appendices……………………………………………………………………….. 41

A. VOCAL test specification………………………………………………… 41B. MCAS student questionnaires (VOCAL forms)…………………………... 42C. Rasch model and logit unit of measurement ……………………………… 55D. Guide to evaluating Rasch model validity data…………………………… 57E. Technical quality of 70-item VOCAL scale………………………………. 58F. Measure order of 70-item VOCAL scale …………………..…………….. 59G. Item prompts for engagement, safety, and environment domains………... 60H. Person Reliability of VOCAL scale, grade-level scales and sub-scales …. 63I. Subgroup DIF plots ………………………………………………………. 64 J. Residual analyses output…………………………………………………. 68K. Transformation of logit measures………………………………………… 69

3

1. Purpose of this Report

This report offers reliability and validity evidence to support the use of DESE’s Views of

Climate and Learning (VOCAL) school climate survey. DESE sought to develop a school

climate instrument that would (1) differentiate levels of school climate within and between

schools, and (2) provide schools and districts with concrete, actionable information about school

climate in order to support continuous improvement. A positive school environment is associated

with healthier social and emotional well-being, reduced substance abuse, and decreased

behavioral problems of students in school (Thapa, Cohen, Guffey and Higgins-D’Alessandro,

2013), and is positively related to students’ academic success (Berkowitz, Moore, Astor, and

Benbenishty, 2017; Hough, Kalogrides, and Loeb, 2017). This technical report provides

information on the survey development process used to develop three forms of the school climate

survey that measure students’ views in grade 5, grade 8, and grade 10, respectively. The report

makes available the results of the reliability and validity analyses performed to justify the use

and reporting of VOCAL scores to schools and districts.

This report is intended for readers with knowledge of survey development and validation,

psychometrics and educational measurement. Familiarity with Wolfe and Smith’s (2007a,

2007b) and Messick’s (1995) construct validity frameworks for instrument development is

helpful. Evidence from six aspects of survey validity (content, substantive, structural,

generalizability, external, and consequential) combine to support the use of VOCAL scores.

1

2. Survey Design, Survey Administration and Profile of Respondents

Instrument development relied on a three-pronged strategy: (1) defining the school climate

construct, (2) incorporating stakeholder feedback to support item and instrument development;

and (3) using Rasch theory to ideate and guide item development and validity analyses. VOCAL

instrument development and validity activities are summarized in Figure 1.

2.1. School Climate Construct

DESE used the United States Department of Education’s (USED, 2017) conceptual framework

for the school climate construct, with survey items designed to measure student perceptions of

three dimensions of school climate: engagement, safety and environment. Each dimension is

further divided into three domains/topics. For example, the engagement dimension consists of

items measuring cultural and linguistic competence, teacher/adult-on-student relationships and

student-on-student relationships, and participation in school life. The conceptual framework with

construct domain definitions is outlined in Table 1. Items from publicly available school climate

instruments were evaluated for inclusion, with a review of school climate research articles

conducted to help ideate new item development. DESE leveraged work done during the

development of its educator evaluation student feedback surveys (SFS), with several SFS items

adapted for potential inclusion in the school climate surveys.

2

Figure 1: VOCAL scale development process

3

Table 1

VOCAL’s conceptual framework

Dimension Domain (label) Definition

Engagement(ENG)

Cultural and Linguistic Competency (CLC)

The extent students feel the school/staff value diversity, manage dynamics of differences, avoid stereotypes and acquire cultural knowledge.

Relationships (REL)The extent students feel there is a social connection and respect between staff/teachers and students, and between students and their peers.

Participation (PAR) The extent students feel they or their parents are engaged in school events.

Safety(SAF)

Emotional Safety (EMO)

The extent students feel a bond to the school, and students/teachers/adults support the emotional needs of others.

Physical Safety (PSF)The extent that students feel physically safe within the school environment and know how to respond to threats to themselves or the school.

Bullying/Cyber-bullying (BUL)

The extent that students report different types of bullying behaviors that occur in the school and the extent that school/staff/students try to counteract bullying.

Environment(ENV)

Instructional (INS)The extent that students feel the instructional environment is engaging, challenging and supportive of learning.

Mental Health (MEN) The extent that students learn to self-manage their feelings and get support if needed.

Discipline (DIS) The extent that discipline is fair, applied consistently and evenly, and a shared responsibility.

2.2. Survey Design Principles

It was important that the surveys were designed with the rigor expected of cognitive tests. When

developing measures in the Rasch framework, best test design (Wright & Stone, 1979) involves:

Items that are evenly spaced from easiest to hard;

The average item difficulty (usually set to zero) centered at the mean of the target or

student distribution;

4

Survey items that are sufficiently dispersed to cover the target distribution;

Items from different dimensions/domains (topics of school climate) overlapping each

other on the item-person continuum, and

A test of appropriate length to provide the responsiveness required to differentiate

performance.

These psychometric criteria were adopted and used to guide the selection of items for the school

climate survey. However, it is important to stress that the stakeholder engagement and feedback

discussed in the next section was the key driver for item selection.

2.3. Item and Measure Development

DESE developed items using a hierarchical perspective. DESE first identified what behaviors,

practices, or systems create the foundation for a positive school climate; students have more

likelihood of responding affirmatively to these foundational items. DESE then identified

behaviors, practices, or systems that represent exemplary school climates. These

behaviors/practices/systems, by their nature, are more difficult to enact within schools and

students are likely to have greater difficulty responding affirmatively to items designed to

measure them. Once these behaviors/practices/systems were identified, items were developed or

acquired from public available surveys to measure and anchor the two ends of the school climate

continuum. The next step in the item development process was to develop or obtain publicly

available items to fill in the continuum. Therefore, the rating scale (always true to never true)

combined with the hypothesized distribution of item difficulties is designed to stretch the item

calibrations and person distribution along the school climate continuum for each dimension and

provide meaningful differentiation of student perceptions. This process was also used to develop

5

items for each domain and helped ensure that item and measure development conformed to best

practice.

Items for the grade 5 form were simplified to ensure students could read and understand the

content. For example, the item, “Adults working at this school treat all students respectfully,

regardless of students’ race, culture, family background, sex, or sexual orientation” was

administered in grade 8 and grade 10; the corresponding item in grade 5 was, "Adults working at

this school treat all students respectfully". Items were also developed for the specific school

climate context. For example, the item, “At our school, a teacher or other adult is available for

students if they need help because of sexual assault or dating violence” was only administered on

the grade 10 survey. Similarly, items related to cyber-bullying were placed on the grade 8 form

to account for the predominance of this type of bullying in middle-school grades. Once items

were selected or developed, they were reviewed by different stakeholder groups.

2.4. Stakeholder Engagement

Multiple stakeholder groups (agency experts, student advisory councils, principal and teacher

advisory councils, and special interest groups) met to review items. The item review process also

prompted new item development. Three to four times the number of items needed for the final

surveys were developed or selected, and students and other stakeholders were asked to rate them.

The item review process was designed to ensure item representativeness (did the items measure

the concept it was designed to measure?), accessibility (would students understand it?),

actionability (would schools be able to use the information?), and responsiveness (would the

6

items measure a continuum of student perceptions that differentiate strong school climates from

relatively weak ones?). Stakeholders worked in groups to review, revise, and reject items.

To further ensure items placed on the grade 5 form met these inclusion criteria, cognitive

interviews were undertaken with a small, but diverse group of fifth-graders. The purpose of these

interviews was to elicit and probe whether students understood the item content in accordance

with the item developer’s intent. Participants in the cognitive interviews reported that most of the

items were easy to understand. The interviews, however, did result in DESE simplifying the

content and readability of some items. Through a deliberative process, the items that survived the

review process were placed on the three forms of the school climate survey; each grade-level

form was designed to meet the best survey design criteria highlighted previously.

2.5. Form Building

DESE administered three parallel forms of the VOCAL survey in the spring of 2017; the number

of items on each form was:

24 items for grade 5 students,

34 item items for grade 8 students,

29 items for grade 10 students

Each survey measured the breadth of the school climate construct and included common items

that were used to place all student responses onto the same scale metric; common items

represented over 30% of the total number of items on each form. The number and types of items

on each form are shown in Figure 2, with a detailed “test” specification found in Appendix A.

7

Figure 2. Form building for VOCAL surveys

This methodology allowed DESE to try out as many items as possible without over-burdening

respondents; seventy-one items were tested in total. However, post administration, one grade 8

item was removed from consideration and not included in the validity analyses. The goal of this

third criterion was to ensure that the mean and standard deviation of the common items

approximate the mean and standard deviation of the whole set of items used. Common items

should represent the breadth of the school climate construct and approximate the average item

difficulty and variance of all 70 items (Engelhard, 2013). The 70 items had an average item

difficulty of 0.00 logits and a standard deviation of 0.75 logits; the corresponding average item

difficulty for the 9 common items was 0.07, with a standard deviation of 0.99.

A Likert scale with four response options was used to rate students’ perceptions of school

climate; coding for all items dictated that a response of “0” (untrue) indicated the lowest level of

school climate, with a “3” (always true) denoting the most positive school climate. Response

scoring categories “1” and “2” corresponded to mostly untrue and mostly true, respectively.

8

Note, seven bullying-behavior and three physical safety items were reverse-scored. The three

forms as they appeared for students are provided in Appendix B.

2.6. Form Linking and Anchoring Process

Each grade form was first calibrated separately to assess the invariance of the common items.

The Pearson product-moment correlation (henceforth Pearson) of the common item difficulties

was 0.9 or above for each paired comparison. Common item invariance allowed ESE to

concurrently calibrate all 70 items on to the same scale metric. Figure 3 illustrates this process.

Figure 3. Concurrent calibration process of grade 5, grade 8, and grade 10 surveys

Items ------------------------ |||||||||||||||||||||||| |||||||||||||||||||||||| G5 Form |||||||||||||||||||||||| |||||||||||||||||||||||| |||||||||||||||||||||||| |||||||||||||||||||||||| ------------------------ - - - - - - - -------------------------- | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| G8 Form | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| | | | | | | | |||||||||||||||||||||||||| --------------------------------------------------- - - - - - - - - - -------------------- | | | | | | | | | |||||||||||||||||||| | | | | | | | | | |||||||||||||||||||| G10 Form | | | | | | | | | |||||||||||||||||||| | | | | | | | | | |||||||||||||||||||| 1Figure template taken from Linacre (2017)

Figure 4 graphically shows the relationship between the seven items common across the three

grade forms. The item difficulties of the two additional items linking the grade 8 and grade 10

forms were also highly correlated (data not shown).

9

Figure 4. Relationship of common items across grade forms

Anchoring Process. The seven items common to each form were simultaneously calibrated,

along with the 2 additional items common to the grade 8 and grade 10 forms. To reduce

positioning effects, common items were placed in the same fixed position on each of the three

forms. Once the common items were placed in their item slots, the remaining unique items were

randomly assigned positions on each form. This process allowed placement of all 70 pilot items

on to the same scale metric. The ensuing validity analyses (and review of the item-variable map

for the relative difficulty, ordering and spacing of items), revealed that 55 of the 70 items were

well-fitting and could be anchored. Anchoring of polytomous items requires fixing the items’

difficulty parameters, and fixing the item Andrich step threshold parameters of the rating scale

(Linacre, 2017). This anchoring process was applied to the 55 items of the VOCAL scale and

helps ensure the comparability of VOCAL scores when reported out to schools and districts.

10

2.7. Administration of Forms

In grades 5 and 8, the forms were administered as part of the Massachusetts Comprehensive

Administration System (MCAS) Science achievement test. In grade 10, the form was attached to

the mathematics MCAS test. The forms were paper-based and attached to the end of the last test

session of the science or mathematics assessments, respectively. Students marked their responses

in their student answer booklets.

2.8. Profile of Respondents

The sampling frame included students in grades 5, 8 and 10. Students who participated in

MCAS-Alternative were not included in the sampling frame, so a census was not feasible. In

addition, participation in the survey was optional for districts, schools and students. As a result,

73% of fifth graders, 70% of eighth graders, and 64% of tenth graders participated in the

surveys, respectively. Eighty-nine percent of districts and fifty-six percent of schools with

Massachusetts received VOCAL scores, respectively. When comparing the sample of students to

the state student profile, the profile of the sample is reasonably representative of the state at each

grade level. The sample profile with state comparison is provided in Table 2. Students with

disabilities (SWD), English Language Learners (ELL), and economically disadvantaged students

are under-represented in grade 8 and grade 10; black and Hispanic students are also under-

represented in grade 10.

11

Table 2

Participating students’ profile

Subgroup Grade 5 Grade 8 Grade 10 State

Female 49.3 49.7 50.2 48.7

Male 50.7 50.3 49.7 51.3

Asian 6.4 6.0 8.8 6.7

Black 9.0 8.4 6.2 8.9

Hispanic 20.7 17.7 16.2 19.4

Mixed-race 3.4 2.8 2.7 3.4

Native American 0.2 0.2 0.2 0.2

Pacific Islander 0.1 0.1 0.1 0.1

White 60.2 64.7 65.7 61.3

Students with disabilities 17.8 15.8 14.8 17.4

English Language Learners 8.3 6.5 4.9 9.5

Economically disadvantaged 30.8 26.1 25.0 30.2

3. Data Analyses Procedures

3.1. Rasch Methodology

Analyses using the Rasch measurement model (Rasch, 1960) and validity framework (Wolfe &

Smith, 2007a, 2007b) are the primary source of reliability and validity data for the VOCAL

survey measures. The Rasch model, which uses an exponential transformation to place ordinal

Likert responses on to an equal-interval logit scale, was used to analyze student responses.

Winsteps software developed by Linacre (2017) was used to perform Rating Scale model

12

analyses of the data (Andrich, 1978a, 1978b). Technical details explaining the Rasch model are

provided in Appendix C. In the Rasch framework, the scale metric axis represents the desirable

structural properties of a Rasch scale; it is: linear, unidimensional (measures only one construct),

hierarchical (items are ordered according to their difficulty to endorse) and measures a

continuum of items and persons. The evaluation criteria used to perform a Rasch-based

reliability and validity assessment for each construct validity aspect (content, substantive,

generalizability, structural and external) are summarized in the next section.

4. Validity Framework and Validity Evidence

4.1. Validity Framework

Messick’s (1980, 1995) unified concept of construct validity guided the validity analyses for the

school climate construct. Messick (1995, p. 741) defines validity as “an evaluative judgment of

the degree to which empirical evidence and theoretical rationales support the adequacy and

appropriateness of interpretations and actions on the basis of test scores or other modes of

assessment.” Evidence from six aspects of test validity (content, substantive, generalizability,

structural, external and consequential) combine to provide test developers with the justification

to claim that the meaning or interpretability of the test scores is trustworthy and appropriate for

the test’s intended use. More recently, Wolfe and Smith (2007a, 2007b, p. 205) used Messick’s

validity conceptualization to detail instrument development activities and evidence that are

needed to support the use of scores from instruments based on the Rasch measurement

framework. Table 3 outlines the specific validity aspects addressed in this technical report. This

report primarily focuses on internal validity with more limited external validity evidence

provided for the school climate construct. Section 6 explains each aspect of construct validity

outlined in Table 3.

13

Table 3

Rasch-Based Instrument Validity Framework and Evidence Collected for VOCAL survey1

Validity AspectContent Substantive Generalizability

Evidence

Instrument Purpose

Test Specification

Expert Reviews2

Item Technical Quality

Rating Scale Functioning

Item Difficulty Hierarchy

Differential Item Functioning

Person Separation Reliability

Item Invariance

Validity AspectStructural External Consequential2

Rasch Dimensionality Analyses

Responsiveness

Sub-scale correlations

Relationship between VOCAL scaled-scores with scores from similar/dissimilar constructs

Standard SettingScore Use

1 Based on: Messick (1995) and Wolfe and Smith (2007b) conceptualization and representation.2Standard setting was not a focus of this pilot study.

5. Validity Evidence for VOCAL scale and sub-scales

The majority of this report is dedicated to the validity evidence needed to support VOCAL score

use. DESE will present data for five aspects of construct validity: content, substantive,

generalizability, structural and external. Consequential validity was beyond the scope of this

pilot administration. Appendix D provides a guide to the validity criteria used in this study for

each aspect of construct validity.

14

5.1. Content Validity

Content validity examines the “content relevance, representativeness and technical quality”

(Messick, 1995, p.745) of the items used as indicators of the construct. Stakeholder engagement

activities (Figure 1) ensured that the items were relevant and representative and, more

importantly, had the potential to provide schools with diagnostic and actionable information. The

content validity evidence reported here predominantly focuses on the technical quality of the

VOCAL survey items. Item technical quality was assessed using point-to-measure (PTM)

correlations and item fit statistics. The PTM correlations and item fit statistics are shown in

Appendix E. Fit statistics above 1.5 indicate that the items may not measure the construct of

interest; these items have additional source(s) of variance and can degrade measurement

(Appendix D). Twelve of the seventy pilot items had outfit Mean Square Errors (MNSQE) of

greater than 1.5. Only five of the twelve misfitting items, however, had PTMs below 0.3 which

suggests these five items are poorly related to the school climate construct.

In terms of content, nine of the twelve misfitting items were from the safety dimension and seven

of the nine required reverse scoring (six of these, in turn, were related to bullying behaviors).

The bullying-behaviors items were also structured differently from other items within the

surveys, which may have contributed to their separation from the primary dimension (e.g., I have

been threatened by other students more than once on social media.). Conceptually, however, the

presence or absence of bullying-behaviors is integral to accurately measuring students’

perceptions of safety within the school environment and these items were retained for further

validity analyses. The remaining fifty-eight items fit the model well, with outfit MNSQE ranging

from 0.72–1.49 and PTMs ranging from 0.32 to 0.64. Fifty-five of the fifty-eight items were used

to anchor the scale.

15

5.2. Substantive Validity

Substantive validity assesses whether the responses to the items are consistent with the

theoretical framework used to develop the items. Two pieces of evidence support the substantive

validity aspect of construct validity: these are (1) rating scale use (Figure 5) and (2) item

difficulty hierarchy (Figure 6). For each threshold of the rating scale, the mean square error fit

statistics should be between 0.7 and 1.3 and, on a four-point scale, the distance between

thresholds should be at least 0.8 logits (Appendix D; Wolfe & Smith, 2007b). The rating scale

for the 70-item VOCAL functioned relatively well, with adjacent category thresholds of near or

greater than 1.0 logit apart. Except for the little used score category of zero (never true), the

category threshold fit statistics have excellent MNSQE (Figure 5).

A qualitative assessment of how well the item hierarchy corresponds to the instrument

developer’s a priori theoretical expectations provides substantive validity evidence. The overall

item hierarchy across the scale met our a priori expectations in terms of relative difficulty of

each dimension (Figure 6) and in terms of individual items within each dimension. The ordered

pattern of item difficulties also conforms to best test design principles.

Figure 6 shows that items of each dimension span the breadth of the continuum with items

measuring relatively weak school climates to relatively strong school climates. In addition, items

from each dimension overlap as you move from low to high on the continuum. The ordered

pattern of relative item difficulty indicates that safety items (particularly the reverse scored

bullying-behavior and physical safety items) were very easy to endorse compared to items from

the other dimensions. Students feel relatively safe in school, with a comparatively low level of

bullying reported. Safety is a harbinger of a positive school environment and it was expected that

these items would be among the easiest to endorse. Similarly, within the Engagement dimension,

16

items related to student-on-student relationships were, as expected, harder for students to affirm

than items related to teacher-on-student relationships. This is consistent with past research

(Thomas, 2004; Peoples, O’Dwyer, Wang, Brown, & Rosca, 2014).

Figure 5. Rating scale structure for VOCAL instrument

SUMMARY OF CATEGORY STRUCTURE. Model="R"-------------------------------------------------------------------|CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY||LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||THRESHOLD| MEASURE||-------------------+------------+------------++---------+--------|| 0 0 238047 6| -.41 -.72| 1.42 1.91|| NONE A |( -2.65)| 0| 1 1 575328 14| .20 .17| 1.02 1.07|| -1.32A| -.94 | 1| 2 2 1588387 39| 1.09 1.24| .95 .85|| -.44A| .77 | 2| 3 3 1690182 41| 2.29 2.20| 1.08 1.05|| 1.76A|( 2.94)| 3|-------------------+------------+------------++---------+--------||MISSING 6325736 61| 1.51 | || | |-------------------------------------------------------------------

CATEGORY PROBABILITIES: MODES P -+---------+---------+---------+---------+---------+---------+-R 1.0 + +O | |B | |A |0 |B .8 + 000 +I | 00 333|L | 00 33 |I | 00 33 |T .6 + 00 222222 33 +Y | 00 222 2222 33 | .5 + 0 222 22233 +O | 00 1111 22 3322 |F .4 + 111*1 11**1 3 222 + | 111 00 2 11 33 22 |R | 111 0022 111 33 222 |E | 111 2200 11 33 22 |S .2 + 111 22 00 1**3 2+P |11 222 00 333 111 |O | 2222 00*33 1111 |N | 222222 3333333 000000 11111111 |S .0 +*****333333333333333333 0000000000000000********+E -+---------+---------+---------+---------+---------+---------+- -3 -2 -1 0 1 2 3 PERSON [MINUS] ITEM MEASURE

Items belonging to the three domains of the Environment dimension were relatively harder to

endorse, especially items related to student autonomy and students taking responsibility for their

actions. This ordering of environment items was expected; past classroom environment research

17

Figure 6. Item-Variable Map for VOCAL

18

has shown that student autonomy is hard to engender within classrooms and schools, but

important to engaging students (Hafen et al., 2012; Peoples, Abbott, & Flanagan, 2015a, 2015b).

Table 4 below is a specific example of the item hierarchy from the discipline environment

domain. Foundational to a positive discipline environment is the perception that school staff are

fair, supportive and consistent in applying school rules related to discipline. Items such as,

“School rules are fair for all students” that measure this basic environment were, as expected,

easier for students to endorse than items that provide students with a voice in school rules

(e.g., “Student have voice in deciding school rules.”) or holding them responsible for their

actions (e.g., “Teachers give students a chance to explain their behavior when they do something

wrong.”) This ordered pattern of discipline item difficulties confirms developers’ a priori

expectations thereby supporting the domain’s substantive validity.

Table 4.

Item hierarchy of Discipline Environment items

Item code

Grade Administered

Item Prompt Item Difficulty (logits)

DIS_1 5, 8, 10 Students have a voice in deciding school rules 2.20

DIS_7 10 Teachers give students a chance to explain their behavior when they do something wrong

1.38

DIS_5 8 In school, students learn how to control their behavior.

0.80

DIS_6 10 The consequence for inappropriate behavior are enforced fairly

0.46

DIS_2 5 School rules are fair for all students 0.23

DIS_4 8 School staff are consistent when enforcing rules in school

-0.23

DIS_3 5 Adults at my school (for example, my school nurse, my teachers, or my principal) talk with students to help us know how to behave well.

-0.67

19

Appendix F provides the hierarchy for all 70 items; item prompts broken out by dimension are

provided in Appendix G. The well-functioning rating scale combined with the theoretically

grounded 70-item item hierarchy provides the evidence needed to support the substantive

validity aspect of the school climate construct.

5.3. Generalizability

A measure is considered generalizable when the score meaning and properties function similarly

across multiple contexts (e.g., stakeholder groups, forms) or time points. Reliability analyses and

differential item functioning (DIF) analyses are used to assess the generalizability of the

measures. Similar to Cronbach’s alpha, person separation reliability (PSR) looks at the stability

(internal consistency) of the measures across the forms (Schumacker and Smith, 2007) and

scoring structures. The reliability indices depict the ratio of true variance to observed variance; in

the Rasch model, the internal consistency reliability coefficient, the person separation reliability,

measures the ratio of the variance in latent person measures to the estimated person measures.

Standard errors are estimated for each person and each item and are used to provide an estimate

of error variance (Schumacker and Smith, 2007). DESE used DIF analyses to empirically test for

item invariance across several subgroups; item invariance ensures comparability of score

interpretation.

5.3.1. Reliability evidence: The mean difficulty of the 70-item scale was +1.33 logits with a

standard deviation of 1.16 logits (Appendix H). The items are reasonably well targeted for the

student distribution resulting in a real person separation reliability (PSR) of 0.88, and a person

separation index of 2.7 (Figure 6; Appendix H). Best test design principles (Wright, 1979) call

20

for alignment of the mean of the item distribution to the mean of the person distribution. Notable

in Figure 6 is the relative rarity of bullying behaviors when compared to other indicators

assessed; these off-target items likely contributed to the misalignment of the person and item

distributions. However, theoretically, bullying is a critical facet in determining students’

perceptions of the safety and supportive nature of schools; these items were retained for

reporting out scores. The real person separation reliabilities ranged from 0.86 for the 24-item

grade 5 form and the 29-item grade 10 form to 0.88 for the 33-item grade 8 form (Table 5). The

replication of reliabilities across forms provides evidence for the reproducibility and stability of

the school climate construct. Reliabilities above 0.8 are acceptable for the current use of the

surveys (Appendix D), namely to provide schools and districts with formative data to use for

continuous improvement. New items will be tried out in the 2018 VOCAL administration with

the goal of improving the reliability of each grade-level survey.

The bottom of Table 5 shows the reliability of each sub-scale across the three grades. The real

person separation reliability of the engagement, safety, and environment scores was 0.69, 0.68,

and 0.76, respectively. These reliabilities are likely attenuated due to the design of the test forms

(Schwartz, Ayers, and Wilson, 2017). Students across the three grades only responded to a small

sub-set of items for each dimension (the common items). Students from each grade largely

responded to set of unique items thereby creating a large amount of “missing data” when the

three grades’ data were combined to assess the reliability of each dimension. As a result, the true

reliabilities of the dimension scores are underestimated (Schwartz, Ayers, and Wilson, 2017).

School-level reliability. The unit of interest for school climate is not the student, but the

school. In reporting out school climate scores to schools, it is important to ensure that schools

receive reliable data. For a school to receive a report, 10 or more students had to participate in

21

the survey and the reliability of each index score had to be above or equal to 0.7. Figure 7 shows

the distribution of the overall school climate index reliabilities across schools within the sample.

The average reliability of the 1,365 schools in the sample was 0.78 and ranged from 0 to 0.96.

Most schools with reliabilities below 0.7 had only one or very few students respond to the

surveys. Because some schools are kindergarten (K) through grade eight (G8) schools or K–12

schools and contain multiple survey grades, there were over 1,600 potential reports. However,

given DESE’s minimum reporting criteria, 1,345 schools received reports; of these, only 545

were provided a full complement of index scores (an overall school climate index score, an

engagement index score, a safety index score, and an environment index score).

Figure 7. Distribution of overall school-level school climate score reliabilities

22

5.3.2. Differential Item Functioning (DIF) Analyses: To support the claim that the school

climate instrument is generalizable, the items should have the same meaning for different

subgroups of respondents (e.g., gender, ELL) i.e., respondents of the same ability (endorsement

level), should have the same probability of affirming an item irrespective of the subgroup they

belong to. The item deltas did not differ significantly (over 90% of items differed by less than

0.3 logits) across the following subgroups: gender, homelessness, and economically

disadvantaged. Two items exhibited DIF when comparing students with disabilities to students

without disabilities (CLC4 and PSF5, both administered in G10).

DIF was most present for English language learners with seven items having DIF of greater than

1 logit. Six of these seven items (BUL10, BUL11, EMO6, PSF5, DIS7, INS70) were on the

grade 10 form. A further five items exhibited mild to moderate DIF (0.5 – 0.67) between ELL

and non-ELL students; these items (PAR1, PSF2 BUL7, DIS1, CLC4) cut across grades.

DESE’s surveys were not translated for English learners, so the DIF evident most likely resulted

from language difficulties in reading the items administered. Six of the seven items that

displayed severe DIF across ELL groups also exhibited DIF across some race groups (BUL6,

BUL7, BULL11, PSF2, CLC4, EMO6); only two of these items, BUL6 and BUL11 displayed

severe DIF. The remaining items exhibited mild to moderate DIF. Language barriers likely

explain the DIF present across certain race/ethnicity subgroups with students unable to properly

access the survey content. Figure 8 and Figure 9 show DIF plots for gender and race status,

respectively. DIF plots for the remaining subgroup comparisons are found in Appendix I. Note,

when estimating ELL subgroup school climate scores, the six items with moderate to severe DIF

were not included in the calibration. Similarly, when estimating race subgroup school climate

23

Figure 8. Differential item function plot by gender

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

ENGC

LC1

ENGC

LC2

ENGC

LC3

ENGC

LC4

ENGP

AR1

ENGP

AR2

ENGP

AR3

ENGR

EL1

ENGR

EL2

ENGR

EL3

ENGR

EL4

ENGR

EL5

ENGR

EL6

ENGR

EL7

ENGR

EL8

ENGR

EL9

ENGR

EL10

ENGR

EL11

ENGR

EL12

ENGR

EL13

ENGR

EL14

SAFB

UL1

SAFB

UL2

SAFB

UL3

SAFB

UL4

SAFB

UL5

SAFB

UL6

SAFB

UL7

SAFB

UL9

SAFB

UL10

SAFB

UL11

SAFE

MO1

SAFE

MO2

SAFE

MO3

SAFE

MO4

SAFE

MO5

SAFE

MO6

SAFE

MO7

SAFP

SF1

SAFP

SF2

SAFP

SF3

SAFP

SF4

SAFP

SF5

SAFP

SF6

ENVD

IS1EN

VDIS2

ENVD

IS3EN

VDIS4

ENVD

IS5EN

VDIS6

ENVD

IS7EN

VINS

1EN

VINS

2EN

VINS

3EN

VINS

4EN

VINS

5EN

VINS

6EN

VINS

7EN

VINS

8EN

VINS

9EN

VINS

10EN

VINS

11EN

VINS

12EN

VINS

13EN

VMEN

1EN

VMEN

2EN

VMEN

3EN

VMEN

4EN

VMEN

5

DIF

Mea

sure

(diff

.)

ITEM

DIF plot (DIF=@GENDER)Female Male

24

Figure 9. Differential item function plot by race/ethnicity

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

ENGC

LC1

ENGC

LC2

ENGC

LC3

ENGC

LC4

ENGP

AR1

ENGP

AR2

ENGP

AR3

ENGR

EL1

ENGR

EL2

ENGR

EL3

ENGR

EL4

ENGR

EL5

ENGR

EL6

ENGR

EL7

ENGR

EL8

ENGR

EL9

ENGR

EL10

ENGR

EL11

ENGR

EL12

ENGR

EL13

ENGR

EL14

SAFB

UL1

SAFB

UL2

SAFB

UL3

SAFB

UL4

SAFB

UL5

SAFB

UL6

SAFB

UL7

SAFB

UL9

SAFB

UL10

SAFB

UL11

SAFE

MO1

SAFE

MO2

SAFE

MO3

SAFE

MO4

SAFE

MO5

SAFE

MO6

SAFE

MO7

SAFP

SF1

SAFP

SF2

SAFP

SF3

SAFP

SF4

SAFP

SF5

SAFP

SF6

ENVD

IS1EN

VDIS2

ENVD

IS3EN

VDIS4

ENVD

IS5EN

VDIS6

ENVD

IS7EN

VINS

1EN

VINS

2EN

VINS

3EN

VINS

4EN

VINS

5EN

VINS

6EN

VINS

7EN

VINS

8EN

VINS

9EN

VINS

10EN

VINS

11EN

VINS

12EN

VINS

13EN

VMEN

1EN

VMEN

2EN

VMEN

3EN

VMEN

4EN

VMEN

5

DIF

Mea

sure

(diff

.)

ITEM

DIF plot (DIF=@RACE)Asian African American Hispanic White

25

scores, the items with moderate to severe DIF were removed from the calibration. Because no

DIF was apparent in other subgroup comparisons (gender, economically disadvantaged,

homelessness, student with disabilities), these items were retained when reporting out these sub-

group scores. DIF items will be revised or removed for the 2018 VOCAL surveys.

5.4. Structural Validity

Structural validity evaluates the alignment of the scoring structure to the hypothesized structure

of the construct. The fundamental assumption of the Rasch model is that it is used to measure

one latent construct (in this study, the school climate construct). If the data meet this

requirement, the measures are linear, invariant and additive; equal differences on the scale

translate into equal differences in the probability of endorsing an item no matter where on the

scale an item is located. In this validity study, the unidimensionality of the data was assessed by

conducting (1) an analysis of the standardized residuals, (2) correlational analyses of the freely

calibrated dimensions, and (3) an assessment of additional dimensionality data provided by the

Rasch Winsteps software (Linacre, 2017).

5.4.1. Residuals Analyses. If the data fit the model and the variance in responses is explained

by one latent trait (school climate construct), the unexplained or residual variance should be

random (i.e., there is no relationship among the residuals). Results from a principal component

analyses of the residuals (Smith, 2002) using Linacre’s criteria (2017; Appendix D) for

unidimensionality found that the variance explained by the 70-item measure was 40.3% with no

substantial second dimension evident (Table 6). The first contrast’s residual variance was less

26

than 5% of the total item variance. The variance explained by the items of the first dimension

(school construct) was 8 times the variance explained by the first contrast (residual), twice the

criterion of 4 times recommended by Linacre (2017). Two of the three items in the first contrast

were related.to social media and cyber-bullying (BUL5 and BUL7); the loadings on the first

contrast were 0.5 or less. Cyber-bullying is a central facet of feeling safe in school making it

important to measure and the items were retained. The residual analyses results are shown in

Table 6 and Appendix J. These results were replicated when each grade-level form was analyzed

separately (data not shown).

Table 6

Residual analyses of VOCAL data (Grades 5, 8, and 10 combined)

Variance Component Eigenvalue Observed (%)

Total raw variance in observations 117.2 100.0

Raw variance explained by measures 47.2 40.3

Raw variance explained by persons 26.5 22.6

Raw Variance explained by items 20.7 17.7

Unexplained variance in 1st contrast 2.6 2.2

Item variance to 1st contrast multiple 8x

5.4.2. Sub-scale correlations. DESE evaluated the Pearson correlation between subscale scores

for the three freely calibrated dimensions of school climate (engagement, safety, and

environment). The correlations should be positive and of sufficient magnitude (greater than 0.5

but less than 0.9) to indicate that the three sub-scales are measuring distinct but related

dimensions of the school climate construct.

27

Table 7 illustrates that subscale correlations range from 0.66 (safety and environment) to 0.74

(engagement and environment). This magnitude and pattern of correlations was also evident

when examined for each grade separately (data not shown). The lowest correlation (0.58) was

between safety and environment scores in grade 10, with the highest correlation (0.76) between

engagement and safety scores in grade 8. The overarching unifying construct of school climate

explains the moderate-to-strong relationship between the three dimensions highlighted in Table

7. After accounting for measurement error, the sub-scale correlations are close to one.

Table 7

Pearson correlation between three dimensions of the school climate construct

Scale

Overall(N = 148,824)

Engagement(N = 148,338)

Safety(N = 148,380)

Environment(N = 148,364)

Overall 1 --- --- ---

Engagement 0.89 1 0.99 0.99

Safety 0.87 0.70 1 0.89

Environment 0.91 0.74 0.66 1

1Pearson correlations observed are shown below the diagonal; Disattenuated correlations are shown above the diagonal.

Overall, the evidence from residual analyses and the subscale correlational analyses supports the

structural validity aspect of the school climate construct. The one dimension extracted by the

Rasch model meets the unidimensionality assumption of the Rasch model thereby supporting the

use of scores for the intended purpose. The signal-to-noise ratio of the subscales and, more

importantly, the theoretical conceptual framework supports the reporting of subscale scores.

Replication of the results of the residual analyses and subscale correlations across the three

28

grade-level forms provides further evidence supporting the internal structure of the school

climate construct, that is, it is made up of three dimensions (engagement, safety and

environment) whose relationship to each other is explained by the overarching school climate

construct.

5.5. External Validity

This aspect of construct validity relates to the responsiveness of an instrument and the

relationship of its scores to the scores of external measures (criterion validity). The

responsiveness of an instrument refers to “the degree to which an instrument is capable of

detecting changes in person measures following an intervention that is assumed to impact the

target construct” (Wolfe & Smith, 2007b, p. 222). If an instrument is responsive, it can be

applied appropriately to measure expected group differences or individual/group change. The

first section (6.5.1) examines the instrument’s responsiveness at the student-level; the second

section (6.5.2) assesses responsiveness at the school-level and its impact on reportable scores.

Criterion validity is the strongest form of external validity; it determines how well scores from

an instrument predict future scores on a criterion measure (e.g., how well do school climate

scores predict achievement). There are two forms of criterion validity, namely concurrent and

predictive. This section reports data to support the concurrent criterion validity of the VOCAL

survey scores. Because the unit of interest is the school, the external validity analyses focus on

examining the relationship between school-level aggregate school climate scores and school-

level aggregate scores of the following criterion: student achievement, attendance, chronic

29

absence, discipline rates, suspension rates, and retention rates. Concurrent criterion validity is

discussed in section 6.5.3.

5.5.1. Student-level Responsiveness. The responsiveness of an instrument is measured by the

person strata index, H, which provides the number of statistically distinct ability or endorsement

groups whose centers of score distributions are separated by at least three standard errors of

measurement within the sample. According to the formula, H = (4G +1)/3, determined by Wright

and Masters (2002, p. 888) and with a person separation index, (PSI; G) of 2.7 (Table 5), the

number of person strata for the 70-item VOCAL instrument is equivalent to almost 3.9 distinct

person strata. The number of person strata ranged from 3.6 in grade 5 to 4.0 in grade 8. These

results provide evidence that the VOCAL instrument produces reliable, reproducible measures

which are responsive (the instrument can divide the sample into three to four statistically distinct

score groups).

5.5.2. School-level Responsiveness and Score Reporting. The greater the number of person

strata at the individual-level, the more likely the instrument will be able to meaningfully

differentiate schools. At the school-level, the average scaled score was 1.32 logits with a

standard deviation of 0.85 logits (Table 8). After removing schools whose data did not meet our

minimum reporting requirements (N of 10 and school-level reliability of at least 0.7), reportable

school measures ranged from -0.03 logits to 3.02 logits indicating variability in school-level

scores. The relative high degree of responsiveness of the instrument at the student-level appears

to pick up the variation within and between schools.

30

Table 8

Variability of reportable school-level VOCAL scores

Number of StudentsPerson Separation Reliability (PSR)1 Mean ±SD2

Weakest school 1 11 0.83 -0.03 ± 0.64Weakest school 2 21 0.95 0.12 ± 1.38Weakest school 3 98 0.92 0.16 ± 0.94Average school -- ----- 1.32 ± 0.85Strongest school 3 40 0.76 2.96 ± 1.38Strongest school 2 36 0.77 3.00 ± 1.34Strongest school 1 50 0.79 3.02 ± 1.46

1A PSR of 0.7, and an N of 10 or more students was set as the minimum reporting requirements 2SD: Standard Deviation;

Score reporting. Logit scores are confusing to educators so DESE linearly transformed

them to make them more interpretable. The logit measures were standardized and transformed at

the student level to have a mean of 50 and a standard deviation of 20 (see Appendix K for details

on how scores were transformed). The individual scores were aggregated up to the school level;

aggregate school-level scores were then truncated and placed on a scale of 1 – 99 (± 2.5 standard

deviations) with a mean of 50.05 and a standard deviation of 12.8. To help schools interpret their

data, schools were separated into three “performance” levels: schools with relatively weak

school climates had scores that fell 1 or more standard deviations below the mean; schools with

relatively strong school climates had scores that fell 1 or more standard deviations above the

mean. Based on the medium student within these three “performance” groups, a profile or picture

of the school climate was constructed using the item threshold file in Winsteps (Table 9).

Twenty-two percent of the schools with reportable data fell either within the top or bottom

“performance” level with the vast majority of schools with reliable data falling within the

Table 9

31

Massachusetts School Climate Profile

StrongerSchools whose average index score is greater than or equal to one standard deviation above the mean (≥63 points; ~12% of schools).

The average student within these schools responds “always true” to a majority of items and “mostly true” to the remaining items.

1. Student-on-student interactions are mostly respectful, caring, and collaborative within the classroom. Students have a say in school rules and perceive school rules as fair and consistently enforced.

2. Adults actively address safety issues. Students feel safe with few, if any, bullying behaviors reported.

3. Teacher/adult-on-student relationships are respectful, caring, and inclusive. For the most part, adults encourage student autonomy and feedback. Adults/teachers promote responsibility and teach positive behaviors.

4. Support systems are accessible, and teachers/adults actively engage with students to help them emotionally. Most students feel comfortable seeking help.

5. The classroom is a safe and supportive learning environment. Teachers encourage effort, and set high academic expectations. Teachers actively promote and support individual students’ academic success.

6. Students report a strong sense of belonging to the school.AverageSchools whose average index score is between one standard deviation below and one standard deviation above the mean (38 to 62 points; ~78% of all schools).

The average student within these schools responds “mostly true” to a majority of items and “always true” to the remaining items.

1. Student-on-student interactions are mostly respectful and caring, and generally collaborative within the classroom. Students have little say in school rules but perceive school rules as mostly fair and consistently enforced.

2. Adults actively address safety issues. Students feel safe, with few bullying behaviors reported.

3. Teacher/adult-on-student relationships are caring, and mostly respectful and inclusive. For the most part, adults encourage student autonomy and feedback. Adults/teachers promote responsibility and teach respectful behavior. To a lesser degree, adults teach students behavior management.

4. Support systems are available, and teachers/adults engage with students to help them emotionally. However, not all students feel comfortable seeking help.

5. The classroom is a relatively safe and supportive learning environment. Teachers encourage effort, and set high academic expectations. Teachers promote and support individual students’ academic success.

6. Students report a moderately strong sense of belonging to the school.WeakerSchools whose average index score is equal to or less than one standard deviation below the mean (≤37 points; ~10% of schools)

The average student within these schools responds “mostly untrue” and “mostly true” to all but two items, where the average student responded “untrue”.

1. Student-on-student interactions generally lack respect, with students offering limited mutual emotional or academic support. Students have no say in school rules and perceive school rules as relatively unfair and somewhat inconsistently enforced.

2. Adults address safety issues. Students feel safe though some bullying behaviors are reported.

3. Teacher/adult-on-student relationships are somewhat caring, respectful and inclusive. For the most part, adults do not encourage student autonomy or feedback. Promotion of student responsibility and teaching of positive behaviors is relatively low.

4. Support systems are available, but adults do not, for the most part, help students emotionally. Students generally do not feel comfortable seeking help.

5. The classroom is a somewhat safe and supportive learning environment. Teachers mostly encourage effort and have high expectations. Teachers generally encourage and support individual students’ academic success.

6. Students’ report a relatively weak sense of belonging to the school.

32

“average” category. The VOCAL survey meaningfully differentiated schools both quantitatively

and qualitatively. This profile tool was designed to help schools assess their climates. For

schools that fall within the “weaker” category, the profile provides them with a path and the

information needed to improve. For example, students in schools with relatively weak school

climates report that students are not respectful or caring; in contrast students within schools with

relatively strong school climates report that student-on-student relationships are respectful,

caring and collaborative.

Concurrent Validity. Preliminary evidence of concurrent validity at the school level indicates a

positive relationship between students’ VOCAL scaled scores and their Massachusetts

Comprehensive Assessment (MCAS) English Language Arts (ELA) and mathematics

achievement (Table 10) with Pearson correlations of 0.41 and 0.42, respectively. VOCAL scores

were also positively related to students’ growth scores in ELA (0.17) and mathematics (0.26),

although correlations were of a smaller magnitude when compared to static achievement scores.

VOCAL scores are also associated with other school-level indicators, namely, attendance rate

(0.25), chronic absence rate (-0.34), disciplinary rate (-0.48), in-school suspension rate (-0.29),

out-of-school suspension rate (-0.34), retention rate (-0.19), and graduation (0.17) and dropout

rates (-0.26); these data are summarized in Table 11.

The relationship with achievement and other indicators is similar in magnitude to what has been

reported previously for non-cognitive indicators (Peoples, 2016; Hough, Kalogrides, and Loeb,

2017; Peoples, Flanagan, and Foster, 2017), and are in the expected direction for all indicators.

These patterns of associations was replicated across the three grades providing initial evidence of

external validity (Table 10 and Table 11).

33

Table 10

Correlations of 2017 achievement scores and overall VOCAL scores, by school level1,2

Grade All Schools(N = 1,137)

Grade 5 (N = 667)9

Grade 8 (N = 394)10

Grade 10 (N = 294)11

English Language Arts scaled score 0.41 0.42 0.26 0.15

English Language Arts student growth percentile

0.17 0.23 0.08 0.13

Mathematics scaled score 0.42 0.45 0.25 0.16

Mathematics student growth percentile 0.26 0.24 0.18 0.12

1Data based on schools with >= to 10 students contributing to both the aggregate achievement and aggregate VOCAL score, and had a school-level VOCAL reliability of 0.7; 2Grade 5 and grade 8 MCAS tests reflect DESE’s new generation assessments; the grade 10 test is based on the old legacy tests; 9All grade 5 correlations are statistically significant (p< 0.01); 10All grade 8 correlations are statistically significant (p < 0.01) with the exception of eSGP; 11All grade 10 correlations are statistically significant (p < 0.05) with the exception of eSGP and mSGP.

Overall the external validity evidence supports the conclusion that the school climate survey is

responsive (at the individual-level and school level) and should be able to measure change on the

variable. Although the pattern of correlations provide preliminary evidence to support VOCAL’s

external validity, the correlational cross-sectional data do not support the interpretation that more

positive school climates lead to (cause) improved student achievement. In addition, these simple

correlations do not account for the nested nature of educational data with students nested within

schools, which are, in turn, nested within districts. Future validity work will focus on providing

external validity evidence using hierarchical linear models that take into account the nested

structure of education data.

34

Table 11

Correlations of 2017 school-level indicators and VOCAL scores, by school level1

GradeAll Schools(N = 1,137)

Grade 5 (N = 667) 10

Grade 8 (N = 394) 11

Grade 10 (N = 294)12

Attendance rate2 0.25 0.31 0.23 0.13

Chronically absent3

(10% or more) -0.34 -0.33 -0.26 -0.17

Discipline rate4 -0.48 -0.38 -0.35 -0.32

In-school suspension (ISS) 5 -0.29 -0.12 -0.16 -0.19

Out-school suspension (OSS) 6 -0.34 -0.33 -0.37 -0.35

Retention rate7 -0.19 -0.21 -0.13 -0.04

Graduation rate8 NA NA NA 0.17

Drop-out rate9 NA NA NA -0.261Data based on schools with greater than or equal to 10 students contributing to both the aggregate achievement and aggregate VOCAL score, and had a school-level reliability of 0.7 for VOCAL scores; 2Attendance rate: Indicates the average percentage of days in attendance for students enrolled in grades PK–12; 3Chronically absent (10% or more): The percentage of students who were absent 10% or more of their total number of student days of membership in a school. 4Discipline rate: the number of disciplinary incidents divided by school enrollment; 5In-School Suspension Rate: The percentage of enrolled students in grades 1–SP who received one or more in-school suspensions. 6Out-of-School Suspension Rate: The percentage of enrolled students in grades 1–SP who received one or more out-of-school suspensions; 7Retention Rate: The percentage of enrolled students in grades 1–12 who were repeating the grade in which they were enrolled the previous year; 8Graduation rate: The percentage of students who enroll in high school and graduate within 4 years, N = 268; 9Drop-out rate: The percentage of students in grades 9-12 who dropped out of school between July 1 and June 30 prior to the listed year and who did not return to school by the following October 1, N = 268. ; 10All grade 5 correlations are statistically significant (p< 0.01) with the exception of ISS; 11All grade 8 correlations are statistically significant (p< 0.01); 12All grade 10 correlations are statistically significant (p< 0.05), with the exception of attendance rate and retention rate.

Conclusion

The purpose of this research was to use Rasch theory and its validity framework to develop and

pilot an instrument for measuring students’ perceptions of school climate on a large scale. The

psychometric properties of the VOCAL instrument, for the most part, met the assumptions of the

Rasch-model, namely the items are well-fitting, invariant; and form a unidimensional scale. Most

35

importantly, the scale proved reasonably reliable and responsive. With forthcoming

improvements to the instrument (revising behavioral bullying items, increasing the number of

items in each dimension, expanding construct representation), the VOCAL measure shows

promise in providing schools with reliable data, which they can use for continuous improvement

purposes.

36

References

Andrich, D. (1978a). Application of psychometric rating model to ordered categories which are

scored with successive integers. Applied Psychological Measurement, 2 (4), 581-594.

Andrich, D. (1978b). Rating formulation for ordered response categories. Psychometrika, 43 (4),

561-573.

Boone, W. J., and Scantlebury, K. (2006). The role of Rasch analysis when conducting science

education research utilizing multiple-choice tests. Science Education, 90, 253-269.

Boone, W. J., Townsend, J. S., and Staver, J. (2011). Using Rasch theory to guide the practice of

survey development and survey data analysis in science education and to inform science

reform efforts: An exemplar utilizing STEBI self-efficacy data. Science Education, 95,

258-280.

Boone, W. J., Staver, J. R., and Yale, M. S. (2014). Rasch analysis in the human sciences, New

York: Springer.

Berkowitz, R., Moore, H., Astor, R.A., & Benbenishty, R. (2017). A research synthesis of the

associations between socioeconomic background, inequality, school climate an academic

achievement. Review of Educational Research, 87 (2), 425–469.

Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American

Psychologist 32 (7), 513–531.

Engelhard, G. (2013). Invariant measurement: Using Rasch models in the social, behavioral and

health sciences. Routledge Taylor & Francis Group, New York, New York.

Gable, R.K., Ludlow, L.H. and Wolf, M.B. (1990). The Use of Classical and Rasch Latent Trait

Models to Enhance the Validity of Affective Measures. Educational and Psychological

Measurement, 50 (4), 869-878.

37

Hambleton, R. K. & Jones, R. W. (1993). Comparison of classical test theory and item response

theory and their applications to test development. Educational Measurement: Issues and

Practice, Fall, 38-47.

Hafen, C.A., Allen, J. P., Mikami, A. Y., Gregory, A., Hamre, B. & Pianta, R. C. (2012). The

pivotal role of adolescent autonomy in secondary school classrooms. Journal of Youth

Adolescence, 41 (3), 245-255.

Hough, H., Kalogrides, D., & Loeb, S. (2017). Using student surveys of students’ social and

emotional learning and school climate for accountability and continuous improvement.

Policy Analysis for California Education, downloaded from http:/edpolicyinca.org.

Linacre, J. M. (2017). A user’s guide to Winsteps, Ministep Rasch-model computer programs:

program manual 4.0.0, Chicago, US: MESA Press.

Ludlow, L. H. & Haley, S. M. (1995). Rasch model logits: Interpretation, use and

transformation. Educational and Psychological Measurement, 55 (6), 967-975.

Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012–

1027.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’

responses and performances as scientific inquiry into score meaning. American

Psychologist, 50 (9), 741–749.

Peoples, S. M., O’Dwyer, L. M., Wang, Y., Brown, J. & Rosca, C. V. (2014) Development and

Application of the Elementary School Science Classroom Environment Scale (ESSCES):

Measuring Student Perceptions of Constructivism within the Science Classroom,

Learning Environments Research Journal, 17, (1), 49-73.

Peoples, S.M., Abbott, C., and Flanagan, K. (2015a). Developing student feedback surveys for

educator evaluation: Combining stakeholder engagement and psychometric analyses in their

38

development. Paper presented to the April, 2015 annual meeting of the American Educational

Research Association, Chicago, IL, US.

Peoples, S.M., Abbott, C., and Flanagan, K. (2015b). Developing student feedback surveys for

educator evaluation: Validating student feedback surveys for educator evaluation using

Rasch survey development tools and the Rasch construct validity framework. Paper presented

at the April, 2015 annual meeting of the American Educational Research Association,

Chicago, IL, US.

Peoples, S. (2016). College and Career Readiness Mathematical Practice Scale CCRMS:

Assessing middle and high school students’ mathematics self-efficacy. Paper presented at

American Educational Research Association Conference, Washington, DC, 2016, District

of Columbia.

Peoples, S., Flanagan, K., & Foster, B. (2017). Measuring students’ college and career

readiness in English Language Arts using a Rasch-based self-efficacy scale. Paper

presented at American Educational Research Association Conference, San Antonio,

Texas, 2017.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen:

Danish Institute for Educational Research. (Expanded edition, 1980. Chicago: University

of Chicago Press).

Smith, E. V. Jr. (2000). Metric Development and Score Reporting in Rasch Measurement.

Journal of Applied Measurement, 1(3), 303-326.

Smith, E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit

statistics and principal component analysis of residuals. Journal of Applied

Measurement, 3, 205-231.

39

Schumacker, R. E. & Smith, E. V. (2007). Reliability: A Rasch perspective. Educational and

Psychological Measurement, 67 (3), 394-409.

Schwartz, R., Ayers, E., & Wilson, M. (2017). Mapping a data modeling and statistical reasoning

learning progression using unidimensional and multidimensional item response models.

Journal of Applied Measurement, 18(3), 268–298.

Sinnema, C. E. L. and Ludlow, L. H. (2013). A Rasch approach to the measurement of

responsive curriculum practice in the context of curricula reform. The International

Journal of Educational and Psychological Assessment, 12 (2), 33-55.

Thapa, A., Cohen, J., Guffey, S., & Higgins-D’Alessandro, A. (2013). A review of school

climate research, Review of Educational Research, 83 (3), 357–385.

Thomas, G. P. (2004). Dimensionality and construct validity of an instrument designed to

measure the metacognitive orientation of science classroom learning environments.

Journal of Applied Measurement, 5(4), 367-384.

United States Department of Education. (2017). National Center on Safe Supportive Learning

Environments, ED School Climate Surveys (EDSCLS),

https://safesupportivelearning.ed.gov/edscls/measures

Wolfe, E. W., & Smith, E. V. Jr. (2007a). Instrument development tools and activities for

measure validation using Rasch models: Part I – Instrument development tools. Journal

of Applied Measurement, 8 (1), 97–123.

Wolfe, E. W. & Smith Jr., E. V. (2007b). Instrument development tools and activities for

measure validation using Rasch models: Part II – Validation activities. Journal of

Applied Measurement, 8 (2), 204–234.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press

40

Wright B. D., and Masters, G. N. (2002). Number of Person or Item Strata. Rasch Measurement

Transactions, 16 (3), 888.

41

APPENDICES

Appendix A: VOCAL 2017 Test Specification1

Dimension Domain/Topic G5 Items G8 Items G10 Items Total

Engagement (ENG)

Cultural and Linguistic Competence (CLC) 2 2 2 4

Relationships (REL) 4 7 6 14

School Participation (PAR) 1 1 1 3

Subtotal 7 10 9 21

Safety (SAF)

Emotional Safety (EMO) 3 3 3 7

Physical Safety (PSAF) 2 2 2 6

Bullying/cyber-bullying (BUL) 3 6 4 11

Subtotal 8 11 9 24

Environment (ENV)

Instructional Environment (INS) 4 8 6 13

Mental Health (MEN) 2 2 2 6

Discipline (DIS) 3 3 3 7

Subtotal 9 13 11 26

TOTAL 24 34 29 71 1Common items that appear on each grade-level survey are only counted once in total column.

42

Appendix B1: Student MCAS Questionnaire - Grade 5 VOCAL form

Spring 2017STUDENT QUESTIONNAIRE

Grade 5DIRECTIONS Mark your answers to the following questions in the box labeled Student Questionnaire on the inside back cover of your Student Answer Booklet. Please ask your test administrator for help if you are not sure where or how to mark your answers to these questions.This questionnaire asks about what it’s like to be a student in your school. There are no right or wrong answers. Your teachers and principal will not see your answers; your answers will be combined with those of your classmates. Your school will use these combined answers to better understand what school life is like for students. When you read each statement, think about the last 30 days in your school. Please answer honestly so your school knows how you really feel about the school. PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Always true

Mostlytrue

Mostly untrue

Nevertrue

1. Teachers support (help) students who come to class upset. A B C D

2. School rules are fair for all students. A B C D

3. I am happy to be at our school. A B C D

4. My teachers care about me as a person. A B C D

5. In school, I learn how to control my feelings when I am angry or upset. A B C D

43

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Nevertrue

6.Teachers at this school accept me for who I am.

A B C D

7.I get the chance to take part in school events (for example, science fairs, art, or music shows).

A B C D

8. Students respect one another. A B C D

9.Teachers don’t let students pick on other students in class or in the hallways.

A B C D

PLEASE PROCEED TO THE NEXT PAGE

PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Alwaystrue

Mostly true

Mostly untrue

Nevertrue

10. My teachers are proud of me when I work hard in school. A B C D

11.At our school, students learn to care about other students’ feelings.

A B C D

12.If I heard about a threat to our school or to my classmates, I would report it to an adult.

A B C D

13. I feel safe at our school. A B C D

14.Adults working at this school treat all students respectfully.

A B C D

15. Students help each other learn without having to be

A B C D

44

asked by the teacher.

Think of the last 30 days in school.

Alwaystrue

Mostly true

Mostly untrue

Nevertrue

16.I feel comfortable talking to my teachers about something that is bothering me.

A B C D

17.

If I tell a teacher or other adult at school that someone is being bullied, the teacher/adult will do something to help.

A B C D

18.My teachers help me succeed with my schoolwork when I need help.

A B C D

19. Students have a voice in deciding school rules. A B C D

20.Students will help other students, even if they are not close friends.

A B C D

21. My teachers use my ideas to help my classmates learn. A B C D

22.

Adults at my school (for example, my school nurse, my teachers, or my principal) talk with students to help us know how to behave well.

A B C D

PLEASE PROCEED TO THE NEXT PAGE

45

PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

23.I have been punched or shoved by other students more than once in the school or on the playground.

A B C D

24. Students at our school get along well with each other. A B C D

Thank you for sharing your experiences and opinions through this student questionnaire. The information you provided can help inform your school’s efforts to create safe and supportive learning environments for all students. If you would like to speak with someone about the topics on this questionnaire, we encourage you to reach out to a family member and/or guidance counselor, teacher, principal, or other adult in the school.

46

Appendix B2: Student MCAS Questionnaire - Grade 8 VOCAL form

Spring 2017STUDENT QUESTIONNAIRE

Grade 8DIRECTIONSMark your answers to the following questions in the box labeled Student Questionnaire on the inside back cover of your Student Answer Booklet. If you do not see one best answer for a question, leave that question blank in your answer booklet and go to the next question. Please ask your test administrator for help if you are not sure how to answer any of these questions.

1. How does using a computer compare with working by hand when you are completing school assignments such as reports or essays?A. It is a lot easier to write on a computer than by hand.B. It is somewhat easier to write on a computer than by hand.C. It doesn’t make any difference whether I write on a computer or by hand.D. It is somewhat harder to write on a computer than by hand.E. It is a lot harder to write on a computer than by hand.

2. What types of tests have you taken on a computer? Choose all that apply.A. multiple-choiceB. essayC. combination of multiple-choice questions and written responsesD. I have never taken a test on a computer.E. I don’t know.

3. In general, how much time do you spend on homework each week?A. less than 3 hours each weekB. about 3 to 6 hours each weekC. about 7 to 9 hours each weekD. about 10 to 12 hours each weekE. about 13 to 15 hours each week

47

F. more than 15 hours each week PLEASE PROCEED TO THE NEXT PAGE

The next set of questions asks what it’s like to be a student in your school. There are no right or wrong answers. Your teachers and principal will not see your individual answers; your answers will be combined with those of your classmates. Your school will use these combined answers to better understand what school life is like for students. When you read each statement, think about the last 30 days in your school. Please answer honestly so your school knows how you really feel about the school.

PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

4. Teachers support students who come to class upset. A B C D

5. My schoolwork is appropriately challenging. A B C D

6. School staff are consistent when enforcing rules in school. A B C D

7.I have a choice in how I show my learning (e.g., write a paper; prepare a presentation; make a video).

A B C D

8. Teachers are available when I need to talk with them. A B C D

9. My teachers inspire confidence in my ability to do well in class. A B C D

10. Students at this school try to stop bullying when they see it happening. A B C D

11. Students respect one another. A B C D

12. My teachers care about my academic success. A B C D

13. My teachers are proud of me when I work hard in school. A B C D

14. I have seen students with weapons at our school. A B C D

15. I am not scared to make mistakes in A B C D

48

my teachers’ classes.

PLEASE PROCEED TO THE NEXT PAGE

PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

16.I have been teased or picked on more than once because of my religion.

A B C D

17.

Adults working at this school treat all students respectfully, regardless of a student’s race, culture, family background, sex, or sexual orientation.

A B C D

18.Students help each other learn without having to be asked by the teacher.

A B C D

19.If I need help with my emotions (feelings), help is available at our school.

A B C D

20.If I tell a teacher or other adult that someone is being bullied, the teacher/adult will do something to help.

A B C D

21.I have been teased or picked on more than once because of my physical or mental disability.

A B C D

22. Students have a voice in deciding school rules. A B C D

23.If I am absent from school, a teacher or other adult will notice that I was not in class.

A B C D

24.I feel comfortable reaching out to teachers/counselors for emotional support if I need it.

A B C D

25. My teachers set high expectations A B C D49

for my work.

26.Students at this school damage and/or steal other students’ property.

A B C D

27.Teachers encourage students to respect different points of view when expressed in class.

A B C D

28.My parents/guardians feel respected when they participate at our school (e.g., at open houses or conferences with teachers).

A B C D

29.Teachers and adults are interested in my well-being beyond just my class work.

A B C D

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

30.

My textbooks or class materials include people and examples that reflect my race, cultural background, and/or identity.

A B C D

31.

Students have spread rumors or lies about me more than once on social media.

A B C D

32.

Students from different backgrounds get along well with each other in our school, regardless of their race, culture, family background, sex, or sexual orientation.

A B C D

33.

My teachers believe that all students can do well in their learning.

A B C D

34.

In school, students learn how to control their behavior. A B C D

50

35.

I have been threatened by other students more than once on social media.

A B C D

36.

My teachers give me individual help with my schoolwork when I need help.

A B C D

37.

Our school offers guidance to students on how to mediate (settle) conflicts by themselves.

A B C D

Thank you for sharing your experiences and opinions through this student questionnaire. The information you provided can help inform your school’s efforts to create safe and supportive learning environments for all students. If you would like to speak with someone about the topics on this questionnaire, we encourage you to reach out to a family member and/or guidance counselor, teacher, principal, or other adult in the school.

51

Appendix B3: Student MCAS Questionnaire - Grade 10 VOCAL form

Spring 2017STUDENT QUESTIONNAIRE

Grade 10DIRECTIONSMark your answers to the following questions in the box labeled Student Questionnaire on the inside back cover of your Student Answer Booklet. If you do not see one best answer for a question, leave that question blank in your answer booklet and go to the next question. Please ask your test administrator for help if you are not sure how to answer any of these questions.

1. How does using a computer compare with working by hand when you are completing school assignments such as reports or essays?A. It is a lot easier to write on a computer than by hand.B. It is somewhat easier to write on a computer than by hand.C. It doesn’t make any difference whether I write on a computer or by hand.D. It is somewhat harder to write on a computer than by hand.E. It is a lot harder to write on a computer than by hand.

2. What types of tests have you taken on a computer? Choose all that apply.A. multiple-choice B. essayC. combination of multiple-choice questions and written responsesD. I have never taken a test on a computer.E. I don’t know.

3. What are your plans after high school?A. attend a four-year collegeB. attend a two-year collegeC. join the militaryD. work full-timeE. other

52

F. I don’t know. 4. If you are not planning to attend a two- or four-year college, which of the following

best describes your plans for future job training? (If you are planning to attend a two- or four-year college, skip this question.)A. attend college sometime in the future for vocational training or credentialingB. attend a post-secondary vocational school for more advanced trainingC. on-the-job trainingD. I do not plan to seek future job training.E. I don’t know.

PLEASE PROCEED TO THE NEXT PAGE

The next set of questions asks what it’s like to be a student in your school. There are no right or wrong answers. Your teachers and principal will not see your individual answers; your answers will be combined with those of your classmates. Your school will use these combined answers to better understand what school life is like for students. When you read each statement, think about the last 30 days in your school. Please answer honestly so your school knows how you really feel about the school. PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

5. Teachers support students who come to class upset. A B C D

6. Teachers ask students for feedback on their classroom instruction. A B C D

7.My teachers are approachable if I am having problems with my class work.

A B C D

8. I am encouraged to take upper-level courses (honors, AP). A B C D

9.I have been teased or picked on more than once because of my race or ethnicity.

A B C D

10. Teachers give students a chance to explain their behavior when they do

A B C D

53

something wrong.

11. My teachers support me even when my work is not my best. A B C D

12. Students respect one another. A B C D

13.

I feel welcome to participate in extra-curricular activities offered through our school, such as school clubs or organizations, musical groups, sports teams, or student council.

A B C D

14. My teachers are proud of me when I work hard in school. A B C D

15.I have access to help at school if I am struggling emotionally or mentally.

A B C D

16. Students know what to do if there is an emergency at school. A B C D

PLEASE PROCEED TO THE NEXT PAGE

PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

17. I feel as though I belong to our school community. A B C D

18.

Adults working at this school treat all students respectfully, regardless of a student’s race, culture, family background, sex, or sexual orientation.

A B C D

19.Students help each other learn without having to be asked by the teacher.

A B C D

54

20. The consequences for inappropriate behavior are enforced fairly. A B C D

21.If I tell a teacher or other adult that someone is being bullied, the teacher/adult will do something to help.

A B C D

22. The things I am learning in school are relevant (important) to me. A B C D

23. Students have a voice in deciding school rules. A B C D

24.Adults at our school are respectful of student ideas, even if the ideas expressed are different from their own.

A B C D

25.Students try to work out their problems with other students in a respectful way.

A B C D

26. My teachers set high expectations for my work. A B C D

27.I have been teased or picked on more than once because of my real or perceived sexual orientation.

A B C D

28.Teachers, students, and the principal work together in our school to prevent bullying.

A B C D

29. My teachers promote respect among students. A B C D

30. I sometimes stay home because I don’t feel safe at our school. A B C D

PLEASE PROCEED TO THE NEXT PAGE

PLEASE MARK YOUR RESPONSE TO EACH STATEMENT IN YOUR STUDENT ANSWER BOOKLET.

55

Think of the last 30 days in school.

Always true

Mostly true

Mostly untrue

Never true

31.At our school, a teacher or other adult is available to help students who have experienced sexual assault or dating violence.

A B C D

32. I have at least one friend whom I can count on to support me. A B C D

33.

Students from different backgrounds get along well with each other in our school, regardless of their race, culture, family background, sex, or sexual orientation.

A B C D

The last five questions relate to your mathematics instruction. Please think of your current or most recent mathematics class when responding to the statements.

Think of your current or most recent math class.

Always true

Mostly true

Mostly untrue

Never true

34. I am challenged to support my answers or reasoning in this class. A B C D

35.My teacher helps us identify our strengths and shows us how to use them to help us learn.

A B C D

36.During our lessons, I am asked to apply what I know to new types of challenging problems or tasks.

A B C D

37.My teacher checks to make sure we understand what he or she is teaching us.

A B C D

38. I have to use my critical thinking skills, and not just memorize facts,

A B C D

56

to do my work in this teacher’s class.

Thank you for sharing your experiences and opinions through this student questionnaire. The information you provided can help inform your school’s efforts to create safe and supportive learning environments for all students. If you would like to speak with someone about the topics on this questionnaire, we encourage you to reach out to a family member and/or guidance counselor, teacher, principal, or other adult in the school.

57

Appendix C1: The Rasch ModelThe Rasch model uses an exponential transformation to place ordinal Likert responses on to an equal-interval logit scale (Rasch, 1960). This transformation ensures that stakeholder perceptions are measured appropriately and that the data meet the assumptions of parametric testing (Ludlow and Haley, 1995; Boone, Staver, and Yale 2014). In addition, the sample independence features of the Rasch model overcome the fundamental drawbacks of classical test theory (CTT) analyses Smith (2000). In CTT, the difficulty of a test is sample dependent, making it problematic to measure change on a variable (Smith, 2000; Boone & Scantlebury, 2006). In contrast, the Rasch property of item invariance implies that the relative endorsements and location of the items do not change (within measurement error), or are independent of the sample responding; in kind, the relative item endorsements should behave as expected across different samples (Smith, 2002, Engelhard, 2013). When items are invariant, the Rasch model is particularly discerning in differentiating between high and low scorers (Gable, Ludlow, and Wolf, 1990; Sinnema & Ludlow, 2013) on a measurement scale as it places persons and items on a common scale metric (Hambleton and Jones, 1993; Engelhard, 2013).

The Rasch rating scale model provides a mathematical model for the probabilistic relationship

between a person’s ability (βn ) and the difficulty of items (δ i ) on a test or survey. Andrich’s (1978a, 1978b) rating scale model (RSM) used in this study is defined in Equation 1.

∅ nij=exp¿¿ j = 1, 2, …, mi. (1)

Where ∅ nij is the “conditional probability of person, n responding in category j to item i”. Tau is the estimate of the location of the jth step for each item relative to that item’s scale value (δi). The number of response categories is equal to mi +1 where mi is the number of thresholds. In the RSM, moving from one threshold to the next contiguous threshold is assumed to have the same mean difference across all items of the survey. The unit of measurement resulting from the natural log transformation of person responses results in separate ability and item difficulty estimates called logits (Ludlow & Haley, 1995). The persons and items are placed on a common continuum (the scale metric axis of the variable map) and as such, the persons can be characterized by their location on the continuum by the types and level of items of which they are associated. By taking the natural log of the odds ratio, stable replicable information about the relative strengths of persons and items is derived with equal differences in logits translating into equal differences in the probability of endorsing an item no matter where on the scale metric an item is located; this interval-level unit of measurement is a fundamental assumption of parametric tests (Boone, Townsend, and Staver, 2011). By default, in WINSTEPS, the item mean summed across the thresholds equals zero; the person and item measures are generated and reported on the logit scale. In the context of this study, a respondent with a positive logit value on an educator preparation survey feels relatively more positive about the program than a respondent with a negative logit value.

58

Appendix C2: Logit Unit of Measurement

The unit of measurement resulting from the natural log transformation of person responses results in separate ability and item difficulty estimates called logits (Ludlow & Haley, 1995); this transformation expands the theoretical ability (endorsement) range from negative infinity to plus infinity with most estimates falling in the range of -4 to +4 logits (Ludlow & Haley, 1995). Items can be similarly interpreted in logits with a theoretical range of negative infinity to positive infinity; items with a positive logit are, on average, more difficult to endorse than items with negative logits (Ludlow & Haley, 1995). The persons and items are placed on a common continuum (the scale metric axis of the variable map) and as such, the persons can be characterized by their location on the continuum by the types and level of items of which they are associated. Person expected responses can be compared to their observed responses to determine if “the logit estimate of ability (affirmation) corresponding to an original raw data summary score is consistent or inconsistent with the pattern expected for that estimate of ability (affirmation)” (Ludlow & Haley, 1995). By taking the natural log of the odds ratio, stable replicable information about the relative strengths of persons and items is derived with equal differences in logits translating into equal differences in the probability of endorsing an item no matter where on the scale metric an item is located; this interval-level unit of measurement is a fundamental assumption of parametric tests (Ludlow and Haley, 1995; Boone, Townsend, and Staver, 2011).

59

Appendix D: Guide for evaluating Rasch Model validity dataValidity Aspect Statistic/Data Cutoff Criteria or Typical Standard CommentContent Point-to-measure

CorrelationPositive and >0.3. Analog to CTT item-total

correlation. Content & Structural

Infit and Outfit Mean-square Fit Statistics (MNSQ)

0.5 – 1.5 Disruption of pattern in magnitude of misfit.

Mean square errors should have a mean of one i.e. (observed = expected).

Substantive

Rating Scale Functioning

Minimum of 10 responses per category. Categories are unimodal. Observed score averages and item threshold

parameters increase monotonically. Un-weighted MNSQ < 2.0 for ea. category.

Rating scale is used according to the intent of instrument developers – supports score use and inferences.

Item Difficulty Hierarchy

Ordering of item deltas correspond to theoretical expectations.

Item/person variable maps.

Qualitative assessment of items in the construct and/or dimensions/domains.

Generaliz-ability

Item Invariance and Differential Item Functioning (DIF)

Within standard error, items should retain same item difficulty (deltas) across administrations and survey forms (correlation of greater than or equal to 0.9).

For DIF, recommended criteria vary: delta difference of 0.3 – 0.64 Logits (0.5 used in study)

DIF flags items that need further review. Items may need revision to eliminate bias or removal when estimating scores if bias is significant.

Person SeparationReliability (PSR)

Typical ~ 0.8; High Stakes > 0.9 0.9 Construct; 0.8 Dimensions; 0.7 school-

level scores

PSR is similar to Cronbach α and ranges from 0 to 1.

Structural

Sub-scale Correlations

Positive and substantial (> 0.5 but < 0.9) The items that form a 2nd dimension should be reviewed qualitatively to determine their commonality and if their co-variation is meaningful.

Standardized Residuals

No correlation between residuals from separate calibrations of two item subsets.

Winsteps Software(PCA: Principal Component Analyses of Residuals).

Total variance explained:>40% very good; >50% excellent

2nd dimension: < 5% of total variance. 2nd dimension Eigen < 3 1st contrast item variance 4x variance of

2nd item contrast Cluster correlations

> 0.82 likely only one latent trait > 0.71 more dependency than

independence

External

Responsiveness Typical ~ 3 person strata (low, medium, high). H = (4G +1)/3 where H is the number of

person strata and G is the person separation index.

Instruments that are responsive can better differentiate high and low scorers by reliably separating individuals into a greater number of performance levels, thereby facilitating the measurement of change of respondent views on a construct.

60

Appendix E: Technical quality (mean-square error) of 70-item VOCAL scale|ENTRY TOTAL TOTAL MODEL| INFIT | OUTFIT |PTMEASUR-AL|EXACT MATCH| | ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR. EXP.| OBS% EXP%|DISPLACE| ITEM ||------------------------------------+----------+----------+-----------+-----------+--------+---------|| 30 115122 43443 -1.66 .01|2.41 9.9|2.88 9.9|A .22 .41| 69.6 70.5| .00| SAFBUL10|| 31 110085 43416 -1.23 .01|2.27 9.9|2.73 9.9|B .24 .45| 56.7 64.5| .00| SAFBUL11|| 43 117086 43970 -1.71 .01|2.05 9.9|2.57 9.9|C .29 .41| 73.0 71.3| .00| SAFPSF5 || 23 109404 50974 .52 .01|2.42 9.9|2.51 9.9|D .38 .58| 34.8 56.1| .00| SAFBUL2 || 27 121136 44965 -1.92 .01|2.31 9.9|2.41 9.9|E .23 .39| 73.8 73.3| .00| SAFBUL6 || 26 103394 46560 -.36 .01|2.15 9.9|2.10 9.9|F .39 .51| 38.7 56.8| .00| SAFBUL5 || 42 114381 45244 -1.26 .01|2.06 9.9|2.15 9.9|G .32 .45| 58.4 64.2| .00| SAFPSF4 || 28 120598 47104 -1.36 .01|2.11 9.9|1.98 9.9|H .36 .44| 62.8 65.7| .00| SAFBUL7 || 3 78101 46844 .82 .01|1.93 9.9|2.03 9.9|I .28 .56| 37.5 50.2| .00| ENGCLC3 || 4 89363 43661 .11 .01|1.73 9.9|1.78 9.9|J .41 .53| 40.7 54.8| .00| ENGCLC4 || 37 117366 44577 -1.58 .01|1.70 9.9|1.57 9.9|K .37 .42| 68.1 69.5| .00| SAFEMO6 || 5 129736 51223 -.59 .01|1.66 9.9|1.57 9.9|L .38 .50| 59.7 66.3| .00| ENGPAR1 || 61 89389 47342 .39 .01|1.49 9.9|1.49 9.9|M .37 .55| 45.8 52.8| .00| ENVINS10|| 45 155552 142488 2.20A .00|1.37 9.9|1.46 9.9|N .50 .62| 45.2 48.9| -.02| ENVDIS1 || 40 135990 51182 -1.08 .01|1.44 9.9|1.24 9.9|O .42 .46| 71.0 72.6| .00| SAFPSF2 || 41 82833 46675 .62A .01|1.32 9.9|1.38 9.9|P .43 .56| 47.3 51.5| -.01| SAFPSF3 || 24 129741 50861 -.67A .01|1.33 9.9|1.23 9.9|Q .49 .50| 67.1 67.4| .02| SAFBUL3 || 65 114368 51234 .30A .01|1.31 9.9|1.25 9.9|R .52 .57| 53.1 57.7| .00| ENVMEN1 || 14 110606 47723 -.60A .01|1.29 9.9|1.24 9.9|S .47 .49| 54.7 58.4| .00| ENGREL7 || 35 80034 48203 .85A .01|1.29 9.9|1.28 9.9|T .59 .57| 43.0 50.2| -.01| SAFEMO4 || 66 100694 51216 .93A .01|1.28 9.9|1.24 9.9|U .56 .60| 49.5 54.1| .00| ENVMEN2 || 67 99984 48292 .01A .01|1.25 9.9|1.21 9.9|V .55 .53| 50.7 54.9| -.01| ENVMEN3 || 19 96983 43999 -.25A .01|1.23 9.9|1.20 9.9|W .52 .51| 55.9 56.9| .00| ENGREL12|| 68 107825 48385 -.37A .01|1.22 9.9|1.13 9.9|X .59 .51| 54.3 57.0| .00| ENVMEN4 || 7 105042 45209 -.57A .01|1.21 9.9|1.16 9.9|Y .52 .49| 58.1 58.7| .01| ENGPAR3 || 47 130390 51137 -.67A .01|1.19 9.9|1.07 7.6|Z .51 .50| 67.8 67.4| .02| ENVDIS3 || 55 99203 51188 1.00A .01|1.19 9.9|1.17 9.9| .55 .60| 50.9 53.5| .00| ENVINS4 || 48 102344 47094 -.23A .01|1.17 9.9|1.17 9.9| .43 .52| 53.1 56.2| .00| ENVDIS4 || 6 113998 48278 -.72A .01|1.15 9.9|1.07 9.0| .54 .48| 62.1 59.4| .00| ENGPAR2 || 58 80180 47245 .78A .01|1.12 9.9|1.15 9.9| .42 .56| 48.5 50.5| -.01| ENVINS7 || 70 104016 45057 -.52A .01|1.14 9.9|1.07 9.4| .56 .50| 59.2 58.5| .00| ENVMEN6 || 1 343464 143914 -.56A .00|1.13 9.9|1.05 9.9| .55 .51| 61.1 61.1| .00| ENGCLC1 |

61

| 2 137670 51269 -1.23A .01|1.11 9.9| .92 -7.0| .52 .45| 76.8 74.6| .02| ENGCLC2 || 34 121223 51192 -.08A .01|1.10 9.9|1.04 5.1| .57 .54| 60.7 60.4| .01| SAFEMO3 || 39 129842 51172 -.62A .01|1.08 9.9| .93 -7.7| .56 .50| 69.1 66.6| .01| SAFPSF1 || 46 115656 51260 .23A .01|1.08 9.9|1.08 9.9| .52 .56| 57.4 58.0| .01| ENVDIS2 || 44 105521 45198 -.60A .01|1.07 9.5|1.05 7.2| .44 .49| 58.3 59.0| .01| SAFPSF6 || 64 72836 43688 .90A .01|1.05 7.7|1.07 9.9| .49 .57| 49.8 50.3| -.01| ENVINS13|| 10 136865 51134 -1.19A .01|1.06 7.0| .87 -9.9| .53 .45| 76.8 73.9| .02| ENGREL3 || 60 98973 48279 .05A .01| .99 -2.0|1.06 8.6| .32 .53| 58.3 54.7| .00| ENVINS9 || 53 356836 144258 -.84A .00|1.05 9.9| .93 -9.9| .56 .49| 66.1 63.5| .01| ENVINS2 || 56 214029 92697 -.56A .01|1.04 8.2|1.03 6.1| .45 .50| 59.5 58.4| .00| ENVINS5 || 57 108341 48331 -.41A .01|1.02 3.5| .98 -3.4| .58 .51| 59.9 57.2| .01| ENVINS6 || 54 135430 51147 -1.06A .01|1.01 1.4| .86 -9.9| .54 .47| 75.0 72.4| .02| ENVINS3 || 22 354361 144350 -.77A .00|1.00 1.1| .90 -9.9|z .59 .49| 67.7 62.9| .00| SAFBUL1 || 25 73025 47560 1.08A .01| .99 -1.3|1.00 .1|y .56 .57| 50.5 48.7| -.01| SAFBUL4 || 59 119775 48331 -1.08A .01|1.00 .2| .88 -9.9|x .58 .46| 67.8 62.6| .01| ENVINS8 || 12 205745 92101 -.36A .01| .99 -2.3| .97 -5.2|w .49 .51| 60.6 57.2| .00| ENGREL5 || 63 75936 44277 .80A .01| .95 -8.0| .97 -5.2|v .54 .57| 53.1 50.8| .00| ENVINS12|| 38 95837 45394 -.03A .01| .96 -6.4| .93 -9.8|u .59 .53| 58.9 55.7| -.01| SAFEMO7 || 51 62805 44602 1.38A .01| .93 -9.9| .96 -6.5|t .55 .58| 51.3 47.9| -.01| ENVDIS7 || 15 113456 48652 -.65A .01| .92 -9.9| .89 -9.9|s .56 .49| 63.8 58.7| .01| ENGREL8 || 17 103010 48419 -.13A .01| .92 -9.9| .89 -9.9|r .61 .52| 58.2 55.6| .00| ENGREL10|| 36 97968 48451 .11A .01| .92 -9.9| .89 -9.9|q .64 .54| 56.9 54.4| .00| SAFEMO5 || 50 84402 44804 .46A .01| .92 -9.9| .92 -9.9|p .57 .55| 57.2 53.1| .00| ENVDIS6 || 52 286343 144678 .45A .00| .90 -9.9| .92 -9.9|o .50 .57| 58.0 54.2| .00| ENVINS1 || 18 100415 43715 -.49A .01| .91 -9.9| .90 -9.9|n .55 .50| 63.1 58.2| .00| ENGREL11|| 9 98906 51151 1.01A .01| .90 -9.9| .88 -9.9|m .62 .60| 58.3 53.5| .00| ENGREL2 || 16 119107 48619 -.99A .01| .90 -9.9| .82 -9.9|l .61 .47| 69.7 61.5| .01| ENGREL9 || 33 110177 51207 .50A .01| .89 -9.9| .86 -9.9|k .62 .58| 60.8 56.1| .00| SAFEMO2 || 32 301912 143502 .17A .00| .87 -9.9| .88 -9.9|j .59 .56| 61.7 55.7| .00| SAFEMO1 || 29 92720 44527 .03A .01| .87 -9.9| .85 -9.9|i .63 .53| 60.6 55.4| .00| SAFBUL9 || 13 106798 48351 -.32A .01| .85 -9.9| .85 -9.9|h .57 .51| 61.8 56.7| .00| ENGREL6 || 21 104130 44633 -.59A .01| .85 -9.9| .80 -9.9|g .61 .49| 67.8 58.8| .00| ENGREL14|| 49 79639 47284 .80A .01| .85 -9.9| .85 -9.9|f .58 .57| 56.6 50.4| -.01| ENVDIS5 || 69 74062 44616 .91A .01| .84 -9.9| .85 -9.9|e .55 .57| 56.3 50.1| -.01| ENVMEN5 || 8 252700 143591 .91A .00| .76 -9.9| .79 -9.9|d .56 .60| 60.5 51.4| .00| ENGREL1 || 11 92802 50987 1.25 .01| .76 -9.9| .79 -9.9|c .58 .61| 63.0 52.4| .00| ENGREL4 |

62

| 62 87148 44798 .34A .01| .78 -9.9| .79 -9.9|b .58 .55| 61.6 53.8| -.01| ENVINS11|| 20 87839 45016 .32A .01| .73 -9.9| .72 -9.9|a .63 .55| 63.6 53.9| .00| ENGREL13|

63

Appendix F: Measure order of 70-item VOCAL scale|ENTRY TOTAL TOTAL MODEL| INFIT | OUTFIT |PTMEASUR-AL|EXACT MATCH| | ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR. EXP.| OBS% EXP%|DISPLACE| ITEM ||------------------------------------+----------+----------+-----------+-----------+--------+---------|| 45 155552 142488 2.20A .00|1.37 9.9|1.46 9.9| .50 .62| 45.2 48.9| -.02| ENVDIS1 || 51 62805 44602 1.38A .01| .93 -9.9| .96 -6.5| .55 .58| 51.3 47.9| -.01| ENVDIS7 || 11 92802 50987 1.25 .01| .76 -9.9| .79 -9.9| .58 .61| 63.0 52.4| .00| ENGREL4 || 25 73025 47560 1.08A .01| .99 -1.3|1.00 .1| .56 .57| 50.5 48.7| -.01| SAFBUL4 || 9 98906 51151 1.01A .01| .90 -9.9| .88 -9.9| .62 .60| 58.3 53.5| .00| ENGREL2 || 55 99203 51188 1.00A .01|1.19 9.9|1.17 9.9| .55 .60| 50.9 53.5| .00| ENVINS4 || 66 100694 51216 .93A .01|1.28 9.9|1.24 9.9| .56 .60| 49.5 54.1| .00| ENVMEN2 || 8 252700 143591 .91A .00| .76 -9.9| .79 -9.9| .56 .60| 60.5 51.4| .00| ENGREL1 || 69 74062 44616 .91A .01| .84 -9.9| .85 -9.9| .55 .57| 56.3 50.1| -.01| ENVMEN5 || 64 72836 43688 .90A .01|1.05 7.7|1.07 9.9| .49 .57| 49.8 50.3| -.01| ENVINS13|| 35 80034 48203 .85A .01|1.29 9.9|1.28 9.9| .59 .57| 43.0 50.2| -.01| SAFEMO4 || 3 78101 46844 .82 .01|1.93 9.9|2.03 9.9| .28 .56| 37.5 50.2| .00| ENGCLC3 || 49 79639 47284 .80A .01| .85 -9.9| .85 -9.9| .58 .57| 56.6 50.4| -.01| ENVDIS5 || 63 75936 44277 .80A .01| .95 -8.0| .97 -5.2| .54 .57| 53.1 50.8| .00| ENVINS12|| 58 80180 47245 .78A .01|1.12 9.9|1.15 9.9| .42 .56| 48.5 50.5| -.01| ENVINS7 || 41 82833 46675 .62A .01|1.32 9.9|1.38 9.9| .43 .56| 47.3 51.5| -.01| SAFPSF3 || 23 109404 50974 .52 .01|2.42 9.9|2.51 9.9| .38 .58| 34.8 56.1| .00| SAFBUL2 || 33 110177 51207 .50A .01| .89 -9.9| .86 -9.9| .62 .58| 60.8 56.1| .00| SAFEMO2 || 50 84402 44804 .46A .01| .92 -9.9| .92 -9.9| .57 .55| 57.2 53.1| .00| ENVDIS6 || 52 286343 144678 .45A .00| .90 -9.9| .92 -9.9| .50 .57| 58.0 54.2| .00| ENVINS1 || 61 89389 47342 .39 .01|1.49 9.9|1.49 9.9| .37 .55| 45.8 52.8| .00| ENVINS10|| 62 87148 44798 .34A .01| .78 -9.9| .79 -9.9| .58 .55| 61.6 53.8| -.01| ENVINS11|| 20 87839 45016 .32A .01| .73 -9.9| .72 -9.9| .63 .55| 63.6 53.9| .00| ENGREL13|| 65 114368 51234 .30A .01|1.31 9.9|1.25 9.9| .52 .57| 53.1 57.7| .00| ENVMEN1 || 46 115656 51260 .23A .01|1.08 9.9|1.08 9.9| .52 .56| 57.4 58.0| .01| ENVDIS2 || 32 301912 143502 .17A .00| .87 -9.9| .88 -9.9| .59 .56| 61.7 55.7| .00| SAFEMO1 || 36 97968 48451 .11A .01| .92 -9.9| .89 -9.9| .64 .54| 56.9 54.4| .00| SAFEMO5 || 4 89363 43661 .11 .01|1.73 9.9|1.78 9.9| .41 .53| 40.7 54.8| .00| ENGCLC4 || 60 98973 48279 .05A .01| .99 -2.0|1.06 8.6| .32 .53| 58.3 54.7| .00| ENVINS9 || 29 92720 44527 .03A .01| .87 -9.9| .85 -9.9| .63 .53| 60.6 55.4| .00| SAFBUL9 || 67 99984 48292 .01A .01|1.25 9.9|1.21 9.9| .55 .53| 50.7 54.9| -.01| ENVMEN3 || 38 95837 45394 -.03A .01| .96 -6.4| .93 -9.8| .59 .53| 58.9 55.7| -.01| SAFEMO7 |

64

| 34 121223 51192 -.08A .01|1.10 9.9|1.04 5.1| .57 .54| 60.7 60.4| .01| SAFEMO3 || 17 103010 48419 -.13A .01| .92 -9.9| .89 -9.9| .61 .52| 58.2 55.6| .00| ENGREL10|| 48 102344 47094 -.23A .01|1.17 9.9|1.17 9.9| .43 .52| 53.1 56.2| .00| ENVDIS4 || 19 96983 43999 -.25A .01|1.23 9.9|1.20 9.9| .52 .51| 55.9 56.9| .00| ENGREL12|| 13 106798 48351 -.32A .01| .85 -9.9| .85 -9.9| .57 .51| 61.8 56.7| .00| ENGREL6 || 12 205745 92101 -.36A .01| .99 -2.3| .97 -5.2| .49 .51| 60.6 57.2| .00| ENGREL5 || 26 103394 46560 -.36 .01|2.15 9.9|2.10 9.9| .39 .51| 38.7 56.8| .00| SAFBUL5 || 68 107825 48385 -.37A .01|1.22 9.9|1.13 9.9| .59 .51| 54.3 57.0| .00| ENVMEN4 || 57 108341 48331 -.41A .01|1.02 3.5| .98 -3.4| .58 .51| 59.9 57.2| .01| ENVINS6 || 18 100415 43715 -.49A .01| .91 -9.9| .90 -9.9| .55 .50| 63.1 58.2| .00| ENGREL11|| 70 104016 45057 -.52A .01|1.14 9.9|1.07 9.4| .56 .50| 59.2 58.5| .00| ENVMEN6 || 1 343464 143914 -.56A .00|1.13 9.9|1.05 9.9| .55 .51| 61.1 61.1| .00| ENGCLC1 || 56 214029 92697 -.56A .01|1.04 8.2|1.03 6.2| .45 .50| 59.5 58.4| .00| ENVINS5 || 7 105042 45209 -.57A .01|1.21 9.9|1.16 9.9| .52 .49| 58.1 58.7| .01| ENGPAR3 || 21 104130 44633 -.59A .01| .85 -9.9| .80 -9.9| .61 .49| 67.8 58.8| .00| ENGREL14|| 5 129736 51223 -.59 .01|1.66 9.9|1.57 9.9| .38 .50| 59.7 66.3| .00| ENGPAR1 || 14 110606 47723 -.60A .01|1.29 9.9|1.24 9.9| .47 .49| 54.7 58.4| .00| ENGREL7 || 44 105521 45198 -.60A .01|1.07 9.5|1.05 7.2| .44 .49| 58.3 59.0| .01| SAFPSF6 || 39 129842 51172 -.62A .01|1.08 9.9| .93 -7.7| .56 .50| 69.1 66.6| .01| SAFPSF1 || 15 113456 48652 -.65A .01| .92 -9.9| .89 -9.9| .56 .49| 63.8 58.7| .01| ENGREL8 || 24 129741 50861 -.67A .01|1.33 9.9|1.23 9.9| .49 .50| 67.1 67.4| .02| SAFBUL3 || 47 130390 51137 -.67A .01|1.19 9.9|1.07 7.6| .51 .50| 67.8 67.4| .02| ENVDIS3 || 6 113998 48278 -.72A .01|1.15 9.9|1.07 9.0| .54 .48| 62.1 59.4| .00| ENGPAR2 || 22 354361 144350 -.77A .00|1.00 1.2| .90 -9.9| .59 .49| 67.7 62.9| .00| SAFBUL1 || 53 356836 144258 -.84A .00|1.05 9.9| .93 -9.9| .56 .49| 66.1 63.5| .01| ENVINS2 || 16 119107 48619 -.99A .01| .90 -9.9| .82 -9.9| .61 .47| 69.7 61.5| .01| ENGREL9 || 54 135430 51147 -1.06A .01|1.01 1.4| .86 -9.9| .54 .47| 75.0 72.4| .02| ENVINS3 || 59 119775 48331 -1.08A .01|1.00 .2| .88 -9.9| .58 .46| 67.8 62.6| .01| ENVINS8 || 40 135990 51182 -1.08 .01|1.44 9.9|1.24 9.9| .42 .46| 71.0 72.6| .00| SAFPSF2 || 10 136865 51134 -1.19A .01|1.06 7.0| .87 -9.9| .53 .45| 76.8 73.9| .02| ENGREL3 || 31 110085 43416 -1.23 .01|2.27 9.9|2.73 9.9| .24 .45| 56.7 64.5| .00| SAFBUL11|| 2 137670 51269 -1.23A .01|1.11 9.9| .92 -7.0| .52 .45| 76.8 74.6| .02| ENGCLC2 || 42 114381 45244 -1.26 .01|2.06 9.9|2.15 9.9| .32 .45| 58.4 64.2| .00| SAFPSF4 || 28 120598 47104 -1.36 .01|2.11 9.9|1.98 9.9| .36 .44| 62.8 65.7| .00| SAFBUL7 || 37 117366 44577 -1.58 .01|1.70 9.9|1.57 9.9| .37 .42| 68.1 69.5| .00| SAFEMO6 || 30 115122 43443 -1.66 .01|2.41 9.9|2.88 9.9| .22 .41| 69.6 70.5| .00| SAFBUL10|

65

| 43 117086 43970 -1.71 .01|2.05 9.9|2.57 9.9| .29 .41| 73.0 71.3| .00| SAFPSF5 || 27 121136 44965 -1.92 .01|2.31 9.9|2.41 9.9| .23 .39| 73.8 73.3| .00| SAFBUL6 |

66

Appendix G1: Engagement Items

Indicator Grade Item code Item prompt

Cultural and Linguistic Competence

5, 8, 10 ENGCLC1 Adults working at this school treat all students respectfully.

5 ENGCLC2 Teachers at this school accept me for who I am.

8 ENGCLC3 My textbooks or class materials include people and examples that reflect my race, cultural background and/or identity.

10 ENGCLC4 I am encouraged to take upper level courses (honors, AP).

Relationships

5, 8, 10 ENGREL1 Students respect one another.

5 ENGREL2 Students will help other students, even if they are not close friends.

5 ENGREL3 My teachers care about me as a person.

5 ENGREL4 Students at my school get along well with each other.

8, 10 ENGREL5Students from different backgrounds get along well with each other in our school, regardless of their race, culture, family background, sex, or sexual orientation.

8 ENGREL6 Teachers are available when I need to talk with them.

8 ENGREL7 If I am absent from school, there is a teacher or other adult that will notice I was not in class.

8 ENGREL8 Teachers encourage students to respect different points of view when expressed in class.

8 ENGREL9 My teachers care about my academic success.

8 ENGREL10 My teachers inspire confidence in my ability to do well in school.

10 ENGREL11 My teachers are approachable if I am having problems with my class work.

10 ENGREL12 At our school, a teacher or some other adult is available to help students who have experienced sexual assault or dating violence.

10 ENGREL13 Adults at our school are respectful to student ideas even if the ideas expressed are different from their own.

10 ENGREL14 My teachers promote respect among students.

Participation5 ENGPAR1 I get the chance to take part in school events (for example,

science fairs, art or music shows).

8 ENGPAR2 My parents feel respected when they participate at our school (e.g., at parent-teacher conferences, open houses).

10 ENGPAR3I feel welcome to participate in extra-curricular activities offered through our school, such as, school clubs or organizations, musical groups, sports teams, student council.

67

68

Appendix G2: Safety Items

Domain Grade Item code Item prompt

Emotional

5, 8, 10 SAFEMO1 Teachers support (help) students who come to class upset.

5 SAFEMO2 At our school, students learn to care about other students' feelings.

5 SAFEMO3 I am happy to be at our school.

8 SAFEMO4 I feel comfortable reaching out to teachers/counselors for emotional support if I need it.

8 SAFEMO5 Teachers and adults are interested in my well-being beyond just my class work.

10 SAFEMO6 I have at least one friend who I can count on to support me.

10 SAFEMO7 I feel as though I belong to my school community.

Physical

5 SAFPSF1 I feel safe at our school.

5 SAFPSF2 If I heard about a threat to our school or to my classmates, I would report it to an adult.

8 SAFPSF3 Students at this school damage and/or steal other students' property.

8 SAFPSF4 I have seen students with weapons at our school.

10 SAFPSF5 I sometimes stay home because I don’t feel safe at our school.

10 SAFPSF6 Students know what to do if there is an emergency at school.

Bullying/cyber bullying

5, 8, 10 SAFBUL1 If I tell a teacher or other adult that someone is being bullied, the teacher/adult will do something to help.

5 SAFBUL2 I have been punched or shoved by other students more than once in the school or in the playground.

5 SAFBUL3 Teachers don't let students pick on other students in class or in the hallways.

8 SAFBUL4 Students at this school try to stop bullying when they see it happening.

8 SAFBUL5 Students have spread rumors or lies about me more than once on social media.

8 SAFBUL6 I have been teased or picked on more than once because of my religion.

8 SAFBUL7 I have been threatened by other students more than once on social media.

10 SAFBUL9 Teachers, students, and the principal work together in our school to prevent bullying.

10 SAFBUL10 I have been teased or picked on more than once because of my real or perceived sexual orientation.

69

Appendix G2: Safety Items

10 SAFBUL11 I have been teased or picked on more than once because of my race or ethnicity.

Appendix G3: Environment Items

Domain Grade Item code Item prompt

Instructional

5, 8, 10 ENVINS1 Students help each other learn without having to be asked by the teacher.

5, 8, 10 ENVINS2 My teachers are proud of me when I work hard in school.

5 ENVINS3 My teachers help me succeed with my school work when I need help.

5 ENVINS4 My teachers use my ideas to help my classmates learn.

8, 10 ENVINS5 My teachers set high expectations for my work.

8 ENVINS6 My teachers give me individual help with my school work when I need help.

8 ENVINS7 I have a choice in how I show my learning (e.g., write a paper, prepare a presentation, make a video).

8 ENVINS8 My teachers believe that all students can do well in their learning.8 ENVINS9 My school work is appropriately challenging.10 ENVINS10 I am not scared to make mistakes in my teachers' classes.10 ENVINS11 My teachers support me even when my work is not my best.10 ENVINS12 The things I am learning in school are relevant (important) to me.10 ENVINS13 Teachers ask students for feedback on their classroom instruction.

Mental health

5 ENVMEN1 In school, I learn how to control my feelings when I am angry or upset.

5 ENVMEN2 I feel comfortable talking to my teacher(s) about something that is bothering me.

8 ENVMEN3 Our school offers guidance to students on how to mediate (settle) conflicts by themselves.

8 ENVMEN4 If I need help with my emotions (feelings), help is available at my school.

10 ENVMEN5 Students at this school try to work out their problems with other students in a respectful way.

10 ENVMEN6 I have access to help at school if I am struggling emotionally or mentally.

Discipline

5, 8, 10 ENVDIS1 Students have a voice in deciding school rules. 5 ENVDIS2 School rules are fair for all students.

5 ENVDIS3Adults at my school (for example, my school nurse, my teachers, or my principal) talk with students to help us know how to behave well.

8 ENVDIS4 School staff are consistent when enforcing rules in school.

70

Appendix G3: Environment Items

8 ENVDIS5 In school, students learn how to control their behavior.

10 ENVDIS6 The consequences for inappropriate behavior are enforced fairly.

10 ENVDIS7 Teachers give students a chance to explain their behavior when they do something wrong.

71

Appendix H:

Person Reliability of VOCAL scale, grade-level VOCAL scales and dimension sub-scales

Overall School Climate (Persons = 148,824); items = 70)1

Person Separation Reliability (PSR)2

Person Separation Index (PSI: G)

Person Strata (H)

Mean ±SD3

Real – Model2 0.88–0.90 2.67–2.98 3.9–4.3 1.33 ± 1.16

Grade 5 items (Persons =51,384 ); items = 24)1

Person Separation Reliability (PSR)

Person Separation Index (PSI: G)

Person Strata(H)

Mean ±SD

Real - Model 0.86–0.88 2.47–2.74 3.6–4.0 1.81 ± 1.22

Grade 8 items (Persons =50,334 ); items = 33)1

Person Separation Reliability (PSR)

Person Separation Index (PSI: G)

Person Strata(H)

Mean ±SD

Real - Model 0.88–0.91 2.74–3.14 4.0–4.5 1.05 ± 1.01

Grade 10 items (Persons =47,106 ); items = 29)1

Person Separation Reliability (PSR)

Person Separation Index (PSI: G)

Person Strata(H)

Mean ±SD

Real - Model 0.86–0.88 2.45–2.71 3.6–3.9 1.10 ± 1.09

Engagement (Persons = 148,338); items = 21)1

Person Separation Reliability (PSR)

Person Separation Index (PSI: G)

Person Strata (H)

Mean ±SD

Real - Model 0.69–0.73 1.50–1.65 2.3–2.5 1.36 ± 1.30

Safety items (Persons = 148,338); items = 23)1

Person Separation Reliability (PSR)

Person Separation Index (PSI: G)

Person Strata(H)

Mean ±SD

Real - Model 0.68–0.73 1.46–1.65 2.3–2.5 1.43 ± 1.47

Environment items (Persons = 148,364); items = 26)1

Person Separation Reliability (PSR)

Person Separation Index (PSI: G)

Person Strata(H)

Mean ±SD

Real - Model 0.76–0.80 1.79–2.00 2.7–3.0 1.35 ± 1.2417 common items: grade 5, 8, and 10, 2 common items: grade 8 and grade10, 62 remaining items are and distributed across three grades; 2Real PSR lower bound of reliability; Model PSR upper bound; 3SD Standard Deviation.

72

73

Appendix I1: DIF Plot: Economically Disadvantaged (ECODIS)

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

ENGC

LC1

ENGC

LC2

ENGC

LC3

ENGC

LC4

ENGP

AR1

ENGP

AR2

ENGP

AR3

ENGR

EL1

ENGR

EL2

ENGR

EL3

ENGR

EL4

ENGR

EL5

ENGR

EL6

ENGR

EL7

ENGR

EL8

ENGR

EL9

ENGR

EL10

ENGR

EL11

ENGR

EL12

ENGR

EL13

ENGR

EL14

SAFB

UL1

SAFB

UL2

SAFB

UL3

SAFB

UL4

SAFB

UL5

SAFB

UL6

SAFB

UL7

SAFB

UL9

SAFB

UL10

SAFB

UL11

SAFE

MO1

SAFE

MO2

SAFE

MO3

SAFE

MO4

SAFE

MO5

SAFE

MO6

SAFE

MO7

SAFP

SF1

SAFP

SF2

SAFP

SF3

SAFP

SF4

SAFP

SF5

SAFP

SF6

ENVD

IS1EN

VDIS2

ENVD

IS3EN

VDIS4

ENVD

IS5EN

VDIS6

ENVD

IS7EN

VINS

1EN

VINS

2EN

VINS

3EN

VINS

4EN

VINS

5EN

VINS

6EN

VINS

7EN

VINS

8EN

VINS

9EN

VINS

10EN

VINS

11EN

VINS

12EN

VINS

13EN

VMEN

1EN

VMEN

2EN

VMEN

3EN

VMEN

4EN

VMEN

5EN

VMEN

6

DIF

Mea

sure

(diff

.)

ITEM

DIF plot (DIF=@ECODIS)Non-Economically Disadvantaged Economically Disadvantaged

74

Appendix I2: DIF Plot: Students with Disabilities (SWD)

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

ENGC

LC1

ENGC

LC2

ENGC

LC3

ENGC

LC4

ENGP

AR1

ENGP

AR2

ENGP

AR3

ENGR

EL1

ENGR

EL2

ENGR

EL3

ENGR

EL4

ENGR

EL5

ENGR

EL6

ENGR

EL7

ENGR

EL8

ENGR

EL9

ENGR

EL10

ENGR

EL11

ENGR

EL12

ENGR

EL13

ENGR

EL14

SAFB

UL1

SAFB

UL2

SAFB

UL3

SAFB

UL4

SAFB

UL5

SAFB

UL6

SAFB

UL7

SAFB

UL9

SAFB

UL10

SAFB

UL11

SAFE

MO1

SAFE

MO2

SAFE

MO3

SAFE

MO4

SAFE

MO5

SAFE

MO6

SAFE

MO7

SAFP

SF1

SAFP

SF2

SAFP

SF3

SAFP

SF4

SAFP

SF5

SAFP

SF6

ENVD

IS1EN

VDIS2

ENVD

IS3EN

VDIS4

ENVD

IS5EN

VDIS6

ENVD

IS7EN

VINS

1EN

VINS

2EN

VINS

3EN

VINS

4EN

VINS

5EN

VINS

6EN

VINS

7EN

VINS

8EN

VINS

9EN

VINS

10EN

VINS

11EN

VINS

12EN

VINS

13EN

VMEN

1EN

VMEN

2EN

VMEN

3EN

VMEN

4EN

VMEN

5EN

VMEN

6

DIF

Mea

sure

(diff

.)

ITEM

DIF plot (DIF=@SWD)Non-students with disabilities Students with disabilties

75

Appendix I3: DIF Plot: English Language Learner (ELL)

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

ENGC

LC1

ENGC

LC2

ENGC

LC3

ENGC

LC4

ENGP

AR1

ENGP

AR2

ENGP

AR3

ENGR

EL1

ENGR

EL2

ENGR

EL3

ENGR

EL4

ENGR

EL5

ENGR

EL6

ENGR

EL7

ENGR

EL8

ENGR

EL9

ENGR

EL10

ENGR

EL11

ENGR

EL12

ENGR

EL13

ENGR

EL14

SAFB

UL1

SAFB

UL2

SAFB

UL3

SAFB

UL4

SAFB

UL5

SAFB

UL6

SAFB

UL7

SAFB

UL9

SAFB

UL10

SAFB

UL11

SAFE

MO1

SAFE

MO2

SAFE

MO3

SAFE

MO4

SAFE

MO5

SAFE

MO6

SAFE

MO7

SAFP

SF1

SAFP

SF2

SAFP

SF3

SAFP

SF4

SAFP

SF5

SAFP

SF6

ENVD

IS1EN

VDIS2

ENVD

IS3EN

VDIS4

ENVD

IS5EN

VDIS6

ENVD

IS7EN

VINS

1EN

VINS

2EN

VINS

3EN

VINS

4EN

VINS

5EN

VINS

6EN

VINS

7EN

VINS

8EN

VINS

9EN

VINS

10

ENVI

NS1

1EN

VINS

12

ENVI

NS1

3EN

VMEN

1EN

VMEN

2EN

VMEN

3EN

VMEN

4EN

VMEN

5EN

VMEN

6

DIF

Mea

sure

(diff

.)

ITEM

DIF plot (DIF=@ELL)Non-English learner English learner

76

Appendix I4: DIF Plot: HOMELESS

77

Appendix J: Winsteps Residual Analyses Output

Table of STANDARDIZED RESIDUAL variance in Eigenvalue units = ITEM information units Eigenvalue Observed ExpectedTotal raw variance in observations = 117.2517 100.0% 100.0% Raw variance explained by measures = 47.2517 40.3% 44.5% Raw variance explained by persons = 26.5307 22.6% 25.0% Raw Variance explained by items = 20.7210 17.7% 19.5% Raw unexplained variance (total) = 70.0000 59.7% 100.0% 55.5% Unexplned variance in 1st contrast = 2.5826 2.2% 3.7% Unexplned variance in 2nd contrast = 2.2237 1.9% 3.2% Unexplned variance in 3rd contrast = 2.0287 1.7% 2.9% Unexplned variance in 4th contrast = 1.7088 1.5% 2.4% Unexplned variance in 5th contrast = 1.5612 1.3% 2.2%

78

Appendix K: Transformation of Logit Scores

To transform student-level person measures into interpretable school-level scores, the following steps were

taken:

1. The school climate person measures were exported out from Winsteps based on the joint calibration of

all students (all students from across the three grades included),

2. Each person’s logit measure was standardized by subtracting the mean of the overall school climate

measure from each students’ score and dividing by the standard deviation of the overall school climate

measure

sclstd= person school climate measure−mean of school climate measure

standard deviation of school climate measure

where sclstd is the person’s standardized school climate measure

3. The standardized estimates were then multiplied by 20 and 50 was added to each individual score.

As a result of this process, student scores were centered at 50 with a standard deviation of 20. Upon aggregation

to the school-level, scores were truncated to range from 1–99. School-level scores had a mean of 50.05 and a

standard deviation of 12.83. A similar process was used for each dimension score.

79


Recommended