+ All Categories
Home > Documents > Measuring Information and Communication Technology Literacy using a performance assessment:...

Measuring Information and Communication Technology Literacy using a performance assessment:...

Date post: 30-Dec-2016
Category:
Upload: kara
View: 216 times
Download: 0 times
Share this document with a friend
12
Measuring Information and Communication Technology Literacy using a performance assessment: Validation of the Student Tool for Technology Literacy (ST 2 L) Anne Corinne Huggins, Albert D. Ritzhaupt * , Kara Dawson University of Florida, USA article info Article history: Received 19 November 2013 Accepted 2 April 2014 Available online 16 April 2014 Keywords: Technology literacy Information and Communication Technology Literacy NETS*S Validation Reliability abstract This paper reports the validation scores of the Student Tool for Technology Literacy (ST 2 L), a performance-based assessment based on the National Educational Technology Standards for Students (NETS*S) used to measure middle grade students Information and Communication Technology (ICT) Lit- eracy. Middle grade students (N ¼ 5884) from school districts across the state of Florida were recruited for this study. This paper rst provides an overview of various methods to measure ICT literacy and related constructs, and provides documented evidence of score reliability and validity. Following sound procedures based on prior research, this paper provides validity and reliability evidence for the ST 2 L scores using both item response theory and testlet response theory. This paper examines both the in- ternal and external validity of the instrument. The ST 2 L, with minimal revision, was found to be a sound measure of ICT literacy for low-stakes assessment purposes. A discussion of the results is provided with emphasis on the psychometric properties of the tool and some practical insights on with whom the tool should be used in future research and practice. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction A series of recent workshops convened by the National Research Council (NRC) and co-sponsored by the National Science Foundation (NSF) and National Institute for Health highlighted the importance of teaching and assessing 21st century skills in K-12 education (NRC, 2011). Information and Communication Technology (ICT) literacy, or the ability to use technologies to support problem solving, critical thinking, communication, collaboration and decision-making, is a critical 21st century skill (NRC, 2011; P21, 2011). The National Educational Technology Plan (USDOE, 2010) also highlights the importance of ICT literacy for student success across all content areas, for developing skills to support lifelong learning and for providing authentic learning opportunities that prepare students to succeed in a globally competitive workforce. It is clear that students who are ICT literate are at a distinct advantage in terms of learning in increasingly digital classrooms (NSF, 2006; USDOE, 2010), competing in an increasingly digital job market (NRC, 2008) and participating in an increasingly digital democracy (Jenkins, 2006; P21, 2011). Hence, it is critical that educators have access to measures that display evidence of validity and reliability in scores representing this construct in order to use the measures, for example, to guide instruction and address student needs in this area. The International Society for Technology in Education (ISTE) has developed a set of national standards for ICT literacy known as the National Educational Technology Standards for Students (ISTE, 2007). These standards are designed to consider the breadth and depth of ICT literacy and to be exible enough to adapt as new technologies emerge. The standards were modied based on the 1998 version of the standards. NETS*S strands include knowledge and dispositions related to Creativity and Innovation, Communication and Collaboration, Research and Information Fluency, Critical Thinking, Problem Solving and Decision Making, Digital Citizenship and Technology Operations and Concepts. NETS*S have been widely acclaimed and adopted in the U.S. and many countries around the world and are being used by schools for curriculum development, technology planning and school improvement plans. * Corresponding author. School of Teaching and Learning, College of Education, University of Florida, 2423 Norman Hall, PO Box 117048, Gainesville, FL 32611, USA. Tel.: þ1 352 273 4180; fax: þ1 352 392 9193. E-mail address: [email protected] (A.D. Ritzhaupt). Contents lists available at ScienceDirect Computers & Education journal homepage: www.elsevier.com/locate/compedu http://dx.doi.org/10.1016/j.compedu.2014.04.005 0360-1315/Ó 2014 Elsevier Ltd. All rights reserved. Computers & Education 77 (2014) 112
Transcript

Computers & Education 77 (2014) 1–12

Contents lists available at ScienceDirect

Computers & Education

journal homepage: www.elsevier .com/locate/compedu

Measuring Information and Communication TechnologyLiteracy using a performance assessment: Validation of theStudent Tool for Technology Literacy (ST2L)

Anne Corinne Huggins, Albert D. Ritzhaupt*, Kara Dawson

University of Florida, USA

a r t i c l e i n f o

Article history:Received 19 November 2013Accepted 2 April 2014Available online 16 April 2014

Keywords:Technology literacyInformation and CommunicationTechnology LiteracyNETS*SValidationReliability

* Corresponding author. School of Teaching and LeaTel.: þ1 352 273 4180; fax: þ1 352 392 9193.

E-mail address: [email protected] (A.D. Ritzh

http://dx.doi.org/10.1016/j.compedu.2014.04.0050360-1315/� 2014 Elsevier Ltd. All rights reserved.

a b s t r a c t

This paper reports the validation scores of the Student Tool for Technology Literacy (ST2L), aperformance-based assessment based on the National Educational Technology Standards for Students(NETS*S) used to measure middle grade students Information and Communication Technology (ICT) Lit-eracy. Middle grade students (N ¼ 5884) from school districts across the state of Florida were recruitedfor this study. This paper first provides an overview of various methods to measure ICT literacy andrelated constructs, and provides documented evidence of score reliability and validity. Following soundprocedures based on prior research, this paper provides validity and reliability evidence for the ST2Lscores using both item response theory and testlet response theory. This paper examines both the in-ternal and external validity of the instrument. The ST2L, with minimal revision, was found to be a soundmeasure of ICT literacy for low-stakes assessment purposes. A discussion of the results is provided withemphasis on the psychometric properties of the tool and some practical insights on with whom the toolshould be used in future research and practice.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Aseries of recentworkshops convenedby theNational ResearchCouncil (NRC) and co-sponsoredby theNational Science Foundation (NSF)and National Institute for Health highlighted the importance of teaching and assessing 21st century skills in K-12 education (NRC, 2011).Information and Communication Technology (ICT) literacy, or the ability to use technologies to support problem solving, critical thinking,communication, collaboration anddecision-making, is a critical 21st century skill (NRC, 2011; P21, 2011). TheNational Educational TechnologyPlan (USDOE, 2010) also highlights the importance of ICT literacy for student success across all content areas, for developing skills to supportlifelong learning and for providing authentic learning opportunities that prepare students to succeed in a globally competitiveworkforce. It isclear that students who are ICT literate are at a distinct advantage in terms of learning in increasingly digital classrooms (NSF, 2006; USDOE,2010), competing in an increasingly digital jobmarket (NRC, 2008) and participating in an increasingly digital democracy (Jenkins, 2006; P21,2011). Hence, it is critical that educators have access to measures that display evidence of validity and reliability in scores representing thisconstruct in order to use the measures, for example, to guide instruction and address student needs in this area.

The International Society for Technology in Education (ISTE) has developed a set of national standards for ICT literacy known as theNational Educational Technology Standards for Students (ISTE, 2007). These standards are designed to consider the breadth and depth of ICTliteracy and to be flexible enough to adapt as new technologies emerge. The standards were modified based on the 1998 version of thestandards. NETS*S strands include knowledge and dispositions related to Creativity and Innovation, Communication and Collaboration,Research and Information Fluency, Critical Thinking, Problem Solving and Decision Making, Digital Citizenship and Technology Operations andConcepts. NETS*S have been widely acclaimed and adopted in the U.S. and many countries around the world and are being used by schoolsfor curriculum development, technology planning and school improvement plans.

rning, College of Education, University of Florida, 2423 Norman Hall, PO Box 117048, Gainesville, FL 32611, USA.

aupt).

A.C. Huggins et al. / Computers & Education 77 (2014) 1–122

Yet, measuring ICT literacy is a major challenge for educators and researchers. This point is reinforced by two chapters of the most recentHandbook of Educational Communications and Technology that highlight research andmethods onmeasuring the phenomena (Christensen &Knezek, 2014; Tristán-López & Ylizaliturri-Salcedo, 2014). Though there is disagreement on the language used to describe the construct (e.g.,digital literacy, media literacy, technological literacy, technology readiness, etc.), several agree on the key facets that make up the construct,including facets such as knowledge of computer hardware and peripherals, navigation of operating systems, folders and file management,word processing, spreadsheets, databases, e-mail, web searching andmuchmore (Tristán-López & Ylizaliturri-Salcedo, 2014). Such skills areessential for individuals in K-12, post-secondary and workplace environments.

For-profit companies have attempted to measure ICT literacy to meet the No Child Left Behind (USDOE, 2001) mandate of every childbeing technologically literate by 8th grade. States have employed different methods to address this mandates with many relying on privatecompanies. Many of these tools, such as the TechLiteracy Assessment (Learning, 2012) claim alignment with NETS*S. However, most for-profitcompanies provide little evidence of a rigorous design, development and validation process. The PISA (Program for International StudentAssessment) indirectly measures some ICT-related items such as frequency of use and self-efficacy via self-report, but ICT literacy is not afocus of the assessment. Instead, the PISA measures reading literacy, mathematics literacy, and science literacy of 15 year-old high schoolstudents (PISA, 2012).

Many states have adopted the TAGLIT (Taking a Good Look at Instructional Technology) (Christensen & Knezek, 2014) to meet the NCLBreporting requirements. This tool includes a suite of online assessments for students, teachers, and administrators and claims to be con-nected to the NETS*S for the student assessment. The questions of the assessment were originally developed by the University of NorthCarolina Center for School Leadership Development. This is a traditional online assessment that includes a wide range of questions focusingon the knowledge, skills, and dispositions related to ICT literacy. The utility includes a reporting function for schools to use for reporting andplanning purposes. However, very little research has been published on the design, development, and validation of this suite of tools forpublic inspection.

A promising new initiative is the first-ever National Assessment of Education Progress (NAEP) Technology and Engineering Literacy (TEL)assessment, which is currently under development (NAEP, 2014). TEL is designed to complement other NAEP assessments in mathematicsand science by focusing specifically on technology and engineering constructs. Unlike the other NAEP instruments, the TEL is completelycomputer-based and includes interactive scenario-based tasks in simulated software environments. The TEL is scheduled for pilot testingwith 8th grade students in the Fall of 2013, and slated for release to the wider public in sometime in 2014. However, if one carefully readsover the framework for this instrument, one will discover the instrument is not designed to purely measure ICT literacy. Rather, the in-strument focuses on three interrelated constructs, including Design and Systems, Technology and Society, and Information and CommunicationTechnology (NAEP, 2014).

This paper focuses on a performance-based instrument known as the Student Tool for Literacy (ST2L) designed tomeasure the ICT literacyskills of middle grades students in Florida using the 2007 National Educational Technology Standards for Students (NETS*S). This is thesecond iteration of the ST2L with the first iteration aligned with the original 1998 NETS*S (Hohlfeld, Ritzhaupt, & Barron, 2010). Specifically,this paper provides validity and reliability evidence for the scores using both item response theory and testlet response theory (Wainer,Bradlow, & Wang, 2007).

2. Measuring ICT literacy

The definition, description, and measurement of ICT literacy has been a topic under investigation primarily since the advent of theWorldWide Web in the early nineties. Several scholars, practitioners, and reputable organizations have attempted to carefully define ICT literacywith associated frameworks, and have attempted to design, develop, and validate reliable measures of this multidimensional construct. Forinstance, in Europe, they have created the European Computer Driving License Foundation (ECDLF), which is a framework and comprehensiveassessment of ICT literacy skills used to certify professionals working in the information technology industry. This particular certificate hasbeen adopted by 148 countries around the world in 41 different languages (Christensen & Knezek, 2014). We attempt to review some of thepublished measures of ICT literacy and related constructs in this short literature review. We do not claim to cover all instruments of ICTliteracy; rather, we cover instruments that were published and provided evidence of both validity and reliability.

Compeau and Higgins (1995) provide one of the earlier andmore popular measures of computer-self efficacy and discuss its implicationsfor the acceptance of technology systems in the context of knowledgeworkers. Themeasure is intended to be usedwith knowledgeworkers.Building on theworks of Bandura (1986), computer-self efficacy is defined as “a judgment of one’s capability to use a computer” (Compeau &Higgins, 1995, 192). Their study involved more than 1000 knowledge workers in Canada, and several related measurement systems,including computer affect, anxiety, and use. They designed and tested a complex path model to examine computer-self efficacy and itsrelationship with the other constructs. Unsurprisingly, computer self-efficacy was significantly and negatively correlated with computeranxiety. Also, computer use has a significant positive correlation with computer self-efficacy. This scale has been widely adopted, and thearticle has been cited more than 2900 times according to Google Scholar.

Parasuraman (2000) provides a comprehensive overview of the Technology Readiness Index (TRI), which is a multi-item scale designedto measure technology readiness, a construct similar to ICT literacy. Parasuraman (2000) defines technology readiness as “people’s pro-pensity to embrace and use new technologies for accomplishing goals in home life and at work” (p. 308). This measure is intended to be usedby adults in marketing and business contexts. The development process included dozens of technology-related focus groups to generate theinitial item pool followed by an intensive study on the psychometric properties of the scale (including factor analysis and internal con-sistency reliability). Though the TRI has been mostly used in business and marketing literature, it demonstrates that other disciplines arealso struggling with this complex phenomenon.

Bunz (2004) validated an instrument to assess people’s fluency with the computer, e-mail, and the Web (CEW fluency). The instrumentwas developed based on extensive research on information and communication technology literacies. The research was conducted in twophases. First, the instrument was tested on 284 research participants and a principle component factor analysis with varimax rotationresulted in 21 items in four constructs: computer fluency (a ¼ .85), e-mail fluency (a ¼ .89), Web navigation (a ¼ .84), and Web editing(a ¼ .82). The 4-factor solution accounted for more than 67% of the total variance. In the second phase, Bunz’s (2004) 143 participants

A.C. Huggins et al. / Computers & Education 77 (2014) 1–12 3

completed the CEW scale and several other scales to demonstrate convergent validity. The correlations were strong and significant. Themeasure was used with students in higher education contexts. Overall, preliminary support for the scale’s reliability and validity was found.

Katz and Macklin (2007) provided a comprehensive study of the ETS ICT Literacy Assessment (renamed iSkills) with more than 4000college students from more than 30 college campuses in the U.S. The ETS assessment of ICT literacy focuses on several dimensions of ICTliteracy that are measured in a simulated software environment, including defining, accessing, managing, integrating, evaluating, creatingand communicating using digital tools, communications tools, and/or networks (Katz &Macklin, 2007). They systematically investigated therelationship among scores on the ETS assessment and self-report measures of ICT literacy, self-sufficiency, and academic performance asmeasured by the cumulative grade point average. The ETS assessment was found to have small to moderate statistically significant cor-relations with other measurements of ICT literacy, which provides evidence of convergent validity of the measurement system. ETS con-tinues to administer the iSkills assessment to college students at select universities, and provides comprehensive reporting.

Schmidt et al. (2009) developed a measure of Technological Pedagogical Content Knowledge (TPACK) for pre-service teachers based onMishra and Koehler’s (2006) discussion of the Technological Pedagogical Content Knowledge framework. Though not a pure measure of ICTliteracy, the instrument includes several technology-related items that attempt to measure a pre-service teacher’s knowledge, skills, anddispositions towards technology. The development of the instrument was based on an extensive review of literature surrounding teacheruse of technology and an expert review panel appraising the items generated by the research team for relevance. The researchers thenconducted a principal component analysis and internal consistency reliability analysis of the associated structure of the instrument. Theinstrument has been widely adopted (e.g., Abitt, 2011; Chai, Ling Koh, Tsai, & Wee Tan, 2011; Koh & Divaharan, 2011).

Hohlfeld et al. (2010) reported on the Student Tool for Technology Literacy (ST2L) development and validation process, and providedevidence that the ST2L produces valid and reliable ICT literacy scores for middle grade students in Florida based on the 1998 NETS*S. TheST2L includes more than 100 items, most of which are performance assessment items in which the learner responds to tasks in a softwareenvironment (e.g., Spreadsheet) that simulates real world application of ICT literacy skills. The strategy for developing the technology toolwas as follows: 1) technology standards were identified; 2) grade-level expectations/benchmarks for these standards were developed; 3)indicators for the benchmarks were outlined, and 4) specific knowledge assessment items were written and specific performance or skillassessment items were designed and programmed. Using a merger of design-based research and classical test theory, Hohlfeld et al. (2010)demonstrate the tool to be a sound assessment tool for the intended purpose of low-stakes assessment of ICT literacy.Worth noting, the ST2Lhas been used by more than 100,000 middle grade students in the state of Florida since its formal production release (ST2L, 2013).

Across these various studies that all address the complex topic of measuring ICT literacy, we can make a few observations. First, there isnot have consensus on the language used to describe this construct. Computer-self efficacy, CEW fluency, ICT literacy, technology readiness,or technology proficiency are all terms that can be used to describe a similar phenomenon. Second, each article presented here built on aconceptual framework to explain ICT literacy (e.g., social cognitive theory, NETS*S, TPACK, etc.) and used sound development and validationprocedures. We feel this is an important aspect of the work on ICT literacy and that it must be guided by frameworks and theories to informour research base. Third, the instruments were developed for various populations, including pre-service teachers, middle grade students,knowledge workers, college students, and more. Special attention must be paid to the population the ICT literacy measurement is designedfor. Finally, there are several different methods to measure this complex phenomena, ranging from traditional paper/pencil instruments toonline assessments to fully computer-based simulated software environments. The authors feel that the future of measuring ICT literacyshould embrace an objective performance-based assessment in which the learners are responding to tasks in a simulated software envi-ronment, like the ST2L.

3. Purpose

Following the recommendations of Hohlfeld et al. (2010), this paper presents a validation of scores on the Student Tool for TechnologyLiteracy (ST2L), a performance assessment originally based on the 1998 NETS*S and recently revised to alignwith the 2007 NETS*S. This toolwas developed through Enhancing Education Through Technology (EETT) funding as an instrument to assess the ICT literacy of middle gradestudents in Florida (Hohlfeld et al. (2010)). This paper provides the validity and reliability evidence for scores on the modified instrument

Table 1Demographic statistics.

Variable Groups Frequency (n) Percentage (%)

Grade 5 5 .096 1234 20.977 1598 27.168 3035 51.589 9 .1510 1 .0211 2 .03

Gender Male 2934 49.86Female 2950 50.14

Race Asian 125 2.12Black 1114 18.93Hispanic 682 11.59White 3626 61.62Other 337 5.73

Free/Reduced lunch Yes 3569 60.66No 2315 39.34

English with family Yes 5508 93.61No 376 6.39

A.C. Huggins et al. / Computers & Education 77 (2014) 1–124

according to these new standards (ISTE, 2007), with methodology operating under both item response theory and testlet response theory(Wainer et al., 2007). Specifically, this paper addresses the following research questions: (a) Do scores on the ST2L display evidence ofinternal structure validity?, and (b) Do scores on the ST2L display evidence of external structure validity?

4. Method

4.1. Participants

Middle school teachers from13 Florida school districts were recruited from the EETTgrant program. Teachers were provided an overviewof the ST2L, how to administer the tool, and how to interpret the scores. Teachers then administered the ST2L within their classes during thefall 2010 semester. Table 1 details demographic information for the sample ofN¼ 5884 examinees. The bulk of students (i.e., n¼ 5867) are ingrades 6 through 8, with a wide range of diversity in gender, race and free/reduced lunch status. A small percentage (i.e., 7%) of examineeswas from families that did not speak English in the home.

4.2. Measures

ST2L: The ST2L is a performance-based assessment designed to measure middle school students’ ICT literacy across relevant domainsbased on the 2007 NETS-S: Technology Operations and Concepts, Constructing and Demonstrating Knowledge, Communication and Collabo-ration, Independent Learning, and Digital Citizenship. These standards are designed to consider the breadth and depth of ICT literacy and to beflexible enough to adapt as new technologies emerge. NETS*S have been widely acclaimed and adopted in the U.S. and in many countriesaround the world. They are being used by schools for curriculum development, technology planning and school improvement plans.

The ST2L includes 66 performance-based tasks and 40 selected-response items, for a total of 106 items. The selected-response item typesinclude text-basedmultiple-choice and true/false items, as well as multiple-choice items with graphics and image map selections (see Fig. 1for an example). The performance-based items require the examinee to complete multiple tasks nested within simulated software envi-ronments, and these sets of performance-based items were treated as testlets (i.e., groups of related items) in the analysis (see Fig. 2 for anexample). The testlets allow for ease on the examinee as multiple items are associated with each prompt, and they are also more applicableto technological performance of the examinees outside of the assessment environment. The original version of the ST2L was previously pilottested on N¼ 1513 8th grade students (Hohlfeld et al., 2010). The purpose of the pilot test was to provide a preliminary demonstration of theoverall assessment quality by considering classical test theory (CTT) item analyses, reliability, and validity. Pilot analysis results indicatedthat the original version of the ST2L was a sound low-stakes assessment tool. Differences between the piloted tool and the current toolreflect changes in national standards. In the current dataset for this study, Cronbach’s alpha as a measure of internal consistency of the ST2Litems was estimated as a ¼ .96.

For the ST2L assessment used in this study, there were fourteen sections of items. These fourteen sections are a part of the NETS*Sdomains as defined and described by ISTE. The first consisted of fifteen selected-response items measuring the construct of technologyconcepts, which was shortened to techConcepts in the remaining text and tables. The second consisted of four performance-based itemsthatmeasured the examinee’s ability tomanipulate a file, whichwas shorted to techConceptsFileManip in the remaining text and tables. Thethird and fourth sections consisted of ten and three performance-based items, respectively, that measured the examinee’s ability to performresearch in a word processor, which was shortened to researchWP. The fifth section measured the examinee’s ability to perform researchwith a flowchart with five performance-based items (i.e., researchFlowchart). The sixth, seventh, and eight sections measured examinee’screative ability with technology, each with four performance-based items that focused on the use of graphics, presentations, and videos,

Fig. 1. Example multiple-selection item.

Fig. 2. Example performance-based task item.

A.C. Huggins et al. / Computers & Education 77 (2014) 1–12 5

respectively (i.e., creativityGraphics, creativityPresent, creativityVideo). The ninth, tenth, and eleventh sections consisted of eight, six, andfour performance-based items, respectively, which measured examinee ability in applying technological communication through browsers(i.e., communicationBrowser) and email (i.e., communicationEmail). The twelfth and thirteenth sections measured critical thinking skills intechnology with five and nine performance-based items, respectively (i.e., criticalThink). Finally, the fourteenth section measured digitalcitizenship of examinees with twenty-five selected-response items, which was shortened to digitalCit.

PISA: The PISA questionnaire was included in this study as a criterion measure for assessing external validity of ST2L scores. It has beenrigorously analyzed to demonstrate both reliability and validity across diverse international populations (OECD, 2003). Students were askedto provide information related to their Comfort with Technology, Attitudes towards Technology, and Frequency of Use of Technology. The threeconstructs employed different scales for which internal consistency in this study’s dataset was a¼ .78, a¼ .89, and a¼ .54, respectively. Thelow internal consistency of the attitudes toward technology is expected due the shortness of the scale (i.e., five items).

4.3. Procedures

Datawere collected in the fall semester of 2010. Middle school teachers from the 13 Florida school districts were recruited from the EETTgrant program. Teachers were provided an overview of the ST2L, how to administer the tool, and how to interpret the scores. Teachers thenadministered the ST2L within their classes during the fall 2010 semester. Teachers also had the opportunity to report any problems with theadministration process.

4.4. Data analysis: internal structure validity

The testlet nature of the items corresponds with a multidimensional data structure. Each testlet item is expected to contribute to the ICTliteracy dimension as well as to a second dimension representing the effect of the testlet in which the item is nested. Dimensionality as-sumptions of the testlet response model were assessed via confirmatory factor analysis (CFA). Fit of the model to the item datawas assessedwith the S–X2 index (Orlando & Thissen, 2000, 2003).

Data from the selected response (i.e., multiple-choice/true-false) non-testlet items were then fit to a three-parameter logistic model(3PL; Birnbaum, 1968). The 3PL is defined as

PsiðYi ¼ 1jqs; ai; bi; ciÞ ¼ ½ci*ð1� ciÞ�"

eaiðqs�biÞ

1þ eaiðqs�biÞ

#; (1)

where i refers to items, s refers to examinees, Y is an item response, q is ability (i.e., ICT literacy), a is item discrimination, b is item difficultyand c is item lower asymptote.

Data from the performance-based (i.e., open-response) testlet items were fit to a two-parameter logistic testlet model (2PL; Bradlow,Wainer, & Wang, 1999). The 2PL testlet model is defined as

Psi�Yi ¼ 1

���qs; ai; bi;gsdðiÞ� ¼"

eaiðqs�bi�gsdðiÞÞ1þ eaiðqs�bi�gsdðiÞÞ

#; (2)

where Ysd(i) represents a testlet (d) effect for each examinee. The testlet component (Ysd(i)) is a random effect, allowing for a variance

A.C. Huggins et al. / Computers & Education 77 (2014) 1–126

estimate of Ysd(i) for each testlet. A 2PL testlet model was selected over 3PL because the item formats did not lend to meaningful chances ofsuccessful guessing.

The item calibration was completed with Bayesian estimation using Markov chain Monte Carlo methods in the SCORIGHT statisticalpackage (Wainer, Bradlow, & Wang, 2010; Wang, Bradlow, & Wainer, 2004). Item fit, item parameter estimates, testlet effect variancecomponents, standard errors of measurement, information, reliability, and differential item functioning (DIF) were examined for internalstructure validity evidence.

4.5. Data analysis: external structure validity

ICT literacy latent ability estimates were correlated with the three PISA measures (i.e., use of technology, general knowledge of tech-nology, and attitudes toward technology) to assess external structure validity evidence. Positive, small to moderate correlations were ex-pected with all three external criteria, with a literature-based hypothesis that comfort with technology would yield the strongest relativecorrelation with technology literacy (Hohlfeld et al., 2010; Katz & Macklin, 2007).

5. Results

Prior to addressing the research questions on internal and external structure validity evidence, items with constant response vectors andpersons with missing data had to be addressed. Two items on the assessment had constant response vectors in this study’s sample (i.e.,researchWP21 was answered incorrectly by all examinees and communicationEmail24 was answered correctly by all examinees), and weretherefore not included in the analysis. The first stage of analysis was focused on determining the nature of themissing data on the remaining104 test items used in the analysis.

Basing our missing data analysis process on Wainer et al. (2007), we began by coding the missing responses as omitted responses andcalibrated the testlet model. We then coded the missing responses as wrong and recalibrated the testlet model. The theta estimates fromthese two calibrations correlated at r¼ .615 (p< .001). This indicated that the choice of how to handle our missing data was non-negligible.We then identified a clear group of examinees for whom ability estimates were extremely different between the two coding methods.Specifically, they had enough missing data to result in very low ability estimates when missing responses were coded as wrong and averageability estimates with very high standard errors when missing responses were coded as omitted. We then correlated the a and b itemparameter estimates from the calibration with missing responses coded as omitted with the a and b item parameter estimates from thecalibration with missing responses coded as wrong, respectively. Discrimination (a) parameters were mostly larger when missing data wastreated as omitted, and the correlation indicated that the differences between the calibrations were non-negligible (r ¼ .629, p < .001).Difficulty (b) parameters were more similar across the calibrations with a correlation at r ¼ .931 (p < .001). The lack of overall similarity inresults shown by these correlations indicated that coding the missing data as wrong was not a viable solution. In addition, it was clear thatsome individuals (n ¼ 109) with large standard errors of ability estimates when missing data was coded as omitted had to be removed fromthe data set. Ultimately, they did not answer enough items to allow for accurate ability estimation, and their inclusion would thereforecompromise future analysis, such as the correlations between ability and external criteria. Majority of these 109 individuals answered onlyone item of the 106 test items.

The remaining data set of N ¼ 5884 examinees (i.e., those discussed in the above Participants section) was examined for nature ofmissingness according to Enders (2010). For each item, we coded missing data as 1 and present data as 0. We treated these groups asindependent variables in t-tests inwhich the dependent variable was either the total score on frequency of technology use items or the totalscore on attitudes toward technology items. All t-tests for all items were non-significant, indicating that the missingness was not related tofrequency of technology uses or attitudes toward technology. We were unable to perform t-tests on the total score of self-efficacy withtechnology use of items due to severe violations of distributional assumptions (i.e., a large portion of persons scored the maximum score forself-efficacy), and hence a simplemean comparisonwas utilized. The self-efficacy total scores ranged from 0 to 76, and self-efficacymeans ofthe group of missing data differed from the means of the group with non-missing data by less than three points on all items. We concludedthat these differences were small and, therefore, that missingness was not related to this variable. Based on these analyses, we proceededunder the assumption that missing data was ignorable (MAR) for the 3 PL and testlet model analysis.

5.1. Internal structure validity evidence

Before fitting the 3PL and testlet models, we checked the assumption of model fit through CFA analysis with a hypothesis that each itemwould load onto the overall ability factor (theta) as well as a testlet factor associated with the testlet in which the item was nested. Fig. 3shows an abbreviated diagram of the CFA model fit in Mplus version 7 (Muthén & Muthén, 2012), with weighted least squares estimationwith adjusted means and variances. All latent factors and item residuals were forced to an uncorrelated structure. The model fit the data toan acceptable degree, as indicated by the root mean square error of approximation (RMSEA ¼ .068), the comparative fit index (CFI ¼ .941)and the Tucker–Lewis fit index (TLI ¼ .938). While the fit could have been improved slightly, these results were deemed acceptable formeeting the dimensionality assumptions of the item/testlet response models.

We then assessed item fit in IRTPro (Cai, Thissen, & du Toit, 2011) to determine if each multiple choice/true-false item fit the 3PL modeland if each open-response item fit the 2PL testlet model. This statistical package was used for item fit as the calculation of fit indices is builtinto the program, however it was not used for final parameter estimation as it lacks the preferred Bayesian estimation approaches used inthis study. The S-X2 item fit index (Orlando & Thissen, 2000, 2003) was used to assess fit of the model to the item data, and significance levelat a ¼ .001 was used due to sensitivity of the chi-square test to large sample sizes. A total of four (i.e., <4%) of the items were deemed asdisplaying problematic misfit of themodel to the data. Onewas a techConcepts item inwhich examinees scoring below a summated score of82 on the test often displayed a frequency of observed correct responses that was below the expected frequency of correct responses for theitem. For examinees scoring above a summated score of 82 on the test, the opposite pattern was often observed. Another item with misfitconcerns was a communicationEmail item in which there was a variety of both over predicted expected correct responses and under

Fig. 3. Diagram of Confirmatory Factor Analysis used for Model Fit Testing.

A.C. Huggins et al. / Computers & Education 77 (2014) 1–12 7

predicted expected correct responses across the range of total test scores. An additional communicationEmail item had misfit concerns atthe middle range of total summated scores associated with an expected number of correct responses that was lower than the observedcorrect responses, and the opposite pattern for more extreme total summated scores. A final itemwith misfit concerns was a digitalCit itemin which there was a consistent pattern of expected correct responses being higher than observed correct responses, except within totalsummated scores above 85. To determine if the misfit associated with these items was problematic for ability estimation, two sets of abilityestimates were calculated in a separate set of analyses in IRTPro; one analysis in which all items were included and another in which themisfitting items were removed from the analysis. The correlation between the two sets of estimated ability parameters (using expected aposterior estimation) was r ¼ .99 (p < .001), indicating that the inclusion of the four misfit items in the assessment analysis was notproblematic for ability estimation.

The final Bayesian testlet model was then estimated within the SCORIGHT package with five MC chains, 20,000 iterations, and 3000discarded draws within each iteration, based on recommendations from Sinharay (2004) and Wang et al. (2010). Acceptable modelconvergence was reached as indicated by confidence interval shrink statistics near one (Wang et al., 2004). The variance estimates of thetwelve testlet effects (Yd) are shown in Table 2. All variances are significantly different from0, indicating that their effect on item responses isnon-negligible and the testlet model must be retained. In other words, the tool measures some types of abilities that are associated morewith particular testlets than with general ICT literacy, and the use of the testlet model separates these components providing for a moreaccurate estimate of ICT literacy for each student.

Fig. 4 is a plot of the standard error of measurement (sem) of q estimates (i.e., ICT literacy estimates) for each examinee. Approximately82% of the sample had estimates with sem � .30, and 95.53% of the sample had estimates with sem � .40. Several of the 4.47% of examineeswith larger sem estimates were further examined. For example, individuals ID¼ 3286 and ID¼ 5421 (see Fig. 4) answered less than six itemson the assessment.

Using the sem estimates, there are two ways we estimated reliability/information of the theta estimates. Under item response theory,information can be calculated from the sem and can be used as an indicator of reliability. The relationship between sem and test levelinformation is defined as

IðqÞ ¼�

1sem

�2

; (3)

where IðqÞ represents the level of test information at a particular value of theta. Therefore, having 95.53% of the sample with sem � .40indicates that 95.53% of the sample has a test information level of IðqÞ � 6:25. For 82% of the sample, the test information level is at IðqÞ �11:11.

Table 2Estimated variance of testlet effects.

Testlet Teslet# EstimatedVariance of Yd se(Var [Yd])

techConceptsFileManip 1 .67 .04researchWP1 2 .32 .02researchWP2 3 .23 .03researchFlowChart 4 .37 .03creativityGraphics 5 .52 .04creativityPresent 6 .25 .03creativityVideo 7 .23 .04communicationBrowser 8 .31 .02communicationEmail1 9 .35 .02communicationEmail2 10 .47 .04criticalThink1 11 .33 .03criticalThink2 12 .23 .01

Fig. 4. Standard error of measurement of ICT literacy estimates (qs).

A.C. Huggins et al. / Computers & Education 77 (2014) 1–128

For a reliability coefficient that aligns with CTT methodology, we can compute the test level reliability if we assumed that all individualshad the same sem. Under CTT,

r ¼ 1��semsq

�2; (4)

where r represents the CTT reliability coefficient and sq represents the standard deviation of latent ability scores. For the sample in thisstudy, sq ¼ :97: If all examinees had a sem ¼ .4, then r ¼ .83. If all examinees had a sem ¼ .3, then r ¼ .90. Therefore, the CTT reliabilityestimate for the scores in this sample is between r ¼ .83 and r ¼ .90 for 95.53% of examinees.

Item parameter estimates are presented in Table 3. All discrimination parameters estimates were above ai ¼ .39, indicating positive,moderate to large relationships between item responses and ICT literacy latent scores. Difficulty parameter estimates were distributednormally with Mb ¼ �.39 and SDb ¼ 1.29, with two outlying items of extreme difficulty. Specifically, one of the criticalThink items wasextremely difficult (i.e., bb ¼ 4:89) and one of the researchWP items was also extremely difficult (i.e., bb ¼ 4:66). Four items displayed lowerasymptote estimates of ci � .49, indicating large amounts of guessing on those selected-response items. All four were digitalCit items.

DIF was examined with non-parametric tests that allow for ease of examination of DIF across a large amount of items. Due to the largenumber of items, it was expected that some items would display DIF, so we beganwith an examination of differential test functioning (DTF)to first determine any grouping variables that had DIF in items that aggregated to a problematic amount of test level differences, or DTF.Weighted s2 variance estimates of DTF (Camilli & Penfield, 1997) were estimated in the DIFAS package (Penfield, 2012) and showed small,negligible DIF across groups defined by gender, race (collapsed), free/reduced lunch, and English spoken in the home. Grade (collapsed into6th, 7th, and 8th) showed a relatively larger DTF variance estimate, specifically when comparing 6th grade to 8th grade examinees. We thenestimated DIF across 8th and 6th graders in the DIFAS package (Penfield, 2012) and used Educational Testing Service’s classification of A, B,and C items (Zieky, 1993) to flag items with small, moderate, and large DIF. We located three items with large DIF (researchWP13, digi-talCit15, and digitalCit25) and thirteen items with moderate DIF (techConcepts8, researchWP15, researchFlowchart2, creativityPresent4,creativityVideo4, communicationBrowser5, communicationEmail11, communicationEmail21, criticalThinkSS26, criticalThinkSS28, digital-Cit4, digitalCit5, and digitalCit10). We reran the DIF analysis by estimating proficiency only on the items that were not flagged as large andmoderate DIF, and found that three of the above itemswere no longer problematic (i.e., displayed small DIF), but the remainder were flaggedas either moderate or large DIF across groups defined by 6th grade and 8th grade classification.

5.2. External structure validity evidence

The technology literacy estimates were then correlated to three outside criteria measures from the PISA. Pearson’s correlations with eachof the summated scores from the three criteria are presented in Table 4. The correlations are all positive, small to moderate and statisticallysignificant. The strongest correlation is with the Comfort with Technology scores, followed by Attitudes towards Technology scores and Fre-quency of Use of Technology scores.

6. Discussion

The results of this study must be interpreted with an understanding of the limitations and delimitations of this research. This study waslimited tomiddle grade students (N¼ 5884) from school districts in Florida during the fall of 2010.While the ST2L is intended to be softwareand operating system independent, students may not find the interface similar enough to the specific software suites that they areaccustomed to using in their schools and homes. Thus, the ST2L may not adequately measure the knowledge and skills of these students. Theexternal validation process included correlating the scores of the ST2L to perceived technology ability levels (Comfort with Technology,

Table 3Item parameter estimates from Bayesian testlet model estimation with MCMC methods.

Item Testlet# a se(a) b se(b) c se(c)

techConcepts1 NA 1.71 .10 .34 .04 .15 .02techConcepts2 NA 1.31 .10 .48 .07 .24 .02techConcepts3 NA .87 .06 �.95 .17 .21 .05techConcepts4 NA 1.41 .13 .97 .06 .24 .02techConcepts5 NA 1.54 .10 �.72 .09 .32 .04techConcepts6 NA .39 .08 �1.32 .90 .47 .09techConcepts7 NA 2.66 1.39 2.24 .56 .36 .13techConcepts8 NA 1.06 .08 .49 .08 .18 .03techConcepts9 NA 1.53 .20 2.04 .11 .18 .01techConcepts10 NA 1.81 .12 �.98 .09 .38 .04techConcepts11 NA 1.46 .11 �1.10 .14 .41 .05techConcepts12 NA 1.45 .09 �1.56 .14 .31 .06techConcepts13 NA 1.02 .10 �.15 .16 .29 .05techConcepts14 NA 1.69 .10 �.02 .06 .22 .02techConcepts15 NA 1.06 .07 �1.03 .16 .27 .05

techConceptsFileManip1 1 1.43 .06 �.51 .03 – –

techConceptsFileManip2 1 1.46 .06 �.41 .03 – –

techConceptsFileManip3 1 1.66 .07 �1.49 .05 – –

techConceptsFileManip4 1 1.21 .05 �.55 .04 – –

researchWP11 2 .44 .04 4.66 .40 – –

researchWP12 2 1.60 .06 �.87 .03 – –

researchWP13 2 1.72 .08 1.37 .04 – –

researchWP14 2 2.13 .08 �.93 .03 – –

researchWP15 2 3.00 .12 �.44 .02 – –

researchWP16 2 1.80 .06 �.94 .03 – –

researchWP17 2 3.47 .15 �.48 .02 – –

researchWP18 2 2.68 .11 �1.25 .03 – –

researchWP19 2 1.63 .06 �.08 .03 – –

researchWP110 2 2.46 .09 .18 .02 – –

researchWP22 3 2.37 .12 .50 .02 – –

researchWP23 3 1.62 .07 1.37 .05 – –

researchFlowchart1 4 1.16 .05 �1.37 .06 – –

researchFlowchart2 4 1.16 .05 �.63 .04 – –

researchFlowchart3 4 1.14 .05 .11 .03 – –

researchFlowchart4 4 2.28 .11 �.01 .02 – –

researchFlowchart5 4 1.65 .07 .34 .03 – –

creativityGraphics1 5 1.89 .09 �.16 .03 – –

creativityGraphics2 5 .66 .04 2.55 .15 – –

creativityGraphics3 5 1.27 .06 .73 .03 – –

creativityGraphics4 5 1.21 .05 �.60 .04 – –

creativityPresent1 6 2.05 .09 �1.63 .05 – –

creativityPresent2 6 1.16 .05 �1.03 .05 – –

creativityPresent3 6 2.44 .11 �.53 .03 – –

creativityPresent4 6 2.04 .11 �2.40 .07 – –

creativityVideo1 7 .97 .04 �1.72 .07 – –

creativityVideo2 7 1.00 .05 1.66 .07 – –

creativityVideo3 7 1.24 .07 1.78 .07 – –

creativityVideo4 7 1.57 .07 1.39 .05 – –

communicationBrowser1 8 .97 .04 �1.38 .06 – –

communicationBrowser2 8 1.85 .08 �1.74 .05 – –

communicationBrowser3 8 4.73 .41 �1.99 .05 – –

communicationBrowser4 8 4.44 .31 �1.62 .04 – –

communicationBrowser5 8 3.03 .16 �1.71 .05 – –

communicationBrowser6 8 1.77 .07 .20 .03 – –

communicationBrowser7 8 1.36 .05 �.88 .04 – –

communicationBrowser8 8 2.24 .10 �1.44 .04 – –

communicationEmail11 9 1.13 .05 �.56 .04 – –

communicationEmail12 9 1.52 .06 �1.70 .05 – –

communicationEmail13 9 1.68 .07 �1.50 .05 – –

communicationEmail14 9 2.20 .10 �1.86 .05 – –

communicationEmail15 9 2.23 .10 �.93 .03 – –

communicationEmail16 9 1.97 .10 �2.42 .07 – –

(continued on next page)

A.C. Huggins et al. / Computers & Education 77 (2014) 1–12 9

Table 3 (continued )

Item Testlet# a se(a) b se(b) c se(c)

communicationEmail21 10 1.08 .05 �1.08 .05 – –

communicationEmail22 10 1.51 .07 �1.09 .04 – –

communicationEmail23 10 1.97 .09 �1.23 .04 – –

criticalThinkSS11 11 1.52 .07 �1.78 .06 – –

criticalThinkSS12 11 1.90 .09 �1.92 .06 – –

criticalThinkSS13 11 1.01 .05 .70 .04 – –

criticalThinkSS14 11 1.91 .08 �.77 .03 – –

criticalThinkSS15 11 1.12 .05 �.97 .05 – –

criticalThinkSS21 12 1.88 .07 �.77 .03 – –

criticalThinkSS22 12 1.50 .23 4.89 .55 – –

criticalThinkSS23 12 1.75 .07 �.03 .03 – –

criticalThinkSS24 12 1.91 .07 �.44 .03 – –

criticalThinkSS25 12 1.85 .08 �1.50 .05 – –

criticalThinkSS26 12 2.04 .08 �.83 .03 – –

criticalThinkSS27 12 1.42 .05 �.63 .03 – –

criticalThinkSS28 12 2.10 .08 �.67 .03 – –

criticalThinkSS29 12 1.68 .09 1.95 .07 – –

digitalCit1 NA .40 .06 �2.05 .92 .54 .09digitalCit2 NA 1.85 .13 �.05 .06 .34 .03digitalCit3 NA .61 .12 .70 .35 .29 .07digitalCit4 NA 2.65 .16 �.60 .05 .32 .03digitalCit5 NA 2.57 .15 �.78 .05 .30 .03digitalCit6 NA 2.52 .14 �.52 .05 .29 .02digitalCit7 NA .50 .06 �1.83 .56 .39 .09digitalCit8 NA 1.34 .10 �1.11 .16 .42 .05digitalCit9 NA 1.55 .13 �1.39 .17 .49 .05digitalCit10 NA 1.52 .11 �.92 .11 .37 .04digitalCit11 NA 1.96 .11 �.20 .05 .26 .02digitalCit12 NA 2.22 .19 �1.05 .11 .53 .04digitalCit13 NA 1.99 .12 �.20 .06 .31 .02digitalCit14 NA 1.25 .10 �1.47 .19 .39 .06digitalCit15 NA 2.56 .19 �.99 .08 .42 .04digitalCit16 NA .82 .14 2.19 .16 .16 .03digitalCit17 NA 2.13 .23 �.79 .14 .62 .04digitalCit18 NA 2.25 .23 .21 .07 .55 .02digitalCit19 NA 2.50 .17 �.78 .07 .37 .03digitalCit20 NA .82 .08 .19 .16 .22 .04digitalCit21 NA 1.54 .18 1.06 .06 .30 .02digitalCit22 NA 2.83 .18 �.45 .05 .29 .02digitalCit23 NA 1.80 .10 �.58 .06 .23 .03digitalCit24 NA 2.14 .12 �.14 .04 .20 .02digitalCit25 NA 2.61 .15 �.60 .05 .28 .03

A.C. Huggins et al. / Computers & Education 77 (2014) 1–1210

Attitudes towards Technology, and Frequency of Use of Technology), which was based on self-report measures. Thus, students completing theseself-assessments might have provided what they perceived as socially acceptable responses.

In light of these constraints, the overall analysis indicated that the ST2L tool was able to produce scores for the examinees that hadsufficient reliability, sound psychometric properties providing evidence of internal validity, and evidence of external validity in relationshipto related constructs. Specifically, CTT reliability coefficients, item response theory information indicators, and Cronbach’s alpha of internalconsistency were all of sufficient magnitude to utilize the test for low-stakes purposes. Tests with stakes for students require reliabilitycoefficients greater than .80, with a preference for coefficients of .85–.90 (Haertel, 2013), which were met by the data in this study eventhough the stakes associated with the test are low.

For internal validity, the items fit the theorized testlet and item response theory models to an acceptable degree, ICT literacy estimateswere of a sufficiently low standard error of measurement, item difficulty was largely aligned with the desired property of developing a testthat covers a wide range of technology literacy levels, item discriminations were sufficiently high to indicate that all items scores werecorrelated with ICT literacy scores, and the vast majority of subgroups of examinees in the population (e.g., racial subgroups) displayedinvariant measurement properties in the items of the ST2L tool. All of these internal psychometric properties indicate that the tool isoperating as designed and can produce reliable test scores without wasting examinee time on a test that is too difficult/easy for thepopulation or that lacks the power to discriminate between persons with different levels of ICT literacy.

Table 4Correlation of ICT literacy estimates and external criteria scores.

External criteria Correlation with ICT literacy (r) Statistical significance of correlation (p)

Frequency of Technology Use .131 <.001Comfort with Technology .333 <.001Attitude toward Technology .212 <.001

A.C. Huggins et al. / Computers & Education 77 (2014) 1–12 11

With respect to external validity, the low-stakes ST2L is only useful if it displays the expected relationships with other constructs relatedto ICT literacy. The results of the external criteria analysis showed that the ICT literacy scores on the ST2L have the small tomoderate, positiverelationships that were expected with the constructs of frequency of technology use, comfort with technology, and attitudes towardtechnology. In addition, comfort with technologies displayed the strongest relationship with technology literacy, supporting the researchhypothesis.

The ST2L tool was originally developed according to low stakes test purposes. For low stakes applications, the ST2L is more than satis-factory as indicated by internal and external structure analysis. Beginningwith the framework of the NETS*S, an extensive, thorough processwas followed for defining indicators. The assessment items were mapped to these indicators and provide measurement of the indicators ininnovative, relevant, performance-based ways. Test quality criteria demonstrate reasonable item analysis, reliability, and validity results fora relatively short, criterion referenced test. The tool may be beneficial as districts report aggregated data for NCLB purposes and teacherstarget technology related curricular needs.

Examining external structure validity evidence is always difficult when both internal and external criteria measures produce imperfecttest scores. In this study, three PISA measures were used to determine external structure of the ST2L, but the correlation estimates are mostlikely attenuated due to factors such as measurement error in the test scores, small number of items per PISA subscale, and use of observedscores for the PISA measures. Future studies may want to utilize different and more numerous external criteria.

Four conclusions drawn from the analysis are indicative of some of the limitations of the study as well as minor revisions that areneeded to the ST2L before future administration. First, two of the researchWP items (i.e., using word processors) may need revision beforefuture administration of the assessment due to extreme difficulty. One was answered incorrectly by all respondents and the other had anestimated difficulty parameter that indicated it was most appropriate for examinees who are more than four standard deviation unitsabove the mean of ICT literacy for this population. Similarly, one of the critical thinking items was also very difficult and may need revisionor removal.

Second, while the majority of examinees completed the exam in its entirety, there was a non-negligible group of examinees who quit theassessment after several items. It was not possible to obtain accurate ability estimates of these examinees, and more incentive to completethe assessment may be needed for this small group of examinees who seemed to be less motivated to complete the exam. Third, some of theselected-response items had a larger amount of guessing than desired. It is expected that some guessing will occur on multiple choice/true-false items, but some item removal or revision may be called for on those with particularly large amounts of guessing. Finally, it was clearfrom the DIF analysis that some items measured different constructs for sixth grade students as compared to eighth grade students. It was arelatively small percentage of the total items, yet it may deserve further consideration as to whether or not the ST2L tool is best used for amore homogenous population in terms of grade.

From the practical perspective, the ST2L is a tool available to middle grade educators throughout the state of Florida to meet theNCLB requirements of demonstrating the ICT literacy of 8th grade students in their respective school districts. Other states aside fromFlorida may also elect to use this tool for reporting requirements within their states by making arrangements with the FloridaDepartment of Education (FLDOE). This tool can be used as a low stakes assessment to provide data related to the technology literacy ofmiddle grade students for district reporting, curriculum design, and student self-assessment. The tool, in its present form, is notsuitable for use in high-stakes applications such as computing school grades or evaluating individual student performance for pro-motion/retention.

There is also something to be said about the development procedures for the ST2L and the associated NETS*S. As previously described, theST2L development team followed sound development procedures for item writing and review and conducted usability analyses to ensurethat the user interface and the simulated performance-based tasks were as clear and intuitive as possible (Hohlfeld et al., 2010). Thedevelopment team included content teachers (e.g., mathematics), computer teachers, educational technology specialists, media specialists,programmers, college professors and several other key stakeholders to help operationalize the NETS*S at age appropriate benchmarks. Hadthese rigorous procedures not been followed, the psychometric properties of the tool would likely not be acceptable.

The ST2L has the potential to be used in several research applications in the field of educational technology beyond simple reportingpurposes. For instance, Holmes (2012) used the ST2L as a measure of 21st century skills in her dissertation focusing on the effects of project-based learning experiences onmiddle grade students. Another example comes fromRitzhaupt, Liu, Dawson, and Barron (2013) who used theST2L to demonstrate a technologyknowledgeand skill gap (digital divide) between students of high and lowsocio-economic status,white andnon-white students, and female andmale students. Further, Hohlfeld, Ritzhaupt, and Barron (2013) used the tool in a comprehensive study ofthe relationship between gender and ICT literacy. Future researchers can use the ST2L to expand understanding of technology-enhancedteaching and learning in the 21st century while accounting for several important variables (e.g., socio-economic status) in their models.

As we have shown in this paper, there are many measurement systems that are related to ICT literacy. While each of these tools con-tributes to our understanding of the measurement of ICT literacy in various populations, not all of these measurement systems usedinnovative items (e.g., simulated software tasks) to measure the multifaceted construct. In fact, most instruments reviewed here usedtraditional measures of ICT literacy based on paper/pencil or online assessments via self-report measures (e.g., Bunz, 2004; Compeau &Higgins, 1995; Parasuraman, 2000; Schmidt et al., 2009). The authors believe that for future instruments, we must use technology in ourdesign frameworks to measure this complex construct. That is, the measurements themselves should use the types of innovate items foundon the ST2L, iSkills assessment, or the in progress NAPE TEL assessment. Doing so provides a more objective and authentic measure of ICTliteracy beyond simple self-report, as well as increases the generalizability of themeasure’s scores to real life application of ICT literacy skills.

This paper reports a systematic and rigorous process for the validation of a measure of ICT literacy based on modern test theorytechniques (i.e., item and testlet response theory). As the body of knowledge grows in this realm, wemust periodically revise and refine ourmeasures. As noted by Hohlfeld et al. (2010)

“Validation ofmeasurement instruments is an ongoing process. This is especially truewhen dealingwith themeasurement of technologyliteracy while using technology, because technology is perpetually changing. The capabilities of the hardware and software continue toimprove and new innovations are introduced. As a result, developing valid and reliable instruments and assessment tools to measure theconstruct is difficult and an ongoing process” (p 384).

A.C. Huggins et al. / Computers & Education 77 (2014) 1–1212

The authors believe that the measurement of ICT literacy is a vital 21st century measurement, as evidenced by the calls from NSF, NIH,and others. Because technology does evolve so quickly, we must periodically update our measurement systems to reflect the newest in-novations. This paper contributes to this charge by re-aligning tool with the 2007 NETS*S standards.

References

Abitt, J. T. (2011). An investigation of the relationship between self-efficacy beliefs about technology integration and technological pedagogical content knowledge (TPACK)among preservice teachers. Journal of Digital Learning in Teacher Education, 27(4), 134–143.

Bandura, A. (1986). Social foundations of thought and action. Englewood Cliffs, NJ: Prentice Hall.Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick’s (Eds.), Statistical theories of mental test scores (pp.

397–460). Reading, MA: Addison-Wesley.Bradlow, E. T., Wainer, H., & Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153–168.Bunz, U. (2004). The computer-email-web (CEW) fluency scale-development and validation. International Journal of Human–Computer Interaction, 17(4), 479–506.Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for Windows [Computer Software]. Lincolnwood, IL: Scientific Software International.Camilli, G., & Penfield, D. A. (1997). Variance estimation for differential test functioning based on Mantel–Haenszel statistics. Journal of Educational Measurement, 34, 123–139.Chai, C. S., Ling Koh, J. H., Tsai, C., & Wee Tan, L. L. (2011). Modeling primary school pre-service teachers’ technological pedagogical content knowledge (TPACK) for meaningful

learning with information and communication technology (ICT). Computers & Education, 57, 1184–1193.Christensen, R., & Knezek, G. A. (2014). Measuring technology readiness and skills. In Spector, Merrill, Elen, & Bishop (Eds.), Handbook of research on educational commu-

nications and technology (pp. 829–840). New York: Springer.Compeau, D. R., & Higgins, C. A. (1995). Computer self-efficacy: development of a measure and initial test. MIS Quarterly, 19(2), 189–211.Enders, C. K. (2010). Applied missing data analysis. NY: The Guilford Press.Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student test scores (ETS Memorial Lecture Series Reports). Princeton, NJ: Educational Testing

Service.Hohlfeld, T. N., Ritzhaupt, A. D., & Barron, A. E. (2010). Development and validation of the Student Tool for Technology Literacy (ST2L). Journal of Research on Technology in

Education, 42(4), 361–389.Hohlfeld, T., Ritzhaupt, A. D., & Barron, A. E. (2013). Are gender differences in perceived and demonstrated technology literacy significant? It depends on the model.

Educational Technology Research and Development, 61(4), 639–663.Holmes, L. M. (2012). The effects of project-based learning on 21st century skills and no child left behind accountability standards. Doctoral dissertation, University of Florida.International Society for Technology in Education. (2007). National Educational Technology Standards for students. Retrieved from http://http://www.iste.org/standards/

standards-for-students/nets-student-standards-2007.Jenkins, H. (2006). Convergence culture: Where old and new media collide. New York: New York University Press.Katz, I. R., & Macklin, A. S. (2007). Information and communication technology (ICT) literacy: integration and assessment in higher education. Journal of Systemics, Cybernetics

and Informatics, 5(4), 50–55.Koh, J. H., & Divaharan, S. (2011). Developing pre-service teachers’ technology integration expertise through the TPACK-developing instructional model. Journal Educational

Computing Research, 44(1), 35–58.Learning. (2012). TechLiteracy Assessment. Available at http://www.learning.com/techliteracy-assessment/.Mishra, P., & Koehler, M. (2006). Technological pedagogical content knowledge: a framework for teacher knowledge. The Teachers College Record, 108(6), 1017–1054.Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén.National Assessment of Educational Progress. (2014). Technology and Engineering Literacy Assessment. Retrieved https://nces.ed.gov/nationsreportcard/tel/.National Research Council. (2011). Assessing 21st Century Skills: Summary of a workshop. Washington, DC: The National Academies Press.National Research Council. (2008). Research on future skill demands. Washington, DC: National Academies Press.NSF. (2006). New formulas for America’s Workforce 2: Girls in science and engineering. Washington: D.C.Orlando, M., & Thissen, D. (2000). Likelihood-based item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S–X2: an item fit index for use with dichotomous item response theory models. Applied

Psychological Measurement, 27, 289–298.Parasuraman, A. (2000). Technology Readiness Index (TRI) a multiple-item scale to measure readiness to embrace new technologies. Journal of Service Research, 2(4), 307–320.Partnership for 21st Century Skills (2011). Framework for 21st century learning. Washington, DC. Retrieved from http://www.p21.org/tools-and-resources/publications/1017-

educators#defining.Penfield, R. D. (2012). DIFAS 5.0: Differential item functioning analysis system user’s manual. Penfield.PISA. (2012). Program for International Student Assessment (PISA). Available at http://nces.ed.gov/surveys/pisa/.Ritzhaupt, A. D., Liu, F., Dawson, K., & Barron, A. E. (2013). Differences in student information and communication technology literacy based on socio-economic status,

ethnicity, and gender: evidence of a digital divide in Florida schools. Journal of Research on Technology in Education, 45(4), 291–307.Schmidt, D. A., Baran, E., Thompson, A. D., Mishra, P., Koehler, M. J., & Shin, T. S. (2009). Technological pedagogical content knowledge (TPACK): the development and

validation of an assessment instrument for preservice teachers. Journal of Research on Technology in Education, 42(2), 123.Sinharay, S. (2004). Experiences with MCMC convergence assessment in two psychometric examples. Journal of Educational and Behavioral Statistics, 29, 461–488.ST2L. (2013). Student Tool for Technology Literacy (ST2L). Retrieved from http://st2l.flinnovates.org/index.aspx.Tristán-López, A., & Ylizaliturri-Salcedo, M. A. (2014). Evaluation of ICT competencies. In Spector, Merrill, Elen, & Bishop (Eds.), Handbook of research on educational com-

munications and technology (pp. 323–336). New York: Springer.U.S. Department of Education. (2001). No child left behind: enhancing education through technology act of 2001. Retrieved from http://www.ed.gov/policy/elsec/leg/esea02/

pg34.html.U.S. Department of Education. (2010). National Educational Technology Plan 2010. Retrieved from http://www.ed.gov/technology/netp-2010.Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. NY: Cambridge University Press.Wainer, H., Bradlow, E. T., & Wang, X. (2010). Detecting DIF: many paths to salvation. Journal of Educational and Behavioral Statistics, 35(4), 489–493.Wang, X., Baldwin, S., Wainer, H., Bradlow, E. T., Reeve, B. B., Smith, A. W., et al. (2010). Using testlet response theory to analyze data from a survey of attitude change among

breast cancer survivors. Statistics in Medicine, 29, 2028–2044.Wang, X., Bradlow, E. T., & Wainer, H. (2004). A user’s guide for SCORIGHT (Version 3.0): A computer program built for scoring tests built of testlets including a module for covariate

analysis (ETS Research Report RR 04–49). Princeton, NJ: Educational Testing Service.Zieky, M. (1993). DIF statistics in test development. In P. W. Holland, & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Hillsdale, NJ: Erlbaum.


Recommended