Post on 24-Dec-2015
transcript
Automated Essay Grading
Introduction
Types of Assessment Procedures
Classification based on function role in classroom instruction Placement assessment: administered at
the beginning of instruction Formative assessment: monitor learning
progress during instruction Diagnostic assessment: diagnose
learning difficulties during instruction Summative assessment: assess
achievement at the end of instruction
Types of Assessment Procedures
How the results of tests and assessment are interpreted? Norm referenced: performance in terms
of relative position in a known group Criteria referenced: specific performance
criteria (type 40 word/min without error)
Types of Assessment Procedures
Fixed-Choice/ Complex Performance assessment
Fixed-choice Complex-performance
Short answer
Essay
• Factual knowledge• Low level skills
(recall)• Objective
assessment• Highly reliable
• Critical thinking skills
• May extend beyond classroom
• Inferential skills• Subjective
assessment
Measuring Complex Achievement
Essay type questions Freedom of response▪ Free to construct, relate and present ideas in
own words Assess higher order skills▪ Critical thinking
Freedom in the cost of ▪ reliability in scoring▪ time for evaluation
Anatomy of Essay
Prompt of an essay a topic around which you start jotting
down ideas. single word, a short phrase, a complete
paragraph or even a picture Trait of essay
Characteristics of essay on which it is evaluated
Scoring rubrics depend on traits
Scoring Criteria or Rubrics
Ideas or content Organization Voice Word choice Sentence fluency
Automated Essay Evaluation (AEE)
“the process of evaluating and scoring written prose via computer programs” NLP has helped to go beyond numeric
scoring to qualitative feedback Multi-disciplinary AEE/AES systems
PEG E-rater Intelligent Essay Assessor C-rater
E-Rater Grading Engine
Commercial AES by Education Testing Services (ETS), 1999
Employed in high stake assessment in Graduate Management Admission Test (GMAT)
Shown to agree with expert raters Scoring depend on tangible markers related to
writing constructs Organization and development of ideas Variation in syntactic constructs Vocabulary usage Technical correctness in terms of grammar, usage and
mechanics
E-rater Features
Grammatical errors Automatic grammatical error detection Article and preposition errors
Discourse structure and organization Rhetorical Structure Theory motivated
features Topic relevant word usage
Content Vector Analysis (CVA) Style-related word usage
Overly repetitious word usage
E-rater Features with NLP Approaches
Grammatical error detection Rule-based approach▪ Rules are defined over syntactic parse
Statistical approach▪ Word n-gram and POS n-grams
Discourse analysis Linear representation of essay sentences Segment essay into▪ Introductory material▪ Thesis statement▪ Main ideas▪ Supporting ideas▪ Conclusion
E-rater Features with NLP Approaches
E-rater Features with NLP Approaches
Content Vector Analysis (CVA)
Essay to be
graded
Higher quality essay
Lower quality essay
≈
≈
Higher grade
Lower grade
E-rater Features with NLP Approaches
Collocation detection To test proper usage of word that
depend on other words Collocation patterns▪ Noun-of-noun (swarm of bees)▪ Adjective+noun (strong tea)▪ Noun+noun (house arrest)
Model Building
Model is trained with human-scored essays Training
Converting essay to vector of linguistic features Learning of weights through regression
Different models Topic-specific model▪ Training is done by drawing human scored essays on a given
topic Generic model▪ Topic agnostic
Hybrid model ▪ Some feature weights are trained on generic essays while
others are from prompt-specific essays.
Intelligent Essay Assessor
Commercial AES by Pearson Knowledge Technologies, 1998
Features Automated scoring and feedback of
paragraphs Grading summary writing to improve
reading comprehension Performance task scoring Short answer scoring for students
IEA Scoring Features
Essay Score
Mechanics
Content
Lexical Sophistica
tion
Style, Organizati
on, Developm
ent
Grammar
Spelling
Capitalization
Punctuation
LSA Similari
ty
Vector Length
Word Maturit
y
Word Variety Confusa
ble Word
Inter-sentence
coherence
Essay coherenc
e
Topic developme
nt
N-gram feature
s
Grammatical errors
Automated Short Answer Scoring
Short answers are not short essays Evaluation of essays focuses on traits like
grammar, style, vocabulary, organization etc.▪ Computational syntax and stylistics
Evaluation of short answers emphasizes on content▪ Computational semantics
Short answers are harder to evaluate Smaller amount of exploitable information
Automated Short Answer Scoring
C-rater by ETS Grades free-text responses with length
ranging from a single word, phrase or 4-5 sentences
Supports both summative and formative assessment
Perform well for test that solicit specific information from student
Perform poor for open-ended task
C-rater
Model of correct answer provided by the content expert
C-rater goal Student response model
Model is manual but mapping a automatic The difficulty
The question is designed to elicit from students one or more concepts that constitute the correct answer
There are several no of ways that a concept can be realized in natural language
The solution correct responses are paraphrases of the model answer
C-rater
Try to model human graders with following normalization Syntactic variation Pronoun reference Morphological variation Synonymous words Typographical and spelling errors
Summary of NLP Techniques Content assessment
Content Vector Analysis▪ Vector space model
Semantics based assessment▪ Latent Semantic Analysis
Meaning/Concept assessment Paraphrasing and textual entailment
Organizational assessment Argument structure mining Discourse structure analysis