+ All Categories
Home > Documents > Comprehensive Exam Review

Comprehensive Exam Review

Date post: 13-Feb-2016
Category:
Upload: minnie
View: 20 times
Download: 0 times
Share this document with a friend
Description:
Comprehensive Exam Review. Click the LEFT mouse key ONCE to continue. Appraisal Part 2. Click the LEFT mouse key ONCE to continue. Statistical Concepts for Appraisal. - PowerPoint PPT Presentation
76
Comprehensive Exam Review Click the LEFT mouse key ONCE to contin
Transcript
Page 1: Comprehensive Exam Review

Comprehensive Exam Review

Click the LEFT mouse key ONCE to continue

Page 2: Comprehensive Exam Review

AppraisalPart 2

Click the LEFT mouse key ONCE to continue

Page 3: Comprehensive Exam Review

Statistical Concepts for Appraisal

Page 4: Comprehensive Exam Review

A frequency distribution is a tabulation of scores in numerical order showing the number of persons who obtain each score or group of scores.

A frequency distribution is usually described in terms of its measures of central tendency (i.e., mean, median, and mode), range, and standard deviation.

Page 5: Comprehensive Exam Review

The (arithmetic) mean is the sum of a set of scores divided by the number of scores.

The median is the middle score or point above or below which an equal number of ranked scores lie; it corresponds to the 50th percentile.

The mode is the most frequently occurring score or value in a distribution of scores.

The range is the arithmetic difference between the lowest and the highest scores obtained on a test by a given group.

Page 6: Comprehensive Exam Review

The standard deviation is a measure of the variability in a set of scores (i.e., frequency distribution).

The standard deviation is the square root of the squared deviations around the mean (i.e., the square root of the variance for the set of scores).

Variability is the dispersion or spread of a set of scores; it is usually discussed in terms of standard deviations.

Page 7: Comprehensive Exam Review

The normal distribution curve is a bell-shaped curve derived from the assumption that variations from the mean are by chance, as determined through repeated occurrences in the frequency distributions of sets of measurements of human characteristics in the behavioral sciences.

Scores are symmetrically distributed above and below the mean, with the percentage of scores decreasing in equal amounts (standard deviation units) as the scores progress away from the mean.

Page 8: Comprehensive Exam Review

Skewness is the degree to which a distribution curve with one mode departs horizontally from symmetry, resulting in a positively or negatively skewed curve.

A positive skew is when the “tail” of the curve is on the right and the “hump” is on the left.

A negative skew is when the “tail” of the curve is on the left and the “hump” is on the right.

Page 9: Comprehensive Exam Review

Kurtosis is the degree to which a distribution curve with one mode departs vertically from symmetry .

A leptokurtic distribution is one that is more “peaked” than the normal distribution.

A platokurtic distribution is one that is “flatter” than the normal distribution.

Page 10: Comprehensive Exam Review

Percentiles result from dividing the (normal) distribution into one hundred linearly equal parts.

A percentile rank is the proportion of scores that fall below a particular score.

Two different percentiles may represent vastly different numbers of people in the normal distribution, depending on where the percentiles are in the distribution.

Page 11: Comprehensive Exam Review

Standardization, sometimes called “normalizing,” is the conversion of a distribution of scores so that the mean equals zero and the standard deviation equals 1.0 for a particular sample or population.

“Normalizing” a distribution is appropriate when the sample size is large and the actual distribution is not grossly different from a normal distribution.

Page 12: Comprehensive Exam Review

Standardization, or normalizing, is an intermediate step in the derivation of standardized scores, such as T scores, SAT scores, or Deviation IQs.

Stanines are a system for assigning a score of one through nine for any particular score. Stanines are derived from a distribution having a mean of five and a standard deviation of two.

Page 13: Comprehensive Exam Review

A correlation coefficient is a measure of relationship between two or more variables or attributes that ranges in value from -1.00 (perfect negative relationship) through 0.00 (no relationship) to +1.00 (perfect positive relationship).

A regression coefficient is a measure of the linear relationship between a dependent variable and a set of independent variables.

Page 14: Comprehensive Exam Review

The coefficient of determination is the square of a correlation coefficient. It is used in the interpretation of the percentage of shared variance between two sets of test scores.

The probability (also known as the alpha) level is the likelihood that a particular statistical result occurred simply on the basis of chance.

Page 15: Comprehensive Exam Review

Error of measurement is the discrepancy between the value of an observed score and the value of the corresponding theoretical true score.

The standard error of measurement is an indicator of how closely an observed score compares with the true score. This statistic is derived by computing the standard deviation of the distribution of errors for the given set of scores.

Page 16: Comprehensive Exam Review

Measurement error variance is the portion of the observed score variance that is attributed to one or more sources of measurement error (i.e., the square of the standard error of measurement).

Random error is an error associated with statistic analyses that is unsystematic, often indirectly observed, and appears to be unrelated to any measurement variables.

Page 17: Comprehensive Exam Review

Differential item functioning is a statistical property of a test item in which, conditional upon total test score or equivalent measure, different groups of test takers have different rates of correct item response.

The item difficulty index is the percentage of a specified group that answers a test item correctly.

Page 18: Comprehensive Exam Review

The item discrimination index is a statistic that indicates the extent to which a test item differentiates between high and low scorers.

Extrapolation is the process of estimating values of a function beyond the range of the available data.

Page 19: Comprehensive Exam Review

A confidence interval is the interval between two points on a scale within which a score of interest lies, based on a certain level of probability.

The error of estimate (standard or probable) is the degree to which test scores estimated from a criterion correspond with actual scores.

Page 20: Comprehensive Exam Review

The regression effect is the tendency of a predicted score to be nearer to the mean of its series of scores than was predicted.

A factor is a hypothetical dimension underlying a psychological construct that is used to describe the construct and intercorrelations associated with it.

Page 21: Comprehensive Exam Review

Factor analysis is a statistical procedure for analyzing intercorrelations among a group of variables, such as test scores, by identifying a set of underlying hypothetical factors and determining the amount of variation in the variables that can be accounted for by the different factors.

The factorial structure is the set of factors resulting from a factor analysis.

Page 22: Comprehensive Exam Review

Reliability

Page 23: Comprehensive Exam Review

The reliability coefficient is an index that indicates the extent to which scores are free from measurement error. It is an approxi-mation of the ratio of true variance to observed score variance for a particular population of test takers.

Reliability is the degree to which an individual would obtain the same score on a test if the test was re-administered to the individual with no intervening learning or practice effects.

Page 24: Comprehensive Exam Review

The coefficient of equivalence is a correlation between scores for two forms of a test given at essentially the same time; also referred to as alternate-form reliability, a measure of the extent to which two equivalent or parallel forms of a test are consistent in what they measure.

The coefficient of stability is a correlation between scores on two administrations of a test, such as test administration and retest with some intervening time period.

Page 25: Comprehensive Exam Review

The coefficient of internal consistency is a reliability index based on interrelationships of item responses or of scores on sections of a test obtained during a single administra-tion. The most common examples include the Kuder-Richardson and split-half.

Coefficient Alpha is a coefficient of internal consistency for a measure in which there are more than dichotomous response choices, such as in the use of a Likert scale.

Page 26: Comprehensive Exam Review

The split-half reliability coefficient is a reliability coefficient that estimates the internal consistency of a power test by correlating the scores of two halves of the test (usually the even-numbered items and the odd-numbered items, if their representative means and variances are equal).

The Spearman-Brown Prophecy Formula projects the reliability of a test that has been reduced from the calculated reliability of the test. It is a “correction” appropriate for use only with a split-half reliability coefficient.

Page 27: Comprehensive Exam Review

Interrater reliability is an index of the consistency of two or more independent raters’ judgments in an assessment situation.

Intrarater reliability is an index of the consistency of each independent rater’s judgments in an assessment situation.

Page 28: Comprehensive Exam Review

Validity

Page 29: Comprehensive Exam Review

Validity is the extent to which a given test measures or predicts what it purports to measure or predict.

The two basic approaches to the determina-tion of validity include logical analysis, which applies to content validity and item structure, and empirical analysis, which applies to predictive validity and concurrent validity. Construct validity falls under both logical and empirical analyses.

Page 30: Comprehensive Exam Review

Validation is the process by which the validity of an instrument is measured.

Validity is application specific, not a generalized concept. That is, a test is not in and of itself valid, but rather is valid for use for a specific purpose for a specific group of people in a specific situation.

Face validity is a measure of the acceptability of a given test and test situation by the examinee or user, in terms of the apparent uses of the test.

Page 31: Comprehensive Exam Review

Concurrent validity is a measure of how well a test score matches a measure of criterion performance.

Example applications include comparing a distribution of scores for men in a given occupation with those for men in general, correlating a personality test score with an estimate of adjustment made in a counseling interview, and correlating an end-of-course achievement or ability test score with a grade-point average.

Page 32: Comprehensive Exam Review

Content validity is a measure of how well the content of a given test represents the subject matter (domain or universe) or situation about which conclusions are to be drawn.

A construct is a grouping of variables or behaviors considered to vary across people. A construct is not directly observable but rather is derived from theory.

Construct validity is a measure of how well a test score yields results in line with theoretical implications associated with the construct label.

Page 33: Comprehensive Exam Review

Predictive validity is a measure of how well predictions made from a given test are confirmed by data collected at a later time.

Example applications of predictive validity include correlating intelligence test scores with course grades or correlating test scores obtained at the beginning of the year with grades earned at the end of the year.

Page 34: Comprehensive Exam Review

Factorial validity is a measure of how well the factor structure resulting from a factor analysis of the test matches the theoretical framework for the test.

Cross-validation is the process of determining whether a decision resulting from one set of data is truly effective when used with another relevant and independent data set.

Page 35: Comprehensive Exam Review

Convergent evidence is validity evidence derived from correlations between test scores and other types of measures of the same construct and in which the relationships are in predicted directions.

Discriminant evidence is validity evidence derived between test scores and other forms of assessment for different constructs and in which the relationships are in predicted directions.

Page 36: Comprehensive Exam Review

Appraisal ofIntelligence

Page 37: Comprehensive Exam Review

A very general definition of intelligence is that it is a person’s global or general level of mental (or cognitive) ability.

However, there is considerable debate as to what intelligence is, and a corresponding amount of debate about how it should be measured.

Page 38: Comprehensive Exam Review

Perhaps the biggest debate in the assessment of intelligence is how to use intelligence tests effectively.

Given that intelligence is a “global” construct, what are the implications of intelligence test results for relatively specific circumstances and/or sets of behaviors?

In general, intelligence test results have been most useful for interpretation in contexts calling for use of mental abilities, such as in educational processes.

Page 39: Comprehensive Exam Review

Another argument concerns whether intelligence is “a (single) thing,” which is reflected in unifactor theories of intelligence, or a unique combination of things, which is reflected in multifactor theories of intelligence.

The measurement implications from this debate result in some intelligence tests at-tempting to measure a single construct and some attempting to measure a unique set of interrelated constructs.

Page 40: Comprehensive Exam Review

Another debate centers on what proportion of intelligence is genetic or inherited and what proportion is environmentally determined. This is the so-called “nature-nurture” controversy.

So-called “fluid” intelligence (theoretically a person’s inherent capacity to learn and solve problems) is largely nonverbal and is a relatively culture-reduced form of mental efficiency.

Page 41: Comprehensive Exam Review

The nature-nurture concern has significant implications for how intelligence is assessed (e.g., what types of items and/or tasks are included), but there has not been full or consensual resolution of the debate.

So-called “crystallized” intelligence (theoretically) represents what a person has already learned, is most useful in circumstances calling for learned or habitual responses, and is heavily culturally laden.

Page 42: Comprehensive Exam Review

A fourth major debate concerns the extent to which intelligence tests are racially, culturally, or otherwise biased.

Although evidence of such biases were found in some “early” intelligence tests, improvements in psychometry have done much to alleviate such biases, at least in regard to resultant psychometric properties of “newer” intelligence tests.

Page 43: Comprehensive Exam Review

In light of these and other considerations, the primary focus for the assessment of intelligence is on the construct validity of intelligence tests.

In general, individually administered intelligence tests have achieved the greatest credibility.

Individual intelligence tests typically are highly verbal in nature, i.e., necessitate command of language for effective performance.

Page 44: Comprehensive Exam Review

Individual intelligence tests typically include both verbal (e.g., response selection or item completion) and performance (e.g., manipulation task) subsets of items.

However, nonverbal and nonlanguage intelligence tests have been developed.

Group administered intelligence tests, such as those commonly used in schools, are typically highly verbal and non-performance in nature.

Page 45: Comprehensive Exam Review

Appraisal of Aptitudes

Page 46: Comprehensive Exam Review

An aptitude is a relatively clearly defined cognitive or behavioral ability.

An aptitude is a much more focused ability than general intelligence, and the measurement of aptitudes also has been more focused.

Literally hundreds of aptitude tests have been developed and are available for a substantial number of rather disparate human abilities.

Page 47: Comprehensive Exam Review

Theoretically, aptitude tests are intended to measure “innate” abilities (or capacities) rather than learned behaviors or skills.

There remains considerable debate as to whether this theoretical premise is actually achieved in practice.

However, this debate is lessened in importance IF the relationship between a current aptitude test result and a future performance indicator is meaningful and useful.

Page 48: Comprehensive Exam Review

Aptitude tests are used primarily for prediction of future behavior, particularly in regard to the application of specific abilities in specific contexts.

Predictive validity is usually the foremost concern in aptitude appraisal and is usually established by determining the correlation between test results and some future behavioral criterion.

Page 49: Comprehensive Exam Review

Although there are many individual aptitude tests, aptitude appraisal is much more commonly achieved through use of multiple-aptitude test batteries.

There are two primary advantages to the use of multiple-aptitude batteries (as opposed to a collection of individual aptitude tests from different sources):

Page 50: Comprehensive Exam Review

First, the subsections of multiple-aptitude test batteries are designed to be used as a collection; therefore, there is usually a common item and response format, greater uniformity in score reporting, and generally better understanding of subsection and overall results.

Second, the norms for the various subtests are from a common population; therefore, comparison of results across subtests is facilitated.

Page 51: Comprehensive Exam Review

Perhaps the most widely recognized use of aptitude tests is for educational purposes, e.g., Scholastic Assessment Test (formerly the Scholastic Aptitude Test; SAT), American College Testing Program (ACT), and Graduate Record Examination (GRE).

However, aptitude tests used specifically for vocational purposes (e.g., General Aptitude Test Battery; GATB) or armed services purposes (e.g., Armed Services Vocational Aptitude Battery; ASVAB) also are very widely used.

Page 52: Comprehensive Exam Review

Appraisal of Achievement

Page 53: Comprehensive Exam Review

Achievement tests are measures of success, mastery, accomplishment, or learning in a subject matter or training area.

The greatest use by far of achievement tests is in school or educational systems to determine student accomplishment levels in academic subject areas.

The vast majority of achievement tests are group tests.

Page 54: Comprehensive Exam Review

Most achievement tests also are actually multiple-achievement test batteries because they typically have subtests for several different subject matter areas.

However, there are achievement tests available that measure across several different subject matter areas but that are designed for individual administration.

Individual achievement tests are used most commonly in processes to diagnose learning disabilities.

Page 55: Comprehensive Exam Review

Most achievement tests are norm-referenced to facilitate comparisons within and between components of educational systems.

However, increasingly, criterion-referenced achievement tests are being used in the at-tempt to determine with greater specificity the particular skills and/or knowledge students are mastering at various educational levels.

Page 56: Comprehensive Exam Review

Appraisal of Interests

Page 57: Comprehensive Exam Review

The primary goal of interest assessment is to help individuals differentiate preferred activities from among possible activities.

Presumably, the information derived from interest assessment will enable the respondent to achieve greater vocational productivity, success, and/or life satisfaction.

Page 58: Comprehensive Exam Review

Most interest inventories are used in the context of vocational counseling (i.e., to help individuals determine preferences in various aspects of the world of work).

However, increasingly, interest inventories are being developed and used to assess preferences in other aspects of life, such as leisure.

Page 59: Comprehensive Exam Review

Some interest (and some personality) inventories are ipsative measures, which means that the average of the subscale responses is the same for all respondents.

Ipsative measures usually have a forced-choice format, which means that a respondent cannot have all high scores or all low scores across subscales.

Page 60: Comprehensive Exam Review

Interest inventories are most commonly used by and developed for young adults, such as late high school or college students.

However, interest inventories suitable and valid for use with persons at any age are available.

The major problem with interest inventories is the tendency for respondents to interpret them as measures of ability or probable satisfaction, neither of which is necessarily directly related to any particular preference.

Page 61: Comprehensive Exam Review

Appraisal of Personality

Page 62: Comprehensive Exam Review

Personality is a vague, difficult-to-define construct. People tend to think of it as “the way a person is.” However, there are at least two points of agreement about personality:

First, each person is consistent to some extent (i.e., has coherent traits and action patterns that are repeated).

Second, each person is distinct or unique to some extent (i.e., has traits and behaviors different from others).

Page 63: Comprehensive Exam Review

It is exactly this strange set of conflicting conditions that makes the assessment of personality so complex.

“Normality” is a relativistic term used to describe how some identifiable group of people (should) behave most of the time.

The assessment of personality thus involves determining the extent to which a person’s traits and/or behaviors fit normality (i.e., are compared to average behavior in some reference group).

Page 64: Comprehensive Exam Review

The use of projective techniques and self-report inventories are the two primary methods of appraisal of personality.

Projective techniques involve respondents constructing their own responses to vague and ambiguous stimuli.

The projective hypothesis is that personal interpretation of ambiguous stimuli reflects unconscious needs, motives, and/or conflicts.

Page 65: Comprehensive Exam Review

Generally, five types of projective assess-ment techniques are discussed:

Association techniques, such as the Rorschach or Holtzman Inkblot techniques, ask the respondent to “explain” what is seen in the stimulus.

Construction techniques, such as the Thematic Apperception Test or the Children’s Apperception Test, ask the respondent to “tell a story” about what is represented by the stimulus, usually a vague picture.

Page 66: Comprehensive Exam Review

Expression techniques, such as the Draw-A-Person Test or the House-Tree-Person Test, ask the respondent to create a figure or drawing in response to some instruction.

Arrangement techniques ask respondent to place in order the elements of a set (usually) of pictures and then to “explain” the sequence.

Completion techniques ask the respondent to make a complete sentence from a sentence stem.

Page 67: Comprehensive Exam Review

Historically, the results of projective tech-niques have exhibited poor psychometric properties.

However, the use of projective techniques remains quite popular, primarily because respondents often do disclose information, particularly “themes” of information, not easily obtainable through other methods.

Page 68: Comprehensive Exam Review

Generally, three types of self-report personality inventories are discussed in the professional literature:

Theory-based inventories, such as the Myers-Briggs Type Inventory, State-Trait Anxiety Inventory, or Personality Research Form, assess traits and/or behaviors in accord with the constructs upon which the inventory is based.

Page 69: Comprehensive Exam Review

Factor-analytic inventories, such as the Sixteen Personality Factor Questionnaire or the Neo-Personality Inventory-Revised, assess personality dynamics outside the context of any particular theory of personality.

Items in these types of instruments are selected from the results of factor analyses of large samples of items and generally have very good psychometric properties.

Page 70: Comprehensive Exam Review

Criterion-keyed inventories, such as the Minnesota Multiphasic Personality Inventory-2 or Millon Clinical Multiaxial Inventory-III, contain subscale items that discriminate between a criterion group (e.g., schizoid or narcissistic) and a relevant control group (e.g., “normals”).

These types of inventories usually are used to assist in making clinical diagnoses.

Page 71: Comprehensive Exam Review

Self-report personality inventories generally have much better psychometric properties than do projective techniques.

However, clinical diagnoses should never be made solely on the basis of personality instrument results; clinical judgments should be used in combination with assessment results.

Page 72: Comprehensive Exam Review

Computers and Appraisal

Page 73: Comprehensive Exam Review

Clearly the most prominent trend in appraisal today is toward “computerization” of testing.

In computer-based testing, instruments or techniques that are or could be in other formats (e.g., “paper-and-pencil”) are converted to a situation in which they are presented on and responded to through use of a computer.

Page 74: Comprehensive Exam Review

Adaptive testing is when an item presented subsequently to a respondent is selected based on the qualitative or accuracy nature of the response to the preceding item.

Adaptive testing is facilitated through the use of computers due to the capability to handle large numbers of contingencies and choices efficiently and accurately.

Page 75: Comprehensive Exam Review

Computer-generated interpretive reports also are increasing in frequency of use.

A computer’s capability to analyze complex data sets and intricate patterns in data are the primary reasons for the increasing use of computer-generated interpretive reports.

However, computer-generated interpretive reports are only as good as the programming underlying them, and never as good as when used in conjunction with sound clinical judgment.

Page 76: Comprehensive Exam Review

This concludes Part 2 of the presentation on

APPRAISAL


Recommended