Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | aiden-stack |
View: | 223 times |
Download: | 1 times |
SADC Course in Statistics
Assessing data critically
Module B1 Session 17
2To put your footer here go to View > Header and Footer
Objectives
At the end of this session the students will be able to:
•Apply basic techniques for error detection
•Ask relevant questions that allow for the explanation or correction of discrepancies
3To put your footer here go to View > Header and Footer
Detecting errors in primary data
Checks to detect errors in primary data should be made at various stages:
Immediately after data collection (and during data entry)
After data computerisation
During exploratory data analysis
4To put your footer here go to View > Header and Footer
Checking for errors after data collection Have all questions been answered? If not,
are the reasons for non-response clear?
Are recorded values within their expected range?
Do all questions or items have meaningful entries? Are they internally consistent?
Are any zero entries genuinely zeros?
Are IDs unique?
5To put your footer here go to View > Header and Footer
Checking for errors after data entry
Compute new (temporary) variables to check if:
Rates recorded per 1000 of population are less than 1000
Percentages expected to be less than 100% are indeed so
There is internal consistency amongst variables, and between tables – for example,
• date of interviewing should be earlier than the date when the supervisor checked the questionnaire
• totals are consistent across different tables, and sub-totals add to overall totals.
Codes for missing values have been identified correctly according to their reason for missing and have been set as missing in the database to be used for analysis.
6To put your footer here go to View > Header and Footer
Tips for error detection• Look for counts or categories that do not make
sense
• If you have a series of data in chronological order, look for jumps in the data. They may be errors
• Always check your totals– Make sure they add to the expected total (e.g. 100%).– When looking at multiple tables in a single study, the
sample size should be consistent in all tables
• What is expected to tally should tally!
• Don’t just look at the numbers, look at the definitions that the numbers represent
7To put your footer here go to View > Header and Footer
Checks during Exploratory Data Analysis
Simple one-way or two-way tables can help identify errors.
(a) Results are from a socio-economic survey in Uganda. Are these results reasonable?
Average number of meals taken by HH in past week Frequency
0 6
1 699
2 5547
3 3285
4 113
5 1
7 1
Total 9652
8To put your footer here go to View > Header and Footer
Checks during Exploratory Data Analysis
(b) A second example from the British Crime Survey, 2000
Number of times something was stolen from respondent’s hands, pockets, bag or case since 1 Jan 99 Frequency
0 413
1 39
2 4
3 2
5 1
10 1
15 1
36 1
97 1
Total 463
Can the last figure be correct?
9To put your footer here go to View > Header and Footer
Checks during Exploratory Data Analysis
(c) Detection rate of property crimes in one police force. (Data are fictitious)
Property Crime Jan Feb Mar
Vandalism 10 13 14
Burglary 14 19 16
Vehicle thefts 15 81 17
Bicycle thefts 4 3 3
Thefts from person 3 2 5
Other thefts 7 9 11
10To put your footer here go to View > Header and Footer
Checks during Exploratory Data Analysis
Consistency checks across related variables
The following examples show:
(i) Current number of cars at household versus whether respondent was worried about having car stolen.
(ii) Current number of cars at household versus whether respondent was worried about having things stolen from car.
(iii) Distance to reach any type of formal court versus distance from nearest Magistrate’s Court.
11To put your footer here go to View > Header and Footer
Use of cross-tabulations • Table 1. Cross-tabulation of current number
of cars at household versus extent to which respondent is worried about having car stolen (Source: BCS, 2000)
12To put your footer here go to View > Header and Footer
Use of cross-tabulations• Table 2. Cross-tabulation of current number
of cars at household versus extent to which respondent is worried about having things stolen from the car (Source: BCS, 2000)
13To put your footer here go to View > Header and Footer
Detecting errors in secondary data
Procedures similar to the above can be undertaken,but in addition:
• Ask questions regarding the source from where data arose, e.g. to assess competence, adequacy of funding, motivation for study, etc.
• Ask about the data collection procedure and associated documentation. In particular seek answers to what, who, why, when, where, and how.
• Important to follow the whole data chain.
14To put your footer here go to View > Header and Footer