Post on 14-Jan-2016
description
transcript
Geog 458:Geog 458:Map Sources and Map Sources and
ErrorsErrorsUncertaintyUncertainty
January 23, 2006January 23, 2006
OutlinesOutlines
1.1. Defining uncertaintyDefining uncertainty
2.2. How to calculate uncertainty?How to calculate uncertainty?1)1) NNominal case: Confusion matrixominal case: Confusion matrix
2)2) Interval/ratio case: RMSEInterval/ratio case: RMSE
3.3. How to validate uncertainty?How to validate uncertainty?1)1) IInternal validation: MAUPnternal validation: MAUP
2)2) External validation: ConflationExternal validation: Conflation
1. Defining u1. Defining uncertaintyncertainty Definition of uncertaintyDefinition of uncertainty
DiscrepancyDiscrepancy between reality and its representation between reality and its representation Different kinds of uncertaintyDifferent kinds of uncertainty
Vagueness: representation is not well accommodated Vagueness: representation is not well accommodated into the essence of reality (e.g. representing cities as into the essence of reality (e.g. representing cities as a point layer, soil as crisp boundary) a point layer, soil as crisp boundary) better human better human conceptualization neededconceptualization needed
Ambiguity: representation is not unilaterally agreed Ambiguity: representation is not unilaterally agreed by users (e.g. placenames, occupation classification, by users (e.g. placenames, occupation classification, indicator of environmental health) indicator of environmental health) standardization standardization neededneeded
Accuracy vs. precisionAccuracy vs. precision Accuracy: difference between true values and those Accuracy: difference between true values and those
in DB in DB Precision: amount of detail present in dataPrecision: amount of detail present in data
QuestionsQuestions Your diagnostics among {uYour diagnostics among {uncertainty, precision, ncertainty, precision,
positional accuracy, attribute accuracy, vagueness, positional accuracy, attribute accuracy, vagueness, ambiguity} and what are your prescriptions? ambiguity} and what are your prescriptions?
Longitude values in decimal degree are stored as an Longitude values in decimal degree are stored as an integerinteger
Contour lines derived from DEM is not well lined up Contour lines derived from DEM is not well lined up with DRGwith DRG
The map indicates this road is bidirectional, but it turns The map indicates this road is bidirectional, but it turns out to be one-wayout to be one-way
Implementing intelligent geocoding system based on Implementing intelligent geocoding system based on preposition in English (e.g. across, at, over) for preposition in English (e.g. across, at, over) for international users international users
Is the boundary of Mt. Everest well delineated? Is this Is the boundary of Mt. Everest well delineated? Is this polygon boundary a good representation of Mt. Everest?polygon boundary a good representation of Mt. Everest?
Which is broadest? How would you communicate these Which is broadest? How would you communicate these errors in your data quality report?errors in your data quality report?
2. Calculating accuracy2. Calculating accuracy
Nominal caseNominal case CConfusion matrix (a.k.a. onfusion matrix (a.k.a.
misclassification matrix)misclassification matrix) Interval/Ratio caseInterval/Ratio case
Root Mean Square Error (RMSE)Root Mean Square Error (RMSE)• Confusion matrix is widely used to report on attribute accuracy when measured at a nominal scale
• RMSE is widely used to report on position accuracy when measured at a numeric scale (e.g. x, y coordinates are metric)
Confusion MatrixConfusion Matrix Table 6.2 (p. 138): evaluating classification of land Table 6.2 (p. 138): evaluating classification of land
parcel there are five land use code A to Eparcel there are five land use code A to E Rows and columns in misclassification matrixRows and columns in misclassification matrix
Row corresponds to the class as recorded in the databaseRow corresponds to the class as recorded in the database Column corresponds to the class as recorded in the fieldColumn corresponds to the class as recorded in the field
Correctly classified vs. incorrectly classifiedCorrectly classified vs. incorrectly classified DDiagonal entries represent agreement between database iagonal entries represent agreement between database
and fieldand field Off-diagonal entries represent disagreement between Off-diagonal entries represent disagreement between
database and fielddatabase and field So how accurate would you say about this data?So how accurate would you say about this data?
SSince 206 (sum of diagonal entries) is correctly classified ince 206 (sum of diagonal entries) is correctly classified out of 304, it would be 206/304 = 68.6%out of 304, it would be 206/304 = 68.6%
Confusion matrix: Confusion matrix: exerciseexercise
Let’s say you decide to write a test report Let’s say you decide to write a test report on attribute accuracy of land use mapon attribute accuracy of land use map
100 reference points are selected to 100 reference points are selected to represent three classes, 49 points from represent three classes, 49 points from natural, 28 points from agricultural, and natural, 28 points from agricultural, and 23 points from urban land use in your data23 points from urban land use in your data
Field checks resulted in 41 points Field checks resulted in 41 points confirmed to be natural, 21 points confirmed to be natural, 21 points confirmed to be agricultural, and 19 points confirmed to be agricultural, and 19 points confirmed to be urban. confirmed to be urban.
What is overall accuracy of your data? What is overall accuracy of your data?
Root Mean Square ErrorRoot Mean Square Error RMSE = RMSE =
where cwhere cii is observed value and a is observed value and aii is true value is true value RMSE is the square root of sum of squared RMSE is the square root of sum of squared
ddifference between observed value (ci) and its ifference between observed value (ci) and its corresponding true value (ai) corresponding true value (ai)
Indicates how much observed value is Indicates how much observed value is deviated from true valuesdeviated from true values
In the case of positional accuracy, ai will be In the case of positional accuracy, ai will be derived from data with source in higher derived from data with source in higher accuracyaccuracy
RMSE: exerciseRMSE: exercise Let’s say you decide to Let’s say you decide to
write a test report on write a test report on positional accuracy of positional accuracy of NHPN dataNHPN data
You obtain data of You obtain data of sources with a higher sources with a higher positional accuracy such positional accuracy such as geodetic pointsas geodetic points
7 points (intersections) 7 points (intersections) are selected to be are selected to be compared to 7 compared to 7 corresponding control corresponding control points points
Distances for 7 pairs are Distances for 7 pairs are calculated as followscalculated as follows
What is RMSE?What is RMSE?
3. Validating accuracy3. Validating accuracy
Internal validationInternal validation Examines likely impacts of uncertainty Examines likely impacts of uncertainty
upon operation results within GISupon operation results within GIS What would be effects of different dWhat would be effects of different data ata
aggregation schemes on operation aggregation schemes on operation results?: MAUPresults?: MAUP
External validationExternal validation Validates accuracy of test data in Validates accuracy of test data in
reference to external data sourcesreference to external data sources How much is this data set accurate How much is this data set accurate
relative to reference data?: Conflationrelative to reference data?: Conflation
Modifiable Areal Unit Modifiable Areal Unit ProblemProblem
Quite simply, different aggregations yield Quite simply, different aggregations yield different resultsdifferent results FFrom Openshawrom Openshaw
Because sometimes geography does not have a Because sometimes geography does not have a natural unit of analysisnatural unit of analysis PPopulation, vegetationopulation, vegetation
Remember census unit is artificial boundary for Remember census unit is artificial boundary for the purpose of enumerationthe purpose of enumeration SSpace is used as a sampling schemepace is used as a sampling scheme
Question of optimal unit of analysisQuestion of optimal unit of analysis UUrban center boundary for analyzing urban activitiesrban center boundary for analyzing urban activities Metropolitan area for analyzing spatial labor marketMetropolitan area for analyzing spatial labor market
ConflationConflation DDescribes the range of functions that escribes the range of functions that
attempt to overcome differences between attempt to overcome differences between datasets or merge their contents as with datasets or merge their contents as with rubber-sheetingrubber-sheeting
Visual inspection of spatial oVisual inspection of spatial overlay of verlay of TIGER file over GPS measurementsTIGER file over GPS measurements
Lab2: working with data of different Lab2: working with data of different sources, conflating test data with data of sources, conflating test data with data of independent source (higher accuracy), independent source (higher accuracy), visual inspection of positional accuracy, visual inspection of positional accuracy, summarizing positional accuracy of test summarizing positional accuracy of test data with RMSEdata with RMSE