Accuracy and Uncertainty
Uncertainty
Variability
Temporal Individual Heterogeneity Spatial
Incomplete Knowledge
Model Parameter Decision
Accuracy and Uncertainty
Why is this an issue?
What is meant by accuracy and uncertainty (data vs rule)?
How things have changed in a digital world.
Spatial data quality issues (Metadata).
Why an Issue?
Imperfect or uncertain reconciliation[science >< practice][concepts >< application][analytical capability >< social context]
It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable
Changing (temporal) patterns (e.g., census)Fractals (e.g., coastlines)
Why an Issue?GIGO (?)
Often little is known of the input data quality, and far too much is assumed about the output quality
GIS fosters data merging
Single municipal database
Taxation
Engineering
Planning
Accuracy and Uncertainty
Why is this an issue?
What is meant by accuracy and uncertainty (data vs rule)?
How things have changed in a digital world.
Spatial data quality issues (Metadata).
Accounting for uncertainty.
Sources of ErrorMeasurement error: different observers, measuring instruments, etc.
ErrorEnvironment
Procedure
Setup Observer
Software
Equipment
Sources of ErrorSpecification error: omitted variables(regression discussion)
Ambiguity, vagueness and the quality of a GIS representation
Blunders or spurious errors
Uncertainty has become a catch-all for ‘incomplete’ representations or a ‘quality’ measure. It can be characterized and managed but not eliminated.
Uncertainty
Uncertainty: a reflection of our imperfect and inexact knowledge of the world.
Data uncertainty: our observations or measurements encompass ambiguity / inherent variability.
(e.g., standard deviation)Rule uncertainty: how we reason with observations—we are unsure of the conclusions we can draw from even perfect data. (e.g., linear versus non-linear models)
Real World
Conception
Measurement & Representation
Analysis
Uncertainty: Data to Rule
Data UncertaintySpatial uncertainty
Natural geographic units?Multivariate extensions?
Census vs samplesVagueness
Statistical, cartographic (i.e., generalization), cognitiveAmbiguity
Values, language (semantics, ontologies)
http://xkcd.com/776/
Data Uncertainty (& MAUP)
RegionsUniformity (one or all aspects?)
# of regions r48 .218924 .296312 .57576 .76493 .9902
Relations typically grow stronger when based on larger geographic units (statistical uncertainty appears to decrease while our ability to assign characteristics to individuals decreases)
Dealing with Data UncertaintyIn fuzzy set theory, it is possible to have partial membership in a set
membership can vary, e.g., from 0 to 1this adds a third option to classification: yes, no, and possibly
Fuzzy approaches have been applied to the mapping of soils, vegetation cover, and land use
Fuzzy Membership FunctionsM
embe
rshi
p
Height50
1
6
Short Medium Tall
Measurement/RepresentationRepresentational models filter reality differently
Vectorcrisp
Rastercan be fuzzy
0.9 – 1.0
0.5 – 0.9
0.1 – 0.5
0.0 – 0.1
Forest
Savannah
Forest membership
Measures of UncertaintyGiven that uncertainty exists in every layer in a GIS, and that decisions are made based on analyses of that data, how can we quantify the ‘certainty’ we have in our data?
The measures of uncertainty we can use are dependent on the type of data we are working with (NOIR).
Measures of central tendencyMeasures of dispersion
Measures of Uncertainty
How to measure the accuracy of nominal attributes?e.g., a vegetation cover map
The confusion or misclassification matrixcompares recorded classes (the observations) with classes obtained by a more accurate process, or from a more accurate source (the reference)
Collecting the Reference Data
Examining every parcel (or pixel) may not be practical
Rarer classes should be sampled more often in order to reliably assess accuracy
sampling is often stratified by class(recall notes on spatial sampling) °
°
°
°°
°°
°°
°
°
°
The bolded numbers reflect correct classification (i.e., where the land use in the database equaled the land use observed in the field). The off-diagonal numbers reflect incorrect land use records in the database.
A B C D E Total
A 80 4 0 15 7 106
B 2 17 0 9 2 30
C 12 5 9 4 8 38
D 7 8 0 65 0 80
E 3 2 1 6 38 50
Total 104 36 10 99 55 304
Misclassification Matrix
Land
use
: in
the
data
base
Land use: in the field
The bolded numbers reflect areas where the land use has not changed. The off-diagonal numbers reflect areas where the land use has changed over time. Looks similar to a confusion matrix, but this time reflects a real change.
A B C D E Total
A 80 4 0 15 7 106
B 2 17 0 9 2 30
C 12 5 9 4 8 38
D 7 8 0 65 0 80
E 3 2 1 6 38 50
Total 104 36 10 99 55 304
Transition Matrix
Land
use
: at t
ime
A
Land use: at time B
Markov models
Misclassification StatisticsPercent correctly classified
total of diagonal entries divided by the grand total, times 100
209/304*100 = 68.8%but chance would give a score of better than 0
Kappa statistic (more)
normalized to range from 0 (chance) to 100evaluates to 58.3%
Per-Polygon and Per-Pixel Assessment
Error can occur in the attributes of polygons, but it can also occur in the positions of the boundaries
better to conceive of the map as a field, and to sample pointsthis also reflects how the data are likely to be used (to query the class at a point)
AssessmentAn example of a vegetation cover map. Two strategies for accuracy assessment are available: to check by area (polygon), or to check by point. In the former case a strategy would be devised for field checking each area, to determine the area's correct class. In the latter, points would be sampled across the state and the correct class determined at each point.
It will be much easier to
sample using points.
Measures of Uncertainty
Errors typically distort interval / ratio measurements by small amounts (unless a blunder or outlier)
Accuracy refers to the amount of deviation from the ‘true’ value
Precisionthe variation among repeated measurements, and also the amount of detail in the reporting of a measurement
The term precision is often used to refer to the repeatability of measurements. On the left, successive measurements have similar values (they are precise), but show a bias away from the correct value (they are inaccurate). On the right, precision is lower but accuracy is higher (no apparent bias).
Precision
Reporting Measurements
The amount of detail in a reported measurement [precision] (e.g., output from a GIS) should reflect its accuracy
“14.4 m” implies an accuracy of ±0.05 m“14 m” implies an accuracy of ±0.5 m
Excess precision should be removed by rounding
Measuring AccuracyRoot Mean Square Error is the square root of the average squared error or deviation
the primary measure of accuracy in map accuracy standards and GIS databasese.g., elevations in a digital elevation model might have an RMSE of 6.1m (TRIM Specs)the abundances of errors of different magnitudes often closely follow a Gaussian or normal distribution
Standard deviations
where yi is an elevation from the DEM, yj is the "true" known or measured elevation of a test point
and N is the number of sample points.
Uncertainty based on an assumed RMSE of 7 m. The Gaussian distribution with a mean of 350 m and a standard deviation of 7 m gives a 95% probability that the true location of the 350 m contour lies in the colored area, and a 5% probability that it lies outside.
Visualizing Uncertainty
A Useful Rule of Thumb
Positional accuracy of features on a paper map is roughly 0.5mm on the map
e.g., 0.5mm on a map at scale 1:20,000 gives a positional accuracy of 10m
this is approximately the U.S. National Map Accuracy Standard, used by BC TRIM
and also allows for digitizing error, stretching of the paper, and other common sources of positional error
Correlation of ErrorsAbsolute positional errors may be high
reflecting the technical difficulty of measuring distances from the Equator and the Greenwich Meridian, or elevations from mean sea level or the geoide
Relative positional errors over short distances may be much lower
positional errors tend to be strongly correlatedover short distances
As a result, positional errors can largely cancel out in the calculation of properties such as distance or area
Rule Uncertainty
Measures of central tendency: Mean (Arithmetic, geometric, harmonic), median or mode?The form of the relation? (linear, nonlinear, multiple regression models)Alternative approaches to many operations such as slope, spatial interpolation.
More than just incomplete knowledge.
Interpolation methods
What is the slope?
Use a 2x2 windowor a 3x3 window?
Maximum difference(2262 - 2017)
Best fitting surface
Several other options
Slope Uncertainty
Slope UncertaintySAGA GIS has nine different slope derivation methods
Accuracy and Uncertainty
Why is this an issue?
What is meant by accuracy and uncertainty (data vs rule)
How things have changed in a digital world.
Spatial data quality issues (Metadata)
Accounting for uncertainty
The Digital Divide: Paper Maps
Standards, such as the National Map Accuracy Standard (NMAS), defined positional accuracy but little else.
Most maps were made by governmental mapping agencies with typically high standards.
Scale, accuracy and resolution are linked on paper maps.
The Digital Divide: Digital Data
Digital data can come from anyone (unknown standards)
Data entry can create problemsDigitizing errorsConflation of data layersDatum differences, scale differencesGeoregistration (rubber sheeting)Mixed pixels
Mandates
An OpenStreetMap.org trace
Filter FilterMandates
Software Data
GIS
Trust in the data provider?How the ‘big three’ of online maps compare to a more precise source.
Geocoding results
Bing Maps Google Maps Yahoo! Maps
Threshold Quantile 95% 67.08 15.36 42.62
Threshold Outlier 111.76 40.81 68.41
Digitizing errors
Data Entry
Map conflation problems
Data entryA common source of error—working directly off-of an aerial photograph (ignoring relief displacement)
But even orthophotosaren’t perfect.
Georegistration error?
Rubber sheet transformation
Analysis: Error PropagationAddresses the effects of errors and uncertainty on the results of GIS analysis
Typical mathematical or statistical approaches are not appropriate.
Almost every input to a GIS is subject to error and uncertaintyIn principle, every output should have confidence limits or some other expression of uncertainty
Showing your [uncertainty] colours
Geostatistical interpolation (kriging)
Zinc data: Predicted values Zinc data: Estimated standard deviation
Accounting for Uncertainty
Traditional statistics vsspatial statistics
spatial autocorrelation is assumed when using spatial data, but it is the antithesis to aspatialstatistics [IID]Analytical error propagation is not possible in such cases.
Vision Vancouver vs NPA in the 2008 elections
Accounting for UncertaintyMonte Carlo Simulation (MCS) or sensitivity analysis
Elevation
Add random value
Determine slope
Repeat 100 times
How likely is the slope we determinethe actual slope value,given that we know theelevation values have anunderlying uncertainty?
Iterators in ArcMap ToolboxGenerate a random raster
Examples of Iterators
Three realizations of a model simulating the effects of error on a digital elevation model. The three data sets differ only to a degree consistent with the known error. Error has been simulated using a model designed to replicate the known error properties of this data set – the distribution of error magnitude, and the spatial autocorrelation between errors.
Error Modelling: M C Simulation
Error in the measurement of the area of a square 100 m on a side. Each of the four corner points has been surveyed; the errors are subject to bivariate Gaussian distributions with standard deviations in xand y of 1 m (dashed circles). The red polygon shows one possible surveyed square (one realization of the error model).
In this case the measurement of area is subject to a standard deviation of 200 sq m; a result such as 10,014.603 is quite likely, though the true area is 10,000 sq m. In principle, the result of 10,014.603 should be rounded to the known accuracy and reported as 10,000 sq m.
Error Modelling
Living with Uncertainty
It is easy to see the importance of uncertainty in GISbut much more difficult to deal with it effectivelybut we may have no option, especially in disputes that arelikely to involve litigation
Some Basic PrinciplesUncertainty is inevitable in GISData obtained from others should never be taken as truth
efforts should be made to determine quality
Effects on GIS outputs are often much greater than expected
there is an automatic tendency to regard outputs from a computer as the truth
Some Basic Principles
Use as many sources of data as possibleand cross-check them for accuracy
Be honest and informative in reporting resultsadd relevant caveats and cautions
Interactive Map DisclaimerData presented should be considered in conjunction with the report. Product demand projections detailed in the report will vary depending on the amount of retrofit activity the 1200 Buildings program stimulates and the extent of retrofit that is implemented in buildings. As there are inherent uncertainties in predicting demand from future activity the product demand projections and findings may not be accurate. Specific advice should be sought and further detailed analysis undertaken before acting or relying on the product demand projections
Accuracy and Uncertainty
Why is this an issue?
What is meant by accuracy and uncertainty (data vs rule)
How things have changed in a digital world.
Spatial data quality issues (Metadata)
MetadataDefinition: Metadata are "data about data“
They describe the content, quality, condition, and other characteristics of data. Metadata help a person to locate and understand data.
In epistemology the word meta means "about (its own category)“. Thus metadata is "data about the data".
MetadataLineage (author, manipulations)
Positional accuracy (∆x, ∆y, ∆z)
Attribute accuracy (misclassification)
Completeness (how thorough?)
Logical consistency (turn tables)
Semantic accuracy (one-to-many)
Temporal information (date obtained, entered into system)
Air photo taken
Event happens
Air photo entered into
GIS
Why event not
recorded?
Examples of Metadata
IdentificationTitle? Area covered? Themes? Currentness? Restrictions?
Data QualityAccuracy? Completeness? Logical Consistency? Lineage?
Spatial Data OrganizationVector? Raster? Type of elements? Number?
Spatial ReferenceProjection? Grid system? Datum? Coordinate system?
Examples of MetadataEntity and Attribute Information
Features? Attributes? Attribute values?Distribution
Distributor? Formats? Media? Online? Price?Metadata Reference
Metadata currentness? Responsible party?
A classification of uncertainty
ConclusionUncertainty is ever-present in GIS analyses—from the source data through to the analyses and to the final output.It is important to be conscious of uncertainty in your data / analyses / presentation and, if necessary, determine the impact of it on your results. (MCS)Knowledge of uncertainty allows us to make estimates of the confidence limits of the results.Metadata!