Date post: | 03-Jul-2015 |
Category: |
Data & Analytics |
Upload: | kim-boggio |
View: | 56 times |
Download: | 2 times |
LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES:A COMPARISON OF UNCERTAINTY AMONG DATATSETS
Kim BoggioMarch 2014
LAND COVER, POPULATION ESIMATES, AND STATE BOUNDARIES:A COMPARISON OF UNCERTAINTY AMONG DATATSETS
There are various techniques used in GIS to visualize land coverage, population estimates,
and state boundaries. This data is accessed through different sources; each dataset has its
advantages and disadvantages. This document evaluates the uncertainty associated with the
various methods used to present land cover, population, and state boundaries in a GIS
format.
This study is conducted as an outgrowth of work done for the Open Space Institute (OSI) in an
effort to identify forested lands adjacent to urban and suburban areas that would be
appropriate for acquisition.
Some background on the Open Space Institute:
• The Open Space Institute (OSI) protects scenic, natural, and historic landscapes to ensure public enjoyment, conserve habitats, and sustain community character.
• OSI achieves its goals through land acquisition, conservation easements, regional loan programs, fiscal sponsorship, creative partnerships, and analytical research. OSI has protected more than 100,000 acres through the New York land program through direct acquisition and conservation easements in the State of New York.
• Through the Conservation Finance Program, which provides low-cost bridge loans, OSI has assisted in the protection of an additional 1.6 million acres across the East Coast.
• The Research Program influences land use policy and practice through research, communication and training.
METHODS
The task associated with the OSI project was to identify land adjacent to urban and suburban
areas for open space acquisition. The solution was to use land cover, impervious cover,
federal lands, wilderness areas, and LANDSCAN datasets to identify those areas. After using
various raster and vector data it became obvious that there were large differences in scale,
coordinate systems, classification of data, and underlying attribute tables. This introduced
uncertainty, depending on the dataset used. The data used in the OSI project is analyzed
below to see how closely the raster and vector layers align.
The land cover rasters used in the study were the 30m grid from the National Land Cover
Database (NLCD) and the 200m grid land cover from National Atlas (See appendix A for the
respective websites). Population raster layers used were Impervious 30m from NLCD) and
LANDSCAN (.00833 X .00833 decimal degree cells – approximately 1010m X 488m) from
Oak Ridge National Laboratory. State boundary shapefiles were downloaded from National
Atlas, Tiger Data (US Census Bureau) and ESRI.
Land cover data was compared for similarities in classification, and visual clarity at different
map scales on an equal extent basis. Land cover was also compared using 500 random
points in a relatively small area at the head of the Chesapeake Bay. Population data was
analyzed for classification methods, and visual clarity at different map scales on an equal
extent basis. Finally, state boundaries shapefiles were investigated for how closely they
follow the Delmarva Peninsula coastline; this would reveal how closely the boundary files
match each other.
LANDCOVER RESULTS
The NLCD classifications used to analyze land cover are shown in Figure1.
Figure 1: NLCD Land cover classifications used to analyze the data
LAND COVER AREA BY CLASS ANALYSIS
Figure 2 shows the various land classes, the cell counts, and area by class for 30m and 200m
NLCD data in National Land Cover boundary zone 13.
One can see that cell counts are much lower for 200m data compared to 30m data, which
would be expected given that the 200m data is comprised of 40ha cells and 30m data is
0.9ha. It appears obvious from figure 2 that 30m and 200m land cover are representing the
land cover classes differently. For instance class 43 shows mixed forest area as 164,000,000
m² (16,400 ha) for 30m data and 1,940,280,000 m² (194,028 ha) for 200m data. There are
statistically significant differences between 30m and 200m data for urban, forested, and
herbaceous wetland cover classes as shown in Tables 2 and 3.
Table 1: 200m & 30m Land cover area analysis
CLASS DESCRIPTION30m
COUNT30m AREA
(m²)200m
COUNT200m AREA
(m²)
AREA DIFFERENCE
%
11 OPEN WATER 1,014,628 865,176,000 24,611 984,440,000 13.8%
21 LOW INTENSITY RESIDENTIAL 2,043,829 1,742,780,000 49,179 1,967,160,000 12.9%
22 HIGH INTENSITY RESIDENTIAL 1,859,436 1,585,550,000 10,389 415,560,000 73.8%
23 COMMERCIAL/ INDUSTRIAL/TRANSPORTATION 921,654 785,897,000 15,954 638,160,000 18.8%
24 HIGH INTENSITY URBAN 431,715 368,124,000 - - - - N/A
31 BARE ROCK/ SAND/ CLAY 349,813 298,286,000 - - - - N/A
32 QUARRIES/ STRIP MINES - - - - 2,455 98,200,000 N/A
33 TRANSITIONAL - - - - 1,890 75,600,000 N/A
41 DECIDUOUS FOREST 7,470,356 6,369,990,000 160,357 6,414,280,000 0.7%
42 EVERGREEN FOREST 1,235,803 1,053,770,000 24,979 999,160,000 5.2%
43 MIXED FOREST 192,967 164,543,000 48,507 1,940,280,000 1079.2%
51 SHRUBLAND - - - - - - - - 100.0%
61 ORCHARDS/ VINEYARDS - - - - - - - - N/A
71 GRASSLAND/ HERBACEOUS - - - - - - - - N/A
81 PASTURE/ HAY 6,504,092 5,546,050,000 186,453 7,458,120,000 34.5%
82 ROW CROPS 5,075,837 4,328,180,000 59,044 2,361,760,000 45.4%
83 SMALL GRAINS - - - - - - - - N/A
85 URBAN/ RECREATIONAL GRASSES - - - - 4,666 186,640,000 N/A
90EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 1,343,873 1,145,920,000 - - - - N/A
91 WOODY WETLANDS - - - - 18,548 741,920,000 N/A
92 EMERGENT HERBACEOUS WETLANDS - - - - 9,062 362,480,000 N/A
95EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 455,573 388,468,000 - - - - N/A
TOTALS28,899,57
6 24,642,734,000 616,094 24,643,760,000 0.0%
Table 2: General land cover class totals
CELL COUNT AREA (m²)
CELL COUNT AREA (m²)
% DIFFERENCE
FOREST LAND TOTAL 8,899,126 7,588,303,000 233,843 9,353,720,000 23.3%
RESIDENTIAL TOTAL 5,256,634 4,482,351,000 75,522 3,020,880,000 32.6%
FARMLAND TOTAL 11,579,929 9,874,230,000 245,497 9,819,880,000 0.6%
EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 1,799,446 1,534,388,000 27,610 1,104,400,000 28.0%
Table 3: General land cover class statistical analysis
Hypotheses: H0: φ1 = φ2 vs. HA: φ1 ≠ .φ2
TEST STATISTIC p1 φ2 SE p1- φ2 Z
FOREST LAND (CLASSES 41 – 43) Z= p1- φ2/√ φ1(1- φ)/n ) 0.308 0.380 0.017
-0.072 -4.26 P < .01
RESIDENTIAL (CASSES 21 – 24) 0.182 0.123 0.008 0.059 7.72 P < .01
FARMLAND (CLASSES 61 – 83) 0.401 0.398 0.017 0.002 0.13 N.S.
EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS (CLASSES 90 – 95) 0.062 0.045 0.003 0.017 5.71 P < .01
LAND COVER RANDOM POINT ANALYSIS
The random point analysis was performed on a relatively small area at the head of the
Chesapeake Bay. ArcMap generated 500 random points for the same geographic
coordinates in both 30m and 200 m land cover as depicted in Figure 5 below. The results of
the random point analysis were similar to the NLCD area by class analysis in that there are
statistically significant differences between 30m and 200m data for urban, forested, and
herbaceous wetland cover classes as shown in Tables 5 and 6. The random point analysis
provides an accurate cell to cell comparison for both resolutions of NLCD land cover data.
Figure 2: 200M & 30M land cover with random points
Table 4: 200M & 30M random point NLCD classification analysis
CLASS DESCRIPTIONPOINT COUNT
POINT COUNT
11 OPEN WATER 12 12
21 LOW INTENSITY RESIDENTIAL 31 48
22 HIGH INTENSITY RESIDENTIAL 38 8
23 COMMERCIAL/ INDUSTRIAL/TRANSPORTATION 19 8
24 HIGH INTENSITY URBAN 2
31 BARE ROCK/ SAND/ CLAY 4
32 QUARRIES/ STRIP MINES 3
33 TRANSITIONAL 1
41 DECIDUOUS FOREST 127 116
42 EVERGREEN FOREST 23 15
43 MIXED FOREST 2 53
51 SHRUBLAND
61 ORCHARDS/ VINEYARDS
71 GRASSLAND/ HERBACEOUS
81 PASTURE/ HAY 125 153
82 ROW CROPS 91 62
83 SMALL GRAINS
85 URBAN/ RECREATIONAL GRASSES 1
90EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 22
91 WOODY WETLANDS 14
92 EMERGENT HERBACEOUS WETLANDS 5
95EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 4
TOTALS 500 499
Table 5: General land cover class random point totals
LAND CLASSPOINT COUNT
30mPOINT COUNT
200m% DIFFERENCE
FOREST LAND TOTAL 152 184 17.4%
RESIDENTIAL TOTAL 90 64 40.6%
FARMLAND TOTAL 216 215 0.5%
EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS 26 19 36.8%
Table 6: General land cover class random point statistical analysis
LAND CLASSTEST
STATISTIC p1 φ2 SE p1- φ2 Z
FOREST LAND (CLASSES 41 - 43)Z= p1- φ2/√ φ1(1- φ)/n ) 0.304 0.369 0.017 -0.065 -3.89 P < .01
RESIDENTIAL (CASSES 21 - 24) 0.180 0.128 0.008 0.052 6.48 P < .01
FARMLAND (CLASSES 61 - 83) 0.432 0.431 0.018 0.001 0.06 N.S.
EMERGENT HERBACEOUS WETLANDS/ WOODY WETLANDS (CLASSES 90 - 95) 0.052 0.038 0.003 0.014 5.32 P < .01
LAND COVER RESOLUTION ANALYSIS
On the following page are two maps at 1:75,000 scale. The 30m map still provides sharp
delineations of land cover and boundaries; the 200m map of the same area shows a blur of
cells. The 200m map may be appropriate for macro view analysis, but inappropriate for
analysis on a small scale. The 200m map falls apart below a 1:1,000,000 scale; whereas the
30m map is still useful at 1:50,000.
Figure 3: 30M LAND COVER 1:75,000
Figure 4: 200M LANDCOVER 1:75,000
LANDSCAN RESULTS
LANDSCAN CLASSES
The LANDSCAN database provides population values for cells with an area of 54 ha at 40˚
latitude. The screen shot below shows that LANDSCAN symbology has population class
breaks at 5, 25, 50, 100, 500, 2,500, 5,000 and 130,000 people per cell. So each cell has an
actual population number associated with it (see Table 8). The NLCD Impervious dataset has
only a relative scale of population with no values appearing in the attribute table. However the
Impervious cells are 30m and may be useful for cursory analysis.
Figure 5: LANDSCAN Symbology
Table 8: LANDSCAN zonal statistics by NLCD class
CLASS DESCRIPTION COUNT AREA (m) MEAN STD SUM
NO. PEOPLE/
HA
11 OPEN WATER 732 747,490,000 83 361 60,810 0.8
21 LOW INTENSITY RESIDENTIAL 1,386 1,415,330,000 491 646 680,973 4.8
22 HIGH INTENSITY RESIDENTIAL 1,299 1,326,490,000 662 1,052 860,134 6.5
23COMMERCIAL/ INDUSTRIAL/TRANSPORTATION 659 672,945,000 1,171 1,532 771,940 11.5
24 DEVELOPED/ HIGH INTENSITY 326 332,899,000 1,658 2,114 540,591 16.2
31 BARE ROCK/ SAND/ CLAY 201 205,253,000 141 246 28,388 1.4
41 DECIDUOUS FOREST 4,384 4,476,770,000 121 296 532,410 1.2
42 EVERGREEN FOREST 229 233,846,000 133 258 30,446 1.3
43 MIXED FOREST 48 49,015,700 60 180 2,862 0.6
81 PASTURE/ HAY 4,703 4,802,520,000 113 263 530,065 1.1
82 ROW CROPS 3,390 3,461,740,000 118 320 398,785 1.2
90EMERGENT HERBACEOUS/ WOODY WETLANDS 463 472,798,000 230 559 106,560 2.3
95EMERGENT HERBACEOUS/ WOODY WETLANDS 276 281,841,000 120 709 33,092 1.2
TOTALS 18,096 18,478,937,700 5,102 8,5354,577,05
6
POPULATION ESTMATE RESOLUTION ANALYSIS
Similar to the land cover datasets, the large cell LANDSCAN map falls apart at scales under
1:1,000,000. The NLCD Impervious layer aligns nicely with the LANDSCAN layer (Figure 6),
however LANDSCAN provides actual population estimates.
Figure 6: LANDSCAN & NLCD IMPERVIOUS 1:1,500,000
Figure 7: LANDSCAN & NLCD IMPERVIOUS 1:50,000
STATE BOUNDARY ANALYSIS
Three state boundary shapefiles were used in this analysis: National Atlas, Tiger Data and
ESRI. All three shapefiles were compared to the 30m land cover layer. The National Atlas
data seemed to provide the most accurate boundaries, followed by Tiger data and ESRI (see
figure 8). ESRI boundaries were all encompassing; on a small scale they provided little detail.
However the ESRI attribute table provided detailed information for each state that would be
useful for demographic studies such as area, population, number of households, etc.
National Atlas and Tiger Data attribute tables only provide information regarding the state
boundary polygon shapes.
Figure 8: State boundaries
STATE BOUNDARIES – NATIONAL ATLAS, TIGER DATA, ESRI SHAPEFILES 1:75,000
RED BOUNDARY – NATIONAL ATLAS STATE SHAPEFILEBLUE BOUNDARY – TIGER STATE SHAPEFILE
MAGENTA BOUNDARY – ESRI STATE SHAPEFILE
CONCLUSIONS
The following recommendations are made in order to reduce uncertainty in the Open Space
Institute project:
LAND COVER
• The 30m NLCD land cover rasters provide much more visual detail at small scales.
• There are statistically significant differences in which the different NLCD classes are represented in 30m and 200m NLCD data.
• 30m land cover data is appropriate for most analyses; 200m data may be used for a “macro” analysis.
POPULATION ESTIMATES
• NLCD Impervious layer may be used for cursory analysis; LANDSCAN is appropriate for detailed analysis because of the inclusion of actual population by cell data.
STATE BOUNDARIES
• The National Atlas state boundary shapefile is more accurate than Tiger Data and ESRI, and should be used for most analyses.
• The ESRI state boundary shapefile is appropriate for demographic studies.
DATA SOURCES
http://www.mrlc.gov/nlcd_multizone_map.php
Multi – Resolution Land Characteristics Consortium (MRLC) includes:National Land Cover Database (NLCD) multi-zone download site. NLCD 2001 includes 21 classes of Land Cover, Percent Tree Canopy,and Urban Imperviousness at 30m cell resolution.
The Urban Imperviousness layer aligns nicely with the LANDSCAN data.
http://www.epa.gov/mrlc/nlcd-2001.html
The EPA site for NLCD data.
http://www.ornl.gov/sci/landscan/
The LANDSCAN Dataset comprises a worldwide population database compiled on a 30" X 30" latitude/longitude grid. Census counts are based on proximity to roads, slope, land cover, nighttime lights, and other information.
http://eros.usgs.gov/products/elevation.html
The USGS site provides Digital Elevation Models (DEM) at 30m resolution. You can download seamless 7.5 degree quads from this site. Also access to the National Map Seamless Server.
http://www.nationalatlas.gov/
Access to National Atlas Seamless server. The OSI map includes state and county boundaries and Federal land locations downloaded from this site. National Atlas also has 200m resolution land cover maps.
http://www.census.gov/geo/www/tiger/
The Census Bureau is home to Tiger data shapefiles