Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | sasavukoje |
View: | 221 times |
Download: | 0 times |
of 62
8/10/2019 Arcs Can
1/62
G-141/3.36.06
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Scanning Data EntrySolutions forARC/INFO GIS
An ESRI White Paper
Contents Page
Executive Summary 1
Evaluating Scanning Data Entry 7
ESRI Scanning Data Entry SolutionsA GIS Focus 31
Glossary 45
8/10/2019 Arcs Can
2/62
Copyright 1995 Environmental Systems Research Institute, Inc.All rights reserved.Printed in the United States of America.
The information contained in this document is the exclusive property of Environmental Systems ResearchInstitute, Inc. This work is protected under United States copyright law and other international copyrighttreaties and conventions. ESRI grants the recipient of the ESRI information contained herein the right to
freely reproduce, redistribute, rebroadcast, and/or retransmit this information for personal, noncommercialpurposes including, teaching, classroom use, scholarship, and/or research, subject to the fair use rightsenumerated in Section 107 and 108 of the Copyright Act (Title 17 of the United States Code). No part of thiswork may be reproduced or transmitted for commercial purposes in any form or by any means, electronic ormechanical, including photocopying and recording, or by any information storage or retrieval system, exceptas expressly permitted in writing by Environmental Systems Research Institute, Inc. All requests should besent to Environmental Systems Research Institute, Inc., 380 New York Street, Redlands, CA 92373 USA,Attention: Contracts Manager.
The information contained in this document is subject to change without notice.
RESTRICTED RIGHTS LEGENDUse, duplication, and disclosure by the government are subject to restrictions as set forth in FAR 52.227-14Alternate III (g)(3) (JUN 1987), FAR 52.227-19 (JUN 1987), or DFARS 252.227-7013 (c)(1)(ii) (OCT
1988), as applicable. Contractor/Manufacturer is Environmental Systems Research Institute, Inc., 380 NewYork Street, Redlands, CA 92373 USA.
ESRI, ARC/INFO, PC ARC/INFO, ArcView, and ArcCAD are registered trademarks; ARC COGO, ARCNETWORK, ARC TIN, ARC GRID, ARC/INFO LIBRARIAN, ARCPLOT, ARCEDIT, TABLES,Application Development Framework (ADF), ARC Macro Language (AML), Avenue, FormEdit, ArcSdl,ArcBrowser, ArcDoc, ARCLine, ARCSHELL, IMAGE INTEGRATOR, DATABASE INTEGRATOR, DBIKit, WorkStation ARC/INFO, ArcTools, ArcStorm, ArcScan, ArcExpress, ArcPress, Mapplets, SPATIALDATABASE ENGINE (SDE), PC ARCEDIT, PC ARCPLOT, PC DATA CONVERSION, PC NETWORK,PC OVERLAY, PC STARTER KIT, PC ARCSHELL, Simple Macro Language (SML), ArcUSA, ArcWorld,ArcScene, ArcCensus, ArcCity, the ESRI corporate logo, the ESRI globe logo, the ARC/INFO logo, the ARCCOGO logo, the ARC NETWORK logo, the ARC TIN logo, the ARC GRID logo, the ARCPLOT logo, theARCEDIT logo, the Avenue logo, the ArcTools logo, the ArcStorm logo, the ArcScan logo, the ArcExpresslogo, the PC ARC/INFO logo, the ArcView logo, the ArcCAD logo, the ArcData logo, ARCware,ARC News,ArcSchool, ESRITeam GIS, ESRIThe GIS People, GIS by ESRI, ARC/INFOThe World's GIS,Geographic User Interface (GUI), Geographic User System (GUS), Your Personal Geographic InformationSystem, and Geographic Table of Contents (GTC) are trademarks; and the ArcData Publishing Program,ARCMAIL, ArcQuest, ArcWeb, and Rent-a-Tech are service marks of Environmental Systems ResearchInstitute, Inc.
The names of other companies and products herein are trademarks or registered trademarks of their respectivetrademark owners.
8/10/2019 Arcs Can
3/62
8/10/2019 Arcs Can
4/62
Executive Summary
2 G-141/3.36.06
March 1994
Air Photos Maps
Overview of Scanning Data Entry Pathways
Scanner
RasterDatabase
ARC/INFOCoverageDatabase
Vectorize Raster-to-vector
conversion
Vectorize Heads-up
digitizing
Database merge with Existing vector data COGO data CAD data
Georeference Raster edit Tiling Raster-to-raster
Conversion
8/10/2019 Arcs Can
5/62
Executive Summary
G-141/3.36.06 3
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Costs and Benefitsof ScannedData Entry
Cost analysis of GIS projects shows that database automation often
accounts for more than 75 percent of the total project expense.
Scanning data entry is a viable cost-reduction alternative for this most
expensive GIS componentdata automation.
The cost of scanning data entry has decreased dramatically since 1990.
Even agencies with limited budgets and relatively small data automa-
tion projects are discovering that a scanning data entry strategy makes
good economic sense. Database automation strategies based on
scanning are cost-competitive with, and in some cases can be
significantly less expensive than, other methods.
Scanning technology is no longer the data capture solution of the
distant future, but is quickly becoming the preferred method of data
capture for GIS. Data automation methods within ESRI's Database
Development Group have shifted dramatically in recent years to
scanning data entry as the preferred solution.
Is Scanning DataEntry Appropriatefor Your Project?
Scanning data entry provides many advantages to GIS users.
Scanning is generally faster and a great deal more accurate than table
digitizing. Scanned data that are subsequently vectorized have more
consistent coordinate placement than data entered through manual
digitizing. Scanning data entry methods can be easily learned andused by existing staff. Scanning data entry can be used to generate
application-specific data that are not commercially available. Recent
technical advances in hardware and software have tailored scanning
data entry capabilities specifically for GIS requirements.
GIS users should carefully evaluate scanning data entry in the context
of project requirements, which can vary greatly. Factors to consider
include data sources and their availability, map quality, update
frequency, data volume, accuracy requirements, and system capacity.
As you consider various data automation options, the specific require-ments of your project will guide your analysis of the alternatives. For
example, if your GIS application uses street centerline data with
address ranges, you may find standard "off-the-shelf" data from a
commercial data vendor a good solution. But commercial data may
8/10/2019 Arcs Can
6/62
Executive Summary
4 G-141/3.36.06
March 1994
not exist for all of your organization's needs. For example, a data
layer such as property boundaries (i.e., land parcels) may not be
commercially available.
When you prefer to do data automation in-house, scanning data entry
takes no more time than table digitizing and offers better coordinate
accuracy and consistency without the random errors often associated
with manual methods. If you have an existing digital database and
need to perform only a low volume of intermittent updates, table
digitizing these updates may be appropriate. Even so, incremental
updates from scanned documents are a feasible alternative.
The available data sources will also influence the feasibility of
scanning data entry. The scanning data entry alternative is most
appropriate when the data do not already exist in digital form but do
exist in document form. Scanning data entry does require a source
document of some kind. If these documents are of poor quality,
scanning data entry can be effective but will require more operator data
cleanup. Scanning data entry is most useful and cost-effective when a
high-quality data source (e.g., maps or air photos) is available.
Feature layers on separate documents reduce processing requirements.
Some applications may be able to use raster data obtained without
document scanningfor example, satellite or airborne scanner dataprovided on digital media. Scanning data entry technology offers
software tools that take advantage of commercially available raster
data.
When planning scanning data entry projects, it is very important to
spend time assessing your needs before implementing a solution.
Accuracy requirements and the characteristics of your source
documents will determine the most appropriate hardware and software
package. For example, the need for raster integration may require
additional disk storage or a more powerful CPU. If you work mainly
with air photos and do not have stringent accuracy constraints, youshould consider heads-up digitizing as a scanning data entry option.
If you have good digital geodetic control and a complete series of
plats, raster-to-vector conversion using that control can be an effective
strategy. Used for appropriate applications and implemented
8/10/2019 Arcs Can
7/62
Executive Summary
G-141/3.36.06 5
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
correctly, scanning can be the most cost-effective and efficient method
for capturing your data.
ESRI Solutionsfor Scanning
Data Entry
ESRI has used scanning data entry technology for many years. The
ESRI Database Automation Group has adopted scanning data entry as
a preferred methodology. ESRIsoftware has supported raster data
sets for many years, and a new ESRI software product called
ArcScanfocuses specifically on providing scanning data entry
software tools. ArcScan is closely integrated with the rest of
ARC/INFO in a single software environment. Thus, all the
functionality of ARC/INFO can be combined with specialized
software for scanning data entry. ESRI can provide turnkey scanning
data entry systems through reseller agreements with industry-leading
hardware vendors. Many other companies have joined ESRI's open
systems approach and offer complementary capabilities that work with
the ARC/INFOdata structures and user interface. ESRI scanning
data entry solutions are affordable, easy to use, and are integrated with
ESRI's advanced GIS data management technology. ESRI scanning
data entry solutions provide a clear and effective alternative for data
automation.
About ThisWhite Paper Scanning data entry technology offers a variety of tools formanipulating raster data and for converting raster data to vector data.A thorough evaluation of project needs and available data sources will
determine the tool or tools that work best for you. The next section
presents information to help you evaluate scanning data entry. ESRI's
rich toolset for scanning data entry and integrated data editing/data
management technology is described in the last section. A glossary
provides definitions for many of the specialized terms used in
scanning data entry.
8/10/2019 Arcs Can
8/62
Executive Summary
6 G-141/3.36.06
March 1994
8/10/2019 Arcs Can
9/62
G-141/3.36.06 7
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Evaluating ScanningData Entry
This section is designed to help you evaluate scanning dataentry. Careful evaluation is a key factor in ensuring your
success. If you take the time to think it through and
evaluate the options and trade-offs carefully, your projectwill benefit greatly. Evaluation considerations should
include the data available, the hardware and software, and
the methods or procedures.
EvaluatingData Sourcesfor Scanning
Data Entry
Understanding data sources is crucial to understanding how scanning
data entry can be used in your GIS. Scanning solutions are as diverse
as the data used in the GIS applications they support, because
scanning and vectorization requirements are determined by the data.
Two main categories of data are used with scanned data entry projects.
The first is paper or Mylar maps, containing line art, that are scanned
into bi-tonal (black-and-white) raster data sets, or images. Scanned
maps in raster format are usually converted to vectors using raster-to-
vector conversion programs. The second category of scanned
documents in wide use is aerial photographs, typically black and
white, that are scanned into a grayscale image. Scanned photos are
not conducive to automated raster-to-vector conversion techniques and
are often used in raster format. Scanned photos are often used in
vector conversions as visual background for heads-up digitizing.
While both types of documents lend themselves to scanning dataentry, each has different processing requirements. Scanned data entry
can build a GIS database using any or all of the following data
sources:
8/10/2019 Arcs Can
10/62
Evaluating Scanning Data Entry
8 G-141/3.36.06
March 1994
Line Work Maps Low-quality 36 x 44 (E format) maps. Ranging fromantique linen maps, to CAD plotter output, to blue lines, to as-
builts, the overwhelming majority of hard-copy maps are low
quality. Quality, in the scanning sense, refers to the quality of the
media itself and the problems the media presents to the scanning
and vectorization process, not to the informational quality of the
data on the document. This is the most common data source
category for scanned data entry applications.
High-quality E format maps. Typically, these are Mylar
maps with multiple data layers. They are more frequently found in
larger, rather than smaller, organizations and public agencies.
Media quality is highfor example, the lines on the media are
clear and crisp and the media will not have extraneous marks or
"noise." This type of data can have clutter, in the form of
annotation or unwanted data layers, which can complicate the
vectorization process.
High-quality, single-layer E format maps. Mylar
separates, such as topographic contours, are often available in this
form. Soil maps, separates from a production map series, and
specially prepared Mylars can fall in this category. Cadastral maps
on Mylar fit in this category if they have only one data layer (i.e.,only parcels). This type of bi-tonal, line art map data source is the
most amenable to scanning data entry because its high quality
requires the least pre- or post-processing. Hallmarks of high-
quality documents for scanning are a single data layer, absence of
clutter, presence of registration tics (georeference marks), and
clean, high-contrast media.
11 x 17 (B format) plat maps. Plat maps in this format are
found throughout the United States. Map quality will vary.
8/10/2019 Arcs Can
11/62
Evaluating Scanning Data Entry
G-141/3.36.06 9
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Line work can contain gaps. Inthis case, the gaps are caused by the
line symbology used to representintermittent streams. Gaps canalso be caused by low-quality data,resulting in noisy scanned output.
The ArcScan tracing tool can"jump" gaps to create continuous
vector output. Gap jumpingparameters can be modified to suit
data requirements.
Photographs andDigital Imagery
9 x 9 aerial photos. This source data category is widely used
with scanning data entry. Aerial photos are most often black and
white, with some in color when budget allows. The most common
use of scanned photography is to serve as a visual backdrop or
"reality check" to other data. Heads-up digitizing uses scannedphotographic images to guide operator coordinate capture by
"tracing" features from a display screen. Air photos are often the
most up-to-date data source. You can usually scan air photos at a
much lower resolution than line art. This can help compensate for
the greater storage requirements of grayscale images. Visual
display has less stringent resolution requirements than raster-to-
vector conversion.
Much photography being scanned today is uncorrected.
Uncorrected photography, while relatively inexpensive and useful in
many applications, should be used with care. Even though vectordata derived from uncorrected photography will overlay properly on
its source image, it is only accurate relative to its source data.
However, vector data converted from uncorrected photo images are
8/10/2019 Arcs Can
12/62
Evaluating Scanning Data Entry
10 G-141/3.36.06
March 1994
likely to misregister with data from other sources. In addition,
uncorrected photographic images may not merge well with adjacent
images, and measurements made on uncorrected images will be
incorrect.
Large format photography. Large format photos (e.g.,
24 x 30 or 30 x 30) are available in a variety of scales. 1:24,000
orthophoto quads, usually black and white, have been produced
for large areas (e.g., statewide coverage). Orthocorrection can be
applied photographically or digitally. That is, orthophotos can be
scanned directly and used without need of coordinate correction,
or uncorrected stereophotos can be scanned and then
orthocorrected in digital form. The output of digital orthophotos
can be at any scale appropriate for the resolution of the image.
Other image data from airborne scanners or satellites.
LANDSAT and SPOT images are commercially available (e.g.,
through the ArcDataSMprogram) and typically offer ten- to thirty-
foot resolution. Airborne scanners linked with GPS receivers can
produce images with much greater resolution (pixel resolution of
less than one foot, for example).
Digital DataConcepts The raster data format is a cellular data format well suited for storingimages or maps. A raster data set is like a carpet of cells overlayingthe map where each cell has a value representing the corresponding
value beneath it in the map. For example, a raster data set of a
scanned map will have pixel values that correspond to the brightness
of the light reflected from the map.
This figure illustrates how apolygon, line, and point feature
would be stored in an x,ycoordinate system (vector) and a
row, column system (raster).
Polygon
Point
Line
x-axis
y-axis columns
rows
Vector Raster
8/10/2019 Arcs Can
13/62
Evaluating Scanning Data Entry
G-141/3.36.06 11
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
A raster data set can be bi-tonal, grayscale, or color (often satellite
images are displayed as false color images). Raster data can bedirectly useful in a GIS. For example, a scanned air photo can be
used as a backdrop to other infrastructure data, such as roads or
sewers, using ARC/INFO IMAGE INTEGRATORcapabilities.
The ARC/INFO GRIDextension uses the raster data format for
complex spatial analysis. Raster data sets tend to be largein the
multi-megabyte rangeand have special data processing needs.
Header record
Raster data file
Pixel data
.
.
.end of file
Raster data can be organized in a number of ways depending upon the
particular raster format. Typically, the raster data file contains a
header record that stores information about the data such as the
number of rows and columns, the number of bits per pixel, the colorrequirements, and the georeferencing information. Following the
raster header is the actual pixel data for the image. The internal
organization of the raster data is dependent upon the raster format.
Some formats contain only a single band of data, while others contain
multiple bands.
When planning your scanning data entry project, you should give
special attention to input resolution. Input resolution is the number of
pixels per inch in both x and y dimensions of the digital snapshot.
Most scanners allow some control over input resolution. In general,
you should try to reduce data storage requirements by choosing thelowest resolution that will cleanly capture your data. Some
experimentation will be required as you "fine tune" your scanning data
entry methods.
The input resolutions shown in Table 1 are typical, but not absolute
resolutions. Resolution is expressed in dots per inch (dpi). Doubling
resolution (e.g., from 400 dpi to 800 dpi) can have the effect of
quadrupling data set size (compressed formats will show less increase
in size). Scanning data at a resolution greater than that required by the
source document will only increase data storage requirements with no
appreciable improvement in data quality. Unneeded input resolutioncan even create processing problems by exaggerating errors in poor-
quality data (e.g., additional white noise).
8/10/2019 Arcs Can
14/62
Evaluating Scanning Data Entry
12 G-141/3.36.06
March 1994
TABLE 1Raster Data Sources
Type ofSource Data
T y p i c a lDataExample
Typical Usein ScanningData Entry
T y p i c a lScanner (Input)R e s o l u t i o n
Raster DataFormat
Typical RasterData Size
Low quality36 x 44 inch(E format) map
As-builts,blue lines
Raster-to-vectorconversion usinginteractive techniquessuch as raster cleanupand line following
400 dpi RLC or GRID bi-tonal (compressed)
6 megabytes
High quality Eformat map
Contours onMylarseparates
Raster-to-vectorconversion usinginteractive (multipledata layers or clutter) or
batch (single data layer)
400 to 800 dpi(depending on datatype and quality)
RLC or GRID bi-tonal (compressed)
6 to 30megabytes
11 x 17 inch(B format) map
Plat map Raster-to-vectorconversion
400 to 500 dpiresolution variesby source dataquality
RLC or GRID bi-tonal (compressed)
1 to 4 megabytes
9 x 9 inch aerialphoto (black andwhite)
Standard airphoto
Visual backdrop forother data entrymethods such asCOGO or heads-updigitizing. Ortho-photo production
200 dpi TIFF(uncompressed)
4 megabytes
9 x 9 inch aerial
photo (color)
Standard air
photo (color)
Visual backdrop for
other data entrymethods such asCOGO or heads-updigitizing. Orthophotoproduction
200 dpi TIFF
(uncompressed)
4 megabytes
(8 bit color),12 megabytes(24 bit color)
Large formataerial photo(30 x 30 inch)
Orthophoto Visual backdrop forother data entrymethods such asCOGO or heads-updigitizing. Ortho-photo production
200 dpi TIFF(uncompressed)
36 megabytes
Three band satelliteimage
EOSAT orSPOT Imagedata
Visual backdrop forheads-up digitizing ofmajor roads
Purchased inraster format
Band interleaved,etc.(uncompressed)
Varies with typeof data and area ofcoverage, 2 to600 megabytes
8/10/2019 Arcs Can
15/62
Evaluating Scanning Data Entry
G-141/3.36.06 13
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Raster data can be compressed. That is, you can use a data storage
scheme to reduce the amount of disk space required to store the data.
Bi-tonal raster data can be compressed to a greater degree than
grayscale or color data because the cell values of bi-tonal data can be
represented with a single biteither black or white, on or off, data or
no data. Grayscale and color raster data can also be compressed, but
with lesser compression ratios and at a higher processing cost to
support decompression. When raster data are compressed for storage,
it must be decompressed for display and other operations.
Vector data are in a format that represents map features with the x,y
coordinates of the features. Where a raster data set would represent a
feature by tagging all the cells that overlay the feature, a vector data set
would represent the feature by listing the coordinates of points along
it. Many GIS applications, such as parcel maintenance, demographic
analysis, or vehicle routing, require data in a vector format.
Raster data are unsuitable for these applications because, although the
raster and vector data may look the same displayed on a screen, raster
data have very different characteristics. Scanned raster data are simply
a "digital snapshot" of the source documenta scanned map has
pictorial information and limited connectivity to other data. ARC/INFO
georelational vector data, on the other hand, maintains the internal
spatial relationships of the features it represents and has far moreinformation than a simple picture. In addition, georelational vector data
have strong connections to other related data, such as tables stored in a
relational database management system (RDBMS).
8/10/2019 Arcs Can
16/62
Evaluating Scanning Data Entry
14 G-141/3.36.06
March 1994
The Georelational Model
Raster data can also be used to yield a vector representation of the
same map datathis process is called raster-to-vector conversion and
is a main topic of this white paper. Only recently, however, has
raster-to-vector conversion technology become affordable, reliable,
and widely available. These advances in both hardware and software
have made scanning data entry a feasible alternative.
Evaluating
Computer Hardwarefor ScanningData Entry
Scanning data entry has special hardware requirements. These
hardware capabilities support raster data characteristics such as large
data set size and cellular format. A scanning data entry system can
include several hardware components:
The Scanner Scanners to input hard copy paper maps or photos. Scanners take a"digital snapshot" of the source material and store this raster data on
disk. Advanced scanners offer on-screen graphical user interface
(GUI) control software to enhance ease of use. Scanners are available
at a variety of output resolutions, support a variety of media sizes, and
can output black-and-white, grayscale, and color images. Scanners
can output scanned images directly to work-station secondary storage
via a high bandwidth interface (e.g., SCSI). Scanners output data instandard raster data formats such as RLC (for bi-tonal data) or TIFF
(for grayscale data).
8/10/2019 Arcs Can
17/62
Evaluating Scanning Data Entry
G-141/3.36.06 15
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
A scanner is a device with a mechanical document feed that is set
above a row of cameras. The document feed can either be continuous,
as in a drum scanner, or direct feed, where the document feeds in the
front and comes out the back of the device. A light source and a glass
window are between the cameras and the document. The light source
is angled in such a way as to shine through the window and reflect off
the document. There is usually a white background behind the
document in the event that the media is transparent. The reflected light
enters the cameras that focus the image onto a charge-coupled device
(CCD). The CCD is a ceramic board with an imbedded array that
translates the presence or absence of light into digital form. The
output from the CCD is a value (usually from 0 to 255) that indicates a
level of gray. Usually, a grayscale value of zero indicates total
absence of light, while a value of 255 indicates complete light
saturation.
Scanners sense reflected light
values and store the reflectance
values as a digital image.
Scanner Basics
Media feedMechanism
Map orPhoto
CCD
Camera
GlassWindow
Digital Image Output
LightSource
White Background
Media
MovementLightSource
Scanner resolution is important. Optical resolution is the ability of
cameras in the scanner to discern data. As a rule of thumb, the optical
resolution of a scanner is expressed in this formula: optical resolution
in dpi = (number of cameras + 1) * 100. Thus, a scanner with three
cameras can offer an optical resolution of 400 dpi. Scanners can also
offer interpolated resolution, in which the data from the CCD are
resampled into smaller pixels. Thus, a scanner with optical resolution
of 400 dpi can also offer inter-polated resolution of 800 dpi. Thismethod can produce an output image with higher resolution, but not
necessarily with greater accuracy. In general, you should evaluate
scanners for GIS applications using optical resolution because GIS
8/10/2019 Arcs Can
18/62
Evaluating Scanning Data Entry
16 G-141/3.36.06
March 1994
data are too complicated to be adequately captured with interpolated
resolution data.
Positional accuracy is also an important consideration in GIS
applications that require accurate spatial data placement. Positional
inaccuracy can result from media slippage in the scanner document
feed mechanism or from miscalibrated cameras.
Decide on the minimum resolution and accuracy required by your
application. Then find the least expensive scanner that will meet those
needs. A less expensive scanner that provides 200 dpi optical
resolution is adequate for scanning photographs used as visual
backdrops. On the other hand, if your application must scan closely
drawn "tight" contours, you will need a scanner capable of at least 800
dpi optical resolution. And, as when buying any piece of equipment,
you should consider reliability, repair costs, maintenance agreements,
user support, and so on.
The Computer High-powered computer workstations capable of displaying andmanipulating raster data. Raster data require a bitmapped graphics
monitor for display. Even with bi-tonal data, a color monitor is useful
to support color overlay of vector data. If you are working with
grayscale data, you need at least an 8-bit color display in order to
support a color space that includes both the grayscale image and therest of the graphical user interface.
The workstation needs to have the CPU power required to manipulate
and display large amounts of data rapidlythis consideration should
not be underestimated, as slow response in handling large data sets
can adversely affect project productivity. One workstation can be
dedicated as a scanner and/or plotter server if warranted by high
usage, or the workstation can perform other functions.
8/10/2019 Arcs Can
19/62
Evaluating Scanning Data Entry
G-141/3.36.06 17
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
A Plotter Plotters capable of creating hard-copy output of raster data. Plottertechnologies that support output of raster data include electrostatic, ink
jet, and thermal transfer. Plotters can be color or black and white
(black-and-white plotters usually support grayscale output). Pen
plotters are not appropriate for raster data output. For optimum utility,
plotter software should be capable of combining raster and vector
data.
Data Storage Secondary storage devices, such as high-speed magnetic disk drives,or high-volume optical disk drives capable of storing large raster data
files. If you envision an on-line raster database you should be careful
to assess data storage needs carefully. To estimate your raster data
storage requirements, multiply the number of documents you need tohave available on your system at any one time by the raster data set
size for that document type in Table 1. Many applications need to
have only a limited amount of raster data on-line; other applications
want to develop a library of raster data for ad hoc access. Estimation
of your total storage requirements should include the storage needs of
system software, application software, and other types of data (e.g.,
vector data and editing copies of raster data).
A variety of technologies are available to ease the data storage burden.
First, tape archival can be used to simply take unneeded data off the
system. Some sites use a two-tiered approach to data storage in whichmore frequently accessed raster data are kept on high-speed magnetic
disks, and less frequently used data are migrated to optical media.
Optical systems can be purchased with software to perform data
migration automatically at off-peak times. Optical systems and
magnetic systems should be transparently usable as mountable file
systems, usually accessible through an open network access standard
such as a Network File System (NFS).
8/10/2019 Arcs Can
20/62
Evaluating Scanning Data Entry
18 G-141/3.36.06
March 1994
Networks High-speed local area networks (LANs) capable of transferring largeamounts of data at high rates of data throughput. LAN configurations
for distributed processing support scanning data entry by connecting
specialized computing machinery on a high-speed data pathway.
Today's LAN configurations can isolate high bandwidth raster data
traffic from other functions by creating subnets using network
bridges. LAN-based systems are modular and scalable.
Evaluating Softwarefor Scanning
Data Entry
Scanning data entry has special software requirements. Software
tools are used for managing, manipulating, and displaying raster data,
converting raster data to vector data and providing a graphical user
interface to make scanning data entry easy to do. Some of these toolsoperate in batch mode or "behind the scenes" with little user inter-
action. Their benefit is increased savings of people's time through
increased automation. Others require a higher level of user
interactiontheir benefit is the intelligent combination of machine and
human capabilities to attain higher productivity. Software functions
needed to support scanned data entry projects include the following:
Raster data management. The data produced by scanning
data entry must be organized in an orderly way. Georeferenced
data should be organized geographically. Data management
software can optimize storage and retrieval of raster data evenwhen data volumes are large. For integration with other systems
and scanners, the software should provide raster-to-raster data
conversion. Raster data management software can extract, edit,
and merge a raster data set from a raster database.
Software data compression. Software data compression can
minimize raster data storage requirements. A variety of industry-
standard data compression formats are available. Industry-
standard data compression formats include RLC and CCITT
Group 3 and Group 4. The CCITT compression standards are
implemented in the TIFF raster data standard. Eight to ten timesdata size reduction can be achieved with bi-tonal data. The amount
of actual data reduction will depend on the compression algorithm
used and the complexity of the data. Typically, denser more
8/10/2019 Arcs Can
21/62
Evaluating Scanning Data Entry
G-141/3.36.06 19
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
1
2
3
4
5
Example Scanning Data Entry Configurations
Simpler configuration:
UNIXWorkstation
MagneticDisk
SCSISCSI
More complex configuration:
Scanner
Scanner
Raster Plotter
Loc al Area Network
MultipurposeUNIX Workstation
Optical StorageDevice "Jukebox"
1
2
3
4
5
DedicatedUNIX Workstation
1
2
3
4
5
UNIX Server
8/10/2019 Arcs Can
22/62
8/10/2019 Arcs Can
23/62
Evaluating Scanning Data Entry
G-141/3.36.06 21
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
ArcScan georeferencingmenu supports
multiwindow visualinteraction.
Raster data display. Raster data can be displayed with full
control over display symbology and graphic overlay of vector
data. Background values in bi-tonal raster data can be displayed
transparently, thus allowing concurrent display of multiple raster
data sets. Software can alter the display characteristics of
grayscale and color images to suit the needs of the application.
The software provides direct output of raster data to raster-capable
plotters, thus enhancing output speed and reducing hard-copy
processing requirements. The software can merge raster and
vector data in the same hard-copy plot.
8/10/2019 Arcs Can
24/62
Evaluating Scanning Data Entry
22 G-141/3.36.06
March 1994
ArcScan raster editing menu. Raster data editing. The software provides capability to clean
up raster data with tools that work directly on the raster data
format. Raster editing is a common pre-processing step to raster-
to-vector conversion. For example, raster editing software can
remove speckling from raster dataand cleaner raster data are
converted to vector data with less post-processing.
Raster-to-vector data conversion. Software can provide an
array of tools for converting raster data to vector data. The tools
offer the flexibility to adapt to a wide variety of raster data. For
high-quality media, batch vector conversion software may be a
good choice. When source documents are lower quality or have
much clutter, interactive line-following software is often
preferable. When photos are scanned, heads-up digitizing can be
appropriate. Maps that are scanned to capture coordinate
information often have a wealth of feature attribute information as
well. This attribute data can be interactively captured during
scanning data entry procedures if the scanning software tools are
well integrated with other editing functions.
The reason that raster-to-vector conversion requires special
algorithms to overcome problems posed by noise and clutter is that
the raster data format is simply a pictorial representation of the
data. That is, when the conversion software examines the raster
data, it can see only pixels of black and whiteit cannot see whatthe raster "picture" is supposed to represent. Thus, conversion
software will attempt to make vector lines out of all data it
encounters, even if the "lines" are really the pen strokes that make
up annotation. Conversion software deals with this kind of
problem in many different ways, but human intervention is often
necessary.
GUI interface and software integration. Ease-of-use is
greatly enhanced through a Graphical User Interface (GUI) that
provides point-and-click control of software functionality. For
highest efficiency, the conversion software should be integratedwith other raster and vector editing functions in a common
software environment. By integrating scanning data entry
technology in a common software environment, more software
functions are available and access to more types of data is possible
8/10/2019 Arcs Can
25/62
Evaluating Scanning Data Entry
G-141/3.36.06 23
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
at each processing step. It is easier to learn an integrated software
system that has a consistent and attractive "look and feel."
Evaluating ScanningData Entry
Methodologies
Scanning data entry methods vary as widely as the applications in
which they are used. Some of these methods are unique to scanning
data entry; others take advantage of data and software integration to
bring additional functionality scanning data entry projects. Noise
removal and clutter removal are pre-processing methods. Pre-
processing prepares the raster data for the raster-to-vector conversion
step.
Noise removal. Scanned maps will have a certain amount of
noise. The lower the map quality, the higher the noise content.
Noise is data that do not have informational content. For example,
a common type of noise is tiny spots called speckles that are an
artifact of the scanning process. The speckles are unwanted in the
final vector output. Various methods can be used to remove noise
from the image so that vectorization can proceed. These methods
are collectively called noise removal.
Clutter removal. Even a high-quality map may have unwanted
data on it, such as annotation. Often maps show more than one
data type or layer. For example, a map might show parcelboundaries, easements, and street names. When only the parcels
are to be vectorized, the easements and street names are clutter.
Scanning data entry has developed methods for clutter removal,
such as raster editing tools that "white out" indicated areas.
Software filters that remove data below a given threshold size can
be useful for removing the relatively small lines that make up
lettering (annotation).
Post-processing. Once the raster-to-vector conversion has
been performed, the output vector data can be post-processed.Post-processing is usually accomplished with vector editing
software to correct output vector data. The more closely the post-
processing software is integrated with the vectorization software,
the more efficient the entire vectorization process becomes.
8/10/2019 Arcs Can
26/62
8/10/2019 Arcs Can
27/62
Evaluating Scanning Data Entry
G-141/3.36.06 25
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
uncluttered data. Batch vectorization will require post-processing.
Since batch vectorization proceeds without operator intervention
and vectorizes all data input, low-quality, cluttered data can reduce
its efficiency by requiring high levels of post processing.
Typically, the user tests a variety of vectorization parameters to
find the combination of parameters best suited to the data, and then
initiates batch vectorization using those parameters. The major
advantage of batch vectorization is that once parameters are set,
vectorization can proceed unattended and produce an output vector
data set in much less time than other methods. Since all the maps
of a single map series tend to be alike and can use the same
parameters, batch vectorization is well suited to projects that scan a
large number of similar maps.
Both interactive vectorizers and batch vectorizers can deal with a
variety of cartographic problems such as dotted line symbolism.
The choice of which vectorizer to use is mostly data dependent.
Having both methods available broadens the array of available
tools. Vectorizers of both kinds work only with bi-tonal raster
datathey cannot presently be used with grayscale or color raster
data.
Feature attribute capture. Line followers and batchvectorizers both output lines in vector format. These methods are
good for automating the line symbology on maps. Maps can also
contain other data such as point symbols and annotation. The
symbols and annotation are often attribute data associated with a
point, line, or area feature on the map and may be of value to the
GIS database. The informational content of the point symbology
and annotation can be captured by methods that combine heads-up
digitizing with data editing capabilities. Advanced scanning data
entry software will allow the user to point at symbology or
annotation and attach its information to a feature. In a single
integrated software environment, the user can take advantage ofimaging capabilities, vector editing capabilities, and forms entry
capabilities, all within an application tailored to capturing data
from a specific map series. Operator intervention and some key
entry is required, but this method can be an efficient way to
8/10/2019 Arcs Can
28/62
Evaluating Scanning Data Entry
26 G-141/3.36.06
March 1994
capture attribute information along with coordinate information in
one handling of the source document.
Heads-up digitizing. Here, the user captures coordinate
information by tracing features directly from data displayed on the
screen. A digitizing table is not used by all. Hence the term
"heads-up"an allusion to the heads-up instrument display
technology developed for aircraft pilots.
Heads-up digitizing is a scanning data entry method that is
commonly used when coordinate accuracy need not be at
engineering levels. As the operator digitizes from the screen,
output accuracy is determined by the accuracy of the source
document, the resolution at which it was scanned, the resolution at
which it is displayed, the resolution of the screen itself, and the
skill of the operator. Usually greater accuracy can be attained by
other methods. Heads-up digitizing is usually performed on
georeferenced raster data sets.
Heads-up digitizing is most commonly performed using scanned
images and thus can be a way to quickly capture the most current
information. Vectorizing methods are the most common way to
capture information from scanned maps, although heads-up
digitizing can be used with scanned maps.
Orthophoto production. Scanned stereopair images can be
digitally orthocorrected to produce a digital orthophoto. Optically
corrected orthophotos can be scanned directly into digital
orthophotos. A digital orthophoto can be plotted at virtually any
scale. For example, a series of 30 x 30 orthophotos can be
scanned, merged into a common database, and reproduced on a
plotter without reference to the original size, coverage, and format
of the hard-copy photos. While output scale can be changed
freely, care should be taken not to "blow up" the image too much
or the output image will appear blocky. Orthophotos can also beplotted with overlays of vector data.
Combining scanned data with other data. The data
produced through scanning data entry may not be the only data of
8/10/2019 Arcs Can
29/62
Evaluating Scanning Data Entry
G-141/3.36.06 27
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
interest. For example, scanned and vectorized data can be fit into
a geodetic control network created by coordinate geometry
(COGO) data entry. This technique can be used to localize, or
bound, error in scanned plats. Vectorized data can be combined
with purchased or table-digitized vector data. Accurate
georeferencing is very important when using scanned data with
other data.
Scanned data also have attributes. Adding attributes to scanned
data is usually necessary. For example, scanned contours need to
be tagged with their elevation values for use in digital terrain
projects. If scanning data entry software is integrated with other
GIS software tools, any or all of those tools can be made part of
the scanning data entry and data automation process.
Incremental data automation. Some organizations have
adopted an incremental approach to database creation. Incremental
methods include in-house staff digitizing on a time-available basis,
addition of digital data from other sources, such as CAD data from
land developers, or conversion of other digital data such as legal
descriptions. Incremental database generation approaches can
work. But, since any GIS must have data in order to be effective,
it can be wise to input some data immediately so as to demonstrate
immediate GIS benefits.
Scanning data entry supports incremental database development.
With scanning data entry, a raster database can be produced
quickly by scanning maps or air photos and georeferencing the
scanned images to real-world coordinates. The georeferenced
raster data can provide immediate benefit as a visual backdrop to
other data and as a data source for vector conversion. The raster
database can provide complete seamless coverage for an agency's
entire area of responsibility (e.g., a city or a county), and vector
conversion can proceed incrementally, on an as-needed or highest-
need basis.
8/10/2019 Arcs Can
30/62
Evaluating Scanning Data Entry
28 G-141/3.36.06
March 1994
ESRI Scanning Data
Entry Procedures
ESRI's experience as a user of scanning data entry technology enables
us to share a very practical viewpoint. The ESRI Database Develop-
ment Group has used scanning data entry for many projects. The
Digital Chart of the World project, performed by ESRI as the prime
contractor to the Defense Mapping Agency, used scanning data entry
to develop a 2.8 gigabyte vector database from over 2,800 source
documents. The ESRI Database Development Group has wide
experience with scanning data entryand much of this experience is
reflected in this chapter.
The group adapts scanning data entry technology and methods for
each project, as determined by the project goals and available source
documents. Even so, a standard processing sequence has evolved.This processing sequence is shown in the following flowchart. Note
that some sort of pre- and/or post-processing is an assumed
requirement. Scanning data entry can reduce data automation time
requirements, but it will not eliminate them.
8/10/2019 Arcs Can
31/62
Evaluating Scanning Data Entry
G-141/3.36.06 29
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Find or draft georeference marks(tics) on source documents
Hand prep source documents
Determine scan resolution Determine proposed
processing flow
EvaluateProjectGoals
EvaluateSource
Documents
Scanning Data Entry Procedures
Maps Air photo s
Post process(vector cleanup)coverage data
Test and set scannerparameters
Heads-up digitize
ScanDocuments
Pre-process (rasteredit) raster data
Vectorize
8/10/2019 Arcs Can
32/62
8/10/2019 Arcs Can
33/62
G-141/3.36.06 31
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
ESRI Scanning DataEntry SolutionsA GIS Focus
ESRI's experience as a manufacturer and user of GIS
technology enables us to view scanning data entry fromthe special perspective of GIS needs and requirements.
GIS requirements are different from CAD and engineer-
ing requirementsESRI solutions are based on GIS
requirements. For example, GIS data typically depict
irregular features from the real world, not the right
angles and straight lines of CAD drawings.
ESRI provides a wide range of products and services that can support
GIS scanning data entry projects. ESRI brings its experience in GIS
and scanning data entry together in a software package calledArcScan. ArcScan is specifically designed to support GIS scanning
data entry projects. ArcScan is a fully integrated extension to
ARC/INFO, and takes advantage of ARC/INFO's complete GIS
functionality.
ARC/INFO itself provides much ancillary capability to scanning data
entry projects, including vector data editing and management.
ARC/INFO provides additional raster data support through its bundled
IMAGE INTEGRATOR capabilities. ARC/INFO software extensions
for surface modeling, raster data modeling, network modeling, and
coordinate geometry can all play a part in scanning data entry projects
because these capabilities are available in the common ARC/INFOsoftware environment.
ArcScan, ARC/INFO, ArcView, ArcCAD, third-party software
integrated with ESRI software, the ArcData program, ESRI services,
8/10/2019 Arcs Can
34/62
ESRI Scanning Data Entry SolutionsA GIS Focus
32 G-141/3.36.06
March 1994
and the ESRI hardware reseller program can be flexibly matched to the
exact requirements of your scanning data entry project and your GIS
implementation.
ArcScan ArcScan is a set of software tools that support data automation usingscan digitizing. These tools permit GIS applications to automate
vector databases using scanned raster data sets as input. ArcScan is
an extension to ARC/INFO and is fully integrated with the ARC/INFO
software environment. The ArcToolsArcScan menu system
provides an easy-to-use interface for raster editing and interactive
vectorization. ArcScan includes a Users Guideand Command
Referencethat describe ArcScan capabilities to users.
ArcScan Capabilities ArcScan users can create ARC/INFO coverages by extracting linefeatures from scanned monochrome document images. This is done
using interactive, automated line-following software within
ARCEDIT. Examples of linear features that can be extracted and
added to a coverage include street lines, utility lines, contour lines,
parcel boundaries, and soil polygon boundaries. This technique for
digitizing features is simpler, more accurate, and often faster than
traditional manual and heads-up digitizing.
ArcScan provides tools for editing monochrome, grayscale, and
pseudo-color single-band imagery within ARCEDIT.
ArcScan provides a powerful and efficient set of tools in ARC/INFO
for importing, correcting, editing, plotting, and exporting scanned
raster images. ArcScan supports industry-standard raster data formats
and can accept data from many types of scanners.
ArcScan Components ArcScan functional components include raster database constructiontools, raster pre-processing tools, integrated raster-to-vector editing
tools, and an interactive raster-to-vector conversion tool. ArcScan
soft-copy and hard-copy raster display is provided using standard
ARC/INFO display functionality. ArcScan tools work with the highly
efficient ARC/INFO grid raster data format.
Raster DatabaseConstruction Tools
Scanned raster images in a variety of standard formats (e.g., TIFF,
RLC, SunRASTER) and compressions (e.g., Run Length
Compressed, CCITT Group III, CCITT Group IV) can be converted
8/10/2019 Arcs Can
35/62
ESRI Scanning Data Entry SolutionsA GIS Focus
G-141/3.36.06 33
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
to and from compressed grids using the IMAGEGRID and
GRIDIMAGE conversion tools. The GRIDMERGE tool can be used
to build a raster database from georeferenced input grid raster data
sets. The ArcScan conversion tools transfer runs of data directly,
rapidly converting large scanned monochrome documents.
Geometric Correctionand Noise
Removal Tools
The ArcScan raster pre-processing tools prepare raster data sets for
additional processing. A set of geometric correction tools can be used
to correct orientation errors during scanning, distortions in the source
document, and georeferencing. These tools perform the following
operations:
Rotate a grid by a multiple of 90 degrees. Commonly used when
documents are scanned sideways. Flip the contents of a grid from top to bottom. Commonly used
when documents are scanned upside down.
Mirror the contents of a grid from left to right. Commonly used
with translucent documents that are scanned wrong side up.
Correct a skewed document by converting a user-specified
parallelogram on the input document to a rectangle on the output
document. A common distortion encountered in scan digitizing is
a skew caused by paper feed that is not perfectly aligned.
Apply a warping transformation to georeference a grid to real-world coordinates.
The following noise cleanup tools can be applied to either the entire
image or a selected image area:
Remove specks of black noise from a scanned image. Most
scanned documents will show speckling to varying degrees.
Apply a majority rule filter to a scanned image. Commonly used
to correct dropout in noisy scanned lines.
ArcScan raster datamanagement menu.
8/10/2019 Arcs Can
36/62
ESRI Scanning Data Entry SolutionsA GIS Focus
34 G-141/3.36.06
March 1994
Integrated RasterVector Editing Tools
Grids can be edited in conjunction with coverages. The ARC/INFO
software environment supports editing monochrome, grayscale, and
pseudo-color raster data. Users can edit multiple grids during an
ARCEDIT session. ARCEDIT supports full pan and zoom display in
map coordinate space of both raster and vector data. A multilevel
undo capability provides the user with a "safety net" during raster
editing. The user can concurrently edit both raster and vector data.
The editing tools include
Filling Tools: Fill the interiors of user-defined boxes, circles,
and polygons. Fill a connected entity of pixels by pointing to any
pixel in the entity.
Drawing Tools: Rasterize the boundaries of user-defined
boxes, circles, and polygon.
Pixel Editing Tools: Set and query the value of individual
pixels by pointing to the screen.
Brush Tool: Change the value of pixels in the grid by dragging
a brush on the screen.
Rasterization Tools: Rasterize the selected set of arcs into theedit grid.
Selection Tools: Select all cells within a box, circle, or
polygon.
Geometric Operations: Move, rotate, flip, and mirror the
selected region.
Filtering Operations: Despeckle, smooth, or enhance the
selected region.
Georeferencing Tools: These tools allow a user to
interactively position and rescale the edit grid in map coordinate
space, deskew the edit grid, and warp the edit grid using a link
coverage created in ARCEDIT. Georeferencing is accomplished
8/10/2019 Arcs Can
37/62
ESRI Scanning Data Entry SolutionsA GIS Focus
G-141/3.36.06 35
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
by identifying common points in the raster data set and in real-
world coordinates.
The mouse is used to register theimage to real-world coordinates.
InteractiveVectorization Tools
With ArcScan you can extract the centerlines of linear features from a
raster document with optimized user intervention. Interactive raster-
to-vector conversion using an automated line-following, or line-
tracing, tool is especially useful for selective raster-to-vectorconversion from a raster data set with multiple data layers. The
ArcScan line tracer, because of its high degree of user control, can
also vectorize complex and difficult data. With the line tracer tool you
can efficiently produce high-quality vector output. The trace tool
performs automatic intersection straightening and automatic line
generalization based on user parameters.
The ArcScan Tracing Tool worksin the ARCEDIT environment.
The trace tool snaps to the center of a raster line. Tracing begins atthat point, stopping at junctions to obtain user input. The user
interacts with the tracer using the mouse or keyboard, and controls the
direction taken by the tracer at the junction. Tracer features include the
ability to jump gaps, the ability to snap to the center of a heavy raster
8/10/2019 Arcs Can
38/62
ESRI Scanning Data Entry SolutionsA GIS Focus
36 G-141/3.36.06
March 1994
line, and smart retrace. Built-in junction memory prevents retrace of
explored paths, offering improved line tracing efficiency. Line
following can be done with user interaction, or in fully automatic
mode, all connected line work within a defined area can be vectorized
without operator intervention. The line tracer can work from bi-tonal
ARC/INFO grid or RLC raster data.
ArcScan provides automaticcleanup of line intersections
guided by user set parameters.
Because the line tracer is built into ARCEDIT, you can interleave
manual digitizing with line following, enabling more productive
processing of noisy data. You can use a menu interface for heads-up
digitizing to tag features with attribute data shown in the raster
document. The line trace tool works with the multiple windowing
capability of ARCEDIT. ArcScan automatically moves a close-up
view of the tracing activity, following the tracing activity even as itmoves out of view.
ARC/INFO ARC/INFO is a full-feature GIS capable of meeting the complexrequirements of a wide variety of GIS applications. All the
capabilities of ARC/INFO can be applied to scanning data entry
projects. One of the most important features of ARC/INFO for
scanning data entry projects is ARC/INFO software's user interface
environment, ArcTools. ArcTools organizes ARC/INFO software's
thousands of GIS tools and provides a single look and feel to
ARC/INFO functionality, including the ArcScan extension.ARC/INFO capabilities can be fully customized using the ARC Macro
Languageprocessing procedures for scanning data entry can be
tailored to the specific needs of the GIS application.
8/10/2019 Arcs Can
39/62
ESRI Scanning Data Entry SolutionsA GIS Focus
G-141/3.36.06 37
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
The integrated ARC/INFO software environment makes allARC/INFO functionality immediately available. This offers a
scanning data entry project the ability to take advantage of ARCEDIT
vector and attribute editing capability. This ARC/INFO data
automation functionality can be added to the techniques specific to
scanning data entry. The final result of scanning data entry is a
topologically correct ARC/INFO georelational database that can
support GIS analysis and advanced display.
An important feature of ARC/INFO software's integrated environment
is that all editing and data entry functions can use the ArcStorm
database manager. ArcStorm (ARC/INFO STORage Manager)provides feature-level access to seamless spatial databases. ArcStorm
supports production-level data entry by multiple users.
Because scanning data entry is implemented in the ARCEDIT
environment, advanced spatial editing features can be utilized. These
advanced features include editing of complex and user-defined
features and interactive topology creation. In short, once the edit
session is over, no further processing is required.
The IMAGE INTEGRATOR functionality included with ARC/INFO
provides image conversion, management, and display capabilities fora wide variety of image data (see Table 2), including industry-standard
formats used by most scanner vendors. IMAGE INTEGRATOR
brings image handling capability to scanning data entry projects and
provides such benefits as concurrent raster and vector display for
heads-up digitizing and raster plotter support for hard-copy
production. These images can be easily converted into the user-
selected format of preference, geo-referenced and kept in an
ARC/INFO Image Catalog as a seamless raster database.
8/10/2019 Arcs Can
40/62
8/10/2019 Arcs Can
41/62
ESRI Scanning Data Entry SolutionsA GIS Focus
G-141/3.36.06 39
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
Importantly, the IMAGE INTEGRATOR can also display images that
do nothave the inherent geographic component that maps and satellite
images do. This type of image can also be scanned from input
sources such as photographs, textual documents, and video input.
This type of image cannot be georeferenced and is commonly used as
a pictorial attribute of a coverage feature. A DBMS capable of storing
Binary Large Objects (BLOBs) can be used to manage this type of
image. ARC/INFO can access BLOB data in a DBMS.
Video image attribute, stored as aBLOB in an external DBMS
table, can be displayed with theIMAGE INTEGRATORcommand IMAGEVIEW.
Scanned document informationcan also be stored as a BLOB and
displayed using theIMAGEVIEW capability.
8/10/2019 Arcs Can
42/62
8/10/2019 Arcs Can
43/62
ESRI Scanning Data Entry SolutionsA GIS Focus
G-141/3.36.06 41
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
grabbers can also be displayed during an ArcCAD session. Scanning,
raster-to-vector conversion, and raster editing are also available.
ArcView ArcView, ESRI's desktop GIS display and query software, is capableof viewing all the image formats supported by ARC/INFO. ArcView
can play a role in a scanning data entry project as a "quick look" tool
to view and verify scanned data. ArcView can display raster and
vector data simultaneously and can provide quick output of raster
graphics in industry-standard graphic formats such as PostScript.
ESRI Services ESRI provides a full range of services including training, databaseautomation, application development, and on-site technology transfer.
Since 1969, ESRI has supported hundreds of organizations
throughout the world in the design, development, and implementation
of GIS. ESRI's services support the complete GIS life cycle,
including implementation planning, system integration, database
development, application development, and system operation. ESRI
is unique within the GIS industry in its ability to provide such
comprehensive services in combination with a complete set of leading
GIS software.
Working with the leading hardware vendors, ESRI has successfully
provided turnkey geographic information systems to hundreds of
clients.
ESRI offers new ArcScan users an on-site ArcScan Start-up Support
Package. This package is two days of consulting and technical
training support by an ESRI technical analyst to help the new ArcScan
user implement this technology. The goal of the support package is to
transfer the skills necessary to begin using the ArcScan software in a
production environment. The subjects covered are document
preparation, system initialization, ArcScan software usage, quality
assurance techniques, and data structure considerations for scanning.
The ArcScan Start-up Support Package is provided at your site using
your equipment.
8/10/2019 Arcs Can
44/62
ESRI Scanning Data Entry SolutionsA GIS Focus
42 G-141/3.36.06
March 1994
ArcData ArcData is ESRI's program for providing published spatial andgeographically related digital data in ARC/INFO supported formats.
Existing off-the-shelf vector and raster databases can complement
scanning projects. The ArcData program includes satellite and other
imagery. Through the ArcData program, leading data vendors such as
EOSAT, Spot Image, and Hughes STX can provide imagery for
locations worldwide that can be used in scanning data entry projects.
Supported Devices As mentioned previously, ARC/INFO supports industry-standardraster data formats. For a scanner to be usable with ARC/INFO it
must output raster data in one of these standard formats, preferably
TIFF or RLC. Other factors, such as direct output to the UNIX file
system and UNIX-based controller software also bear on scannerease-of-use. ArcScan has been tested with scanners from leading
manufacturers. The section on evaluating scanning data entry
provides more information on scanner features that are important.
ESRI's machine-independent philosophy allows ARC/INFO users to
take advantage of new hardware developments as they occur.
ESRI defines level of support for peripheral devices, such as scanners
and plotters, using a numerical classification system. TheARC/INFO
Users Guide, Supported Devices, UNIX Workstations (Rev. 7.0)
provides classifications for specific scanner, plotters, and other
devices. The classification categories are outlined below.
Classes of SupportedDevices
There are five classes of support for supported devices:
Class 1: Fully
Supported,
In-House at ESRI
Class 1 devices have been tested at ESRI, run successfully with
ARC/INFO, and have an interface (driver, interface file, etc.)
provided with ARC/INFO Rev. 6.1.2. Any problems that occur with
these devices can be tested on-site because the device must remain on
ESRI premises for it to remain a Class 1 supported device. This is the
highest level of support.
8/10/2019 Arcs Can
45/62
8/10/2019 Arcs Can
46/62
ESRI Scanning Data Entry SolutionsA GIS Focus
44 G-141/3.36.06
March 1994
Class 5: Unknown The status or ability to interface this device is unknown at the time of
this publishing. New devices as well as devices not included in this
document that have not yet been tested fall into this category.
Class 6: Not
Supported
These devices will not work with ARC/INFO and therefore cannot be
supported. In most cases, unsuccessful attempts have been made to
interface these devices.
Citations Litton, Adrien L. "Automated Data Capture: The ScanningSolution," Proceedings, 1993 ARC/INFO User Conference
ARC/INFO: GIS Today and Tomorrow, ESRI White PaperSeries, September 1992
ArcCAD, The Integration of CAD and GIS, ESRI White Paper
Series, April 1993
Supported Devices Guide, UNIX Workstations
(ARC/INFO Rev. 7.0)
WorkStation ARC/INFO Technical Guide to Hardware Options
ARC/INFO Users Guide,Cell-based Modeling with GRID
A Wide Range of High-Quality Support for All Your GIS Needs,
ESRIBrochure
8/10/2019 Arcs Can
47/62
G-141/3.36.06 45
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
GlossaryDefinitions of key terms that will helpyou to understand the concepts discussedin this document.
ARCEDIT ARCEDIT is the ARC/INFO environment for editing coveragecoordinate data and descriptive data. Its sophisticated graphic andediting capabilities provide the tools necessary for accurate data entry
and manipulation. These capabilities are important for creating and
maintaining geographic databases.
ARCPLOT ARCPLOT is the ARC/INFO environment for providing cartographictools for all of your ARC/INFO mapping needs, including full
cartographic design, display, and production capabilities.
ArcScan ArcScan is the ARC/INFO extension that provides capabilities tosupport scanning data entry. ArcScan is closely integrated with other
ARC/INFO functionality, particularly IMAGE INTEGRATOR and
ARCEDIT.
ArcTools ArcTools is a graphical user interface (GUI) environment providedwith ARC/INFO. ArcTools provides a consistent menu interface to
ARC/INFO and its subsystems and extensions.
attribute An attribute is a characteristic of a map feature described by numbersor characters, typically stored in tabular format, and linked to the
feature by a user-assigned identifier. For example, attributes of a
well, represented by a point, might include depth, pump type, and
8/10/2019 Arcs Can
48/62
Glossary
46 G-141/3.36.06
March 1994
owner. Feature attribute information is often present in source
documents as symbology or annotation. For example, a plat map may
show Parcel Identification Numbers (PINs).
attribute table Attribute tables are tabular, flat, or relational files directly associated tothe spatial data and form the "relational" half of the georelational data
structure. ARC/INFO functions maintain the integration of spatial and
attribute data in feature attribute tables. Additional attribute
information may be kept in external attribute tables, perhaps
maintained as tables in an RDBMS.
bandwidth Bandwidth is a way of expressing how much data can be transmittedacross a communications medium at any one time. The higher the
bandwidth, the more activity the communications channel can support.
For example, local area networks have higher bandwidth than serial
communications links. Bandwidth is often measured in megabytes
per second.
bit The smallest unit of information that can be stored and processed in acomputer. A bit has two possible values, 0 or 1, which can be
interpreted as BLACK/WHITE or ON/OFF. Bi-tonal data can be
compressed into images that represent cell values with a single bit.
bi-tonal Bi-tonal, as applied to raster data sets, means the raster data have onlytwo possible values. Tonal quality is the brightness value, therefore
bi-tonal data have only two values, black and white.
black noise
Black noise, or addition of data.
Black noise is pixels with black values where the original information
content had white values. This often has the appearance of speckling,
tiny black spots on a white background. Noise is data in an commu-
nication channel that is random or has no informational content.
Noise is usually caused by low data quality and is unwanted because
extra pre- or post-processing can be required to remove it. Blacknoise is the addition of data where none should exist. Seewhite
noise.
8/10/2019 Arcs Can
49/62
Glossary
G-141/3.36.06 47
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
BLOB Binary Large Object. A term often used with database managementsystems. Any large data set handled as binary data; often BLOBs are
raster image data.
categorical data Categorical data consist of values representing discrete categories,such as soil or vegetation type. Also referred to as nominal data.
CCD Charge-coupled device. A CCD is the electronic instrument used inscanners to sense brightness values. CCDs are usually capable of
distinguishing and outputting grayscale data that have a maximum of
256 levels of gray.
cell The basic element of spatial information in a grid data set. Cells arealways square. A group of cells forms agrid.
1
234567
Y-axis
X-axis(0,0)
Rows
Columns
Upper left corner
Value1
234567
Count8
114
125
3010
Cover-TypeW Pine
D FirMixedGrassWaterPavedAgriculture
}Cell size
cell based See raster.
8/10/2019 Arcs Can
50/62
Glossary
48 G-141/3.36.06
March 1994
clutter
129
In this example, the annotation"129" is clutter. It overlays theline and will interfere with
vectorizing.
Unwanted data on a scanned map. Clutter, unlike noise, may have
informational contentbut not the information sought by the data
entry process. Clutter, like noise, can require extra pre- or post-
processing in order to remove it. Line-following raster-to-vector
converters are efficient at dealing with clutter because they utilize
human capabilities to discern clutter from desired data. Annotation
that overlays line work is a common type of clutter.
COGO 1. Coordinate geometry. Software that uses legal descriptions andsurvey information to create spatial vector data.
2. An ARC/INFO software extension.
continuous data Continuous data consist of values representing samples from acontinuous surface, such as elevation values. Also referred to as
ordinal or ratio data.
control point A control point is a location on the image or map having known real-world coordinates. Control points are also called registration marks,
or tics.
coordinates An expression of location in space by the provision of pairs ofnumbers that indicate offset from a known starting point. X,y
coordinates are an expression of position in Cartesian space. A
common coordinate, or georeferencing system, is a requirement forthe concurrent use of different types of data.
corrected photo See orthophoto.
8/10/2019 Arcs Can
51/62
Glossary
G-141/3.36.06 49
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
coverage A digital analog of a single map sheet forming the basic unit of vectordata storage in ARC/INFO software. In a coverage, map features are
stored as primary features, such as arcs, nodes, polygons, and label
points; and secondary features, such as tics, extent, links, and
annotation. Map feature attributes are described and stored
independently in feature attribute tables.
data automation The process of converting analog data such as maps, to a digitalrepresentation of the same information.
data model A data model is a formal method for arranging data to represent thebehavior of real-world entities. Fully developed data models describe
data types, integrity rules for the data types, and operations on the data
types. ARC/INFO software uses a georelational data model, a hybrid
data model that combines spatial data (in coverages and grids) and
attribute data (in tables). ARC/INFO's integrated data model allows
easy conversion between, and concurrent use of, raster and vector
data.
data quality In the context of scanning data entry, data quality refers to the qualityof the source document, that is, the media itself. Data quality does notrefer to the informational veracity, accuracy or precision of the data on
the media. Thus, a well-used, folded, wrinkled, and stained third-
generation blue-line map has less data quality than a new Mylar
overlay map having crisp, high-contrast line work.
DBMS Database management system; often a relational database managementsystem. A DBMS is the collection of software required for using and
manipulating a tabular database, and presenting multiple, different
views of the data. DBMS can also manage Binary Large Objects. SeeBLOB.
8/10/2019 Arcs Can
52/62
Glossary
50 G-141/3.36.06
March 1994
dpi Dots per inch. Dpi is a common measure of resolution in scanners.The more dots per inch (sampling rate) a scanner has, the greater the
resolution.
dropout Dropout is an artifact of the scanning process that results in the loss ofdata where they should exist, such as pixel thinning in line work. See
white noise.
georeference To georeference is to establish the relationship between an image(row, column) coordinate system and a map (x,y) coordinate system.
Georeferencing is accomplished establishing control points that can beidentified in both coordinate systems, then creating the displacement
vectors, or links, between the control points. For example, once a
raster data set is georeferenced to a vector coverage, the raster and
vector data should overlay or register.
georegister See georeference.
georelational data
model
A hybrid data model used to represent spatial features. The
georelational data model encompasses coordinate, topological (geo),and feature attribute (relational) information.
GIS A geographic information system (GIS) is an organized collection ofcomputer hardware, software, geographic data, personnel, and
procedures designed to efficiently capture, store, update, manipulate,
analyze, and display all forms of geographically referenced
information. Complex spatial analysis is possible with a GIS that
would be difficult, time-consuming, or impracticable otherwise.
GPS Global positioning system. A system of geostationary satellites,ground receivers, and associated software that provides an
electronically instrumented means of determining position on the
earth.
8/10/2019 Arcs Can
53/62
Glossary
G-141/3.36.06 51
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
grid 1. A raster geographic data set for use with ARC/INFO software.Each grid cell is referenced by its geographic x,y location. Cells
store values. ArcScan functionality operates on the ARC/INFO
grid data structure for most operations.
2. One of many data structures commonly used to represent map
features. A raster-based data structure composed of cells of equal
size arranged in columns and rows. The value of each cell, or
group of cells, represents the feature value. (Also called
"Raster.")
1
0 0 0 0 0 0
0 0 0 0 0 0
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Point features
1
0 1 0 0 0 0
1 1 3 3 3 0
1 1 0 0 3 0
1 0 0 0 0 0
1 2 2 2 0 0
1 0 0 0 0 0
Line features
2
3
1
1 1 2 2 2 2
1 1 2 2 2 2
1 1 1 2 2 2
1 1 3 3 3 3
1 1 3 3 3 3
1 1 3 3 3 3
Area features
2
3
Coordinate
Grid
GRID An ARC/INFO software product that provides a fully integratedraster- or cell-based geoprocessing system for use with ARC/INFO.
GRID supports a map-algebra spatial language allowing sophisticated
spatial modeling and analysis.
grid cell A discretely uniform unit that represents a portion of the earth, such asa square meter or square mile. Each grid cell has a value that
corresponds to the feature or characteristic at that site, such as a soil
type, census tract, or vegetation class. Seepixel.
GUI Graphical user interface. A highly visual and interactive method forsupporting human-computer interaction.
8/10/2019 Arcs Can
54/62
Glossary
52 G-141/3.36.06
March 1994
heads-up digitizing The process of using a high-resolution, bit-mapped display and mouseto automate vector data by tracing features shown as an image on the
screen.
image A graphic representation or description of an object that is typicallyproduced by an optical or electronic device. Common examples
include remotely sensed data such as satellite data, scanned data, and
photographs. An image is stored as a raster data set of binary or
integer values representing the intensity of reflected light, heat, or
another range of values on the electromagnetic spectrum. Seeraster.
image catalog An image catalog is an organized set of spatially referenced, possiblyoverlapping, images that can be accessed as one logical image.
ARC/INFO IMAGE INTEGRATOR can use image catalogs for raster
data in formats such as TIFF or RLC. The ARC/INFO GRID data
structure does not use image catalogs.
IMAGEINTEGRATOR
A collection of image management and display tools in ARC/INFO
that allows vector and raster data to be displayed concurrently. Image
integrator commands are used to georeference and rectify images to
real-world coordinates, display images, and manage image catalogs.
image-to-worldtransformation
Image-to-world transformation is the transformation between image
locations and real-world or map coordinates.
interpolatedresolution
A method employed by scanner vendors to increase output resolution
by use of softwarethat is, each input pixel is interpolated to produce
more output pixels. Interpolating pixel values will not improve the
informational content of the original scanned data and is usually not aneffective method for GIS applications. See optical resolution.
8/10/2019 Arcs Can
55/62
Glossary
G-141/3.36.06 53
Environmental Systems Research Institute, Inc. (909) 793-2853380 New York Street, Redlands, CA 92373 Fax (909) 793-5953
Telex 910 332 1317
LAN 1. Local area network. Computer data communications technologythat connects computers at the same site. When computers are on
a LAN, they can share data and other computer resources, such as
printers and plotters. LANs are composed of cabling and special
data communications hardware and software.
2. An ERDAS image processing system file type