Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | ashlie-beatrice-casey |
View: | 217 times |
Download: | 3 times |
Collection and Preservation of At-Risk Digital Geospatial Data:
the North Carolina NDIIPP Project Partners:
NCSU LibrariesProject Lead: Steve Morris
NC Center for Geographic Information & AnalysisProject Lead: Zsolt Nagy
LCFS Database Group May 30, 2005
Note: Percentages based on the actual number of respondents to each question 2
Project Context
Partnership between university library (NCSU) and state agency (NCCGIA)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory informationObjective: engage existing state/federal geospatial data infrastructures in preservation
Note: Percentages based on the actual number of respondents to each question 3
Targeted Content
Resource TypesGIS “vector” (point/line/polygon) dataDigital orthophotography Digital mapsTabular data (e.g. assessment data)
Content ProducersMostly state, local, regional agenciesSome university, not-for-profit, commercialSelected local federal projects
Note: Percentages based on the actual number of respondents to each question 4
NC Local GIS Landscape
100 counties, 92 with GIS80 counties with high resolution orthophotography65+ counties with unique map servers.Growing number of municipal systemsValue: $162 million plus investment
Note: Percentages based on the actual number of respondents to each question 5
NC OneMap Initial Data Layers Produced by Cities and Counties
0%
20%
40%
60%
80%
Ortho Cadastral Roads Municipal Bnd.County Bnd. ETJs Surface Waters ElevationLand Use Airports Schools UniversitiesHospitals Storm Surge Police Stations Fire StationsLandfills Watersheds Wetlands Hazardous Disposal SitesBuilding Footprints Future Land Use Water Lines Sewer Lines
Note: Percentages based on the actual number of respondents to each question 6
Vector data (scale, accuracy, currency, etc.)
Note: Percentages based on the actual number of respondents to each question 7
Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC
Note: Percentages based on the actual number of respondents to each question 8
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of respondents to each question 9
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of respondents to each question 10
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of respondents to each question 11
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of respondents to each question 12
Time series – Ortho imageryVicinity of Raleigh-Durham International Airport 1993-2002
Note: Percentages based on the actual number of respondents to each question 13
Tabular data (combined with vector data)
Note: Percentages based on the actual number of respondents to each question 14
Tabular data (combined with vector data)
Note: Percentages based on the actual number of respondents to each question 15
Tabular data (combined with vector data)
Note: Percentages based on the actual number of respondents to each question 16
Today’s geospatial data as tomorrow’s cultural heritage
Note: Percentages based on the actual number of respondents to each question 17
Risks to Digital Geospatial Data
Producer focus on current dataTime-versioned content generally not archives
Future support of data formats in questionVast range of data formats in use--complex
Shift to “streaming data” for accessArchives have been a by-product of providing access
Preservation metadata requirementsDescriptive, administrative, technical, DRM
GeodatabasesComplex functionality
Note: Percentages based on the actual number of respondents to each question 18
GIS Software Used – Local Agencies
0%
10%
20%
30%
40%
50%
60%
70%
ArcGIS (ESRI) ArcInfo (ESRI) ArcView 8.x (ESRI)
ArcView 3.x (ESRI) ArcIMS (ESRI) GenaMap
IMAGINE Intergraph MapInfo
Understanding Systems Other Not Sure
Source: NC OneMap Data Inventory 2004
Note: Percentages based on the actual number of respondents to each question 19
Earlier NCSU Acquisition Efforts
NCSU University Extension project 2000-2001
Target: County/city data in eastern NC“Digital rescue” not “digital preservation”
Project learning outcomesConfirmed concerns about long term accessNeed for efficient inventory/acquisitionWide range in rights/licensingNeed to work within statewide infrastructureAcquired experience; unanticipated collaboration
Note: Percentages based on the actual number of respondents to each question 20
Exploring Approaches to Sharing Data County and City GIS Directories
Note: Percentages based on the actual number of respondents to each question 21
Processing Ingested Datae.g. Testing for data gaps in county orthophoto sets
Note: Percentages based on the actual number of respondents to each question 22
Content Identification and Selection
Work from NC OneMap Data Inventory
Combine with inventory information from various state agencies and from previous NCSU efforts
Develop methodology for selecting from among “early,” “middle,” and “late” stage products
Develop criteria for time series development
Investigate use of emerging Open Geospatial Consortium technologies in data identification
Note: Percentages based on the actual number of respondents to each question 23
Content Acquisition
Work from NC OneMap Data Sharing Agreements as a starting point (the “blanket”)Secure individual agreements (the “quilt”) Investigate use of OGC technologies in captureExplore use of METS as a metadata wrapper
Ingest FGDC metadata; Xwalk to MODS? PREMIS?Maybe METS DRM short term; GeoDRM long termConsider links to services; version managementGet the geospatial community to tackle the content packaging problem (maybe MPEG 21?)
Note: Percentages based on the actual number of respondents to each question 24
Partnership Building
Work within context of the NC OneMap initiativeState, local, federal partnership
State expression of the National Map
Defined characteristic: “Historic and temporal data will be maintained and available”Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees
Seek external partnersNational States Geographic Information Council FGDC Historical Data Committee
… more
Note: Percentages based on the actual number of respondents to each question 25
Content Retention and Transfer
Ingest into DspaceExplore how geospatial content interacts with existing digital repository software environments
Investigate re-ingest into a second platformChallenge: keep the collection repository-agnostic
Start to define format migration pathsSpecial problem: geodatabases
Purse long term solutionRoles of data producing agencies, state agencies; NC OneMap; NCSU
Note: Percentages based on the actual number of respondents to each question 26
Rights Issues
Various interpretations of public records law53.9% of local NC agencies charge for data
43.7% of local NC agencies restrict redistribution
Desire for downstream control of dataDisclaimer clickthrough; liability concerns
Filtered locations/individuals; post 9/11 issues
Restrictions on redistribution; commercial resale
Web services area in “Wild West” stageBoth content and technical agreements
GeoDRM initiative in the works
Note: Percentages based on the actual number of respondents to each question 27
Big Challenges
Format migration paths
Management of data versions over time
Preservation metadata
Harnessing geospatial web services
Preserving cartographic representation
Keeping content repository-agnostic
Preserving geodatabases
More …
Note: Percentages based on the actual number of respondents to each question 28
Vector Data Format Issues
Vector data much more complicated than image data
‘Archiving’ vs. ‘Permanent access’An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access
Piles of XML need to be widely understood piles
GML: need widely accepted application schemas (like OSMM?)
The Geodatabase conundrumExport feature classes, and lose topology, annotation, relationships, etc.
… or use the Geodatabase as the primary archival platform (some are now thinking this way)
Note: Percentages based on the actual number of respondents to each question 29
Vector Data Format Options
Option A: use an open format and have a really unfortunate transformation and limited vendor support for the output object
Option B: use closed format but retain the original content and count on short- and medium-term vendor support.
Option C: do both to buy time and look for an open, ASCII solution. (watch GML activity)
No sweet spot, just an evolving and changing mix of
flawed options that are used in combination.
Note: Percentages based on the actual number of respondents to each question 30
Managing Time-versioned Content
Note: Percentages based on the actual number of respondents to each question 31
Managing Time-versioned Content
Many local agency data layers continuously updated
E.g., some county cadastral data updated daily—older versions not generally available
Individual versioned datasets will wander off from the archive
How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”?
How do we certify concurrency and agreement between the metadata and the data?
Note: Percentages based on the actual number of respondents to each question 32
Managing Time-versioned Content
Can we manage the relationship loosely using a persistent identifier link to a parent object?
version
version version
version
Persistent IDResolver
Parent ObjectManager
version
Note: Percentages based on the actual number of respondents to each question 33
Preservation Metadata Issues
FGDC MetadataMany flavors, incoming metadata needs processing
Cross-walk elements to PREMIS, MODS?
Metadata wrapperMETS (Metadata Encoding and Transmission Standard) vs. other industry solutions
Need a geospatial industry solution for the ‘METS-like problem’
GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3)
Note: Percentages based on the actual number of respondents to each question 34
Harnessing Geospatial Web Services
Note: Percentages based on the actual number of respondents to each question 35
Note: Percentages based on the actual number of respondents to each question 36
Note: Percentages based on the actual number of respondents to each question 37
Note: Percentages based on the actual number of respondents to each question 38
Note: Percentages based on the actual number of respondents to each question 39
Note: Percentages based on the actual number of respondents to each question 40
Harnessing Geospatial Web Services
Automated content identification ‘capabilities files,’ registries, catalog services
WMS (Web Map Service) for batch extraction of image atlases
last ditch capture option
preserve cartographic representation
retain records of decision-making process
… feature services (WFS) later.
Rights issues in the web services space are ambiguous
Note: Percentages based on the actual number of respondents to each question 41
Preserving Cartographic Representation
Note: Percentages based on the actual number of respondents to each question 42
Preserving Cartographic Representation
The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data:
Intellectual choices about symbolization, layer combinations
Data models, analysis, annotations
Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration
Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem
Note: Percentages based on the actual number of respondents to each question 43
Preserving Cartographic Representation
Image-based approachesGenerate images using Map Book or similar tools
Harvest existing atlas images
Capture atlases from WMS servers
Export ‘layouts’ or ‘maps’ to image
Vector-based approachesStore explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2)
Archive and upward-migrate existing files .avl, .apr, .lyr, .mxd, etc.
SVG, VML or other XML approaches
Other?
Note: Percentages based on the actual number of respondents to each question 44
Preserving Cartographic Representation
Note: Percentages based on the actual number of respondents to each question 45
Preserving Cartographic Representation
Note: Percentages based on the actual number of respondents to each question 46
Interest in how geospatial content interacts with widely available digital repository software
Focus on salient, domain-specific issues
Challenge: remain repository agnosticAvoid “imprinting” on repository software environment
Preservation package should not be the same as the ingest object of the first environment
Tension between exploiting repository software features vs. becoming software dependent
Repository Architecture Issues
Note: Percentages based on the actual number of respondents to each question 47
Preserving Geodatabases
Spatial databases in general vs. ESRI Geodatabase “format”
Not just data layers and attributes—also topology, annotation, relationships, behaviors
ESRI Geodatabase archival issuesXML Export, Geodatabase History, File Geodatabase, Geodatabase Replication
Growing use of geodatabases by municipal, county agencies
Some looking to Geodatabase as archival platform (in addition to feature class export)
Note: Percentages based on the actual number of respondents to each question 48
Questions?
Contact:
Steve MorrisHead, Digital Library InitiativesNCSU [email protected]