+ All Categories
Home > Documents > Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a...

Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a...

Date post: 04-Jan-2016
Category:
Upload: regina-day
View: 213 times
Download: 1 times
Share this document with a friend
13
Geographic Text Search rate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc
Transcript
Page 1: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

On building a high performance

gazetteer database

Amittai AxelrodMetaCarta Inc

Page 2: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Thanks to

Keith Baker

Kenneth Baker

Michael Bukatin

András Kornai

Page 3: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Plan of the talk

• Database background

• Relating geographic names and features

• Handling ambiguities and inconsistencies in geographic names

• Classification and storage system for geographic features

Page 4: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Databases

• No DB (faking it with flat files) -- clumsy

• Record-oriented -- still runs the world

• Relational -- making headway

• Object-oriented -- still very academic

• For MetaCarta GazDB, relational approach made most sense:• Overlapping records (McKinley/Denali)• Need for frequent updates of subparts of

records

Page 5: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Gazetteer production process

Page 6: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Conversion scripts

• Enforce uniform structure on the data

• Normalize across sources (e.g. lat/lon to decimal degrees, spelling, …)

• Configuration required once per source

• Load data in GazDB

• Combination perl/SQL

Page 7: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Relating features and names

Page 8: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Other tables used in GazDB• Population• Elevation• Language• Feature type• Source/versioning info• Temporal extent• Hierarchical information• Confidence• Comments• Change logs (full auditing)

Page 9: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Geographic names

• Internationalization• Full Unicode (UTF8) support• Maintain detail language information (SIL)

• Name resolution • Canonical form (16 bits)• Display form (8 bit)• Search form (6 bit)

• Authoritativeness

• Explicitness

Page 10: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Updating a name in the GazDB

Page 11: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Geographic features

• Spatial representations • Point, line, area, …

• Functional classes• Building, field, campus, city, …

• Administrative types• Nation, province, county, international org, …

Page 12: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Export scripts

• Read GazDB

• Select which fields to include in custom output

• Creates .gbdm (MetaCarta format) binaries

• Combination perl/SQL

• Not yet general across binary output formats

Page 13: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.

Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.

Conclusions• Accept multiple sources (only configure

once per source)• Fast loading of large datasets (1m entries

per hour on linux desktop)• Simple update procedure• Outputting large binary custom gazetteers

for different purposes at extreme speeds (1m entries per minute)


Recommended