+ All Categories
Home > Documents > Geocoding With ARCView for H20Net

Geocoding With ARCView for H20Net

Date post: 06-Mar-2015
Category:
Upload: julian-santa-rita
View: 38 times
Download: 1 times
Share this document with a friend
16
Geocoding with ARCView for H20Net Files required to get started: 1. Usage Spreadsheet/Database From utility Provider - Has customer usage - Has customer address (the more detailed the better) - Has customer type (Commercial, Industrial, Residential) 2. Utility Network Model - Digital drawing of utility delivery network - Points (Nodes) at junctions of interest with Unique Identifiers 3. Street Centerline File (the more comprehensive the better) - Has streetnames and preferably address ranges, zip codes, city names, etc. Geocoding and preparation of the geocoded information for modeling is composed of several steps and can be broken down into two phases: Mapping and Database Manipulation. You will often need to go back and forth between both phases to comprehensively geocode. First things first: Get an understanding of the information you are handling by looking at the tables and determining which fields are important. Often when you first receive a database of customer usage their may be multiple billing records to the same address, or addresses that are too colloquial to be accurately mapped. The first thing you should do is make a backup copy of the original unmodified file before you scrub the data and refine it down.
Transcript
Page 1: Geocoding With ARCView for H20Net

Geocoding with ARCView for H20Net

Files required to get started:

1. Usage Spreadsheet/Database From utility Provider - Has customer usage- Has customer address (the more detailed the better)- Has customer type (Commercial, Industrial, Residential)

2. Utility Network Model- Digital drawing of utility delivery network - Points (Nodes) at junctions of interest with Unique Identifiers

3. Street Centerline File (the more comprehensive the better)- Has streetnames and preferably address ranges, zip codes, city names, etc.

Geocoding and preparation of the geocoded information for modeling is composed of several steps and can be broken down into two phases: Mapping and Database Manipulation. You will often need to go back and forth between both phases to comprehensively geocode.

First things first:

Get an understanding of the information you are handling by looking at the tables and determining which fields are important. Often when you first receive a database of customer usage their may be multiple billing records to the same address, or addresses that are too colloquial to be accurately mapped. The first thing you should do is make a backup copy of the original unmodified file before you scrub the data and refine it down.

To do so you should use MSAccess to refine the database down into single address records. Create a query that finds records with duplicate addresses and combines those into a single record with a summation of the usages. Below is an example of a house that has had many residents in the course of the year.

This:Flippin Usage 2007

Acct_No

NameService_

AddrAcct_T

ypeMeter_

SizeReading_

TypeSewer_C

odeArea_C

odeOct20

06Nov2006

Dec2006

Jan2007

Feb2007

Mar2007

Apr2007

May2007

Jun2007

Jul2007

Aug2007

Sep2007

10045100

YORK, SHERYL

100 NO. 5TH STREET

R 7 G 1 I 6650 4980 2890 0 0 3550 5730 4750 5740 5900 6060 6380

10045101

GLASSCOCK, REGINA

100 NO. 5TH STREET

R 7 G 1 I 0 0 8110 410 0 0 0 0 0 0 0 0

10045102

DAUGHERTY, JOHN

100 NO. 5TH STREET

R 7 G 1 I 0 0 0 0 0 2020 0 0 0 0 0 0

Page 2: Geocoding With ARCView for H20Net

Turns into this:Service Addr Summed

FirstOfAcct_

No

FirstOfName

Service_Addr

FirstOfAcct_Ty

pe

FirstOfMeter_

Size

FirstOfReading_T

ype

FirstOfSewer_C

ode

FirstOfArea_C

ode

SumOfOct20

06

SumOfNov2

006

SumOfDec2006

SumOfJan20

07

SumOfFeb20

07

SumOfMar20

07

SumOfApr20

07

SumOfMay20

07

SumOfJun20

07

SumOfJul20

07

SumOfAug20

07

SumOfSep2007

10045100

YORK, SHERYL

100 NO. 5TH STREET

R 7 G 1 I 6650 4980 11000 410 0 5570 5730 4750 5740 5900 6060 6380

For every set of records corresponding to the same address. That will reduce the total number of records and give you the bigger picture.

R_Unmappables

Acct Name Address Type FirstOfMet FirstOfRea FirstOfSewFirstOfAr

eDec Jun

30250100 TOTT, CHARLEY

P.O. BOX 104 1 0 G 0 I 2220 2400

Further ‘cleaning’ can still be done during this discovery phase. Records like the one above have inadequate geographic information, and looking up Charles Ott in the phone book may still list the P.O. Box. Search for records without numbered addresses, P.O. Boxes and the other obvious un-map-ables. These records will be impossible to accurately map but are not discarded. If you are able to track down these addresses effectively, that is the most accuracy-producing option. Otherwise, they are collected into a separate database, removed from your primary database and are used later in the process.

Your database should now be tidy enough for Geocoding. At this point you may wish to separate the database by grouping usage types into their own tables. It is common to map Residential use separately from Industrial and Commercial usage. You might also have already identified specific parts of the database that are of interest, like a specific set of months, and may remove portions of the database that you are not going to use.

Providing the Framework

You should now have a tidy set of records which you would like to place into the real world, but you do not yet have a reference table to map them against. You will need to create an address locator from the street centerline file I mentioned you would need. The address locator can be made in either ArcMap or ArcCatalog. You will need to look at the records of the street centerline file to determine how it is formatted. It will either be a file or geodatabase and may have included information about which sides of the streets are numbered in what way, zip codes, prefixes…all very useful if you have them.

From within ArcCatalog there is a category entitled Address Locators. Easy enough. When you click it, it will display a list of available locators as well as an option to create

Page 3: Geocoding With ARCView for H20Net

a new address locator. You will most likely be creating a new one. Selecting that option brings of a screen where you select the style of address locator to create. This will depend on the information in the street centerline file and whether the centerline information is a geodatabase or file-based. Properly completing this step relies on knowing your source data well enough to correlate it to the necessary fields in the address locator setup screen that follows. Referencing the centerline file and accurately linking fields is all that is necessary, but you will find it helpful to use this panel to modify the output fields and set some basic matching options. Spelling sensitivity and percentages for candidate matching are setup here and are manipulated later if necessary.

Once you are satisfied that it is properly filled out, continue and confirm creation of the Locator.

Geocoding

You have enough to start geocoding in ArcView now! You will create a new map with aerial photography, street centerline file, and the utility network model. City limits or other boundaries may also be included. You might have to do some spatial adjustment or transformation to get everything lining up, but once you do it should look something like this.

You will now start geocoding the addresses from your usage tables.

Press “Tools: Geocoding: Geocode Addresses…”

You will be prompted to select an address locator. Navigate to the one you’ve prepared and continue.

Page 4: Geocoding With ARCView for H20Net

You will now be prompted to Select the Usage Table and define the field that contains the address. You will also need to tell ArcView where to save the Shapefile that will be created as you geocode.

Really Geocoding

Once you’ve pressed OK, the first geocoding match will be done. Any addresses that match between the two sets of data will be mapped and given coordinates. Some will Tie, some will be unmapped. In cases where the utility database or centerline file has not been prepared with geocoding in mind there may be a very low number of automatic matches initially. You should experiment with the spelling/candidate/match percentage requirements until you get more matches, but be careful no to become too lax with the percentages or you will allow nodes to be placed incorrectly. Educated decision-making combined with trial and error is the key here.

Initial (left) and lowered sensitivity (right) result. You will probably not get them all to match initially. In my example I only had two matches until I changed the spelling sensitivity. The usage database’s address field is poorly formatted and inconsistent. This a very low percentage of initially matched.

Some of them will also tie, and many will be unmatched. You will need to match interactively to make up the difference. Start with the tied matches. These will be the easiest to complete. You will choose a record and then select it’s best fit. This example has a simple decision between Alford Street and Alford Drive. The usage table makes no note of which it is, in this example, although usually it will offer the information you need. So where can we look next?

Page 5: Geocoding With ARCView for H20Net

Often the Centerline File’s records will be critical in deciding what goes where. Below you can see that the centerline file lists an Alford Street in Flippin (the city we’re coding) but that Alford Drive is in Bull Shoals. Since we’re mapping Flippin,not Bull Shoals, the usage database must mean Alford Street.

Page 6: Geocoding With ARCView for H20Net

Unmatched Records in the usage database are often hindered from matching because they simply lack a suffix, or are otherwise only minimally different from the corresponding address in the centerline file.

Sometimes, however, it will take more legwork to locate where the addresses are supposed to go because the two tables are not reconciling. In the table above you can see that there are several similar addresses that will not even bring up candidates. Lowering the candidate score may bring up some potential matches. In most cases you will have to manually match the troublesome usage records by changing some information in the record or hand coding. If you can match all of your usage with this automatic process, more power to you. Many large cities with well managed utilities will have a great number of matches an a small enough percentage of unmatched that the unmatched can be dealt with by dispersal (more on that later)

If that is not the case, you’re stuck hand coding. To hand code you will need to find out where the address is by sleuthing. In the example above there are several addresses that cannot find a match even though the centerline file shows Main Street in flippin with the proper range. In this case you will need to locate Main Street in ArcView and write a set of coordinates down.

Page 7: Geocoding With ARCView for H20Net

This excerpt from the centerline file lists the destination we need, but wont automatically match our usage record. We will need to locate an X and Y coordinate for the section of the street we are matching to. If we select the record where we are trying to match and close the table, that area on our display map will be highlighted

Page 8: Geocoding With ARCView for H20Net

Mousing over the street’s location will display its coordinates in the lower right corner of the screen. Write these down, and proceed to start an editing session and open the usage table.

Here you can see which records are matched in the status field (M=matched, U=unmatched, T=tied)or not as well as the score they were matched with, X,Y, and a few other fields. We are most interested in the Address, Matched, X, and Y fields. The X and Y field is where we can put the coordinates we wrote down for the address we looked up. The match field needs to be changed from U to M when you’ve matched it up. Generalizing may be done at this point, i.e. using the same coordinate for all of the addresses listed in the 1000 block of Main Street (or a larger range of addresses depending on your pipe network and their relation to the layout of the streets).Save edits often.

Coding in this way will not create the spatially correct point (that IS generated by the automatic geocoding process we started with) but it will allow us to do so when we’re satisfied that the table has been filled out to the best of our ability. Inevitably you will still have some unmatchable stragglers. Now it is time to re-clean the data and go ahead with mapping the hand-input address/coordinate matches.

Save edits and close your editing session.

Re-open the geocoding result table Sort the records by “Status” and select all of the records that are unmatched. Next use “Options: Export… : Selected records” to export the unused records to a new table. We will accommodate these unmatched records later, so for now bundle them with all of the unmatchable records from your initial data scrubbing. OpenOffice can handle the conversion from dBase to .xls format. Xls can be imported into Access.

Page 9: Geocoding With ARCView for H20Net

Deselect the Unmatched and use “Options: Export… : All records” to export a raw version of your geocoding to be used as a backup.

Restart the editing session and delete the unmatched records from your table. Export this table as a new table and add the result of the export to your map. It will show up under the source tab as a table. This table has our coded points but these points are not yet visible. Right click the table name and choose “Display XY Data” from the contextual menu.

This will create a shapefile with your table as a backend and a single point for every address you’ve located. In this image there are residential and commercial nodes mapped there is a pipe network with nodes that is about to come in very handy.

Building a Relationship based on Proximities

You are in very good shape now. You have the usage nodes mapped and you have a map of where the usage network is. Now you just have to get them in bed together. The utility network has nodes placed along its route. Our goal is to group the usage nodes with the network node closest to it.

Create a shapefile from the utility network’s node layer. (right click, “Export…”) and add it back to your map. You will now have a bundle of points but you need to be able to group nodes that are closest and the best way to do so quickly is to generate Thiessen Polygons.

You will create your polygons from the network nodes since it is the network’s coverage we are investigating. The best way to create polygons is with the add in from ET

Page 10: Geocoding With ARCView for H20Net

Geowizards, or some other open-source freely distributed polygon tool like the one from Terrace GIS. I will not go over installation or finding these tools. They are available with a Google search or on the boards on ESRI’s user forums.

The terrace GIS version will create polygons effectively and write them to a shapefile which we will add to our map.

As you can see each network node has been given all the areas closest to itself but not closer to some other network node. If you like you may adjust these polygons to adhere more strictly to your pipe network, examine broader areas or define smaller areas. The usage nodes fall within these areas and we will need to link them through a Spatial Join in order to summarize the utility usage by network node.

Page 11: Geocoding With ARCView for H20Net

Right click on the polygon layer and select Join.

You will then select the layer you wish to join to, the treatment of the data being joined and the resulting shapefile’s name.

That resulting shapefile will house your usages organized by node and will account for every network node we have decided to examine.

We can export this table to a dBase file and then convert it to .xls in OpenOffice. This xls table will have the usage type summed by node. You will need to repeat this for each usage you have coded separately (residential, commercial, industrial).

Page 12: Geocoding With ARCView for H20Net

Post-Processing

Now you have your usage by node as a summation of the usages in their directly surrounding area. Close, but not complete. Your tables will need to be updated so that the AutoCAD identifier and the H2Onet identifier are both in your polygon-joined-usage-tables (add a field and use an update query with the exported h20Net table). Combining Commercial and Residential usage into a single table may also be done in access if necessary.

You will want to reclean up your tables since ArcMap adds field you may not need and names them things you might not want them to be named. You should have a nice tidy table that looks like this:

You will also need to sum up the usage records that were unmappable from the data scrubbing and evenly disperse that usage across the records(total unmapped usage/number of nodes). This is most easily done in Access or Excel.

In the end you will have tables with the original usage records, usage records summed by address, mappable usage by type, unmappable usage by type, polygonal summation per usage node by type, and a table with the relationship between the cad map and the h20net identifier. You may also have some tables with combined usages, etc.


Recommended