Getting Better Geocoding Results-Secrets of the Trade

Molly Vogt, GIS Specialist-City of Gresham, Oregon
2004 President of Women in GIS

Geocoding is one of the best tricks you can do with GIS. What's not exciting about watching a file of sterile tabular address data suddenly take shape and illustrate themselves on a city map? But mention to a colleague that you're trying your hand at geocoding, and you'll likely see a quick flinch, maybe even a shudder. I've seen veteran ESRI Tech Support staff – people who laugh in the face of projection problems and license server administration – shy away from the topic. Why the drama? Like many subjects, the more you know about geocoding, the more success you'll experience with it.

Geocoding is the term used for the process of assigning a locational reference, such as an address, to absolute spatial coordinates. We all benefit from geocoding, whether we're searching for ATMs within walking distance of our office, seeking a convenient place for a group to meet by geocoding members’ home addresses, or identifying disease "hotspots" by geocoding case data and performing cluster analysis.

Some software vendors will convince you that geocoding is just a matter of a few mouse clicks. More honest GIS users will add that successful geocoding requires good input address data, which is hard to find. While helpful, this becomes painfully obvious the moment you take the ol' geocoding wheel. The real key to improving your geocoding results lies in understanding the variety of pieces and players involved in the outcome.

Take the classic geocoding scenario: You import a table of addresses into your GIS and let the software match each one to an address range in a street centerline file. The software returns interpolated point locations which you map and further analyze.

Input address quality is not the only variable affecting your results! You have just relied on at least one reference data set as well as a slew of algorithms, some of which include user-defined parameters, embedded in your GIS software that identify the “best” address match.

It stands to reason, then, that there are many ways to improve not only your match rate but the match quality and your overall satisfaction with the process. Here are a few places to start:

1) Determine your target match rate before you begin. If you simply assume that it must be as close to 100% as possible, you run the risk of spending an inordinate amount of time salvaging a few records by hand-correcting your address data. And hand-correcting data gives geocoding a bad reputation; it should be avoided at all costs.

2) Use spreadsheet tools to clean your input address data. Your geocoding friends are right. Complete, accurate address data does wonders for your match rate; bad data sinks you from the start. According to the US Postal Service, a third of addresses they process contain significant errors or omissions. Take a close look at your address field contents. You’ll want to make sure that all “addresses” are really addresses. Correct common street misspellings. Clever programming jocks may write custom text parsing code; mere mortals will find that the spreadsheet’s Search and Replace tool is a new best friend.

3) Scrub your address data. “Scrubbing” refers to an address verification and quality assurance step in which you submit your data to a 3 rd party database that automatically corrects certain erroneous addresses and provides four-digit postal carrier codes. Many of these services are certified by the US Postal Service, e.g., Finalist, MailSTAR, ZP4. If you have many or large datasets to geocode, the added cost may be quickly justified by time savings and improved results.

4) Identify appropriate reference data sets. What are your accuracy needs? Street centerlines are not your only option for reference data. Does your city or regional government offer taxlot point data? If so, you may be able to geocode your addresses to taxlots, achieving greater precision and accuracy than interpolating along a street centerline. You might also geocode to carrier route (Zip+4) or 5-digit zip code centroids if you lack complete address data. Best of all, some GIS packages allow you to geocode to multiple datasets, stepping through them iteratively in descending order of accuracy to find the best match. ESRI’s ArcGIS 9.0, calls this a Composite Address Locator.

5) Adjust geocoding parameters in your software away from the defaults. Experiment with spelling sensitivity – set it too high and those who choose to spell Chautauqua St. or Schuyler St. phonetically will never make the cut; too low, and your address on 116 th Ave might just end up on 16 th Ave! If you’re geocoding to streets, consider the side offset options that automatically place your point to the right or left of the street centerline. It may save the day when you try to summarize those points by census area – if they lie directly on the census area boundary (often coincident with street centerlines), you’ve only got a 50% chance of assigning it to the correct area!

6) Use alternate street name tables if applicable. A street in Gresham, Oregon, is called 257 th Ave. by the County and Kane Rd. by the City. Either name may show up in the address data, but the street centerline reference file uses only one. Create a master list of alternate street names and, in packages like ArcGIS, you can point the geocoding engine to this list to check for alternate street names.

7) Use place name alias tables if applicable. Some GIS software will be able to translate “City Hall” to its street address, provided you have a separate reference table of place names and their addresses.

8) Take some time to evaluate the output. Do you have tied candidates, and, in these cases, did the software arbitrarily match addresses to the first candidate in the list? Are there patterns to the unmatched addresses, i.e., might there be systematic bias in your results? Are there any recurring street names in the unmatched addresses that either 1) do not exist in your reference data or 2) seem to inexplicably stump the software? Out-of-the-box ArcGIS is known to hiccup on some perfectly valid street names, especially those containing “St” (as an abbreviation for “Saint”) or “HWY”. While these problems can be resolved, the solutions are explicitly ArcGIS-focused and require a willingness to get intimate with the program beyond what can be described here.


Sign up for our email list
Email:
City: