GeoCoding Issues with OpenStreetMap/Nominatim - geocoding

I have a website which needs to obtain the Latitude and Longitude for the address entered by the customer.
Google/Bing/Yahoo are too expensive for us so we went with OpenStreetMap/Nominatim.
Unfortunately while it worked OK during testing, its failing to find about 50% of the addresses entered which is a big issue.
There are 3 things I am interested in knowing:
What is the best way to deal with the situation where the customer really does enter an incorrect address - send them an email and ask them to correct it? Use segments of the address until something is found?
What is the best way to handle the situation where the address is fine but I can't find it with OpenStreetMap? Or am I doing something wrong with my query to Nominatim?
Does anyone know of a free/cheap alternative if OpenStreetMap isn't up to the task? I know its an open source collaboration and therefore not complete, but I thought it did have pretty good coverage, and that it would return a nearby location if it didn't have the exact location - maybe it does and maybe I'm using it wrong.
Here is an example:
182 livington ave,albany,New York,12210,US
Google maps finds that easily.
Nominatim finds nothing: http://nominatim.openstreetmap.org/search?format=xml&addressdetails=0&q=182%20livington%20ave,albany,New%20York,12210,US

I think what you're looking for is address verification. Google, Nominatim, and others, only perform address approximation which is good for finding addresses when you aren't sure what they are, but the results are only a best guess.
I helped develop an API which verifies and geocodes addresses according to stringent CASS™ requirements called LiveAddress. I ran your sample address through Google, Nominatim, and LiveAddress API and these are the results:
Google found the address despite the typo in "Livingston" but could not guarantee its validity, saying, "Address is approximate." -- then again, it says that for just about every address you try.
Nominatim does not find it because of the typo. Perhaps a drawback to using Nominatim is that it doesn't try to compensate for typos, verify the accuracy or completeness of addresses, etc. Fixing the typo returned some information but it was anyone's guess what had to be fixed, and why the query failed anyway.
LiveAddress doesn't recognize the address as entered because of the typo. Missing the "s" in "Livingston" is dramatic because there are streets named "Livington," leaving the query ambiguous, and the results were too much of a mis-match to return according to CASS™ specs. Changing the name with a different typo, "Livingstn," however, produced a valid result, which typo Nominatim did't accept either:
... for some reason I have to break out of my bullet points for code to render properly:
[
{
"input_index": 0,
"candidate_index": 0,
"delivery_line_1": "182 Livingston Ave",
"last_line": "Albany NY 12210-2512",
"delivery_point_barcode": "122102512824",
"components": {
"primary_number": "182",
"street_name": "Livingston",
"street_suffix": "Ave",
"city_name": "Albany",
"state_abbreviation": "NY",
"zipcode": "12210",
"plus4_code": "2512",
"delivery_point": "82",
"delivery_point_check_digit": "4"
},
"metadata": {
"record_type": "S",
"county_fips": "36001",
"county_name": "Albany",
"carrier_route": "C011",
"congressional_district": "21",
"rdi": "Residential",
"latitude": 42.66033,
"longitude": -73.75285,
"precision": "Zip9"
},
"analysis": {
"dpv_match_code": "Y",
"dpv_footnotes": "AABB",
"dpv_cmra": "N",
"dpv_vacant": "N",
"active": "Y",
"ews_match": false,
"footnotes": "M#"
}
}
]
The analysis footnote "M#" indicates a match was achieved by fixing the spelling of the street name. The resulting DPV footnotes "AABB" indicate that the entire address matched a street + city/state on the national ZIP+4 file. Also note that Zip9 precision which is the most precise level of geocoding (currently) — accurate to block (or closer) level.
So, in answer to your questions:
That depends. Are your customers entering an address on a website form? Tell them right away before they continue, that the address isn't valid. We're working on a jQuery plugin to make this cut-and-paste easy for everybody, but until then, you can see our concept in our checkout form which implements a pretty slick system: SmartyStreets has a jQuery Plugin which verifies addresses on website forms (just copy-and-paste). When an address is typed, it is automatically verified. If it is wrong, they slide up a notification asking the user if they'd like to fix it. Sometimes their address is ambiguous, where it returns a few valid results. (Try: "100, new york, ny") — They show a few suggestions and you can pick one. You fix it and the form does not submit until the user gets a valid address or says "Use mine anyway; I guarantee it's right." Or, if the address is correct, they put the standardized results in the address fields and display a green notice: "Address verified!"
I think I discussed this above. Your query is fine; it seems to be a shortcoming in Nominatim.
As suggested, you could try LiveAddress. Try it with a large set of your addresses to get a better idea (comparing from one address alone is, I'll admit, a weak indication) — but so far it seems like, for your needs, LiveAddress is somewhere between Google Maps and Nominatim.
Answer to question in comments
I ran out of room in the comments.
Q:
here is another address causing us issues "7580 E Big Cannon Drive,Anaheim Hills,Anaheim Hills,California,92808,US" even "7580 E Big Cannon Drive,California,92808,US" didn't seem to work with your site.
A:
I did some research on the USPS site and some other service providers as well. None returned any valid results or suggestions. But I found out what's the issue with the address as you submitted it:
Mispelled street name. No biggie; LiveAddress corrected this to Big Canyon.
Bad primary number. There's not much hope here if the primary number is incorrect. There's generally no way for a computer or human to infer what you really meant. In these cases, the address will fail verification and the user must supply something valid to go on. I found a valid primary number at 7584.
Master-planned community, not city/county. "Anaheim Hills" is the name of a master-planned community. Google found it in its business listings, but that has nothing to do with the address.
"Anaheim Hills" twice. It's confusing the parser. Unfortunately, with extra unnecessary information (esp. in a single-line address), it's nearly impossible to tell what part of it is dubious. That second "Anaheim Hills" has to go, but the first one can stay and it will be fine.
Country information. Most of the services I tried your address on got confused with the country in front and put it in the "Company/Firm Name" field. We deal with US addresses, so you can omit the country. It'll reduce the size of your request too.
LiveAddress was actually able to verify the address in these forms, both as a single-line address and split into components:
7584 E Big Cannon Drive anaheim hills ca 92808
7584 bg cannon 92808
7584 big cannon ave aneheim hills ca
The most significant help was finding a valid primary number. In the case that no valid addresses come back, you should alert the user and suggest fixing the primary number and making sure the city/state (if given) align with the zip code ('cause if those two are fighting, it's also impossible to tell what you meant).

Related

GCP data Loss Prevention - not detecting local types

I am working with GCP's DLP APi, and I have issues detecting country-specific types. On the other hand, I have no issues with global types (here you can find the list of types). Does anyone have suggestions on how to fix this? In case it might help, I'm working from outside the US.
This is a copy of my config file:
info_types_rep_names = {"PHONE_NUMBER": "[PHONE]",
"EMAIL_ADDRESS": "[EMAIL]",
"US_PASSPORT": "[PASSPORT]",}
info_types = [{"name": key} for key, value in info_types_rep_names.items()]
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"info_types" : [{"name": key}],
"primitive_transformation": {
"replace_config": {
"new_value": {"string_value": value}
}
}
} for key, value in info_types_rep_names.items()
]
}
}
Locations might be affected in some scenarios. Refer to this document for Country specific values.
There is a piece of code and I could see that there might be a chance that the provided value may be wrong.
I have tested from our end with sample code using Python refer document and for using info_types_rep_names, by changing the project id “input_str = 'Please call me. My phone number is. My email, just in case, is adfsfasfs#gmail.com. Take a note of US passport number: C03005988. Or maybe C03004786'” and works for US_PASSPORT, but the sample number should absolutely work for sure to be valid. Also there may be a possibility that only a country specific one may fail. But when tested with some other country specific value and it worked. Also I could see all the results are aligned with the demo detection.
Also making sure the right InfoTypes are in use on the right section.follow the link , and in the options tab you can view and adjust the InfoTypes.
So my first question is , is it getting aligned with correct Infotypes or not?
If it is aligned, then it is either our detection fails with specific samples or might be the sample is not valid.
If it is not valid then it is either the matter of having the wrong sample or a bug in the code.
Please check and revert if there are any issues.

How to get suggested sublocality or locality

Sometimes google places api will suggest the locality e.g. '3524 51 Street Northwest' will suggest 'Edmonton' even though there is a sublocality called 'Mills Woods' returned in the address_components
Other times the api will suggest the sublocality e.g. '77 Finch Avenue East' will suggest 'North York' despite the fact that 'Toronto' is returned as the locality
How does the places api know which to use in the drop down and how can I use the one that the api suggests?
Thank you for contacting the Google Maps Platform Technical Support. My name is Thomas and I’ll be happy to help you with your question today.
The address that is displayed as the suggested address by JavaScript autocomplete widget is the 'name' field of the object returned by autocomplete.getPlace(). There is also a 'formatted_address' field that is part of the object that is the exact full address of the autocompleted address.

Geocoding multi-line street addresses (Address1 / Address2)

Our app's postal-address entry UI is a two-line Address1/Address2 field like this (borrowing screenshot from Amazon.com).
But real users' data entry is always messy. Some users will ignore our directions and will sometimes put the street address in Address1 and sometimes put it in Address2. Other users will import lists of addresses from external sources (like an existing mailing list), which will also likely cause some cases where the street address is unpredictably in Address1 or Address2.
When it comes time to geocode the address, what's a good algorithm to maximize the chance of successful geocoding if we're not sure whether the street address is in Address1 or Address2? A naive approach could be to try Address1, and if it fails then try Address2. But I'm sure I'm not the first person to try geo-coding real-world messy data entry... how is this problem usually solved?
We're using the Google Maps Geocoding API, if it matters.
I believe Google recommends using the autocomplete widget.
Have a look at the best practices document:
https://developers.google.com/maps/documentation/geocoding/best-practices
It says
Respond, in real time, to user input (includes ambiguous, incomplete, poorly formatted, or misspelled addresses entered by a user)
Use the Places API Place Autocomplete service to obtain a place ID, then the Geocoding API to geocode the place ID into a latlng.
Apartment, suite, unit etc. typically is not present in Google database. So you can bind the autocomplete to the first input where the user selects address and you can get corresponding place ID, the rest of information the user can enter in the second field which is not relevant for Google Geocoding API.
There are several examples of place autocomplete in the official documentation.
https://developers.google.com/maps/documentation/javascript/examples/places-autocomplete
https://developers.google.com/maps/documentation/javascript/examples/places-autocomplete-addressform
https://developers.google.com/maps/documentation/javascript/examples/places-placeid-geocoder
I hope this helps!

Parameter not supported by web service

I want to validate an opinion with you.
I have to design a web service that searches into a database of restaurants affiliated to a discount program in a specific country around a given address.
The REST call to such a webservice will look like http://server/search?country=<countryCode>&language=<languageCode>&address=<address>&zipcode=<zipcode>
The problem is that some countries do not have zipcodes or do not have them in the entire country.
Now, what would you do if the user passes such a parameter for a country that does not have zipcodes, but he/she passes a valid address?
Return 400 Bad request.
Simply igonre the zipcode parameter and return results based on the valid address
Return an error message in a specific format (e.g. JSON) stating that zipcodes are not supported for that country
Some colleagues are also favoring the following option
4. Simply return no results. And state in the documentation that the zipcode parameter is not supported. Also we have to create a webservice method which returns what fields should be displayed in the user interface.
What option do you think is best and why?
Thanks!
Well the OpenStreetMap Nomination Server returns results even if you dont know the ZIP Code and you can look at the results anyway. What if the user doesnt know the zip code but wants to find hist object?
I would try to search for that specific object anyway, especially because you said that some countries have zip codes partially.
If you simply return nothing te user doesnt know what went wrong and he wont know what to do.
That would depend on the use case. How easy is it for a user of the API to trigger that case? Is it a severe error which the user really should know how to avoid? Or is it something that is not entirely clear, where a user may know (or think he knows) a zipcode where officially there shouldn't be one? Does it come down to trial and error for the user how to retrieve correct results from your API? Is it a bad enough error that the user needs to be informed about it and that he needs to handle this on his side?
If you place this restriction in your API, consider that it will have to be clearly documented when this case is triggered, every user of the API will have to read and understand that documentation, it needs to be clear how to avoid the problem, it needs to be possible for the user to avoid the problem and every user will have to correctly implement extra code on his side to avoid this problem. Is it possible for the user to easily know which areas have zipcodes and which don't?
I think the mantra of "be flexible in what you accept, strict in what you output" applies...

Google Geocoding UK address does not get listed

I followed the examples in following site,
http://code.google.com/apis/maps/documentation/geocoding/
So I expected following url would give me UK addresses. But it is still giving me US address. Any ideas?
http://maps.googleapis.com/maps/api/geocode/json?address=baker&sensor=false&region=gb
The "region" parameter will only make the region a preference not lock out all other results.
In this case it seems the address "Baker" doesn't even show up on Google Maps as a known location - only as businesses.
http://maps.google.co.uk/maps?q=baker&hl=en&sll=53.800651,-4.064941&sspn=18.336241,46.362305&t=h&z=5
I also Googled "Baker village', "baker town", etc but with no luck. I'm guessing that location is particularly obscure and so Google is returning what it considers the more likely results - in the US.
If you try another example like "Birmingham" which is in both the US and UK you'll notice it favours the UK due to the region tag setting:
http://maps.googleapis.com/maps/api/geocode/json?address=birmingham&sensor=false&region=uk