What's the easiest way to do a one-time mass geocode? (580,000 addresses)

What's the easiest way to do a one-time mass geocode? (580,000 addresses) - geocoding

I am working on a civics related project and I need to be able to display all the properties in the City of Philadelphia on a map, so I'll need to get the latitude & longitude for all 580,000 properties. (Only once)
Most APIs like Google/Yahoo have limits of 5,000 per day, and even BatchGeo has a similar limit.
Is there a way I can do a one-time geocoding of all these addresses?

You can find a list of free and paid geocoding services at USC site.
Also check Microsoft's Geocode Dataflow API, it allows up to 200,000 entries / 300 Mb and takes up to 14 days.
Another possibility to combine several services at once: use 4 services that allow 5,000 entries a day and you'll finish your task in a month.

You can use Map Quest of Cloud Made.
I have created a small utility to help compare these API's.
The utility is hosted at below url:
http://ankit-zalani.appspot.com/GeoCode/index.jsp

Tobias, I work for an address verification (and recently, geocoding) company called SmartyStreets.
Many services have usage restrictions based on volume and license agreements which prevent users from storing the results of geocoding queries. There are some vendors, however, which don't have limits or restrictions like that.
I would recommend something like LiveAddress which will not only geocode the addresses but also perform CASS-Certified verification to make sure your addresses are correct before giving you potentially faulty coordinates. You can run 580,000 or even millions at a time in a few minutes, and we allow you to store your results.
Hope this helps. If you have any more questions about addresses, I'll personally assist.

This thread is pretty old by now, but there have been some developments in recent years making bulk geocoding very cheap. My favorite option is to just obtain a geocoding server on AWS ( google: geocoding on aws), many options there, some free some with low hourly rates (total cost depends on the server you choose, of course.)

Related

How to do large-scale, batch reverse geo-coding?

I have a very large list of lat/lon coordinate pairs (>50 million). I want to attach address information to each one. Most geo/revgeo services have strict call limits. Assuming computing power isn't the issue, how can I accomplish this? Also note that time/speed are not the primary concern.
One place to start might be the

You can get one of the dedicated AWS geocoders for unlimited volumme processing: https://aws.amazon.com/marketplace/search/results?x=0&y=0&searchTerms=geocoder

Intro
I have experience working with SmartyStreets's batch processing tool. They don't have call limits (paid version). But, they also don't have a Reverse Geocode API (yet!). Their batch processing is strictly for flexibility and ease-of-use in addition to normal calls. But, I am aware of a couple services that do Reverse Geocoding, and they mention batch processing on their website.
How they work
Batch processing services generally allow you to upload your data, even arbitrarily large files. You probably want to put your data in a CSV file (type of spreadsheet) as latitude and longitude pairs. Then, their servers will process the data and alert you when you can download. It's common practice to charge money for this download, but maybe TAMU's is free?
Suggestions on who to use
Texas A&M Geoservices
MapLarge
Both of these services have demos and developer portals to guide you along if there is something you want to research before using them.
(Full disclosure: I have worked for SmartyStreets.)

What is the best tool to use for real-time web statistics?

I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?

Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm

If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!

Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!

If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink

Looking for United States Address Validation Web Service

I'm looking for a United States Address Validation web service, as the title says. Also:
I don't need maps
I don't need Geo coding
I do need:
Validation that an address is real
Address parsing
Google's Maps / Bing Maps seemed good, but won't work for me because of these:
Prohibits use if not plotting points on a map image
Low request limits (100,000 / day) for premium account. I need more like 1,000,000 / day
Does Geocoding, which I don't need, which is resource intensive, which means it's slow
Any suggestions?

Maybe USPS?
https://www.usps.com/business/address-management-products.htm

use FedEx's API. They have an API to validate addresses.
Also:
https://webgis.usc.edu/Services/AddressValidation/Default.aspx

You can try Pitney Bowes “IdentifyAddress” Api available at - https://identify.pitneybowes.com/
The service analyses and compares the input addresses against the known address databases around the world to output a standardized detail. It corrects addresses, adds missing postal information and formats it using the format preferred by the applicable postal authority. I also uses additional address databases so it can provide enhanced detail, including address quality, type of address, transliteration (such as from Chinese Kanji to Latin characters) and whether an address is validated to the premise/house number, street, or city level of reference information.
You will find a lot of samples and sdk available on the site and i found it extremely easy to integrate.

You could, in theory, run desktop software and plug into any kind of API it provides, but then you become responsible for things like uptime, data updates, and associated overhead. You may also run into issues with the software threading model--is it multi-threaded or single-threaded software? You don't want to find that out in production.
There are a handful of web services out there that can verify US-based addresses, including the USPS official web service. The USPS one is very limited in the fields that it returns. For example, if you're looking for the "delivery point" which is used to make a full barcode, the USPS API doesn't return that information. I believe the USPS web service also limits the number of queries that you can perform, although I don't remember the exact limit.
A few things that you'll want to look for in a web service include the price (obviously) as well as geo-distribution of their servers. If a company has all of their servers in one location and that data center goes offline (which can and does happen), you're left out in the cold. If they have multiple physical locations, it can help to prevent unnecessary outages. Also, you'll want to make sure that the service call returns all necessary fields as per your requirements--like delivery point code, barcode, and DPV code (which tells you how deliverable an address is).
Lastly, you'll want to determine how you feel about interacting with the company. When you call them on the phone, are they responsive and concerned about your needs? Or are you talking to some front-line person that can't answer questions and is only able to gather information about your company size and revenue so they can evaluate how big of a fish you are and determine which salesman gets to call you back. Can you talk to the engineers that wrote the web service on the phone or via email?
There are a few choices out there and you'll have to choose the one that best fits your requirements and unique situation. Do a Google search to find a list of companies. In the interest of full disclosure, I'm the co-founder of SmartyStreets. We have an address verification web service API called LiveAddress. You're more than welcome to contact me directly with questions on my personal Twitter account or the company Twitter account.

How do sites count other sites' visitors and "value", and how can they tell users' location?

Hi actually this is a simple question but just came up out of the curiosity...
I have seen a web evaluation online tool recently called teqpad.com.I have lots of queries on it
How do they do it?? eg:page views daily visitors etc. without mapping real website??...
Website worth...is this getting any near to any site??
I don't know how do they got daily revenue??
I like traffic by country..it has seen same like in Google analytic s..how they got that info??
another one is ISP info and Google map location of server..
is there any one here done similar scripts?? if so what is your opinion??

They may be tracking user browser stats like Alexa does. (More info on Wikipedia.) A group of users installs a plug-in that reports which sites each user visits, like TV ratings work in most (all?) countries. This method is obviously not very reliable, and often nowhere near the actual numbers of visitors.
This is usually based on bullshit pseudo-scientific calculations and never a viable basis for evaluating the "value" of a web site, even though it may be possible to guesstimate the approximate ad revenues a site yields (see 3) But that is only one revenue stream - it says nothing about how expensive the site's daily maintenance is - servers, staff, content creation....
It should be possible to very roughly estimate daily revenue by taking the guesses on daily visitors/page views, count the frequency with which ads are shown, and look at what those ads usually yield per page view. It is probably pretty easy to get some rough numbers on what an ad view is worth on a big site if you're in the market.
and 5. It is possible to track down most IP addresses down to the visitor's country and sometimes even city. See the Geo targeting article on Wikipedia

How do you bill your web services?

In developing a new web service I haven't been able to find very much information on how companies bill for their web services.
Do you bill by request or only certain requests ie) GET or POST?
-would these be tracked at the application or server level?
Do you bill by bandwidth?
-again how would this be tracked on a per user basis
Do you charge a subscription to simply have access?
-this is assuming that they are only granted an api key after payment has been made.
A combination of the above or other options?
Thanks for your help.

As all things in a market economy, the price, but also the inconvenience (or convenience) and risk associated with the actual payment (irrespective of the amount) is a function of how unique and cool and valued your service or product is.
It is therefore impossible to answer the question but in very generic terms, i.e. in the form of suggestions. You actual invoicing model may base on one or several of the following
bill for a one-time setup fee
bill on a subscription basis (i.e. for a defined period, with explicitly defined maximum amounts of usage)
bill for maintenance
bill by the act, i.e. a certain amount (possibly on a decreasing unit price schedule). Such acts should be counted at the server level, (The client-side may include some audit/monitoring/log of sorts, but the server-side should be the authoritative source of info)
bill by volume (for example number of MBytes transfered etc.), this is applicable to services where there is a big variation in the volume of info produced for each "act".
In general, the price and the modality of accounting should seem fair, to both parties, particularly to the buyer, and typically, the simpler the better. The price should not necessarily be low, provided you can make the case that the service provided is effectively valuable, and that you either invested and took risk to introduce the service, or the on-going expenses associated with running the service are evident.

I guess It Depends™ on what the service does. Broadly, I'd say you should bill when you provide some intrinsic value; how you determine what that billing criteria is is quite domain-specific. There may be some property of the service provided which allows you to determine how much to bill.
For example, suppose you've a web service that performs a calculation. You might decide that for every successful computation you do, you're going to charge a fixed fee, say $0.01, but let users off if there's a validation problem, such as an invalid request. Alternatively, if those computations are vaguely long-running, you might have a charging model that's based on some sort of CPU-time metric.
Your point about subscriptions is a good one, and this is an area where you might potentially benefit from allowing a couple of commercial models; one to cater for the users who might perform a lot of requests per month, in which case a fixed subscription might make sense, and one to cater for users who make a few ad-hoc requests. In the latter case, of course, if you only attract those customers, then you're not going to make a good return on investment. Some kind of middle ground, whereby you have a small subscription, but then allow customers to buy a "block" or "bundle" of requests on top without incurring additional processing costs, might work.

Most webservices I know of charge for two things:
Volume of "usage". Generally giving low volumes "free" access (i.e., less than X hits/hour from a given IP address account combination). This is similar to say, twitter which gives you 150 hits/hour to its service from either your username, or unique IP or combination of the two (so you dont abuse it by changing IPs frequently). If you want a higher volume you pay for that access and its usually assigned by account (in twitters case you can get a dev account [for free] which gives you 20K or more hits an hour)
Depth of Details, Access to features. Again free accounts get a minimum amount of access, but dont get access to more data or to more advanced features (filtering, etc). Lots of google services work like this, were base access is given to everyone but if you want more refined abilities (greater search, more data, faster results) you have to buy an account code with the corresponding functionality.
I havent really seen or participated in any projects with pay-for-performance, or pay-per-hit/access models as they get very difficult to reliably bill for and very hard to account for to customers, even if you use tiered or banded ranges. How do you tell your customers how many hits they have used, especially in a distributed system, with redundant fail-over, etc. If I had to pay $0.01 cents per access I would want to know exactly how its measured, and what the company had in place to control access, and how accurate their monitoring was, etc.
Its not impossible, and definitely can be done, and may work well in large bulk scenarios.

Many of the ones I have seen bill by time, such as on a monthly or yearly basis. Some allow you to pay by the month, some require some (or all) of the fee up front. Access might be restricted by issuing a security certificate for the web service that expires when the customer's account expires, or possibly by having them send a client ID and letting the server check if that client ID is allowed to have an answer (but that's open to people stealing someone else's client ID ;) ).
I suppose if you have a service that sends and receives very large amounts of data, it might make sense to bill per service request, but the billing for that could get trickier. Are clients likely to make dozens of requests per day, or just a few? How much to bill per transaction? $100? $0.01? That all would depend on the nature of the service. If you want to go that route, you would probably need to be able to ensure that clients only get billed for requests that are successfully answered (I'd hate to get billed even though my client app failed to receive the entire web service message from your server).

Per request or as a subscription, and yes, bandwidth can be a variable that is used to set the fee. Depends of the value of binding the customer close or having a myriad of loosely coupled customers using it. There is no correct answer to the question that fits all or even most cases.
If I look at the services I have made in the past, the subscription model would be the best model to use. Sometime a tick of $ per request seems like the best approach but I have never had a service configured that way yet.

I agree with what has been said by Rob and Des. One thing to remember is that a subscription is a really simple concept that everyone is used to and comfortable with (if you price it right). If you want to cover a wide audience look at how the payment providers do - they have slightly different methods of payment depending on how many transactions you do per year. There'll be a fixed subscription plus a per-transaction charge and they both vary with the number of transactions. This is the most flexible, but it depends if it makes sense for your business.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js