I need to know if a country belongs to the European Union.
I could create a list of nations that belong to the union now, but if they change I should update all the programs because the list is static.
I would like to find a webservice that gives me this data starting (for example) by the ISO code.
But I am not able to find any similar service.
Does anyone know if there is already this service?
Thanks to everyone.
Note: i'm looking for a list of EU countries, not a list of all the countries in the European continent.
You might be able to find it on the EU website; maybe start from this page. F.ex. I know for sure they offer a free webservice to check if a VAT number exists. You might get lucky also by asking on their forum (find it in that same page).
Related
I am working on a project that could use a list of every city, region, country in the world. Countless websites use the same data so I think there must be some data packages containing such lists (of sports teams, of politicians etc.) somewhere on the internet.
Could you tell me what they are called? Are there any websites/sources that you would recommend? What about pictures, can I find similar resources for pictures or icons?
Some of what you may need can be found on http://data.okfn.org/data/ - e.g. list of countries, cities, regions etc.
The autocomplete API allows us to retrieve lists of all countries, regions, and locales by leaving out the query string and setting the result limit to a large number, but this feature isn't available at the city level.
Is there a way that we can retrieve a full list of all targetable cities and their IDs? If not, can we cache the autocomplete data for cities to build up such a list?
That functionality is probably not supported because of the massive amount of return data that would result in fetching all the cities in the world, even with paging. Although limiting the response data by country (by using country_list=["ca"]) and then fetching all cities doesn't sound too far-fetched, however, it is not implemented either.
To me, it sounds like you have two options.
Create a bug report using our bug tool to request a wishlist feature (doesn't guarantee anything, but at least we can track it if we choose to implement it and can serve as a way to gauge interest in the feature)
IANAL, but according to the FB Platform Policies part 2 of section 2 states
You may cache data you receive through use of the Facebook API in order to improve your application’s user experience, but you should try to keep the data up to date. This permission does not give you any rights to such data.
Which sounds like you can cache the autocomplete data since it will better improve the UX of your app, however, just remember that you do not have the rights to the data. I would be cautious about this as it would really suck if you worked really hard to get all the caching functionality built in only to have FB say that it's not allowed. I would advise with some experts some more before pursuing this path.
I'm really hoping there's an existing service for something like this. I have a location (could be GPS coordinates or a street address, I can use geocoding or reverse geocoding services to switch between them) and I want to find a business that's listed as being approximately at that place.
If this service doesn't already exist, I'm thinking the best way to do what I want is to get a list of businesses close to a location, go through those and single out the closest one to the point I want, and say I'm "in" it if the distance is less than such and such.
If you have some pointers for which services I should look into (for either pinpointing one business or getting a list proximate to a location) or you think my methodology is stupid, please let me know!
edit: it's looking like the yahoo local search thing can pretty much do what I want. I'm going to start tinkering with that
Google Maps doesn't offer this yet. They do reverse geocoding from a lat/long to an address but not a business or interest.
I'm looking this up myself to see who offers this but the two I know of so far are GeoAPI (recently purchased by twitter) and SimpleGeo.
What you're looking for is Google Places which also allows you to specify the business type as well.
This is just a hunch, but have you checked out the Google Maps API?
I've got a rather large database of location addresses (500k+) from around the world. Though lots of the address are duplicates or near duplicates.
Whenever a new address is entered, I check to see if it is in the database already, and if so, i take the already existing lat/long and apply it to the new entry.
The reason I don't link to a separate table is because the addresses are not used as a group to search on, and their are often enough differences in the address that i want to keep them distinct.
If I have a complete match on the address, I apply that lat/long. If not, I go to city level and apply that, if I can't get a match there, I have a separate process to run.
Now that you have the extensive background, the problem. Occasionally I end up with a lat/long that is far outside of the normal acceptable range of error. However, strangely, it is normally just one or two of these lat/longs that fall outside the range, while the rest of the data exists in the database with the correct city name.
How would you recommend cleaning up the data. I've got the geonames database, so theoretically i have the correct data. What i'm struggling with is what is the routine you would run to get this done.
If someone could point me in the direction of some (low level) data scrubbing direction, that would be great.
This is an old question, but true principles never die, right?
I work in the address verification industry for a company called SmartyStreets. When you have a large list of addresses and need them "cleaned up", polished to official standards, and then will rely on it for any aspect of your operations, you best look into CASS-Certified software (US only; countries vary widely, and many don't offer such a service officially).
The USPS licenses CASS-Certified vendors to "scrub" or "clean up" (meaning: standardize and verify) address data. I would suggest that you look into a service such as SmartyStreets' LiveAddress to verify addresses or process a list all at once. There are other options, but I think this is the most flexible and affordable for you. You can scrub your initial list then use the API to validate new addresses as you receive them.
Update: I see you're using JSON for various things (I love JSON, by the way, it's so easy to use). There aren't many providers of the services you need which offer it, but SmartyStreets does. Further, you'll be able to educate yourself on the topic of address validation by reading some of the resources/articles on that site.
I'm working on building intelligence around link propagation, and because I need to deal with many short URL services where a reverse-lookup from an exact URL address is required, I need to be able to resolve multiple approximate versions of the same URL.
An example would be a URL like http://www.example.com?ref=affil&hl=en&ct=0
Of course, changing GET params in certain circumstances can refer to a completely different page, especially if the GET params in question refer to a profile or content ID.
But a quick parse of the page would quickly determine how similar the pages were to each other. Using a bit of machine learning, it could quickly become clear which GET params don't effect the content of the pages returned for a given site.
I'm assuming a service to send a URL and get a list of very similar URLs could only be offered by the likes of Google or Yahoo (or Twitter), but they don't seem to offer this feature, and I haven't found any other services that do.
If you know of any services that do cluster together groups of almost identical URLs in the aforementioned way, please let me know.
My bounty is a hug.
Every URL is akin an "address" to a location of data on the internet. The "host" part of the URL (in your example, "www.example.com") is a web-server, or a set of web-servers somewhere in the world. If we think of a URL as an "address", then the host could be a "country".
The country itself might keep track of every piece of mail that enters it. Some do, some don't. I'm talking about web-servers! Of course real countries don't make note of every piece of mail you get! :-)
But even if that "country" keeps track of every piece of mail - I really doubt they have any mechanism in place to send that list to you.
As for organizations that might do that harvesting themselves, I think the best bet would be Google, but even there the situation is rather grim. You see, because Google isn't the owner every web-server ("country") in the world, they cannot know of every URL that accesses that web-server.
But they can do the reverse. Since they can index every page they encounter, they can get a pretty good idea of every URL that appears in public HTML pages on the web. Of course, this won't include URLs people send to each other in chats, SMSs, or e-mails. But still, they can get a pretty good idea of what URLs exist.
I guess what I'm trying to say is that what you're looking for doesn't exist, really. The only way you can get all the URLs used to access a single website, is to be owner of that website.
Sorry, mate.
It sounds like you need to create some sort of discrete similarity rank between pages. This could be done by finding the number of similar words between two pages and normalizing the value to a bounded range then mapping certain portions of the range to different similarity ranks.
You would also need to know for each pair that you compare what GET parameters they had in common or how close they were. This information would become the attributes that define each of your instances (stored along side the rank mentioned above). After you have amassed a few hundred pairs of comparisons you could perhaps do some feature subset selection to identify the GET parameters that most identify how similar two pages are.
Of course, this could end up not finding anything useful at all as this dataset is likely to contain a great deal of noise.
If you are interested in this approach you should look into Infogain and feature subset selection in general. This is a link to my professors lecture notes which may come in handy. http://stuff.ttoy.net/cs591o/FSS.html