Data packages? (cities, sports teams etc) - list

I am working on a project that could use a list of every city, region, country in the world. Countless websites use the same data so I think there must be some data packages containing such lists (of sports teams, of politicians etc.) somewhere on the internet.
Could you tell me what they are called? Are there any websites/sources that you would recommend? What about pictures, can I find similar resources for pictures or icons?

Some of what you may need can be found on http://data.okfn.org/data/ - e.g. list of countries, cities, regions etc.

Related

How did these big companies start from these three main points?

So this is something that Iv'e been thinking about lately, and it basically is : How did big music web apps or websites like Spotify, Youtube, or Anghami(if you know that one) start? I was actually thinking about 3 things, the first : How did they get these huge music libraries? the second : Did each of those big companies need to buy a special server to hold the website data and music Library? and if yes, how much does a special server cost in this case? and the third question is : How did they solve the copyrights with all of these creators or authors or publishers or whatever they're called, the copyrights owners in this case...?
1. They are uploaded by the artists/creators. I'd imagine pre-release Spotify would have had a library already put together by working with the artists.
2. Yes. They cost a lot. There are hundreds of millions of users and terabytes upon terabytes of data, spread around the world. Server costs will be in the millions. Starting out the upfront cost to set up infrastructure would be very high too.
3. This is definitely not the place to ask this kind of question. I would Google information on how copyrights with artists usually work

List of EU countries from webservice

I need to know if a country belongs to the European Union.
I could create a list of nations that belong to the union now, but if they change I should update all the programs because the list is static.
I would like to find a webservice that gives me this data starting (for example) by the ISO code.
But I am not able to find any similar service.
Does anyone know if there is already this service?
Thanks to everyone.
Note: i'm looking for a list of EU countries, not a list of all the countries in the European continent.
You might be able to find it on the EU website; maybe start from this page. F.ex. I know for sure they offer a free webservice to check if a VAT number exists. You might get lucky also by asking on their forum (find it in that same page).

Is there a way to retrieve all targetable cities in the Ads API?

The autocomplete API allows us to retrieve lists of all countries, regions, and locales by leaving out the query string and setting the result limit to a large number, but this feature isn't available at the city level.
Is there a way that we can retrieve a full list of all targetable cities and their IDs? If not, can we cache the autocomplete data for cities to build up such a list?
That functionality is probably not supported because of the massive amount of return data that would result in fetching all the cities in the world, even with paging. Although limiting the response data by country (by using country_list=["ca"]) and then fetching all cities doesn't sound too far-fetched, however, it is not implemented either.
To me, it sounds like you have two options.
Create a bug report using our bug tool to request a wishlist feature (doesn't guarantee anything, but at least we can track it if we choose to implement it and can serve as a way to gauge interest in the feature)
IANAL, but according to the FB Platform Policies part 2 of section 2 states
You may cache data you receive through use of the Facebook API in order to improve your application’s user experience, but you should try to keep the data up to date. This permission does not give you any rights to such data.
Which sounds like you can cache the autocomplete data since it will better improve the UX of your app, however, just remember that you do not have the rights to the data. I would be cautious about this as it would really suck if you worked really hard to get all the caching functionality built in only to have FB say that it's not allowed. I would advise with some experts some more before pursuing this path.

How exactly does sharkscope or PTR data mine all those hands?

I'm very curious to know how this process works. These sites (http://www.sharkscope.com and http://www.pokertableratings.com) data mine thousands of hands per day from secure poker networks, such as PokerStars and Full Tilt.
Do they have a farm of servers running applications that open hundreds of tables (windows) and then somehow spider/datamine the hands that are being played?
How does this work, programming wise?
There are a few options. I've been researching it since I wanted to implement some of this functionality in a web app I'm working on. I'll use PokerStars for example, since they have, by far, the best security of any online poker site.
First, realize that there is no way for a developer to rip real time information from the PokerStars application itself. You can't access the API. You can, though, do the following:
Screen Scraping/OCR
PokerStars does its best to sabotage screen/text scraping of their application (by doing simple things like pixel level color fluctuations) but with enough motivation you can easily get around this. Google AutoHotkey combined with ImageSearch.
API Access and XML Feeds
PokerStars doesn't offer public access to its API. But it does offer an XML feed to developers who are pre-approved. This XML feed offers:
PokerStars Site Summary - shows player, table, and tournament counts
PokerStars Current Tournament data - files with information about upcoming and active tournaments. The data is provided in two files:
PokerStars Static Tournament Data - provides tournament information that does not change frequently, and
PokerStars Dynamic Tournament Data - provides frequently changing tournament information
PokerStars Tournament Results - provides information about completed tournaments. The data is provided in two files:
PokerStars Tournament Results – provides basic information about completed tournaments, and
PokerStars Tournament Expanded Results – provides expanded information about completed tournaments.
PokerStars Tournament Leaders Board - provides information about top PokerStars players ranked using PokerStars Tournament Ranking System
PokerStars Tournament Leaders Board BOP - provides information about top PokerStars players ranked using PokerStars Battle Of Planets Ranking System
Team PokerStars – provides information about Team PokerStars players and their online activity
It's highly unlikely that these sites have access to the XML feed (or an improved one which would provide all the functionality they need) since PokerStars isn't exactly on good terms with most of these sites.
This leaves two options. Scraping the network connection for said data, which I think is borderline impossible (I don't have experience with this so I'm not sure; I've heard it's highly encrypted and not easy to tinker with, but I'm not sure) and, mentioned above, screen scraping/OCR.
Option #2 is easy enough to implement and, with some work, can avoid detection. From what I've been able to gather, this is the only way they could be doing such massive data mining of PokerStars (I haven't looked into other sites but I've heard security on anything besides PokerStars/Full Tilt is quite horrendous).
[edit]
Reread your question and realized I didn't unambiguously answer it.
Yes, they likely have a massive amount of servers running watching all currently running tables, tournaments, etc. Realize that there is a decent amount of money in what they're doing.
This, for instance, could be how they do it (speculation):
Said bot applications watch the tables and data mine all information that gets "posted" to the chat log. They do this by already having a table of images that correspond to, for example, all letters of the alphabet (since PokerStars doesn't post their text as... text. All text in their software is actually an image). So, the bot then rips an image of the chat log, matches it against the store, converts the data to a format they can work with, and throws it in a database. Done.
[edit]
No, the data isn't sold to them by the poker sites themselves. This would be a PR nightmare if it ever got out, which it would. And it wouldn't account for the functionality of these sites, which appears to be instantaneous. OPR, Sharkscope, etc. There are, without a doubt, applications running that are ripping the data real time from the poker software, likely using the methods I listed.
maybe I can help.
I play poker, run a HUD, look at the stats and am a software developer.
I've seen a few posts on this suggesting it's done by OCR software grabbing the screen. Well, that's really difficult and processor hungry, so a programmer wouldn't choose to do that unless there were no other options.
Also, because you can open multiple windows, the poker window can be hidden or partially obscured by other things on the screen, so you couldn't guarantee to be able to capture the screen.
In short, they read the log files that are output by the poker software.
When you install your HUD like Sharkscope or Jivaro etc, than they run client software on your PC. It reads the log files and updates its own servers with every hand you play.
Most poker software is similar, but lets start with Pokerstars, as thats where I play. The Poker software outputs to local log files for every action you/it makes. It shows your cards, any opponents cards that you see plus what you do. eg. which button you have pressed, how much you/they bet etc. It posts these updates in near real time and timestamps the log file.
You can look at your own files to see this in action.
On a PC do this (not sure what you do on a Mac, but will be similar)
1. Load File Explorer
2. Select VIEW from the menu
3. Select HIDDEN ITEMS so that you can see the hidden data files
4. Goto C:\Users\Dave\AppData\Local\PokerStars.UK (you may not be called DAVE...)
5. Open the PokerStars.log.0 file in NOTEPAD
6. In Notepad, SEARCH for updateMyCard
7. It will show your card numerically
3c for 3 of Clubs
14d for Ace of Diamonds
You can see your opponents cards only where you saw them at the table.
Here is a few example lines from the log file.
OnTableData() round -2
:::TableViewImpl::updateMyCard() 8s (0) [2A0498]
:::TableViewImpl::updateMyCard() 13h (1) [2A0498]
:::TableViewImpl::updatePlayerCard() 7s (0) [2A0498]
:::TableViewImpl::updatePlayerCard() 14s (1) [2A0498]
[2015/12/13 12:19:34]
cheers, hope this helps
Dave
I've thought about this, and have two theories:
The "sniffer" sites have every table open, AND:
Are able to pull the hand data from the network stream. (or:)
Are obtaining the hand data from the GUI (screen scraping, pulling stuff out via the GUI API).
Alternately, they may have developed/modified clients to log everything for them, but I think one of the above solutions is likely simpler.
Well, they have two choices:
they spider/grab the data without consent. Then they risk being shut down anytime. The poker site can easily detect such monitoring at this scale and block it. And even risk a lawsuit for breach of the terms of service, which probably disallow the use of robots.
they pay for getting the data directly. This saves a lot of bandwidth (e.g. not having to load the full pages, extraction, updates with html changes etc.) and makes their business much less risky (legally and technically).
Guess which one they more likely chose; at least if the site has been around for some time without being shut down every now and then.
I'm not sure how it works but I have an application id and a key- which you get as a gold or silver subscriber- sign up for a month and send them an email and you will get access and the API documentation.

How can I start with data mining for small grocery shop

My company got the project to build simple website of grocery shop with catalogue only without shop cart. Few days ago I read something about data mining from here
I found that it is possible to do some predictive modelling like
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought
diapers on Thursdays and Saturdays, they also tended to buy beer.
I told them this example and they were happy if I can do something like that.
Now don't know how to start and where to start. I know mysql database and can program complex queries as well. But I don't know how i can get the type of data like beer and diapers
I have 3-4 months left. Can anyone guide me how i can start.
I also don't know what type of data of customer shopping i can get from the shop may be excel files .
But i want to start
Judging your question, you don't seem to know much, if anything, about data mining. That being said, you can get something usable running in 4 months, especially in a very restricted domain like a web shop, where all you are after is probably buying patterns for a start.
Please understand that you cannot expct some out-of-the-box solution that can be posted here in 10 lines of code, so I suggest you start by reading a decent book on the subject. I'd recommend:
Programming Collective Intelligence: Building Smart Web 2.0 Applications