MaxMind's GeoIPCity for a single country only? - geoip

Recently I have stumbled upon a problem - MaxMind's GeoIPCity file is way too big for our needs and contains A LOT of data we don't need and won't need.
The question is: is there a way to limit the City database to a single country? let's say, Canadian cities only?

You cannot just conveniently download the database for Canadian cities only, but you can certainly prune the database once you have downloaded and loaded it. This is true whether you use the MaxMind DB or download the CSV format, just trim out the lines that do not represent Canada's country code or geoname_id (depending on v1 or v2 of the dataset).
If you identify your specific coding environment and language, I'm certain someone can help you write a few lines of code that chops out all the fat.

Related

Comparing data of payments

At my work we have two systems, one that collects the customers payments automatically every month. And one that manages the memberships of those customers. Sadly our outdated technology doesn’t communicate to each other so we don’t know if a customer actually paid for their membership without manually auditing them.
I’ve been put in charge of this process and boy does it take awhile to do.
I have limited knowledge of C++ and was looking into maybe writing a program to do the comparisons for me.
I have two ideas on how to implement this, and was wondering what you guys thought. If these would be best or if it’s even possible or if there’s a better solution?
Current Setup: We have a list of all members in excel, with how much each should be paying, we then go through the actual money collected and check to make sure everyone’s payment went through and was processed and not declined.
Option 1: have a multi-dimensional array of strings. Read the excel file into this array it would have three Columns, first name, last name, amount they should be paying. This would be put in alphabetical order to help with the searching. I would then export the transactions in css file format and read each line one at a time. When it reads a line it would search the array for the same first and last name. Once found it would take the amount paid confirm it said processed and not declined and if so would subtract it from the customers amount they should be paying. In the end if every customers amount they should be paying is equal to 0 then everyone paid.
Option 2: is similar to option 1 just instead of using a multidimensional array it would use two css files. And not put the items into the array at the start.
Thoughts? Is this a smart way to combat this problem? I’m a newbie programmer so I’m just looking for suggestions/advice.
Your solutions would work, but are suited for small datasets. I don't now what your constraints are, but I think that a more elegant solution would be to setup a database on the first system first(instead of the excel file).
Are you allowed to create a database? How many customers are in the excel file?

I need help in designing my C++ Console application

I have a task to complete.
There are two types of csv files 4000+ both related to each other.
2 types are:
1. Country2.csv
2. Security_Name.csv
Contents of Country2.csv:
Company Name;Security Name;;;;Final NOS;Final FFR
Contents of Security_Name.csv:
Date;Close Price;Volume
There are multiple countries and for each country multiple security files
Now I need to READ them do some CALCULATION and then WRITE the output in another files
READ
Read both the file Country 2.csv and Security.csv and extract all the data from them.
For example :
Read France 2.csv, extract Security_Name, Final NOS, Final FFR
Then Read Security.csv(which matches the Security_Name) and extract Date, Close Price, Volume
Calculation
Calculations are basically finding Median of the values extracted which is quite simple.
For Example:
Monthly Median Traded Values
Daily Traded Value of a Security ... and so on
Write
Based on the month I need to sort the output in two different file with following formats:
If Month % 3 = 0
Save It as MONTH_NAME.csv in following format:
Security name; 12-month indicator; 3-month indicator; FOT
Else
Save It as MONTH_NAME.csv in following format:
Security Name; Monthly Median Traded Value Ratio; Number of days Volume > 0
My question is how do I design my application in such a way that it is maintainable and the flow of data throughout the execution is seamless?
So first thing. Based on the kind of data you are looking to generate, I would probably be looking at moving this data to a SQL db if at all possible. This is "one SQL query" kind of stuff. And far more maintainable than C++ that generates CSV files from CSV files.
Barring that, I would probably look at using datamash and/or perl. On a Windows platform, you could do this through Cygwin or WSL. Probably less maintainable, but so much easier it's not too much of an issue.
That said, if you're looking for something moderately maintainable, C++ could work. The first thing I would do is design my input classes. Data-centric, but it can work. It sounds like you could have a Country class, a Security class, and a SecurityClose class...or something along those lines. You can think about whether a Security class should contain a collection of SecurityClosees (data), or whether the data should just be "loose" and reference the Security it belongs to. Same with the Country->Security relationship.
Once you've decided how all that's going to look, you want something (likely a function) that can tokenize a CSV line. So "1,2,3" gets turned into a vector<string> with the contents "1" "2" "3". Then, each of your input classes should have a constructor or initializer that takes a vector<string> and populates itself. You might need to pass higher level data along too. Like the filename if you want the security data to know which security it belongs to..
That's basically most of the battle there. Once you've pulled your data into sensibly organized classes, the rest should come more easily. And if you run into bumps, hopefully you can ask specific design or implementation questions from there.

AWS Machine Learning issue

I use AWS Machine Learning to predict if a tweet message is positive or negative.
I have a CSV file with about 1000 tweets (2 columns "message" TEXT and "is_postive" BINARY).
If the message contains some words that I've defined by my side, "is_positive" is set to 0 (else 1)
My issue is that evaluations always return 1 (even if I try a message with a "bad" word).
How can I have more relevant results?
Thanks for your help!
Navigate to your datasource and select your LM model. Clicking on the attributes will give you an idea of how "statistically relevant" the columns in your teaching data are. Your result is most probably due to your teaching data. Since the entire tweet message is in one column, the model is most likely looking for a correlation on all words in the sample tweets. A better model may be to use a "sentiment" library of which there are publicly available versions which would shift your model to look at each word in the tweet vs. the tweet as a whole as yours currently is.

Getting stocks by industry via Yahoo Finance

i want to list all available industries ( like: http://biz.yahoo.com/p/ ) and show all corresponding stocks.
Until now I'm using YAHOO.Finance.SymbolSuggest.ssCallback for the symbol suggestion and http://finance.yahoo.com/d/quotes.csv?s=... for getting the stock's data.
Does anyone have any idea how to get all industries and corresponding stocks?
Is there another hidden Yahoo API?
Lists of all available industries are called GICS Sectors for Standard and Poor's (S&P500 will use that) and ICB for Dow Jones and FTSE. Hence it used by Nasdaq, Nyse and others markets.
It seems like Yahoo uses a third industry classification by Morning Star, but since I'm not quite sure I will give both ways of retrieving data.
Morning Star
I don't know if Yahoo really sticks to this classification, but some names were really close so let's see it:
You need to go to their Index Data and in each sector, click on it and then at the bottom View complete index holdings.
It's not as precise as in Yahoo industry list, but it's all you can do with Morning Star. Not very convincing, I know...
GICS Sectors
GICS Sectors are now a trademark of Standard and Poor's and then data have to be sought for in S&P's website.
Short answer: take a look at this page, you will need to be registered (it's free and easy) and you can download spreadsheets (xls) with stocks and corresponding sectors. Nevertheless, things aren't always easy, and you will have to do a bit of a search to retrieve all stocks with their corresponding industries. For example, the file INDICATED_RATE_CHANGE.xls will give you some companies and their sectors in each month of 2012. Using that and SP500_DividendAristocrats_2012.xls you should be able to retrieve at least a large part of S&P 500 companies.
ICB
ICB is used by NYSE, NASDAQ etc... Then it's a lot simpler than S&P and MorningStar. Here is your answer. BOOM! Direct link!
Link is dead :(
Finally
I strongly advise you to use the simpler and most-used industry classification index: the ICB. It will always be available and publicly displayed since millions of investors relay everyday on it, without having to use S&P financial services or MorningStar brokerage services...
EDIT
You can look at nasdaq.com to retrieve all companies and their corresponding sector: here for Nasdaq and here for Nyse
Get all industry-IDs from here:
http://biz.yahoo.com/ic/ind_index.html
(look at the links)
Then use YQL ( https://developer.yahoo.com/yql/console/ )
with a query like this:
select * from yahoo.finance.industry where id=912

How can I Implement Yelp's Map Search and Search by zip code or by city name

How can I implement yelp like search?
There are 2 types of searches on yelp.
Simple search using the zip code, city and state in U.S.
I'm using PostgreSQL and wonder if there is good dataset that I can use that has city, state and zip code. I was hoping to find a good geo shape file and use geoDjango where I can just use, say Store.objects.filter(coordinates__in=cityNameORZipCode).
There seem to be some zip code database that I can use, but I really don't know where I can find a good city, state. The last option is to create my own cityname and state table and link to Stores, but not sure if this is smart thing to do.....hm.
Yelp has map search.
If you zoom in or out the google map, it searches local businesses according to the map area you are viewing. Think this is amazing. How can I do this?
It's looking dark right now. Please shed me some light.
You're asking a very broad and unanswerable question, but a good place to start for data in the U.S. is at the Census Bureau. For example:
State and State Equivalent Areas
County and County Equivalent Areas
The full list:
http://www.census.gov/geo/www/cob/bdy_files.html