How can I compile a GeoLite CSV file into MMDB again? - geoip

I have made a few corrections to location names in a GeoLite2 CSV file.
My site only retrieves locations from the MMDB file, so how can I compile back the changed CSV file into the MMDB binary again.
I searched everywhere for a solution but can't find it.
Thanks for any tip.
Carlos

Currently there are only 2 open source MMDB file writers:
MaxMind::DB::Writer (Perl language)
Go MaxMind DB Writer (Go language)
The second one unfortunately has only a subset of the features available for the Perl one, but it should be enough for writing a program that creates the MMDB file reading line by line the CSV one and creating the mmdbtype instances.

You can check out our mmdbctl utility tool.
To convert a CSV file to an MMDB file use the import command:
$ mmdbctl import --in data.csv --out data.mmdb
Instructions, features, and documentation are available here: github.com/ipinfo/mmdbctl.
Right now it only supports string data types, and not nested data types. See this issue for more information.

Related

SAS Enterprise Guide - How can I download a ZIP file from a website, extract its contents and run it?

I would like to have a task to download a continuously updated .ZIP data file from a specific website, extract its contents and run the file inside.
I am looking for a program code that performs these tasks and, so to speak, if the data on the website is updated, then my data file is also updated with it.
How can I do it?
Please help!
SAS Enterprise Guide 8.2
I didn't find a solution for it.
Read Chris method here for reading a zip file using filename statement: https://blogs.sas.com/content/sasdummy/2014/01/29/using-filename-zip/
Chris is the community director at SAS (something like that). A zip file will be locked on an update. Check the file date and store the date somewhere so you can see if it changed.
There are lots of ways to approach your problem and I dont know your constraints. Start with Chris' post and work from there.
You could download the zip using proc http
This macro will let you unzip it: https://core.sasjs.io/mp__unzip_8sas.html
And this macro will give you the recursive directory contents: https://core.sasjs.io/mp__tree_8sas.html

Python reading file without file format software

I've written a code in python that opens and reads a file that is saved as an .hdf5 format. If I don't have an hdf viewer installed on my computer can this code still run?
I'm not using it to open the file so I can look at it, I'm trying read the file & extract data before manipulating it in python. Probably a silly question but I'm very new to this coding thing. Would my code be unable to open a file type that can not be opened on my computer?
Cheers,
Claire
An hdf5 viewer is a program that knows how to interpret the contents of an hdf5 file — much like the program you are trying to write. So your program would at least need to include code (most likely in the form of a module) that knows how to do that.
I'm not very familiar with hdf5. But I think you couldn't manipulate it directly if the file is binary since it's not human readable and you could use something like viewer to "decode" those binary data.
According to this though, I think the answer is you don't have to have the hdf5 viewer to run your code.
So the python code has h5py but the code itself does not need the computer I'm working on to have the hdfviewer software.
It is still capable of running the code without opening the file with the hdfviewer.

Google Cloud Dataflow (Python): function to read from and write to a .csv file?

I am not able to figure out the precise functions in GCP Dataflow Python SDK that read from and write to csv files (or any non-txt files for that matter). For BigQuery, I have figured out the following functions:
beam.io.Read(beam.io.BigQuerySource('%Table_ID%'))
beam.io.Write(beam.io.BigQuerySink('%Table_ID%'))
For reading textfiles, the ReadFromText and WriteToText functions are known to me.
However, I am not able to find any examples for GCP Dataflow Python SDK in which data is written to or read from csv files. Please could you provide the GCP Dataflow Python SDK functions for reading from and writing to csv files in the same manner as I have done for the functions relating to BigQuery above?
There is a CsvFileSource in the beam_utils PyPi package repository, that reads .csv files, deals with file headers, and can set custom delimiters. More information on how to use this source in this answer. Hope that helps!
CSV files are text files. The simplest (though somewhat inelegant) way of reading them would be to do a ReadFromText, and then split the lines read on the commas (e.g. beam.Map(lambda x: x.split(','))).
For the more elegant option, check out this question, or simply use the beam_utils pip repository and use the beam_utils.sources.CsvFileSource source to read from.

Best way to parse a complex log file?

I need to parse a log file that consist in many screenshot of real-time OS stdout.
In particular, every section of my log_file.txt is a text version of what appear on screen. In this machine there's not monitor, so the stdout is written on a downloadable log_file.txt.
The aim would be to create a .csv of this file for data mining purpose but I'm still wondering what could be the best method to compute this file.
I would the first csv file line with the description (string) of the value and from the second line I would the respective values (int).
I was thinking about a parser generator (JavaCC, ANTLR, etc..) but before starting with them I would get some opinions.
Thank you.
P.S.
I put a short version of my log at the following link: pastebin.com/r9t3PEgb

How to open a .dat file (ASCII)?

I tried to open a .dat file using Stata, and it actually opened, but the data set was a complete mess. I took the file from NBER (CPS data)...
click on the A icon of the year 1964 March.
I tried the regular Stata procedure for .dat files: File->Import->ASKII data created by spreadsheet (delimiter " ") as recommended in Stata manual for .dat files.
But it is still not working. Are there any other ways to open .dat file? Can I convert it to .csv somehow?
(All the data files are ASCII files compressed with the Unix compress command.)
There is a Java app to get you the data from CPS, DataFerrett This app lets you get CPS and other data sets. But it is not very efficient.
I can show you an example how to open one of them yourself (you can use it for any years in the interval 1989 till 2012).
Download the .dat file
Save it in a Desktop folder (C:\Users\Owner...)
Download corresponding .do and .dct files from here
Save them in the same folder
Open the .dat file just the way you open it in your question in Stata
Save it as a Stata .dta file in the same folder (C:\Users\Owner...)
Open the .do file (using Notepad++) that is in your (C:\Users\Owner...) folder
At the very beginning you will see the author presctibes local variables for the paths of .dta, .dat and .dct files. Change the paths so that they point to the saved .dta, .dat and .dct files in your folder (C:\Users\Owner...) on your Desktop
Reopen Stata, and run the .do file from your folder (C:\Users\Owner...)
Done! Save the .dta file
Now, for the years 1962 to 1988, you can do the same procedure (10 steps) as I explained above, but unfortunately NBER does not provide the .do and .dct files. It means that you have to write them yourself. Take one of the available .do and .dct files from any of the years (1989 - 2012) as a benchmark, and write your own .do and .dct files. You will have to make corrections so that the new .do and .dct files are consistent with the corresponding .pdf documentation for each year. I know it is very tideous, but this is the only way you can handle it.
We need more information.
".dat" is not an extension that is special so far as Stata is concerned. Perhaps you meant .dta.
Even if so, what file was it, what command did you use and what was wrong?
The page you linked to leads to numerous files. We have not a hope of guessing which you mean.
Spelling is "Stata".
might not save you from spending days digging into that data but here's some ideas:
the file contains 2 completely different kinds of lines. this might be the reason why you can't import them. you can see this by opening the unzipped file in a text editor. you have to find out what that means.
what do you want to obtain from this file? according to the pdf it contains 85 different values per record. do you need them all? if you're only interested in a few values you could extract them in a unix shell.