Couldn't found geoname_id for North America in Geolite-city.csv / Geolite-country.csv - geoip

We are using GeoLite2-City.mmdb for parsing IP addresses. After parsing IP address "8.8.8.8", the result was:
{"continent":{"code":"NA","names":{"de":"Nordamerika","ru":"Северная Америка","pt-BR":"América do Norte","ja":"北アメリカ","en":"North America","fr":"Amérique du Nord","zh-CN":"北美洲","es":"Norteamérica"},
"geoname_id":6255149}
Unfortunately, I couldn't find this continent by geoname_id, was searching in:
1) GeoLite2-Country-Locations-en.csv
2) GeoLite2-City-Locations-en.csv
All files were recently downloaded. So there is could be a lack of data in csv file compare to mmdb?

On the csv from 2020/01/14, I got for 8.8.8.8 geoname_id = 6252001, which is (according to GeoLite2-Country-Locations-en.csv):
6252001,en,NA,"North America",US,"United States",0

Related

Finding string around pattern of Strings

Regular expression(May be) to find the word/string surrounded by other words.
===========================================================================
For example I have below sentences
1.I’m setting up a new server, The key is ABC and want to support UTF-8 fully in my web application. Where do I need to set the encoding/charsets?”
2.XYZ is the key for the new server I am setting and it is located at address 111 abc
3.key as of the date is WWW for the new server I am setting at 111, ABC London
4.The key for server is LMN and it is being setup at location 111, abc London.
key will be finite and will only have around 10 values. The value for key itself can be any form though. I have used ACB, XYZ, WWW, LMN as example above
I should be able to identify that Key exists in the sentence and extract value(ACB, XYZ, WWW, LMN) from all the above examples.
I have basically tried finding using if then else which is very cumbersome and dont have very good code to show yet. But will update when I can
I have basically tried finding using if then else which is very cumbersome and dont have very good code to show yet. But will update when I can
I should be able to identify that Key exists in the sentence and extract value(ACB, XYZ, WWW, LMN) from all the above examples.
Another option could be to use Spacy with dependency parsing
Any help will be greatly appreciated
This expression is likely to return the desired output, not sure though:
^(?=.*\b(ABC|XYZ|WWW|LMN)\b).*$
DEMO
Test
import re
regex = r"^(?=.*\b(ABC|XYZ|WWW|LMN)\b).*$"
test_str = """
1.I’m setting up a new server, The key is ABC and want to support UTF-8 fully in my web application. Where do I need to set the encoding/charsets?”
2.XYZ is the key for the new server I am setting and it is located at address 111 abc
3.Key as of the date is WWW for the new server I am setting at 111, ABC London
4.The key for server is LMN and it is being setup at location 111, abc London.
"""
print(re.findall(regex, test_str,re.M))
Output
['ABC', 'XYZ', 'ABC', 'LMN']
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

How to load specific columns with varying location from a text file in python?

I'm trying to read the discharge data of 346 US rivers stored online in textfiles. The files are more or less in this format:
Measurement_number Date Gage_height Discharge_value
1 2017-01-01 10 1000
2 2017-01-20 15 2000
# etc.
I only want to read the gage height and discharge value columns.
The problem is that in most files additional columns with metadata are added in front of the 'Gage height' column, so i can not just simply read the 3rd and 4th column because their index varies.
I'm trying to find a way to say 'read the columns with the name 'Gage_height' and 'Discharge_value'', but I haven't succeeded yet.
I hope anyone can help. I'm currently trying to load the text files with numpy.genfromtxt so it would be great to find a solution with that package but other solutions are also more than welcome.
This is my code so far
data_url=urllib2.urlopen(#the url of this specific site)
data=np.genfromtxt(data_url,skip_header=1,comments='#',usecols=2,3])
You can use the names=True option to genfromtxt, and then use the column names to select which columns you want to read with usecols.
For example, to read 'Gage_height' and 'Discharge_value' from your data file:
data = np.genfromtxt(filename, names=True, usecols=['Gage_height', 'Discharge_value'])
Note that you don't need to set skip_header=1 if you use names=True.
You can then access the columns using their names:
gage_height = data['Gage_height'] # == array([ 10., 15.])
discharge_value = data['Discharge_value'] # == array([ 1000., 2000.])
See the docs here for more information.

BadDataError when editing a .dbf file using dbf package

I have recently produced several thousand shapefile outputs and accompanying .dbf files from an atmospheric model (HYSPLIT) on a unix system. The converter txt2dbf is used to convert shapefile attribute tables (text file) to a .dbf.
Unfortunately, something has gone wrong (probably a separator/field length error) because there are 2 problems with the output .dbf files, as follows:
Some fields of the dbf contain data that should not be there. This data has "spilled over" from neighbouring fields.
An additional field has been added that should not be there (it actually comes from a section of the first record of the text file, "1000 201").
This is an example of the first record in the output dbf (retrieved using dbview unix package):
Trajnum : 1001 2
Yyyymmdd : 0111231 2
Time : 300
Level : 0.
1000 201:
Here's what I expected:
Trajnum : 1000
Yyyymmdd : 20111231
Time : 2300
Level : 0.
Separately, I'm looking at how to prevent this from happening again, but ideally I'd like to be able to repair the existing .dbf files. Unfortunately the text files are removed for each model run, so "fixing" the .dbf files is the only option.
My approaches to the above problems are:
Extract the information from the fields that do exist to a new variable using dbf.add_fields and dbf.write (python package dbf), then delete the old incorrect fields using dbf.delete_fields.
Delete the unwanted additional field.
This is what I've tried:
with dbf.Table(db) as db:
db.add_fields("TRAJNUMc C(4)") #create new fields
db.add_fields("YYYYMMDDc C(8)")
db.add_fields("TIMEc C(4)")
for record in db: #extract data from fields
dbf.write(TRAJNUMc=int(str(record.Trajnum)[:4]))
dbf.write(YYYYMMDDc=int(str(record.Trajnum)[-1:] + str(record.Yyyymmdd)[:7]))
dbf.write(TIMEc=record.Yyyymmdd[-1:] + record.Time[:])
db.delete_fields('Trajnum') # delete the incorrect fields
db.delete_fields('Yyyymmdd')
db.delete_fields('Time')
db.delete_fields('1000 201') #delete the unwanted field
db.pack()
But this produces the following error:
dbf.ver_2.BadDataError: record data is not the correct length (should be 31, not 30)
Given the apparent problem that there has been with the txt2dbf conversion, I'm not surprised to find an error in the record data length. However, does this mean that the file is completely corrupted and that I can't extract the information that I need (frustrating because I can see that it exists)?
EDIT:
Rather than attempting to edit the 'bad' .dbf files, it seems a better approach to 1. extract the required data to a text from the bad files and then 2. write to a new dbf. (See Ethan Furman's comments/answer below).
EDIT:
An example of a faulty .dbf file that I need to fix/recover data from can be found here:
https://www.dropbox.com/s/9y92f7m88a8g5y4/p0001120110.dbf?dl=0
An example .txt file from which the faulty dbf files were created can be found here:
https://www.dropbox.com/s/d0f2c0zehsyy8ab/attTEST.txt?dl=0
To fix the data and recreate the original text file, this snippet should help:
import dbf
table = dbf.Table('/path/to/scramble/table.dbf')
with table:
fixed_data = []
for record in table:
# convert to str/bytes while skipping delete flag
data = record._data[1:].tostring()
trajnum = data[:4]
ymd = data[4:12]
time = data [12:16]
level = data[16:].strip()
fixed_data.extend([trajnum, ymd, time, level])
new_file = open('repaired_data.txt', 'w')
for line in fixed_data:
new_file.write(','.join(line) + '\n')
Assuming all your data files look like your sample (the big IF being the data has no embedded commas), then this rough code should help translate your text files into dbfs:
raw_data = open('some_text_file.txt').read().split('\n')
final_table = dbf.Table(
'dest_table.dbf',
'trajnum C(4); yyyymmdd C(8); time C(4); level C(9)',
)
with final_table:
for line in raw_data:
fields = line.split(',')
final_table.append(tuple(fields))
# table has been populated and closed
Of course, you could get fancier and use actual date, and number fields if you want to:
# dbf string becomes
'trajnum N; yyyymmdd D; time C(4), level N'
#appending data loop becomes
for line in raw_data:
trajnum, ymd, time, level = line.split(',')
trajnum = int(trajnum)
ymd = dbf.Date(ymd[:4], ymd[4:6], ymd[6:])
level = int(level)
final_table.append((trajnum, ymd, time, level))

GeoIP database that will Geo target by US state

I am looking for a GEO IP database (Similar to MaxMind GeoLite2 Country and City) that will allow me to identify the US state that the user is coming from in order to target specific content to that user.
Does anyone know how or where I could find such a database/service or solution?
Don't expect high accuracy, unless you're satisfied with country/city precision. It is after all IP based geolocation data and the accuracy varies. It is based on ISP providers or companies that manages the databases(commercial databases may have higher accuracy). Look at an IP location info webtool ( like http://geoipinfo.org/ ) and you'll see approximately where it finds you, and it also provided accuracy at city and country levels - percentage wise. It used the ip2location database for lookups and their precision data.
This thread is old, but as of today, I used http://api.ipstack.com and it's working perfectly. They have a VERY extensive help examples on their site, but basically you make the call, parse the data and get what you want.
First, be sure you have any/all includes (Namespace=System.Xml, System.Net, blah, blah).
Second, be sure not to test on a private network IP (192.168.x.x or 10.x.x.x) because that will always return blank/empty fields and you'll think something is coded wrong.
Third, you will need an Acess_Key from ipstack.com ... you can setup a FREE account (10,000 requests a month I think) and get your access code to enter into your string below for API call. I filled out the form and was up and running in 10 minutes for free.
This worked for me to track visitors to any page:
string IP = "";
string strHostName = "";
string strHostInfo = "";
string strMyAccessKeyForIPStack = "THEYGIVEYOUTHISWHENYOUSETUPFREEACCOUNT";
strHostName = System.Net.Dns.GetHostName();
IPHostEntry ipEntry = System.Net.Dns.GetHostEntry(strHostName);
IPAddress[] addr = ipEntry.AddressList;
IP = addr[2].ToString();
XmlDocument doc = new XmlDocument();
string strMyIPToLocate = "http://api.ipstack.com/" + IP + "?access_key=strMyAccessKeyForIPStack&output=xml";
doc.Load(strMyIPToLocate);
XmlNodeList nodeLstCity = doc.GetElementsByTagName("city");
XmlNodeList nodeLstState = doc.GetElementsByTagName("region_name");
XmlNodeList nodeLstZIP = doc.GetElementsByTagName("zip");
XmlNodeList nodeLstLAT = doc.GetElementsByTagName("latitude");
XmlNodeList nodeLstLON = doc.GetElementsByTagName("longitude");
strHostInfo = "IP is from " + nodeLstCity[0].InnerText + ", " + nodeLstState[0].InnerText + " (" + nodeLstZIP[0].InnerText + ")";
// Then I do what you want with strHostInfo, I put it in a DB myself, but whatever.

Regexp: Grouping in a group

I am trying to match output from Nmap command using Regexp. There can be two different formats.
1st format (when nmap can find hostname)
Nmap scan report for 2u4n32t-n4 (192.168.2.168)
and 2nd format (without hostname)
Nmap scan report for 192.168.2.1
I want to capture both hostname and ipaddress and if there is no hostname just get Ip as the hostname as in the 2nd format.
What I was trying in Regexp so far in golang is
Nmap scan report for\\s+([^[:space:]]+)(\\s+\\(([^[:space:]]+)\\))?
But what I got as a result in golang
The 1st format ( it gave me (192.168.2.168) which I don't want ) is as follow:
[Nmap scan report for 2u4n32t-n4 (192.168.2.168), 2u4n32t-n4 , (192.168.2.168) , 192.168.2.168]
and the 2nd format ( it gave me which I don't want ) is as follow:
[Nmap scan report for 192.168.2.1, 192.168.2.1, ]
What to do correctly?
how about this:
Nmap scan report for\s+(?P<hostname>.*?)[\s,]
see it in action