I have 2 files, file1.txt - which has 100's of IP Address line by line and on my second file(file2.txt), I have an entry ip_address which need to replaced by the actual ip address from the file1. How to do it in Python.
Your help is much appreciated
Eg:
less File1.txt
10.10.10.1
10.10.20.1
10.20.10.10 etc
less File2.txt
[/tmp/test/ip_address]
whitelist = *
I am looking for my output to be like this:
[/tmp/test/10.10.10.1]
whitelist = *
[/tmp/test/10.10.20.1]
whitelist = *
[/tmp/test/10.20.10.10]
whitelist = *
etc.
Using a simple iteration.
Ex:
with open("File1.txt") as infile, open("File2.txt", "w") as outfile:
for line in infile: #iterate each line
outfile.write("[/tmp/test/{}]\n whitelist = *\n\n".format(line.strip())) #Write content to file
I'm working with some legacy code that I can't change (for reasons).
It uses fnmatch.fnmatch to filter a list of paths, like so (simplified):
import fnmatch
paths = ['a/x.txt', 'b/y.txt']
for path in paths:
if fnmatch.fnmatch(path, '*.txt'):
print 'do things'
Via configuration I am able to change the pattern used to match the files. I need to exclude everything in b/, is that possible?
From reading the docs (https://docs.python.org/2/library/fnmatch.html) it does not appear to be, but I thought asking was worth a try.
From the fnmatch.fnmatch documentation:
Patterns are Unix shell style:
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any char not in seq
When I run:
for path in paths:
if fnmatch.fnmatch(path, '[!b]*'):
print path
I get:
a/x.txt
Somehow this method works for alphabet just after "!'
for example in my case from the list col_names
['# Spec No', 'Name', 'Date (DD/MM/YYYY)', 'Time (hh:mm:ss)', 'Year',
'Fractional day', 'Fractional time', 'Scans', 'Tint', 'SZA',
'NO2_UV.RMS', 'NO2_UV.RefZm', 'NO2_UV.RefNumber', 'NO2_UV.SlCol(bro)',
'NO2_UV.SlErr(bro)', 'NO2_UV.SlCol(ring)', 'NO2_UV.SlErr(ring)',
'NO2_UV.SlCol(HCHO)', 'NO2_UV.SlErr(HCHO)', 'NO2_UV.SlCol(O4)',
'NO2_UV.SlErr(O4)', 'NO2_UV.SlCol(O3a)', 'NO2_UV.SlErr(O3a)',
'NO2_UV.SlCol(O3223k)', 'NO2_UV.SlErr(O3223k)', 'NO2_UV.SlCol(NO2)',
'NO2_UV.SlErr(NO2)', 'NO2_UV.SlCol(no2a)', 'NO2_UV.SlErr(no2a)',
'NO2_UV.Offset (Constant)', 'NO2_UV.Err(Offset (Constant))',
'NO2_UV.Offset (Order 1)', 'NO2_UV.Err(Offset (Order 1))',
'NO2_UV.Shift(Spectrum)', 'NO2_UV.Stretch(Spectrum)1',
'NO2_UV.Stretch(Spectrum)2', 'HCHO.RMS', 'HCHO.RefZm', 'HCHO.RefNumber',
'HCHO.SlCol(bro)', 'HCHO.SlErr(bro)', 'HCHO.SlCol(ring)',
'HCHO.SlErr(ring)', 'HCHO.SlCol(HCHO)', 'HCHO.SlErr(HCHO)',
'HCHO.SlCol(O4)', 'HCHO.SlErr(O4)', 'HCHO.SlCol(O3a)',
'HCHO.SlErr(O3a)', 'HCHO.SlCol(O3223k)', 'HCHO.SlErr(O3223k)',
'HCHO.SlCol(NO2)', 'HCHO.SlErr(NO2)', 'HCHO.Offset (Constant)',
'HCHO.Err(Offset (Constant))', 'HCHO.Offset (Order 1)',
'HCHO.Err(Offset (Order 1))', 'HCHO.Shift(Spectrum)',
'HCHO.Stretch(Spectrum)1', 'HCHO.Stretch(Spectrum)2', 'Fluxes 318',
'Fluxes 330', 'Fluxes 390', 'Fluxes 440']
I wanted to search all the names that did not contain NO2_UV.
If I do
header_hcho = fnmatch.filter(col_names, '[!NO2_UV.]*');
it excludes the second element that is "Name"., because it starts with N. And the result is the same as if i do
header_hcho = fnmatch.filter(col_names, '[!N]*');
So, I went by rather an old-school method
header_hcho = []
idx=0
for idx in range(0, len(col_names)):
if col_names[idx].find("NO2_UV") == -1:
header_hcho.append(col_names[idx])
idx=idx+1
I need to parse a file that contains flat text and extract both valid ip addresses and obfuscated ip addresses.
(ie. 192.168.1[.]1 or 192.168.1(.)1 or 192.168.1[dot]1 or 192.168.1(dot)1 or 192 . 168 . 1 . 1)
Once the data is extracted I need to convert them all to valid format and remove duplicates.
My current code places the ip addresses into a string, which should be a dict? I know I need to use some kind of recursion to set the key value, but I feel there is a more efficient and modular way to complete the task.
import json, ordereddict, re
# define the pattern of valid and obfuscated ips
pattern = r"((([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])[ (\[]?(\.|dot)[ )\]]?){3}([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]))"
# open data file that contains ip addresses and other text
with open ("sample.txt", "r") as myfile:
text=myfile.read().replace('\n', '')
# put non normalized ip addresses in a dictionary
ips = {"data": [{"key1": match[0] for match in re.findall(pattern, text) }]}
# normalized ip addresses
for name, datalist in ips.iteritems():
for datadict in datalist:
for key, value in datadict.items():
if value == "(dot)":
datadict[key] = "."
if value == "[dot]":
datadict[key] = "."
if value == " . ":
datadict[key] = "."
if value == " .":
datadict[key] = "."
if value == ". ":
datadict[key] = "."
# write valid ip address to json file
with open('test.json', 'w') as outfile:
json.dump(ips, outfile)
Sample data file
These are valid ip addresses 192.168.1.1, 8.8.8.8
These are obfuscated 192.168.2[.]1 or 192.168.3(.)1 or 192.168.1[dot]1
192.168.1[dot]1 or 192.168.1(dot)1 or 192 .168 .1 .1 or 192. 168. 1. 1. or 192 . 168 . 1 . 1
This is what an invalid ip address looks like, they should be excluded 256.1.1.1 or 500.1.500.1 or 192.168.4.0
Expected result
192.168.1.1, 192.168.2.1, 192.168.3.1 , 8.8.8.8
I am writing a script to print all IPs in CIDR notaion, but I do not want to print first and last IPs as they are not usable.
from netaddr import IPNetwork
ipc = raw_input('Enter The IP Range ')
n = 0
for ip in IPNetwork(ipc):
n = n + 1
print '%s' % ip
print 'Total No of IPs are ' + str(n)
This means that if I give 12.110.34.224/27 I should get 30 IPs as result, removing first and last IPs as /27 means 32 IPs.
That should do it.
for ip in list(IPNetwork(ipc))[1:-1]:
I have file which has data in lines as follows:
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour', 'The Smashing Pumpkins', 'Warner Bros. Entertainment','This is a good Beer]
['Voices Inside', 'Expressivista', 'The Kentucky Fried Movie', 'The Bridges of Madison County']
and so on. I want to re-write the data into a file which has lines with tokens with words less than 3 or some other number. e.g.:
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour']
['Voices Inside', 'Expressivista']
this is what I have tried so far:
for line in open(file):
line = line.strip()
line = line.rstrip()
prog = re.compile("([a-z0-9]){32}")
if line:
line = line.replace('"', '')
line = line.split(",")
if re.match(prog, line[0]) and len(line)>2:
wo=[]
for words in line:
word=words.split()
if len(word)<3:
print word.append(word)
But the output says None. Any thoughts where I am making a mistake?
A better way to do what you're doing is to use ast.literal_eval, which automagically converts string representations of Python objects (e.g. lists) into actual Python objects.
import ast
# raw data
data = """
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour', 'The Smashing Pumpkins', 'Warner Bros. Entertainment','This is a good Beer']
['Voices Inside', 'Expressivista', 'The Kentucky Fried Movie', 'The Bridges of Madison County']
"""
# set threshold number of tokens
threshold = 3
# split into lines
lines = data.split('\n')
# parse non-blank lines into python lists
lists = [ast.literal_eval(line) for line in lines if line]
# for each list, keep only those tokens with less than `threshold` tokens
result = [[item for item in lst if len(item.split()) < threshold]
for lst in lists]
# show result
for line in result:
print(line)
Result:
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour']
['Voices Inside', 'Expressivista']
I think the reason your code isn't working is that you're trying to match line[0] against your regex prog - but the problem is that line[0] isn't 32 characters long for either of your lines, so your regex won't match.