Python Save rest of line after string in doc - python-2.7

I have a word doc named a.doc formatted:
Name - Bob
Hair color - Red
Age - 28
...
I'd like to save the information after "Name - " "Hair color - " ... into a variable for access later in the script. Would the easiest way be to create a list:
Keywords = (Name, 'Hair color', Age)
Fileopen = open(a.doc)
Filecontent = readlines(fileopen)
For keywords in filecontent:
This is where I get stuck. I'm thinking I can add a statement allowing to grab after the " - " in each line.
EDIT:
To be more precise in my explanation of what I am looking to do:
I would like to grab the information in each line separately after the ' -
' and store it in a variable. For example Name - Bob will be stored in name equaling 'Bob'.
I have made some progress here since my previous update. I just know the way I am doing it does not allow for easily repeating.
I have successfully pulled the information utilizing:
filename = raw_input("choose your file: ")
print "you chose: %r" % filename
with open(filename) as fo:
for line in fo:
if "Name" in line: name = line.split(" - ", 1)[1]
print name
fo.close()
I know that I can continue to make a new 'if' statement for each of my strings I'd like to pull, but obviously that isn't the fastest way.
My REAL question:
How to make that if statement into a loop that will check for multiple strings and assign them to separate variables?
In the end I am really just looking to use these variables and reorder the way they are printed out which is why I need them separated. I attempted to use the 'keywords' but am not sure how to allow that to dynamically define each to a variable that I would like. Should I add them to a list or a tuple and subsequently call upon them in that manner? The variable name obviously has no meaning outside the program so if I called it from a tuple as in [0], that might work as well.

This code asks for the name, age, and hair color of the person, then returns the person's information while storing the information in the variable Filecontent and is stored until you close the shell:
def namesearch(Name, Hair, Age):
Keywords = ('Name - ' + Name + ', Hair Color - ' + Hair \
+ ', Age - ' + Age)
Fileopen = open('a.doc', 'r')
for line in Fileopen:
if Keywords in line:
global Filecontent
Filecontent = line
print line
Name = raw_input('Enter the person\'s name: ')
Hair = raw_input('Enter the person\'s hair color: ')
Age = raw_input('Enter the person\'s age: ')
namesearch(Name, Hair, Age)
This code returns the information in this format:
Name - (Name), Hair Color - (Hair Color), Age - (Age).
Note: This code can only search for names, not add them

Related

Python v3 inconsistent regex match returns

I'm writing a small python script which takes a log file, matches strings within it and saves them and another custom string "goal " to another text file. Then I take some values from the second file and add them to a list. The problem is that depending on the length of the custom string (e.g. "goalgoalgoal ") the lists with the values varies in length - currently, I'm working with a log file which includes 1031 matches of the string "goal ", but the length of list varies from everything between ~980 and 1029.
Here is the code:
for line in inputfile:
if "Started---" in line:
startTime = line[11:23]
testfile.write("\n"+"Start"+"\n"+"goal "+ startTime+"\n")
counterLines +=1
elif "done!" in line:
testfile.write("\n"+find_between(line, "| ", "done!")+"\n")
elif "Errors:" in line:
testfile.write("\n"+"Errors:"+line.split("Errors:",1)[1]+"\n")
elif "Warnings:" in line:
testfile.write("\n"+"Warnings:"+line.split("Warnings:",1)[1]+"\n")
elif "Successes:" in line:
testfile.write("\n"+"Successes:"+line.split("Successes:",1)[1]+"\n")
elif "END---" in line:
endTime = line[11:23]
testfile.write("\n"+"End"+"\n"+"endTime "+ endTime+"\n")
else:
print("nothing found")
testfileread = open(filePath+"\\testFile.txt", "r")
startTimesList = []
endTimesList = []
for line in testfileread:
matchObj = re.match(r'goal', line)
if matchObj:
startTimesList.append(line)
print(len(startTimesList))
Do you have ideas why the code doesn't work as expected?
Thank you in advance!
Most probably it's due to the fact that you don't flush testFile.txt after writing is completed - as a result, there is unpredictable amount of data in the file when you start reading it. Calling testfile.flush() should fix the problem. Alternatively, wrap the writing logic in a with block.

I need to change the name of an text documente through a cicle

I need that this code create text documents but i need that the name change every time i create a new text document, an example will be that when execute the code it will create a text document named " aventura-1.txt " then why execute again and the name will be " aventura-2.txt" so on too " aventura-n.txt" How i can do that.?
sorry for my bad english btw, this is the code which i have.
import os
def adventure_path ( nombre_aventura) :
if not os. path . isdir (" aventuras ") :
os. mkdir (" aventuras ")
return " aventuras/ %s" %nombre_aventura
archivo = open ( adventure_path ("aventura-n.txt") ,"w")
print "hi"
print "bye"
archivo . close ()
You may use this code that I wrote once and tweaked a little for you...
It will:
check each file in a specific folder
split the names of these files at "." and "-", so we'll be able to catch only the number part of your .txt name (i'm assuming you'll have a specific folder that contains ONLY "aventura-n.txt" named like files)
add this number to a list
get the max value of the numbers list
add +1 to get the next version number
Code:
import os
nums = []
for item in os.listdir('path_to_folder'):
a = item.split(".")[0]
b = a.split("-")[1]
c = int(b)
nums.append(c)
next_num = max(nums) + 1
print next_num

Python 2.7 csv read, modify then write with dict?

Ok I acknowledge that my question might duplicate this one but I have going to ask anyways 'cause although the ultimate goals are similar, the python code in use seems quite different.
I often have a list of students to create user accounts for. For this, I need to generate UserId's of the format
`Lastname[0:6].capitalize() + Firstname[0].capitalize()`
or six characters from the last name and First initial. I'd like to automate this with a python script reading from one .csv file containing firstname / lastname and writing firstname lastname userid to a different csv.
Here is my code, which almost works but I am having difficulty with the write rows to .csv part at the end:
import csv
input_file = csv.DictReader(open("cl.csv"))
index=0
fldnms=input_file.fieldnames
fldnms.append('UserName')
print fldnms
for row in input_file:
index+=1
UserID=(row["Last"][0:6].capitalize() + row["First"][0].capitalize())
row['UserName'] = UserID
print index, row["Last"], row["First"], row["UserName"]
with open("users.csv",'wb') as out_csv:
dw = csv.DictWriter(out_csv, delimiter=',', fieldnames=fldnms)
dw.writerow(dict((fn,fn) for fn in fldnms))
for row in input_file:
dw.writerow(row)
Advice / thoughts welcomed.
Thanks,
Brian H.
I went back to this after a good nights sleep and fwiw, here is the working version:
'''
Reads cl.csv (client list) as firstname lastname list with header
Writes users.csv as lastname firstname userid list w/o header
'''
import csv
INfile=open("..\cl_old.csv")
input_file = csv.DictReader(INfile, delimiter=' ')
fldnms={}
#print type (fldnms)
fldnms= input_file.fieldnames
fldnms.append('UserName')
#print type (fldnms)
#print (fldnms["Last"],fldnms["First"],fldnms["UserName"])
index =0
OUTfile=open("users.csv",'wb')
dw = csv.DictWriter(OUTfile, delimiter=',', fieldnames=fldnms)
dw.writerow(dict((fn,fn) for fn in fldnms))
for row in input_file:
index+=1
UserID=(row["Last"][0:6].capitalize() + row["First"][0].capitalize())
row['UserName'] = UserID
print index, row["Last"], row["First"], row["UserName"]
dw.writerow(row)
INfile.close()
OUTfile.close()
cl.csv contains a list of first, last name pairs. Results are stored in users.csv as names, userid.
I did this as an exercise in python as excel will do this in a single instruction.
=CONCATENATE(LEFT(A2,6),LEFT(B2,1))
Hope this is of interest.
BJH

Python 2.7.3: Search/Count txt file for string, return full line with final occurrence of that string

I'm trying to create a WiFi Log Scanner. Currently we go through logs manually using CTRL+F and our keywords. I just want to automate that process. i.e. bang in a .txt file and receive an output.
I've got the bones of the code, can work on making it pretty later, but I'm running into a small issue. I want the scanner to search the file (done), count instances of that string (done) and output the number of occurrences (done) followed by the full line where that string occurred last, including line number (line number is not essential, just makes things easier to do a gestimate of which is the more recent issue if there are multiple).
Currently I'm getting an output of every line with the string in it. I know why this is happening, I just can't think of a way to specify just output the last line.
Here is my code:
import os
from Tkinter import Tk
from tkFileDialog import askopenfilename
def file_len(filename):
#Count Number of Lines in File and Output Result
with open(filename) as f:
for i, l in enumerate(f):
pass
print('There are ' + str(i+1) + ' lines in ' + os.path.basename(filename))
def file_scan(filename):
#All Issues to Scan will go here
print ("DHCP was found " + str(filename.count('No lease, failing')) + " time(s).")
for line in filename:
if 'No lease, failing' in line:
print line.strip()
DNS= (filename.count('Host name lookup failure:res_nquery failed') + filename.count('HTTP query failed'))/2
print ("DNS Failure was found " + str(DNS) + " time(s).")
for line in filename:
if 'Host name lookup failure:res_nquery failed' or 'HTTP query failed' in line:
print line.strip()
print ("PSK= was found " + str(testr.count('psk=')) + " time(s).")
for line in ln:
if 'psk=' in line:
print 'The length(s) of the PSK used is ' + str(line.count('*'))
Tk().withdraw()
filename=askopenfilename()
abspath = os.path.abspath(filename) #So that doesn't matter if File in Python Dir
dname = os.path.dirname(abspath) #So that doesn't matter if File in Python Dir
os.chdir(dname) #So that doesn't matter if File in Python Dir
print ('Report for ' + os.path.basename(filename))
file_len(filename)
file_scan(filename)
That's, pretty much, going to be my working code (just have to add a few more issue searches), I have a version that searches a string instead of a text file here. This outputs the following:
Total Number of Lines: 38
DHCP was found 2 time(s).
dhcp
dhcp
PSK= was found 2 time(s).
The length(s) of the PSK used is 14
The length(s) of the PSK used is 8
I only have general stuff there, modified for it being a string rather than txt file, but the string I'm scanning from will be what's in the txt files.
Don't worry too much about PSK, I want all examples of that listed, I'll see If I can tidy them up into one line at a later stage.
As a side note, a lot of this is jumbled together from doing previous searches, so I have a good idea that there are probably neater ways of doing this. This is not my current concern, but if you do have a suggestion on this side of things, please provide an explanation/link to explanation as to why your way is better. I'm fairly new to python, so I'm mainly dealing with stuff I currently understand. :)
Thanks in advance for any help, if you need any further info, please let me know.
Joe
To search and count the string occurrence I solved in following way
'''---------------------Function--------------------'''
#Counting the "string" occurrence in a file
def count_string_occurrence():
string = "test"
f = open("result_file.txt")
contents = f.read()
f.close()
print "Number of '" + string + "' in file", contents.count("foo")
#we are searching "foo" string in file "result_file.txt"
I can't comment yet on questions, but I think I can answer more specifically with some more information What line do you want only one of?
For example, you can do something like:
search_str = 'find me'
count = 0
for line in file:
if search_str in line:
last_line = line
count += 1
print '{0} occurrences of this line:\n{1}'.format(count, last_line)
I notice that in file_scan you are iterating twice through file. You can surely condense it into one iteration :).

Attribute Error for strings created from lists

I'm trying to create a data-scraping file for a class, and the data I have to scrape requires that I use while loops to get the right data into separate arrays-- i.e. for states, and SAT averages, etc.
However, once I set up the while loops, my regex that cleared the majority of the html tags from the data broke, and I am getting an error that reads:
Attribute Error: 'NoneType' object has no attribute 'groups'
My Code is:
import re, util
from BeautifulSoup import BeautifulStoneSoup
# create a comma-delineated file
delim = ", "
#base url for sat data
base = "http://www.usatoday.com/news/education/2007-08-28-sat-table_N.htm"
#get webpage object for site
soup = util.mysoupopen(base)
#get column headings
colCols = soup.findAll("td", {"class":"vaTextBold"})
#get data
dataCols = soup.findAll("td", {"class":"vaText"})
#append data to cols
for i in range(len(dataCols)):
colCols.append(dataCols[i])
#open a csv file to write the data to
fob=open("sat.csv", 'a')
#initiate the 5 arrays
states = []
participate = []
math = []
read = []
write = []
#split into 5 lists for each row
for i in range(len(colCols)):
if i%5 == 0:
states.append(colCols[i])
i=1
while i<=250:
participate.append(colCols[i])
i = i+5
i=2
while i<=250:
math.append(colCols[i])
i = i+5
i=3
while i<=250:
read.append(colCols[i])
i = i+5
i=4
while i<=250:
write.append(colCols[i])
i = i+5
#write data to the file
for i in range(len(states)):
states = str(states[i])
participate = str(participate[i])
math = str(math[i])
read = str(read[i])
write = str(write[i])
#regex to remove html from data scraped
#remove <td> tags
line = re.search(">(.*)<", states).groups()[0] + delim + re.search(">(.*)<", participate).groups()[0]+ delim + re.search(">(.*)<", math).groups()[0] + delim + re.search(">(.*)<", read).groups()[0] + delim + re.search(">(.*)<", write).groups()[0]
#append data point to the file
fob.write(line)
Any ideas regarding why this error suddenly appeared? The regex was working fine until I tried to split the data into different lists. I have already tried printing the various strings inside the final "for" loop to see if any of them were "None" for the first i value (0), but they were all the string that they were supposed to be.
Any help would be greatly appreciated!
It looks like the regex search is failing on (one of) the strings, so it returns None instead of a MatchObject.
Try the following instead of the very long #remove <td> tags line:
out_list = []
for item in (states, participate, math, read, write):
try:
out_list.append(re.search(">(.*)<", item).groups()[0])
except AttributeError:
print "Regex match failed on", item
sys.exit()
line = delim.join(out_list)
That way, you can find out where your regex is failing.
Also, I suggest you use .group(1) instead of .groups()[0]. The former is more explicit.