I am currently trying to open a file and using shlex.split to segment the lines. Here is an example of 2 lines from the text file.
set group address "Untrust" "This is a test group"
set group address "Untrust" "This is a test group" add "Test-address"
When I run my code it says "IndexError: list index out of range". I do realize that this is because it doesn't recognize my linetoken[5]. Since both lines begin almost identical, how would I get the code to move beyond the first line and just go to the second. My current code is below. The user input and count are for entering zones and then having it loop through using the input zones, however, I erased most of the code in an attempt to fix this issue first.
import shlex
import sys
def main():
zone = []
zone = raw_input(str('enter zones: '))
zone = shlex.split(zone)
count = 0
configfile = open('convert.txt','r')
for configline in configfile:
with open('converted.txt','a')
linetoken = shlex.split(configline)
if(linetoken[0]=='set' and linetoken[1]=='group' and linetoken[5]=='add'):
converted.write(linetoken[0] +' ' +linetoken[1] +' ' +linetoken[2] +' ' +linetoken[3] +' ' +linetoken[4] +' ' +linetoken[5])
break
main()
I figured out how the answer to my problem and so I figured I would post the solution. To get bypass the first line I had it check that the word 'add' wasn't in the linetoken. If this was true then I added the statement pass and continued to the elif statement. Below is my new code.
import shlex
import sys
def main():
zone = []
zone = raw_input(str('enter zones: '))
zone = shlex.split(zone)
count = 0
configfile = open('convert.txt','r')
for configline in configfile:
with open('converted.txt','a')
linetoken = shlex.split(configline)
if(linetoken[0]=='set' and linetoken[1]=='group' and 'add' not in linetoken):
pass
elif(linetoken[0]=='set' and linetoken[1]=='group' and linetoken[5]=='add'):
converted.write(linetoken[0] +' ' +linetoken[1] +' ' +linetoken[2] +' ' +linetoken[3] +' ' +linetoken[4] +' ' +linetoken[5])
break
main()
Related
How to delete common words from two documents thats extracted from two websites? I already extracted the news from two sites now I want to delete the common words from the two documents. I used the following code to extract news from two different websites:
from __future__import unicode_literals
import feedparser
import re
d=feedparser.parse('http://feeds.bbci.co.uk./news/rss.xml')
i=0
for post in d.entries
titl = post.title
desc = post.description
titl2 = tit1.replace('\\'," ")
desc1 = desc.replace('/'," ")
print(str(i) + ' ' + titl2)
i=i+1
print "indian Express"
g=feedparser.parse('http://www.rssmicro.com/rss.web?q=Android')
i=0
for pos in g.entries:
tit = post.title
#desc=post.description
tit4 = tit.replace('\\'," ")
print(str(i) + ' ' + tit4)
i=i+1
I have a very basic question in python. I want to split the items in the following list and print it in a text file.
import pandas as pd
s = ['"9-6": 1', ' "15-4": 1', ' "12-3": 1', ' "8-4": 1', ' "8-5": 1', ' "8-1": 1']
print type(s)
for i in s:
j = i.split(',')
with open("out.txt","w") as text_file:
text_file.write("{}".format(j))
However, my code only prints the last value. Clearly, it is not taking the last lines inside the for loop block. Can anyone point where am I going wrong? Thanks!
You are not appending the values. You are re-writing every time. Try like this:
with open("out.txt","a+") as text_file:
Here, I replaced "w" by "a+".
Full code:
import pandas as pd
s = ['"9-6": 1', ' "15-4": 1', ' "12-3": 1', ' "8-4": 1', ' "8-5": 1', ' "8-1": 1']
print type(s)
for i in s:
j = i.split(',')
with open("out.txt","a+") as text_file:
text_file.write("{}".format(j))
Every time that you open out.txt with the 'w' option, it is erasing that file completely before you even write anything. You should put the with statement before the start of the for loop, so that the file is only opened once.
One every iteration of your for loop, your truncating your files contents ie. "Emptying the file". This is because when using the open mode w Python implicitly truncates the file since you already created on the previous iteration. This behavior is documented in Python 2.7:
[..]'w' [is] for writing [to files] (truncating the file if it already exists)[..]
Use the option a+ instead, which appends to a file. The Python 2.7 documention also notes this:
[..] [use] 'a' for appending [..]
Which means that this:
...open('out.txt' 'w')...
should be:
...open('out.txt', 'a')...
I have a word doc named a.doc formatted:
Name - Bob
Hair color - Red
Age - 28
...
I'd like to save the information after "Name - " "Hair color - " ... into a variable for access later in the script. Would the easiest way be to create a list:
Keywords = (Name, 'Hair color', Age)
Fileopen = open(a.doc)
Filecontent = readlines(fileopen)
For keywords in filecontent:
This is where I get stuck. I'm thinking I can add a statement allowing to grab after the " - " in each line.
EDIT:
To be more precise in my explanation of what I am looking to do:
I would like to grab the information in each line separately after the ' -
' and store it in a variable. For example Name - Bob will be stored in name equaling 'Bob'.
I have made some progress here since my previous update. I just know the way I am doing it does not allow for easily repeating.
I have successfully pulled the information utilizing:
filename = raw_input("choose your file: ")
print "you chose: %r" % filename
with open(filename) as fo:
for line in fo:
if "Name" in line: name = line.split(" - ", 1)[1]
print name
fo.close()
I know that I can continue to make a new 'if' statement for each of my strings I'd like to pull, but obviously that isn't the fastest way.
My REAL question:
How to make that if statement into a loop that will check for multiple strings and assign them to separate variables?
In the end I am really just looking to use these variables and reorder the way they are printed out which is why I need them separated. I attempted to use the 'keywords' but am not sure how to allow that to dynamically define each to a variable that I would like. Should I add them to a list or a tuple and subsequently call upon them in that manner? The variable name obviously has no meaning outside the program so if I called it from a tuple as in [0], that might work as well.
This code asks for the name, age, and hair color of the person, then returns the person's information while storing the information in the variable Filecontent and is stored until you close the shell:
def namesearch(Name, Hair, Age):
Keywords = ('Name - ' + Name + ', Hair Color - ' + Hair \
+ ', Age - ' + Age)
Fileopen = open('a.doc', 'r')
for line in Fileopen:
if Keywords in line:
global Filecontent
Filecontent = line
print line
Name = raw_input('Enter the person\'s name: ')
Hair = raw_input('Enter the person\'s hair color: ')
Age = raw_input('Enter the person\'s age: ')
namesearch(Name, Hair, Age)
This code returns the information in this format:
Name - (Name), Hair Color - (Hair Color), Age - (Age).
Note: This code can only search for names, not add them
from googlefinance import getQuotes
import json
import time as t
import re
List = ["A","AA","AAB"]
Time=t.localtime() # Sets variable Time to retrieve date/time info
Date2= ('%d-%d-%d %dh:%dm:%dsec'%(Time[0],Time[1],Time[2],Time[3],Time[4],Time[5])) #formats time stamp
while True:
for i in List:
try: #allows elements to be called and if an error does the next step
Data = json.dumps(getQuotes(i.lower()),indent=1) #retrieves Data from google finance
regex = ('"LastTradePrice": "(.+?)",') #sets parse
pattern = re.compile(regex) #compiles parse
price = re.findall(pattern,Data) #retrieves parse
print(i)
print(price)
except: #sets Error coding
Error = (i + ' Failed to load on: ' + Date2)
print (Error)
It will display the quote as: ['(number)'].
I would like it to only display the number, which means removing the brackets and quotes.
Any help would be great.
Changing:
print(price)
into:
print(price[0])
prints this:
A
42.14
AA
10.13
AAB
0.110
Try to use type() function to know the datatype, in your case type(price)
it the data type is list use print(price[0])
you will get the output (number), for brecess you need to check google data and regex.
I'm trying to create a data-scraping file for a class, and the data I have to scrape requires that I use while loops to get the right data into separate arrays-- i.e. for states, and SAT averages, etc.
However, once I set up the while loops, my regex that cleared the majority of the html tags from the data broke, and I am getting an error that reads:
Attribute Error: 'NoneType' object has no attribute 'groups'
My Code is:
import re, util
from BeautifulSoup import BeautifulStoneSoup
# create a comma-delineated file
delim = ", "
#base url for sat data
base = "http://www.usatoday.com/news/education/2007-08-28-sat-table_N.htm"
#get webpage object for site
soup = util.mysoupopen(base)
#get column headings
colCols = soup.findAll("td", {"class":"vaTextBold"})
#get data
dataCols = soup.findAll("td", {"class":"vaText"})
#append data to cols
for i in range(len(dataCols)):
colCols.append(dataCols[i])
#open a csv file to write the data to
fob=open("sat.csv", 'a')
#initiate the 5 arrays
states = []
participate = []
math = []
read = []
write = []
#split into 5 lists for each row
for i in range(len(colCols)):
if i%5 == 0:
states.append(colCols[i])
i=1
while i<=250:
participate.append(colCols[i])
i = i+5
i=2
while i<=250:
math.append(colCols[i])
i = i+5
i=3
while i<=250:
read.append(colCols[i])
i = i+5
i=4
while i<=250:
write.append(colCols[i])
i = i+5
#write data to the file
for i in range(len(states)):
states = str(states[i])
participate = str(participate[i])
math = str(math[i])
read = str(read[i])
write = str(write[i])
#regex to remove html from data scraped
#remove <td> tags
line = re.search(">(.*)<", states).groups()[0] + delim + re.search(">(.*)<", participate).groups()[0]+ delim + re.search(">(.*)<", math).groups()[0] + delim + re.search(">(.*)<", read).groups()[0] + delim + re.search(">(.*)<", write).groups()[0]
#append data point to the file
fob.write(line)
Any ideas regarding why this error suddenly appeared? The regex was working fine until I tried to split the data into different lists. I have already tried printing the various strings inside the final "for" loop to see if any of them were "None" for the first i value (0), but they were all the string that they were supposed to be.
Any help would be greatly appreciated!
It looks like the regex search is failing on (one of) the strings, so it returns None instead of a MatchObject.
Try the following instead of the very long #remove <td> tags line:
out_list = []
for item in (states, participate, math, read, write):
try:
out_list.append(re.search(">(.*)<", item).groups()[0])
except AttributeError:
print "Regex match failed on", item
sys.exit()
line = delim.join(out_list)
That way, you can find out where your regex is failing.
Also, I suggest you use .group(1) instead of .groups()[0]. The former is more explicit.