I have a batch file look like something below:
.....
set ARGS=%ARGS% /startDate:2015-07-15T15:20:00.000
set ARGS=%ARGS% /endDate:2015-07-15T17:30:00.000
set ARGS=%ARGS% /IDs:250
set ARGS=%ARGS% /values:10000,20000
.....
now I want to read it and overwrite it with new dates (1 day after current start and enddate). My code below works fine if I write it to a new file but doesn't work if I tried to overwrite it. Any idea about how to fix it?
WANTED = 19 #or however many characters you want after dates
with open('myfile.bat') as searchfile, open('mynewfile.bat', 'w') as outfile:
for line in searchfile:
left,sep,right = line.partition('startDate:')
if sep: # True iff 'Figure' in line
startdatestr = (right[:WANTED])
startdate = datetime.strptime(startdatestr, "%Y-%m-%dT%H:%M:%S")
newstartdate = startdate + timedelta(days=1)
newstartdatestr = newstartdate.strftime("%Y-%m-%dT%H:%M:%S")
line = line.replace(startdatestr, newstartdatestr)
left,sep,right = line.partition('endDate:')
if sep: # True iff 'Figure' in line
enddatestr = (right[:WANTED])
enddate = datetime.strptime(enddatestr, "%Y-%m-%dT%H:%M:%S")
newenddate = enddate + timedelta(days=1)
newenddatestr = newenddate.strftime("%Y-%m-%dT%H:%M:%S")
line = line.replace(enddatestr, newenddatestr)
outfile.write(line)
While you are in the 'with open()' part, your file is still open, so you cannot overwrite it. Store the content and your modifications in variables, leave the 'with open()' part so that the file handle closes.
Then open the file for writing and output your data to it.
Related
So I wrote this code:
import csv
data = []
filename = "S:\Doc\Python\Data\Dekomp\Hth.txt"
with open(filename) as f:
lines = f.readlines()
for line in lines:
if line.startswith('%'):
data.append(line.split('+')[0].strip())
if line.endswith('%'):
break
with open('S:\Doc\Python\Data\Dekomp\Test.csv', 'w') as f:
writer = csv.writer(f, delimiter=' ')
for line in data:
writer.writerow(line.split())
And my data looks like this:
Headline starts with "%th=number", while number changes from 2 to 180 (each segment plus 2, so it goes (2,4,6... up to180).
Between those segments I have three columns of data, which I would like to append to a csv file. While using my code I save only headliners so (%th=2, %th=4... %th=180). Do you have any idea how to change my code so it will start reading headline, then append data below to a .txt or .csv file, and then starts loop again when it "sees" another headline and continue the process with saving next segment to another file, and that up to "%th=180"?
UPDATE:
Input:
Expected output:
That the program will append to another file all the data below "%th=number", and then when the following segment appears it will save to another file, and the process will continue till the end of this file.
In other words each segment starts with even number so (2, 4, 6, 8 ... 180) so I should get 90 files, each for every segment.
UPDATE 2:
So I have change my code:
with open("S:\Doc\Python\Data\Dekomp\Hth.txt", 'r') as f:
with open("S:\Doc\Python\Data\Dekomp\Hth2.txt", 'w') as g:
for line in f:
if line.startswith("%"):
g.write(line)
if line.endswith("%"):
break
But right now the problem is that if I put this startswith and endswith python will save only headliner, if I delete them, the obivous thing happens, it saves everything from input file.
data = []
filename = "S:\Doc\Python\Data\Dekomp\Hth.txt"
with open(filename) as f:
lines = f.readlines() # Reading file
def _get_all_starting_index(data): # Calculating index of all lines starting with %
return [data.index(line) for line in data if line.startswith("%")]
indices= _get_all_starting_index(lines)
data_info_to_write_in_file = {} # for storing data to write in each individual file
for i in range(len(indices)): # looping over number of indices
key = lines[indices[i]] # key value for starting of a segment.
end_point = indices[i+1] if len(indices) > i+1 else len(indices) # finding end point.
lines_to_get = lines[indices[i]+1 : end_point] # getting lines in between and storing it in dictionary
data_info_to_write_in_file[key] = lines_to_get
for key in data_info_to_write_in_file.keys(): # writing info in each individual file
filename = "S:\Doc\Python\Data\Dekomp\{}.txt".format(key.strip().split("=")[-1])
with open(filename, 'w') as f:
for line in data_info_to_write_in_file[key]:
f.write(line)
Hope it will help.
Feel free to get any info.
I'm trying to find how to stop a os.walk after it has walked through a particular file.
I have a directory of log files organized by date. I'm trying to replace grep searches allowing a user to find ip addresses stored in a date range they specify.
The program will take the following arguments:
-i ipv4 or ipv6 address with subnet
-s start date ie 2013/12/20 matches file structure
-e end date
I'm assuming because the topdown option their is a logic that should allow me to declare a endpoint, what is the best way to do this? I'm thinking while loop.
I apologize in advance if something is off with my question. Just checked blood sugar, it's low 56, gd type one.
Additional information
The file structure will be situated in flows/index_border as such
2013
--01
--02
----01
----...
----29
2014
___________Hope this is clear, year folder contains month folders, containing day folders, containing hourly files. Dates increase downwards.___________________
The end date will need to be inclusive, ( I didn't focus too much on it because I can just add code to move one day up)
I have been trying to make a date range function, I was surprised I didn't see this in any datetime docs, seems like it would be useful.
import os, gzip, netaddr, datetime, argparse
startDir = '.'
def sdate_format(s):
try:
return (datetime.datetime.strptime(s, '%Y/%m/%d').date())
except ValueError:
msg = "Bad start date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
def edate_format(e):
try:
return (datetime.datetime.strptime(e, '%Y/%m/%d').date())
except ValueError:
msg = "Bad end date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
parser = argparse.ArgumentParser(description='Locate IP address in log files for a particular date or date range')
parser.add_argument('-s', '--start_date', action='store', type=sdate_format, dest='start_date', help='The first date in range of interest.')
parser.add_argument('-e', '--end_date', action='store', type=edate_format, dest='end_date', help='The last date in range of interest.')
parser.add_argument('-i', action='store', dest='net', help='IP address or address range, IPv4 or IPv6 with optional subnet accepted.', required=True)
results = parser.parse_args()
start = results.start_date
end = results.end_date
target_ip = results.net
startDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(start.year, start.month, start.day)
print('searching...')
for root, dirs, files in os.walk(startDir):
for contents in files:
if contents.endswith('.gz'):
f = gzip.open(os.path.join(root, contents), 'r')
else:
f = open(os.path.join(root, contents), 'r')
text = f.readlines()
f.close()
for line in text:
for address_item in netaddr.IPNetwork(target_IP):
if str(address_item) in line:
print line,
You need to describe what works or does not work. The argparse of your code looks fine, though I haven't done any testing. The use of type is refreshingly correct. :) (posters often misuse that parameter.)
But as for the stopping, I'm guessing you could do:
endDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(end.year, end.month, end.day)
for root, dirs, files in os.walk(startDir):
for contents in files:
....
if endDir in <something based on dirs and files>:
break
I don't know enough your file structure to be more specific. It's also been sometime since I worked with os.walk. In any case, I think a conditional break is the way to stop the walk early.
#!/usr/bin/env python
import os, gzip, netaddr, datetime, argparse, sys
searchDir = '.'
searchItems = []
def sdate_format(s):
try:
return (datetime.datetime.strptime(s, '%Y/%m/%d').date())
except ValueError:
msg = "Bad start date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
def edate_format(e):
try:
return (datetime.datetime.strptime(e, '%Y/%m/%d').date())
except ValueError:
msg = "Bad end date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
parser = argparse.ArgumentParser(description='Locate IP address in log files for a particular date or date range')
parser.add_argument('-s', '--start_date', action='store', type=sdate_format, dest='start_date',
help='The first date in range of interest.', required=True)
parser.add_argument('-e', '--end_date', action='store', type=edate_format, dest='end_date',
help='The last date in range of interest.', required=True)
parser.add_argument('-i', action='store', dest='net',
help='IP address or address range, IPv4 or IPv6 with optional subnet accepted.', required=True)
results = parser.parse_args()
start = results.start_date
end = results.end_date + datetime.timedelta(days=1)
target_IP = results.net
dateRange = end - start
for addressOfInterest in(netaddr.IPNetwork(target_IP)):
searchItems.append(str(addressOfInterest))
print('searching...')
for eachDay in range(dateRange.days):
period = start+datetime.timedelta(days=eachDay)
searchDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(period.year, period.month, period.day)
for contents in os.listdir(searchDir):
if contents.endswith('.gz'):
f = gzip.open(os.path.join(searchDir, contents), 'rb')
text = f.readlines()
f.close()
else:
f = open(os.path.join(searchDir, contents), 'r')
text = f.readlines()
f.close()
#for line in text:
# break
for addressOfInterest in searchItems:
for line in text:
if addressOfInterest in line:
# if str(address_item) in line:
print contents
print line,
I was banging my head, because I thought I was printing a duplicate. Turns out the file I was given to test has duplication. I ended up removing os.walk due to the predictable nature of the file system, but #hpaulj did provide a correct solution. Much appreciated!
When running this simple script, the "output_file.csv" remains open. I am unsure about how to close the file in this scenario.
I have looked at other examples where the open() function is assigned to a variable such as 'f', and the object closed using f.close(). Because of the with / as csv-file, I am unclear as to where the file object actually is. Would anyone mind conceptually explaining where the disconnect is here? Ideally, would like to know:
how to check namespace for all open file objects
how to determine the proper method for closing these objects
simple script to read columns of data where mapping in column 1 is blank, fill down
import csv
output_file = csv.writer(open('output_file.csv', 'w'))
csv.register_dialect('mydialect', doublequote=False, quotechar="'")
def csv_writer(data):
with open('output_file.csv',"ab") as csv_file:
writer = csv.writer(csv_file, delimiter=',', lineterminator='\r\n', dialect='mydialect')
writer.writerow(data)
D = [[]]
for line in open('inventory_skus.csv'):
clean_line = line.strip()
data_points = clean_line.split(',')
print data_points
D.append([line.strip().split(',')[0], line.strip().split(',')[1]])
D2 = D
for i in range(1, len(D)):
nr = D[i]
if D[i][0] == '':
D2[i][0] = D[i-1][0]
else:
D2[i] = D[i]
for line in range(1, len(D2)):
csv_writer(D2[line])
print D2[line]
Actually, you are creating two file objects (in two different ways). First one:
output_file = csv.writer(open('output_file.csv', 'w'))
This is hidden within a csv.writer and not exposed by the same, however
you don't use that output writer at all, including not closing it. So it remains open until garbage collected.
In
with open('output_file.csv',"ab") as csv_file:
you get the file object in csv_file. The context block takes care of closing the object, so no need to close it manually (file objects are context managers).
Manually indexing over D2 is unnecessary. Also, why are you opening the CSV file in binary mode?
def write_data_row(csv_writer, data):
writer.writerow(data)
with open('output_file.csv',"w") as csv_file:
writer = csv.writer(csv_file, delimiter=',', lineterminator='\r\n', dialect='mydialect')
for line in D2[1:]:
write_data_row(writer, line)
print line
I am a python newbie. I can print the twitter search results, but when I save to .txt, I only get one result. How do I add all the results to my .txt file?
t = Twython(app_key=api_key, app_secret=api_secret, oauth_token=acces_token, oauth_token_secret=ak_secret)
tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break
if(0 == i):
results = t.search(q="#twitter",count='100')
else:
results = t.search(q="#twitter",include_entities='true',max_id=next_max_id)
for result in results['statuses']:
tweet_text = result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']
tweets.append(tweet_text)
print tweet_text
text_file = open("Output.txt", "w")
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']))
text_file.close()
You just need to rearrange your code to open the file BEFORE you do the loop:
t = Twython(app_key=api_key, app_secret=api_secret, oauth_token=acces_token, oauth_token_secret=ak_secret)
tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500
with open("Output.txt", "w") as text_file:
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break
if(0 == i):
results = t.search(q="#twitter",count='100')
else:
results = t.search(q="#twitter",include_entities='true',max_id=next_max_id)
for result in results['statuses']:
tweet_text = result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']
tweets.append(tweet_text)
print tweet_text
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']))
text_file.write('\n')
I use Python's with statement here to open a context manager. The context manager will handle closing the file when you drop out of the loop. I also added another write command that writes out a carriage return so that each line of data would be on its own line.
You could also open the file in append mode ('a' instead of 'w'), which would allow you to remove the 2nd write command.
There are two general solutions to your issue. Which is best may depend on more details of your program.
The simplest solution is just to open the file once at the top of your program (before the loop) and then keep reusing the same file object over and over in the later code. Only when the whole loop is done should the file be closed.
with open("Output.txt", "w") as text_file:
for i in range(0,MAX_ATTEMPTS):
# ...
for result in results['statuses']:
# ...
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'],
result['user']['followers_count'],
result['text'],
result['created_at'],
result['source']))
Another solution would be to open the file several times, but to use the "a" append mode when you do so. Append mode does not truncate the file like "w" write mode does, and it seeks to the end automatically, so you don't overwrite the file's existing contents. This approach would be most appropriate if you were writing to several different files. If you're just writing to the one, I'd stick with the first solution.
for i in range(0,MAX_ATTEMPTS):
# ...
for result in results['statuses']:
# ...
with open("Output.txt", "a") as text_file:
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'],
result['user']['followers_count'],
result['text'],
result['created_at'],
result['source']))
One last point: It looks like you're writing out comma separated data. You may want to use the csv module, rather than writing your file manually. It can take care of things like quoting or escaping any commas that appear in the data for you.
I'm trying to create a data-scraping file for a class, and the data I have to scrape requires that I use while loops to get the right data into separate arrays-- i.e. for states, and SAT averages, etc.
However, once I set up the while loops, my regex that cleared the majority of the html tags from the data broke, and I am getting an error that reads:
Attribute Error: 'NoneType' object has no attribute 'groups'
My Code is:
import re, util
from BeautifulSoup import BeautifulStoneSoup
# create a comma-delineated file
delim = ", "
#base url for sat data
base = "http://www.usatoday.com/news/education/2007-08-28-sat-table_N.htm"
#get webpage object for site
soup = util.mysoupopen(base)
#get column headings
colCols = soup.findAll("td", {"class":"vaTextBold"})
#get data
dataCols = soup.findAll("td", {"class":"vaText"})
#append data to cols
for i in range(len(dataCols)):
colCols.append(dataCols[i])
#open a csv file to write the data to
fob=open("sat.csv", 'a')
#initiate the 5 arrays
states = []
participate = []
math = []
read = []
write = []
#split into 5 lists for each row
for i in range(len(colCols)):
if i%5 == 0:
states.append(colCols[i])
i=1
while i<=250:
participate.append(colCols[i])
i = i+5
i=2
while i<=250:
math.append(colCols[i])
i = i+5
i=3
while i<=250:
read.append(colCols[i])
i = i+5
i=4
while i<=250:
write.append(colCols[i])
i = i+5
#write data to the file
for i in range(len(states)):
states = str(states[i])
participate = str(participate[i])
math = str(math[i])
read = str(read[i])
write = str(write[i])
#regex to remove html from data scraped
#remove <td> tags
line = re.search(">(.*)<", states).groups()[0] + delim + re.search(">(.*)<", participate).groups()[0]+ delim + re.search(">(.*)<", math).groups()[0] + delim + re.search(">(.*)<", read).groups()[0] + delim + re.search(">(.*)<", write).groups()[0]
#append data point to the file
fob.write(line)
Any ideas regarding why this error suddenly appeared? The regex was working fine until I tried to split the data into different lists. I have already tried printing the various strings inside the final "for" loop to see if any of them were "None" for the first i value (0), but they were all the string that they were supposed to be.
Any help would be greatly appreciated!
It looks like the regex search is failing on (one of) the strings, so it returns None instead of a MatchObject.
Try the following instead of the very long #remove <td> tags line:
out_list = []
for item in (states, participate, math, read, write):
try:
out_list.append(re.search(">(.*)<", item).groups()[0])
except AttributeError:
print "Regex match failed on", item
sys.exit()
line = delim.join(out_list)
That way, you can find out where your regex is failing.
Also, I suggest you use .group(1) instead of .groups()[0]. The former is more explicit.