Python Script Can not find file object to close - python-2.7

When running this simple script, the "output_file.csv" remains open. I am unsure about how to close the file in this scenario.
I have looked at other examples where the open() function is assigned to a variable such as 'f', and the object closed using f.close(). Because of the with / as csv-file, I am unclear as to where the file object actually is. Would anyone mind conceptually explaining where the disconnect is here? Ideally, would like to know:
how to check namespace for all open file objects
how to determine the proper method for closing these objects
simple script to read columns of data where mapping in column 1 is blank, fill down
import csv
output_file = csv.writer(open('output_file.csv', 'w'))
csv.register_dialect('mydialect', doublequote=False, quotechar="'")
def csv_writer(data):
with open('output_file.csv',"ab") as csv_file:
writer = csv.writer(csv_file, delimiter=',', lineterminator='\r\n', dialect='mydialect')
writer.writerow(data)
D = [[]]
for line in open('inventory_skus.csv'):
clean_line = line.strip()
data_points = clean_line.split(',')
print data_points
D.append([line.strip().split(',')[0], line.strip().split(',')[1]])
D2 = D
for i in range(1, len(D)):
nr = D[i]
if D[i][0] == '':
D2[i][0] = D[i-1][0]
else:
D2[i] = D[i]
for line in range(1, len(D2)):
csv_writer(D2[line])
print D2[line]

Actually, you are creating two file objects (in two different ways). First one:
output_file = csv.writer(open('output_file.csv', 'w'))
This is hidden within a csv.writer and not exposed by the same, however
you don't use that output writer at all, including not closing it. So it remains open until garbage collected.
In
with open('output_file.csv',"ab") as csv_file:
you get the file object in csv_file. The context block takes care of closing the object, so no need to close it manually (file objects are context managers).
Manually indexing over D2 is unnecessary. Also, why are you opening the CSV file in binary mode?
def write_data_row(csv_writer, data):
writer.writerow(data)
with open('output_file.csv',"w") as csv_file:
writer = csv.writer(csv_file, delimiter=',', lineterminator='\r\n', dialect='mydialect')
for line in D2[1:]:
write_data_row(writer, line)
print line

Related

Saving line after line from a txt file

So I wrote this code:
import csv
data = []
filename = "S:\Doc\Python\Data\Dekomp\Hth.txt"
with open(filename) as f:
lines = f.readlines()
for line in lines:
if line.startswith('%'):
data.append(line.split('+')[0].strip())
if line.endswith('%'):
break
with open('S:\Doc\Python\Data\Dekomp\Test.csv', 'w') as f:
writer = csv.writer(f, delimiter=' ')
for line in data:
writer.writerow(line.split())
And my data looks like this:
Headline starts with "%th=number", while number changes from 2 to 180 (each segment plus 2, so it goes (2,4,6... up to180).
Between those segments I have three columns of data, which I would like to append to a csv file. While using my code I save only headliners so (%th=2, %th=4... %th=180). Do you have any idea how to change my code so it will start reading headline, then append data below to a .txt or .csv file, and then starts loop again when it "sees" another headline and continue the process with saving next segment to another file, and that up to "%th=180"?
UPDATE:
Input:
Expected output:
That the program will append to another file all the data below "%th=number", and then when the following segment appears it will save to another file, and the process will continue till the end of this file.
In other words each segment starts with even number so (2, 4, 6, 8 ... 180) so I should get 90 files, each for every segment.
UPDATE 2:
So I have change my code:
with open("S:\Doc\Python\Data\Dekomp\Hth.txt", 'r') as f:
with open("S:\Doc\Python\Data\Dekomp\Hth2.txt", 'w') as g:
for line in f:
if line.startswith("%"):
g.write(line)
if line.endswith("%"):
break
But right now the problem is that if I put this startswith and endswith python will save only headliner, if I delete them, the obivous thing happens, it saves everything from input file.
data = []
filename = "S:\Doc\Python\Data\Dekomp\Hth.txt"
with open(filename) as f:
lines = f.readlines() # Reading file
def _get_all_starting_index(data): # Calculating index of all lines starting with %
return [data.index(line) for line in data if line.startswith("%")]
indices= _get_all_starting_index(lines)
data_info_to_write_in_file = {} # for storing data to write in each individual file
for i in range(len(indices)): # looping over number of indices
key = lines[indices[i]] # key value for starting of a segment.
end_point = indices[i+1] if len(indices) > i+1 else len(indices) # finding end point.
lines_to_get = lines[indices[i]+1 : end_point] # getting lines in between and storing it in dictionary
data_info_to_write_in_file[key] = lines_to_get
for key in data_info_to_write_in_file.keys(): # writing info in each individual file
filename = "S:\Doc\Python\Data\Dekomp\{}.txt".format(key.strip().split("=")[-1])
with open(filename, 'w') as f:
for line in data_info_to_write_in_file[key]:
f.write(line)
Hope it will help.
Feel free to get any info.

Python: Reading in .csv data as dictionary and printing out data as dictionary to .csv file?

I'm writing a python executable script that does the following:
I want to gather information from a .csv file and read it into python as a dictionary. This .csv file contains several columns of information with headings, and I only want to extract particular columns (those columns with specific headings I want) , and print those columns out to another .csv file. I am using the functions DictReader and DictWriter.
I am reading in the .csv file as a dictionary (with the headings being the key and the column values being the items),and output the information as a dictionary to another .csv file.
After I read it in, I print out the items in the particular headings (so I can double check what I have read it). I then open up a new .csv file and want to write the data (which I have just read in) as a dictionary. I can write in the keys (column headings) but my code doesn't print any of the item values for some reason. The headings that I want in this case are 'Name' and 'DOB'.
Here is my code:
#!/usr/bin/python
import os
import os.path
import re
import sys
import pdb
import csv
csv_file = csv.DictReader(open(sys.argv[1],'rU'),delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
test_file = open('test2.csv','wr')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
Any ideas of where I'm going wrong ? I want to print the item values under their their corresponding column headers in the output file.
I am using python 2.7.11 on a Mac machine. I am also printing values to the terminal.
You're unfortunately tricked by your own testing, that is, the printing of the individual rows. By looping through csv_file initially, you've exhausted the iterator and are at the end. Further iterations, as done in the bottom of your code, are not possible and will be ignored.
Your question is essentially a duplicate of various other question, such as how to read from a CSV file repeatedly. Albeit that the issue here comes up in a different way: you didn't realise what the problem was, while those questions do know the cause, but not the solution.
Answers to those questions tell you to simply reset the file pointer of the input file. Unfortunately, the input file gets closed promptly after reading, in your current code.
Thus, something like this should work:
infile = open(sys.argv[1], 'rU')
csv_file = csv.DictReader(infile ,delimiter = ',')
<all other code>
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
infile.close()
As an aside, just use the with statement when opening files:
with open(sys.argv[1], 'rU') as infile, open('test2.csv', 'wr') as outfile:
csv_file = csv.DictReader(infile ,delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
csvwriter = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames)
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
Note: DictWriter will take care of the header row. No need to write it yourself.

how to read and overwrite part of batch file in Python

I have a batch file look like something below:
.....
set ARGS=%ARGS% /startDate:2015-07-15T15:20:00.000
set ARGS=%ARGS% /endDate:2015-07-15T17:30:00.000
set ARGS=%ARGS% /IDs:250
set ARGS=%ARGS% /values:10000,20000
.....
now I want to read it and overwrite it with new dates (1 day after current start and enddate). My code below works fine if I write it to a new file but doesn't work if I tried to overwrite it. Any idea about how to fix it?
WANTED = 19 #or however many characters you want after dates
with open('myfile.bat') as searchfile, open('mynewfile.bat', 'w') as outfile:
for line in searchfile:
left,sep,right = line.partition('startDate:')
if sep: # True iff 'Figure' in line
startdatestr = (right[:WANTED])
startdate = datetime.strptime(startdatestr, "%Y-%m-%dT%H:%M:%S")
newstartdate = startdate + timedelta(days=1)
newstartdatestr = newstartdate.strftime("%Y-%m-%dT%H:%M:%S")
line = line.replace(startdatestr, newstartdatestr)
left,sep,right = line.partition('endDate:')
if sep: # True iff 'Figure' in line
enddatestr = (right[:WANTED])
enddate = datetime.strptime(enddatestr, "%Y-%m-%dT%H:%M:%S")
newenddate = enddate + timedelta(days=1)
newenddatestr = newenddate.strftime("%Y-%m-%dT%H:%M:%S")
line = line.replace(enddatestr, newenddatestr)
outfile.write(line)
While you are in the 'with open()' part, your file is still open, so you cannot overwrite it. Store the content and your modifications in variables, leave the 'with open()' part so that the file handle closes.
Then open the file for writing and output your data to it.

Mutliple output files created but empty

I am trying to split one file with two articles in it into two separate files with one article in each, for subsequent analysis of the articles. Each article in the initial file has an ID that I want to use to separate the files with, using RE.
Below is the initial input file, with ID number:
166068619 #### "Epilepsy: let's end our ignorance of this neglected condition
Helen Stephens is a young woman with epilepsy [...]."
106899978 #### "Great British Payoff shows that BBC governance is broken
If it was a television series, they'd probably call it [...]."
However, when I run my code, I do get two separate files as an output but they are empty.
This is my code:
def file_split(path_to_file):
"""Function splits bigger file into N smaller ones, based on a certain RE
match, that is used to break the bigger file into smaller ones"""
def pattern_extract(path_to_file):
"""Function identifies the number of RE occurences in a file,
No. can be used in further analysis as range No."""
import re
x = []
with open(path_to_file) as f:
for line in f:
match = re.search(r'^\d+?\t####\t', line)
if match:
a = match.group()
x.append(a)
return len(x)
y = pattern_extract(path_to_file)
m = y + 1
files = [open('filename%i.txt' %i, 'w') for i in range(1,m)]
with open(path_to_file) as f:
for line in f:
match = re.search(r'^\d+?\t####\t', line)
if match:
a = match.group()
#files = [open('filename%i.txt' %i, 'w') for i in range(1, m)]
files[i-1].write(a)
for f in files:
f.close()
return files
Output result is as follows:
file_split(path)
Out[19]:
[<open file 'filename1.txt', mode 'w' at 0x7fe121b130c0>,
<open file 'filename2.txt', mode 'w' at 0x7fe121b131e0>]
I am new to Python and I am not quite sure where the problem lies. I checked some other answers that addressed the multiple file outputs but cannot figure out the solution. Help would be very much appreciated.
There are two problems with your code:
you write only the line matching the ID (actually, just the match itself), not the rest
you are always writing to the last file, as you use i, the loop variable "left over" from the list comprehension
To fix it, you could change the lower portion of your code to this:
y = pattern_extract(path_to_file)
files = [open('filename%i.txt' %i, 'w') for i in range(y)]
n = -1
with open(path_to_file) as f:
for line in f:
if re.search(r'^\d+\s+####\s+', line):
n += 1
files[n].write(line)
But you do not have to read the file two times at all, just to count the matches: Just open another file when the line matches an ID line and directly write to that last file in the list, then close all the files.
open_files = []
with open(path_to_file) as f:
for line in f:
if re.search(r'^\d+\s+####\s+', line):
open_files.append(open('filename%d.txt' % len(open_files), 'w'))
open_files[-1].write(line)
for f in open_files:
f.close()

Save multiple lines of text in .txt

I am a python newbie. I can print the twitter search results, but when I save to .txt, I only get one result. How do I add all the results to my .txt file?
t = Twython(app_key=api_key, app_secret=api_secret, oauth_token=acces_token, oauth_token_secret=ak_secret)
tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break
if(0 == i):
results = t.search(q="#twitter",count='100')
else:
results = t.search(q="#twitter",include_entities='true',max_id=next_max_id)
for result in results['statuses']:
tweet_text = result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']
tweets.append(tweet_text)
print tweet_text
text_file = open("Output.txt", "w")
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']))
text_file.close()
You just need to rearrange your code to open the file BEFORE you do the loop:
t = Twython(app_key=api_key, app_secret=api_secret, oauth_token=acces_token, oauth_token_secret=ak_secret)
tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500
with open("Output.txt", "w") as text_file:
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break
if(0 == i):
results = t.search(q="#twitter",count='100')
else:
results = t.search(q="#twitter",include_entities='true',max_id=next_max_id)
for result in results['statuses']:
tweet_text = result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']
tweets.append(tweet_text)
print tweet_text
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']))
text_file.write('\n')
I use Python's with statement here to open a context manager. The context manager will handle closing the file when you drop out of the loop. I also added another write command that writes out a carriage return so that each line of data would be on its own line.
You could also open the file in append mode ('a' instead of 'w'), which would allow you to remove the 2nd write command.
There are two general solutions to your issue. Which is best may depend on more details of your program.
The simplest solution is just to open the file once at the top of your program (before the loop) and then keep reusing the same file object over and over in the later code. Only when the whole loop is done should the file be closed.
with open("Output.txt", "w") as text_file:
for i in range(0,MAX_ATTEMPTS):
# ...
for result in results['statuses']:
# ...
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'],
result['user']['followers_count'],
result['text'],
result['created_at'],
result['source']))
Another solution would be to open the file several times, but to use the "a" append mode when you do so. Append mode does not truncate the file like "w" write mode does, and it seeks to the end automatically, so you don't overwrite the file's existing contents. This approach would be most appropriate if you were writing to several different files. If you're just writing to the one, I'd stick with the first solution.
for i in range(0,MAX_ATTEMPTS):
# ...
for result in results['statuses']:
# ...
with open("Output.txt", "a") as text_file:
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'],
result['user']['followers_count'],
result['text'],
result['created_at'],
result['source']))
One last point: It looks like you're writing out comma separated data. You may want to use the csv module, rather than writing your file manually. It can take care of things like quoting or escaping any commas that appear in the data for you.