Python - Is it recommended to always open file with 'b' mode? - python-2.7

So I have this simple python function:
def ReadFile(FilePath):
with open(FilePath, 'r') as f:
FileContent = f.readlines()
return FileContent
This function is generic and used to open all sort of files. However when the file opened is a binary file, this function does not perform as expected. Changing the open() call to:
with open(FilePath, 'rb') as f:
solve the issue for binary files (and seems to keep valid in text files as well)
Question:
Is it safe and recommended to always use rb mode for reading a file?
If not, what are the cases where it is harmful?
If not, How do you know which mode to use if you don't know what type of file you're working with?
Update
FilePath = r'f1.txt'
def ReadFileT(FilePath):
with open(FilePath, 'r') as f:
FileContent = f.readlines()
return FileContent
def ReadFileB(FilePath):
with open(FilePath, 'rb') as f:
FileContent = f.readlines()
return FileContent
with open("Read_r_Write_w", 'w') as f:
f.writelines(ReadFileT(FilePath))
with open("Read_r_Write_wb", 'wb') as f:
f.writelines(ReadFileT(FilePath))
with open("Read_b_Write_w", 'w') as f:
f.writelines(ReadFileB(FilePath))
with open("Read_b_Write_wb", 'wb') as f:
f.writelines(ReadFileB(FilePath))
where f1.txt is:
line1
line3
Files Read_b_Write_wb, Read_r_Write_wb & Read_r_Write_w eqauls to the source f1.txt.
File Read_b_Write_w is:
line1
line3

In the Python 2.7 Tutorial:
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
My takeaway from that is using 'rb' seems to the best practice, and it looks like you ran into the problem they warn about - opening a binary file with 'r' on Windows.

Related

How to encode text in Choregraphe NAO

Encoded text
I want to read list from file but its getting all coded and .encode doesn't really work
import json,sys
with open('your_file.txt') as f:
lines = f.read().splitlines()
self.logger.info(lines)
self.tts.say(lines[1])
If your file is saved with UTF-8 encoding, this should work:
with open('text.txt', encoding = 'utf-8', mode = 'r') as my_file:
If this doesn't work, your text file's encoding is not UTF-8. Write your file's encoding in place of utf-8. How to determine the encoding of text?
Or if you share your input file as is, I can figure that out for you.

Issue with writing multiple lines into a file in python

I want to download multiple specific links(images´ urls) into a txt file(or any file where all links can be listed underneath each others).
I get them but the code wrtite each link on the top of the other one and at the end it stays only a link :(. Also I want not repeated urls
def dlink(self, image_url):
r = self.session.get(image_url, stream=True)
with open('Output.txt','w') as f:
f.write(image_url + '\n')
The issue is most simply that opening a file with mode 'w' truncates any existing file. You should change 'w' to 'a' instead. This will open an existing file for writing, but append instead of truncating.
More fundamentally, the problem may be that you are opening the file over and over in a loop. This is very inefficient. The only time the approach you use could be really useful is if your program is approaching the OS-imposed limit on number of open files. If this is not the case, I would recommended putting the loop inside the with block, keeping the mode as 'w' since you open the file just once now, and passing the open file to your dlink function.
Edit
Huge mistake of my part, as it is a method, and you will call it several times, if you open it in write mode ('w') or similar, it will Overwrites the existing file if the file exists.
So, if you use the 'a' way, you can see that:
Opens a file for appending. The file pointer is at the end of the file
if the file exists. That is, the file is in the append mode. If the
file does not exist, it creates a new file for writing.
The other problem radics in image_url is a list, so you need to write it line by line:
def dlink(self, image_url):
r = self.session.get(image_url, stream=True)
with open('Output.txt','a') as f:
for url in list(set(image_url)):
f.write(image_url + '\n')
another way to do it:
your_file = open('Output.txt', 'a')
r = self.session.get(image_url, stream=True)
for url in list(set(image_url)):
your_file.write("%s\n" % url)
your_file.close() #dont forget close it :)
the file open mode is wrong,'w' mode make this file was overwritten every time you open it,not appended to it. replace it to 'a' mode.
you can see this https://stackoverflow.com/a/23566951/8178794 for more detail
Open a file with option w overwrite the file if existring, use the mode a to append data to an existing file.
Try :
import requests
from os.path import splitext
# use mode='a' to append result without erasing filename
def dlink(url, filename, mode='w'):
r = requests.get(url)
if r.status_code != 200:
return
# here the link is valid
with open(filename, mode) as desc:
desc.write(url)
def dimg(img_url, img_name):
r = requests.get(img_url, stream=True)
if r.status_code != 200:
return
_, ext = splitext(img_url)
with open(img_name + ext, 'wb') as desc:
for chunk in r:
desc.write(chunk)
dlink('https://image.flaticon.com/teams/slug/freepik.jpg', 'links.txt')
dlink('https://image.flaticon.com/teams/slug/freepik.jpg', 'links.txt', 'a')
dimg('https://image.flaticon.com/teams/slug/freepik.jpg', 'freepik')

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Python: only run command once in for loop

I have a for loop which creates a CSV of values of several files in a directory.
Within this loop I only want to create the file and write in the header once, currently I am doing this:
#name&path to table file
test = tablefile+"/"+str(cell[:-10])+"_Table.csv"
#write file
if not os.path.isfile(test):
csv.writer(open(test, "wt"))
with open(test, 'w') as output:
wr = csv.writer(output, lineterminator=',')
for val in header_note:
wr.writerow([val])
and to append data I have:
with open(test, 'a') as output:
wr = csv.writer(output, lineterminator=',')
for val in table_all:
wr.writerow([val])
Which works well, however, when I run the script over again another time it will append more data to the bottom of that same .csv. What I want is for the first time through the for-loop, is to just overwrite any existing .csv with a new one with a header then continue on appending data, and overwrite/re-write header once the script is run again. Thanks!
It look like you may have some code problems other than file handling, but here goes: You problem is basically that opening a file in 'w' mode will overwrite everything in the file, and opening in 'a' mode will not allow you to change the header line.
To get around this, you will have to get the contents of the file (if it already exists), then overwrite the file, including those lines that where there to begin with.
You will want something along the lines of:
if os.path.exists(file_name): # if file already exists
with open(file_name, 'r') as in_file: # open it
old_lines = in_file.readlines()[1:] # read all lines from file EXCEPT header line
with open(file_name, 'w') as out_file: # open file again, with 'w' to create/overwrite
out_file.write(new_header_line) # write new header line to file
for line in old_lines:
out_file.write(line) # write all preexisting lines back into file
# continue writing whatever you want.

Creating and then writing to a file

So I want to read in a text file and then use some of that to write to another file that doesn't exist in the same directory. So for instance if I have a file named text.txt, I want to write a script that reads it and then creates another file, text2.txt which has some of its contents determined by what was in text.txt.
To read the file I'm using the command,
with open(inpath, 'r') as f:
...
But then what is the preferred way to create a new file and start writing to it? If I had to guess, I'd think it would be
with open(inpath, 'r') as f:
outtext = open(outpath, 'w')
...
where the variable outpath stores the directory of the file to be written. If I understand all this correctly, if the directory outpath happens to exist, running this script would destroy it or at least append to it. But if it doesn't exist, then Python would create the file. Is that accurate? And is there a better, more elegant way to do this?
I believe inpath and outpath are absolute paths. So you cannot do:
with open(inpath, 'r') as f:
...
It will throw IOError exception. open method expects a file path, but since you are providing path to a directory, exception occurs. The same applies to outpath also. Now Lets assume values of inpath and outpath as:
input_path = '/Users/avi/inputs'
output_path = '/Users/avi/outputs'
Now, to read a file, you could do:
input_file_path = os.path.join(input_path, 'input.txt')
The input_file_path will be now /Users/avi/inputs/input.txt
and to open this:
with open(input_file_path, 'r') as f:
...
Now coming to second question, yes, if file already exists python will overwrite. If it does not, it creates a new one. So you can first check whether file exists or not. If it does, then you can create a new one:
output_path_file = os.path.join(output_path, 'output.txt')
if os.path.isfile(output_path_file):
# file already exists
# do something else like create another file
output_path_file = os.path.join(output_path, 'new_output.txt')
# now write to output file
with open(output_file_path, 'w') as f:
...