Gzip and Encode file in Python 2.7 - python-2.7

with gzip.open(sys.argv[5] + ".json.gz", mode="w", encoding="utf-8") as outfile:
It throws:
TypeError: open() got an unexpected keyword argument 'encoding'
But the docs says it exists
https://docs.python.org/3/library/gzip.html
Update
How can i encode and zip the file in Python 2.7?
I tried now this:
(but it don't work)
with gzip.open(sys.argv[5] + ".json.gz", mode="w") as outfile:
outfile = io.TextIOWrapper(outfile, encoding="utf-8")
json.dump(fdata, outfile, indent=2, ensure_ascii=False)
TypeError: must be unicode, not str
What can i do?

Those are the Python 3 docs. The Python 2 version of gzip does not allow encoding= as a keyword argument to gzip.open().

Seems the question has been answered sufficiently, but for your peace of mind: Alternatively to ensure that Python2 uses utf-8 as standard perhaps try the following, as it then becomes unnecessary to specify an encoding:
import sys
reload(sys)
sys.setdefaultencoding('UTF8')

Related

Reading multiple files in a directory with pyyaml

I'm trying to read all yaml files in a directory, but I am having trouble. First, because I am using Python 2.7 (and I cannot change to 3) and all of my files are utf-8 (and I also need them to keep this way).
import os
import yaml
import codecs
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
def yaml_dump(filepath, data):
with open(filepath, 'w') as file_descriptor:
yaml.dump(data, file_descriptor)
if __name__ == "__main__":
filepath = os.listdir(os.getcwd())
data = yaml_reader(filepath)
print data
When I run this code, python gives me the message:
TypeError: coercing to Unicode: need string or buffer, list found.
I want this program to show the content of the files. Can anyone help me?
I guess the issue is with filepath.
os.listdir(os.getcwd()) returns the list of all the files in the directory. so you are passing the list to codecs.open() instead of filename
There are multiple problems with your code, apart from that it is invalide Python, in the way you formatted this.
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
however it is not necessary to do the decoding, PyYAML is perfectly capable of processing UTF-8:
def yaml_reader(filepath):
with open(filepath, "rb") as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
I hope you realise your trying to load multiple documents and always get a list as a result in data even if your file contains one document.
Then the line:
filepath = os.listdir(os.getcwd())
gives you a list of files, so you need to do:
filepath = os.listdir(os.getcwd())[0]
or decide in some other way, which of the files you want to open. If you want to combine all files (assuming they are YAML) in one big YAML file, you need to do:
if __name__ == "__main__":
data = []
for filepath in os.listdir(os.getcwd()):
data.extend(yaml_reader(filepath))
print data
And your dump routine would need to change to:
def yaml_dump(filepath, data):
with open(filepath, 'wb') as file_descriptor:
yaml.dump(data, file_descriptor, allow_unicode=True, encoding='utf-8')
However this all brings you to the biggest problem: that you are using PyYAML, that will mangle your YAML, dropping flow-style, comment, anchor names, special int/float, quotes around scalars etc. Apart from that PyYAML has not been updated to support YAML 1.2 documents (which has been the standard since 2009). I recommend you switch to using ruamel.yaml (disclaimer: I am the author of that package), which supports YAML 1.2 and leaves comments etc in place.
And even if you are bound to use Python 2, you should use the Python 3 like syntax e.g. for print that you can get with from __future__ imports.
So I recommend you do:
pip install pathlib2 ruamel.yaml
and then use:
from __future__ import absolute_import, unicode_literals, print_function
from pathlib import Path
from ruamel.yaml import YAML
if __name__ == "__main__":
data = []
yaml = YAML()
yaml.preserve_quotes = True
for filepath in Path('.').glob('*.yaml'):
data.extend(yaml.load_all(filepath))
print(data)
yaml.dump(data, Path('your_output.yaml'))

encoding in python and writing it to a YAML file in Python

I have a Unicode, which is read from a CSV file:
df.iloc[0,1]
Out[41]: u'EU-repr\xe6sentant udpeget'
In [42]: type(df_translated.iloc[0,1])
Out[42]: unicode
I would like to have it as EU-repræsentant udpeget. The final goal is to write this into a dictionary and then finally save that dict to a YAML file with PyYAML using safe_dump. However, I struggle with the encoding.
If you really need to use PyYAML you should provide the arguments
encoding='utf-8' and allow_unicode=True to the safe_dump()
routine.
If you ever intend to upgrade to YAML 1.2 and use ruamel.yaml
(disclaimer: I am the author of that package), those are the (much
more sensible) defaults:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = [u'EU-repr\xe6sentant udpeget']
yaml.dump(data, sys.stdout)
which gives:
- EU-repræsentant udpeget

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Python 2.7 and Textblob - TypeError: The `text` argument passed to `__init__(text)` must be a string, not <type 'list'>

Update: Issue resolved. (see comment section below.) Ultimately, the following two lines were required to transform my .csv to unicode and utilize TextBlob: row = [cell.decode('utf-8') for cell in row], and text = ' '.join(row).
Original question:
I am trying to use a Python library called Textblob to analyze text from a .csv file. Error I receive when I call Textblob in my code is:
Traceback (most recent call last): File
"C:\Users\Marcus\Documents\Blog\Python\Scripts\Brooks\textblob_sentiment.py",
line 30, in
blob = TextBlob(row) File "C:\Python27\lib\site-packages\textblob\blob.py", line 344, in
init
'must be a string, not {0}'.format(type(text)))TypeError: The text argument passed to __init__(text) must be a string, not
My code is:
#from __future__ import division, unicode_literals #(This was recommended for Python 2.x, but didn't help in my case.)
#-*- coding: utf-8 -*-
import csv
from textblob import TextBlob
with open(u'items.csv', 'rb') as scrape_file:
reader = csv.reader(scrape_file, delimiter=',', quotechar='"')
for row in reader:
row = [unicode(cell, 'utf-8') for cell in row]
print row
blob = TextBlob(row)
print type(blob)
I have been working through UTF/unicode issues. I'd originally had a different subject which I posed to this thread. (Since my code and the error have changed, I'm posting to a new thread.) Print statements indicate that the variable "row" is of type=str, which I thought indicated that the reader object had been transformed as required by Textblob. The source .csv file is saved as UTF-8. Can anyone provide feedback as to how I can get unblocked on this, and the flaws in my code?
Thanks so much for the help.
So maybe you can make change as below:
row = str([cell.encode('utf-8') for cell in row])

string vs unicode encoding - Struct() argument

I am experiencing a strange problem that returns the same error, regardless of the encoding I use. The code works well, without the encoding part in Python 2.7.8, but it breaks in 2.7.6 which is the version that I use for all my development.
import MIDI_PY2 as md
import glob
import ast
import os
dir = '/Users/user/Desktop/sample midis/'
os.chdir(dir)
file_list = []
for file in glob.glob('*.mid'):
file_list.append((dir + file))
dir = '/Users/user/Desktop/sample midis/'
os.chdir(dir)
file_list returns this:
[u'/Users/user/Desktop/sample midis/M1.mid',
u'/Users/user/Desktop/sample midis/M2.mid',
u'/Users/user/Desktop/sample midis/M3.mid',
u'/Users/user/Desktop/sample midis/M4.mid']
md.concatenate_midis(file_list,'/Users/luissanchez/Desktop/temp/out.mid') returns this error:
-
TypeError Traceback (most recent call last)
<ipython-input-73-2d7eef92f566> in <module>()
----> 1 md.concatenate_midis(file_list_1,'/Users/user/Desktop/temp/out.mid')
/Users/user/Desktop/sample midis/MIDI_PY2.pyc in concatenate_midis(paths, outPath)
/Users/user/Desktop/sample midis/MIDI_PY2.pyc in midi2score(midi)
/Users/user/Desktop/sample midis/MIDI_PY2.pyc in midi2opus(midi)
TypeError: Struct() argument 1 must be string, not unicode
then I modify the code so the first argument is string, not unicode:
file_list_1 = [str(x) for x in file_list]
which returns:
['/Users/user/Desktop/sample midis/M1.mid',
'/Users/user/Desktop/sample midis/M2.mid',
'/Users/user/Desktop/sample midis/M3.mid',
'/Users/user/Desktop/sample midis/M4.mid']
running the function concatenate_midis with this last list (file_list_1) returns exactly the same error: TypeError: Struct() argument 1 must be string, not unicode.
Does anybody knows what's going on here? concatenate_midi works well in python 2.7.8, but can't figure out why it doesn't work in what I use, Enthought Canopy Python 2.7.6 | 64-bit
Thanks
The error
error: TypeError: Struct() argument 1 must be string, not unicode.
is usually caused by the struct.unpack() function which in older versions of python requires string arguments and not unicode. Check that struct.unpack() arguments are strings and not unicodes.
One possible cause is from __future__ .. statement.
>>> type('a')
<type 'str'>
>>> from __future__ import unicode_literals
>>> type('a')
<type 'unicode'>
Check whether your code contains the statement.