encoding in python and writing it to a YAML file in Python - python-2.7

I have a Unicode, which is read from a CSV file:
df.iloc[0,1]
Out[41]: u'EU-repr\xe6sentant udpeget'
In [42]: type(df_translated.iloc[0,1])
Out[42]: unicode
I would like to have it as EU-repræsentant udpeget. The final goal is to write this into a dictionary and then finally save that dict to a YAML file with PyYAML using safe_dump. However, I struggle with the encoding.

If you really need to use PyYAML you should provide the arguments
encoding='utf-8' and allow_unicode=True to the safe_dump()
routine.
If you ever intend to upgrade to YAML 1.2 and use ruamel.yaml
(disclaimer: I am the author of that package), those are the (much
more sensible) defaults:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = [u'EU-repr\xe6sentant udpeget']
yaml.dump(data, sys.stdout)
which gives:
- EU-repræsentant udpeget

Related

how to pass current date in python when using os. system for copying files to a gcs location in python 2.7.5

I have a python script which moves file from local dir to a gs:// using os.system. I need to pass today's date to the filename in the gcs bucket.
Here is the script:
#!/usr/bin/python
import time
import requests
import csv
import json
import os
from datetime import date
#current_date = date.today()
def uploadfile2GCSraw():
current_date = date.today()
os. system('gsutil cp /u/y/XXXX/abcd.json gs://XXXX/XX/XX/CRE_DT=current_date')
Im very new to python, when i run the above, the file is created as cre_dt=current_date, as is. its not taking the date from date.today(). Can someone help? Thanks
When you have current_date on that final line, it's going to literally be the string current_date.
Try using an f-string, like this:
os.system(f"gsutil cp /u/y/XXXX/abcd.json gs://XXXX/XX/XX/CRE_DT={current_date}")
For older python 2, use this syntax:
os.system("gsutil cp /u/y/XXXX/abcd.json gs://XXXX/XX/XX/CRE_DT=%s"%(date.today()))
(And then upgrade to Python 3.)
Should do what you want.

Reading multiple files in a directory with pyyaml

I'm trying to read all yaml files in a directory, but I am having trouble. First, because I am using Python 2.7 (and I cannot change to 3) and all of my files are utf-8 (and I also need them to keep this way).
import os
import yaml
import codecs
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
def yaml_dump(filepath, data):
with open(filepath, 'w') as file_descriptor:
yaml.dump(data, file_descriptor)
if __name__ == "__main__":
filepath = os.listdir(os.getcwd())
data = yaml_reader(filepath)
print data
When I run this code, python gives me the message:
TypeError: coercing to Unicode: need string or buffer, list found.
I want this program to show the content of the files. Can anyone help me?
I guess the issue is with filepath.
os.listdir(os.getcwd()) returns the list of all the files in the directory. so you are passing the list to codecs.open() instead of filename
There are multiple problems with your code, apart from that it is invalide Python, in the way you formatted this.
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
however it is not necessary to do the decoding, PyYAML is perfectly capable of processing UTF-8:
def yaml_reader(filepath):
with open(filepath, "rb") as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
I hope you realise your trying to load multiple documents and always get a list as a result in data even if your file contains one document.
Then the line:
filepath = os.listdir(os.getcwd())
gives you a list of files, so you need to do:
filepath = os.listdir(os.getcwd())[0]
or decide in some other way, which of the files you want to open. If you want to combine all files (assuming they are YAML) in one big YAML file, you need to do:
if __name__ == "__main__":
data = []
for filepath in os.listdir(os.getcwd()):
data.extend(yaml_reader(filepath))
print data
And your dump routine would need to change to:
def yaml_dump(filepath, data):
with open(filepath, 'wb') as file_descriptor:
yaml.dump(data, file_descriptor, allow_unicode=True, encoding='utf-8')
However this all brings you to the biggest problem: that you are using PyYAML, that will mangle your YAML, dropping flow-style, comment, anchor names, special int/float, quotes around scalars etc. Apart from that PyYAML has not been updated to support YAML 1.2 documents (which has been the standard since 2009). I recommend you switch to using ruamel.yaml (disclaimer: I am the author of that package), which supports YAML 1.2 and leaves comments etc in place.
And even if you are bound to use Python 2, you should use the Python 3 like syntax e.g. for print that you can get with from __future__ imports.
So I recommend you do:
pip install pathlib2 ruamel.yaml
and then use:
from __future__ import absolute_import, unicode_literals, print_function
from pathlib import Path
from ruamel.yaml import YAML
if __name__ == "__main__":
data = []
yaml = YAML()
yaml.preserve_quotes = True
for filepath in Path('.').glob('*.yaml'):
data.extend(yaml.load_all(filepath))
print(data)
yaml.dump(data, Path('your_output.yaml'))

pandas.DataFrame.to_pickle backward compatibility

pandas.DataFrame.to_pickle's compression parameter was introduced in pandas 0.20. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html
Before pandas 0.20, there was no compression param that I needed to specify.
I have a webapp written using pandas 0.18 and to read the pickle file using pandas.read_pickle in version 0.18 without error, how should I pickle the file?
So far I have tried setting the compression parameter to None and 'gzip'. Both don't work.
It looks like you don't actually need to specify. The default compression='infer' should work.
But, why not just import and use pickle?
This is what I have been using
# import and save object as pickle
import pickle
pickle.dump(object, open('filename.pkl', 'wb'))
# and this is how to load them
loaded_object = pickle.load(open('filename.pkl', 'rb'))

Gzip and Encode file in Python 2.7

with gzip.open(sys.argv[5] + ".json.gz", mode="w", encoding="utf-8") as outfile:
It throws:
TypeError: open() got an unexpected keyword argument 'encoding'
But the docs says it exists
https://docs.python.org/3/library/gzip.html
Update
How can i encode and zip the file in Python 2.7?
I tried now this:
(but it don't work)
with gzip.open(sys.argv[5] + ".json.gz", mode="w") as outfile:
outfile = io.TextIOWrapper(outfile, encoding="utf-8")
json.dump(fdata, outfile, indent=2, ensure_ascii=False)
TypeError: must be unicode, not str
What can i do?
Those are the Python 3 docs. The Python 2 version of gzip does not allow encoding= as a keyword argument to gzip.open().
Seems the question has been answered sufficiently, but for your peace of mind: Alternatively to ensure that Python2 uses utf-8 as standard perhaps try the following, as it then becomes unnecessary to specify an encoding:
import sys
reload(sys)
sys.setdefaultencoding('UTF8')

Manually building a deep copy of a ConfigParser in Python 2.7

Just starting in on my Python learning curve, and hitting a snag in porting some code up to Python 2.7. It appears that in Python 2.7 it is no longer possible to perform a deepcopy() on instances of ConfigParser. It also appears that the Python team isn't terribly interested in restoring such a capability:
http://bugs.python.org/issue16058
Can someone propose an elegant solution for manually constructing a deepcopy/duplicate of an instance of ConfigParser?
Many thanks, -Pete
This is just an example implementation of Jan Vlcinsky answer written in Python 3 (I don't have enough reputation to post this as a comment to Jans answer). Many thanks to Jan for the push in the right direction.
To make a full (deep) copy of base_config into new_config just do the following;
import io
import configparser
config_string = io.StringIO()
base_config.write(config_string)
# We must reset the buffer ready for reading.
config_string.seek(0)
new_config = configparser.ConfigParser()
new_config.read_file(config_string)
Based on #Toenex answer, modified for Python 2.7:
import StringIO
import ConfigParser
# Create a deep copy of the configuration object
config_string = StringIO.StringIO()
base_config.write(config_string)
# We must reset the buffer to make it ready for reading.
config_string.seek(0)
new_config = ConfigParser.ConfigParser()
new_config.readfp(config_string)
The previous solution doesn't work in all python3 use cases. Specifically if the original parser is using Extended Interpolation the copy may fail to work correctly. Fortunately, the easy solution is to use the pickle module:
def deep_copy(config:configparser.ConfigParser)->configparser.ConfigParser:
"""deep copy config"""
rep = pickle.dumps(config)
new_config = pickle.loads(rep)
return new_config
If you need new independent copy of ConfigParser, then one option is:
have original version of ConfigParser
serialize the config file into temporary file or StringIO buffer
use that tmpfile or StringIO buffer to create new ConfigParser.
And you have it done.
If you are using Python 3 (3.2+) you can use the Mapping Protocol Access to copy (actually deep copy) the sections and options of a source configuration to another ConfigParser object.
You can use read_dict() to copy the state of a configuration parser.
Here is a demo:
import configparser
# the configuration to deep copy:
src_cfg = configparser.ConfigParser()
src_cfg.add_section("Section A")
src_cfg["Section A"]["key1"] = "value1"
src_cfg["Section A"]["key2"] = "value2"
# the destination configuration
dst_cfg = configparser.ConfigParser()
dst_cfg.read_dict(src_cfg)
dst_cfg.add_section("Section B")
dst_cfg["Section B"]["key3"] = "value3"
To display the resulting configuration, you can try:
import io
output = io.StringIO()
dst_cfg.write(output)
print(output.getvalue())
You get:
[Section A]
key1 = value1
key2 = value2
[Section B]
key3 = value3
After reading this article, I am more familiar with config.ini.
Record as follows:
import io
import configparser
def copy_config_demo():
with io.StringIO() as memory_file:
memory_file.write(str(test_config_data.__doc__)) # original_config.write(memory_file)
memory_file.seek(0)
new_config = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())
new_config.read_file(memory_file)
# below is just for test
for section_name, list_item in [(section_name, new_config.items(section_name)) for section_name in new_config.sections()]:
print('\n[' + section_name + ']')
for key, value in list_item:
print(f'{key}: {value}')
def test_config_data():
"""
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
Python: >=3.2
path: ${Common:system_dir}/Library/Frameworks/
[Arthur]
name: Carson
my_dir: ${Common:home_dir}/twosheds
my_pictures: ${my_dir}/Pictures
python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
"""
output:
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
python: >=3.2
path: /System/Library/Frameworks/
[Arthur]
name: Carson
my_dir: /Users/twosheds
my_pictures: /Users/twosheds/Pictures
python_dir: /System/Library/Frameworks//Python/Versions/>=3.2
hoping it is helpful to you.