python 2.7 line break - directory line to the file - python-2.7

I've got one more novice question:
I've got numerous links to external files and some of the directorate names are quite long (due to original folder structure). I've tried numerous methods for breaking a line, but most of them fails while using it with conjunction with pyodbc module.
So far I've got:
SIMD = xlrd.open_workbook(r'P:\Costing and Income\Projects & Planning\HRG, '\
'IRF, Programme Budgeting\__2008-11\Developments\SIMD\PI_upload (08.05.2012).xls')
Which works OK for xlrd module
Tried some simple stuff directly in the IDLE:
>>> a = 'some text'\
'more stuff'
>>> a
'some textmore stuff'
>>> b = r'some stuff'\
' even more'
>>> b
'some stuff even more'
>>> c = r'one' r'two'
>>> c
'onetwo'
>>>
And now the part that fails me:
PCPath1 = r'Z:\IRF\Data\Primary Care Hospitals\PI\_'\
'2008-11 (final)\2012.08.15 - 2008-11_PCH_v4.mdb'
PCConn1 = pyodbc.connect('DRIVER={Microsoft Access Driver (*.mdb)}; DBQ='+PCPath1)
I've got following error:
Traceback (most recent call last):
File "Z:/IRF/Python/Production/S3_PC1_0811.py", line 7, in <module>
PCConn1 = pyodbc.connect('DRIVER={Microsoft Access Driver (*.mdb)}; DBQ='+PCPath1)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 100: ordinal not in range(128)
It works OK when PCPath1 is not broken down.
One could ask why I'm trying to do it, well mostly is for the code readibility.
Any help with above would be greatly appreciated!

You need to put an r in front of the second line as well, otherwise the \ will be combined with the 201 to produce the non-ascii character \x81.
In [5]: r'Z:\IRF\Data\Primary Care Hospitals\PI\_'\
'2008-11 (final)\2012.08.15 - 2008-11_PCH_v4.mdb'
Out[5]: 'Z:\\IRF\\Data\\Primary Care Hospitals\\PI\\_2008-11 (final)\x812.08.15 - 2008-11_PCH_v4.mdb'
In [6]: r'Z:\IRF\Data\Primary Care Hospitals\PI\_'\
r'2008-11 (final)\2012.08.15 - 2008-11_PCH_v4.mdb'
Out[6]: 'Z:\\IRF\\Data\\Primary Care Hospitals\\PI\\_2008-11 (final)\\2012.08.15 - 2008-11_PCH_v4.mdb'

Related

Django encoding error when reading from a CSV

When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.

Understanding struct.pack in python 2.7 and 3.5+

I am attempting to understand; and resolve, why the following happens:
$ python
>>> import struct
>>> list(struct.pack('hh', *(50,50)))
['2', '\x00', '2', '\x00']
>>> exit()
$ python3
>>> import struct
>>> list(struct.pack('hh', *(50, 50)))
[50, 0, 50, 0]
I understand that hh stands for 2 shorts. I understand that struct.pack is converting the two integers (shorts) to a c style struct. But why does the output in 2.7 differ so much from 3.5?
Unfortunately I am stuck with python 2.7 for right now on this project and I need the output to be similar to one from python 3.5
In response to comment from Some Programmer Dude
$ python
>>> import struct
>>> a = list(struct.pack('hh', *(50, 50)))
>>> [int(_) for _ in a]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
in python 2, struct.pack('hh', *(50,50)) returns a str object.
This has changed in python 3, where it returns a bytes object (difference between binary and string is a very important difference between both versions, even if bytes exists in python 2, it is the same as str).
To emulate this behaviour in python 2, you could get ASCII code of the characters by appling ord to each char of the result:
map(ord,struct.pack('hh', *(50,50)))

Handling map function in python2 & python3

Recently i came across a question & confused with a possible solution,
code part is
// code part in result reader
result = map(int, input())
// consumer call
result_consumer(result)
its not about how do they work, the problem is when you are running in python2 it will raise an exception, on result fetching part, so result reader can handle the exception, but incase of python3 a map object is returned, so only consumer will be able to handle exception.
is there any solution keeping map function & handle the exception in python2 & python3
python3
>>> d = map(int, input())
1,2,3,a
>>> d
<map object at 0x7f70b11ee518>
>>>
python2
>>> d = map(int, input())
1,2,3,'a'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'a'
>>>
the behavior of map is not the only difference between python2 and python3, input is also difference, you need to keep in mind the basic differences between the two to make code compatible for both
python 3 vs python 2
map = itertools.imap
zip = itertools.izip
filter = itertools.ifilter
range = xrange
input = raw_input
so to make code for both, you can use alternatives like list comprehension that work the same for both, and for those that don't have easy alternatives, you can make new functions and/or use conditional renames, like for example
my_input = input
try:
raw_input
except NameError: #we are in python 3
my_input = lambda msj=None: eval(input(msj))
(or with your favorite way to check which version of python is in execution)
# code part in result reader
result = [ int(x) for x in my_input() ]
# consumer call
result_consumer(result)
that way your code do the same regardless of which version of python you run it.
But as jsbueno mentioned, eval and python2's input are dangerous so use the more secure raw_input or python3's input
try:
input = raw_input
except NameError: #we are in python 3
pass
(or with your favorite way to check which version of python is in execution)
then if your plan is to provide your input as 1,2,3 add an appropriate split
# code part in result reader
result = [ int(x) for x in input().split(",") ]
# consumer call
result_consumer(result)
If you always need the exception to occur at the same place you can always force the map object to yield its results by wrapping it in a list call:
result = list(map(int, input()))
If an error occurs in Python 2 it will be during the call to map while, in Python 3, the error is going to surface during the list call.
The slight downside is that in the case of Python 2 you'll create a new list. To avoid this you could alternatively branch based on sys.version and use the list only in Python 3 but that might be too tedious for you.
I usually use my own version of map in this situations to escape any possible problem may occur and it's
def my_map(func,some_list):
done = []
for item in some_list:
done.append( func(item) )
return done
and my own version of input too
def getinput(text):
import sys
ver = sys.version[0]
if ver=="3":
return input(text)
else:
return raw_input(text)
if you are working on a big project add them to a python file and import them any time you need like what I do.

Mailgun Talon: Signature extraction example throwing error

I installed mailgun/talon on GCE and was trying out the example in the README section, but it threw the following error at me:
>>> from talon import signature
>>> message = """Thanks Sasha, I can't go any higher and is why I limited it to the
... homepage.
...
... John Doe
... via mobile"""
>>> message
"Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage.\n\nJohn Doe\nvia mobile"
>>> text,signtr = signature.extract(message, sender='john.doe#example.com')
ERROR:talon.signature.extraction:ERROR when extracting signature with classifiers
Traceback (most recent call last):
File "talon/signature/extraction.py", line 57, in extract
markers = _mark_lines(lines, sender)
File "talon/signature/extraction.py", line 99, in _mark_lines
elif is_signature_line(line, sender, EXTRACTOR):
File "talon/signature/extraction.py", line 40, in is_signature_line
return classifier.decisionFunc(data, 0) > 0
AttributeError: 'NoneType' object has no attribute 'decisionFunc'
Do I need to train the model somehow (this signature seems to be the ML example)? I installed it using pip.
If you want to use signature parsing with classifiers you just need to call talon.init() before using the lib - it loads trained classifiers. Other methods like talon.signature.bruteforce.extract_signature() or talon.quotations.extract_from() don't require classifiers. Here's a full code sample:
import talon
# don't forget to init the library first
# it loads machine learning classifiers
talon.init()
from talon import signature
message = """Thanks Sasha, I can't go any higher and is why I limited it to the
homepage.
John Doe
via mobile"""
text, signature = signature.extract(message, sender='john.doe#example.com')
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"

ValueError: need more than 1 value to unpack - python graph core

I was planning to try to use this code in order to do a critical path analysis.
When running this code I got the following error but I have no idea what it means (because I don't now how the code works).
Traceback (most recent call last): File
"/Users/PeterVanvoorden/Desktop/test.py", line 22, in
G.add_edge('A','B',1) File "/Library/Python/2.7/site-packages/python_graph_core-1.8.2-py2.7.egg/pygraph/classes/digraph.py",
line 161, in add_edge
u, v = edge ValueError: need more than 1 value to unpack
# Copyright (c) 2007-2008 Pedro Matiello <pmatiello#gmail.com>
# License: MIT (see COPYING file)
import sys
sys.path.append('..')
import pygraph
from pygraph.classes.digraph import digraph
from pygraph.algorithms.critical import transitive_edges, critical_path
#demo of the critical path algorithm and the transitivity detection algorithm
G = digraph()
G.add_node('A')
G.add_node('B')
G.add_node('C')
G.add_node('D')
G.add_node('E')
G.add_node('F')
G.add_edge('A','B',1)
G.add_edge('A','C',2)
G.add_edge('B','C',10)
G.add_edge('B','D',2)
G.add_edge('B','E',8)
G.add_edge('C','D',7)
G.add_edge('C','E',3)
G.add_edge('E','D',1)
G.add_edge('D','F',3)
G.add_edge('E','F',1)
#add this edge to add a cycle
#G.add_edge('E','A',1)
print transitive_edges(G)
print critical_path(G)
I know it is kind of stupid just to copy code without understanding it but I thought I'd first try the example code in order to see if the package is working but apparently it's not working. Now I wonder if it's just because of a little mistake in the example code or if it's a more fundamental problem.
I peeked at the source code for this and see that add_edge is trying to unpack the first positional argument as a 2-tuple.
If you change these lines:
G.add_edge('A','B',1)
G.add_edge('A','C',2)
...
to:
G.add_edge(('A', 'B'), 1) # note the extra parens
G.add_edge(('A', 'C'), 2)
...
it should work. However, I have not used pygraph before so this may still not produce the desired results.