scheduler produces empty files - python-2.7

I'm using pythonanywhere for a simple scheduled task.
I want to download data from a link once a day and save csv files. Later once i have a decent time series I'll figure out how I actually want to manage the data. It's not much data so don't need anything fancy like a database.
My script takes the data from the google sheets link, adds a log column and a time column, then writes a csv with the date in the filename.
It works exactly as I want it to when I run it manually in pythonanywhere, but the scheduler is just creating empty csv files albeit with the correct name.
Any ideas what's up? I don't understand the log file. Surely the error should happen when it is run manually?
script:
import pandas as pd
import time
import datetime
def write_today(df):
date = time.strftime("%Y-%m-%d")
df.to_csv('Properties_'+date+'.csv')
url = 'https://docs.google.com/spreadsheets/d/19h2GmLN-2CLgk79gVxcazxtKqS6rwW36YA-qvuzEpG4/export?format=xlsx'
df = pd.read_excel(url, header=1).rename(columns={'Unnamed: 1':'code'})
source = pd.read_excel(url).columns[0]
df['source'] = source
df['time'] = datetime.datetime.now()
write_today(df)
the scheduler is set up as so:
log file:
Traceback (most recent call last):
File "/home/abmoore/load_data.py", line 24, in <module>
write_today(df)
File "/home/abmoore/load_data.py", line 16, in write_today
df.to_csv('Properties_'+date+'.csv')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1344, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1551, in save
self._save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1638, in _save
self._save_header()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1634, in _save_header
writer.writerow(encoded_labels)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)

Your problem there is the UnicodeDecodeError -- you have some non-ascii data in your spreadsheet, and the pandas to_csv function defaults to ascii encoding. try specifying utf8 instead:
def write_today(df):
filename = 'Properties_{date}.csv'.format(date=time.strftime("%Y-%m-%d"))
df.to_csv(filename, encoding='utf8')
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Related

Django encoding error when reading from a CSV

When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.

retriving data saved under HDF5 group as Carray

I am new to HDF5 file format and I have a data(images) saved in HDF5 format. The images are saved undere a group called 'data' which is under the root group as Carrays. what I want to do is to retrive a slice of the saved images. for example the first 400 or somthing like that. The following is what I did.
h5f = h5py.File('images.h5f', 'r')
image_grp= h5f['/data/'] #the image group (data) is opened
print(image_grp[0:400])
but I am getting the following error
Traceback (most recent call last):
File "fgf.py", line 32, in <module>
print(image_grp[0:40])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/feedstock_root/build_artefacts/h5py_1496410723014/work/h5py-2.7.0/h5py/_objects.c:2846)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/feedstock_root/build_artefacts/h5py_1496410723014/work/h5py
2.7.0/h5py/_objects.c:2804)
File "/..../python2.7/site-packages/h5py/_hl/group.py", line 169, in
__getitem__oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "/..../python2.7/site-packages/h5py/_hl/base.py", line 133, in _e name = name.encode('ascii')
AttributeError: 'slice' object has no attribute 'encode'
I am not sure why I am getting this error but I am not even sure if I can slice the images which are saved as individual datasets.
I know this is an old question, but it is the first hit when searching for 'slice' object has no attribute 'encode' and it has no solution.
The error happens because the "group" is a group which does not have the encoding attribute. You are looking for the dataset element.
You need to find/know the key for the item that contains the dataset.
One suggestion is to list all keys in the group, and then guess which one it is:
print(list(image_grp.keys()))
This will give you the keys in the group.
A common case is that the first element is the image, so you can do this:
image_grp= h5f['/data/']
image= image_grp(image_grp.keys[0])
print(image[0:400])
yesterday I had a similar error and wrote this little piece of code to take my desired slice of h5py file.
import h5py
def h5py_slice(h5py_file, begin_index, end_index):
slice_list = []
with h5py.File(h5py_file, 'r') as f:
for i in range(begin_index, end_index):
slice_list.append(f[str(i)][...])
return slice_list
and it can be used like
the_desired_slice_list = h5py_slice('images.h5f', 0, 400)

Load HDF file into list of Python Dask DataFrames

I have a HDF5 file that I would like to load into a list of Dask DataFrames. I have set this up using a loop following an abbreviated version of the Dask pipeline approach. Here is the code:
import pandas as pd
from dask import compute, delayed
import dask.dataframe as dd
import os, h5py
#delayed
def load(d,k):
ddf = dd.read_hdf(os.path.join(d,'Cleaned.h5'), key=k)
return ddf
if __name__ == '__main__':
d = 'C:\Users\User\FileD'
loaded = [load(d,'/DF'+str(i)) for i in range(1,10)]
ddf_list = compute(*loaded)
print(ddf_list[0].head(),ddf_list[0].compute().shape)
I get this error message:
C:\Python27\lib\site-packages\tables\group.py:1187: UserWarning: problems loading leaf ``/DF1/table``::
HDF5 error back trace
File "..\..\hdf5-1.8.18\src\H5Dio.c", line 173, in H5Dread
can't read data
File "..\..\hdf5-1.8.18\src\H5Dio.c", line 543, in H5D__read
can't initialize I/O info
File "..\..\hdf5-1.8.18\src\H5Dchunk.c", line 841, in H5D__chunk_io_init
unable to create file chunk selections
File "..\..\hdf5-1.8.18\src\H5Dchunk.c", line 1330, in H5D__create_chunk_file_map_hyper
can't insert chunk into skip list
File "..\..\hdf5-1.8.18\src\H5SL.c", line 1066, in H5SL_insert
can't create new skip list node
File "..\..\hdf5-1.8.18\src\H5SL.c", line 735, in H5SL_insert_common
can't insert duplicate key
End of HDF5 error back trace
Problems reading the array data.
The leaf will become an ``UnImplemented`` node.
% (self._g_join(childname), exc))
The message mentions a duplicate key. I iterated over the first 9 files to test out the code and, in the loop, I am using each iteration to assemble a different key that I use with dd.read_hdf. Across all iterations, I'm keeping the filename is the same - only the key is being changed.
I need to use dd.concat(list,axis=0,...) in order to vertically concatenate the contents of the file. My approach was to load them into a list first and then concatenate them.
I have installed PyTables and h5Py and have Dask version 0.14.3+2.
With Pandas 0.20.1, I seem to get this to work:
for i in range(1,10):
hdf = pd.HDFStore(os.path.join(d,'Cleaned.h5'),mode='r')
df = hdf.get('/DF{}' .format(i))
print df.shape
hdf.close()
Is there a way I can load this HDF5 file into a list of Dask DataFrames? Or is there another approach to vertically concatenate them together?
Dask.dataframe is already lazy, so there is no need to use dask.delayed to make it lazier. You can just call dd.read_hdf repeatedly:
ddfs = [dd.read_hdf(os.path.join(d,'Cleaned.h5'), key=k)
for k in keys]
ddf = dd.concat(ddfs)

Python 2.7 and Textblob - TypeError: The `text` argument passed to `__init__(text)` must be a string, not <type 'list'>

Update: Issue resolved. (see comment section below.) Ultimately, the following two lines were required to transform my .csv to unicode and utilize TextBlob: row = [cell.decode('utf-8') for cell in row], and text = ' '.join(row).
Original question:
I am trying to use a Python library called Textblob to analyze text from a .csv file. Error I receive when I call Textblob in my code is:
Traceback (most recent call last): File
"C:\Users\Marcus\Documents\Blog\Python\Scripts\Brooks\textblob_sentiment.py",
line 30, in
blob = TextBlob(row) File "C:\Python27\lib\site-packages\textblob\blob.py", line 344, in
init
'must be a string, not {0}'.format(type(text)))TypeError: The text argument passed to __init__(text) must be a string, not
My code is:
#from __future__ import division, unicode_literals #(This was recommended for Python 2.x, but didn't help in my case.)
#-*- coding: utf-8 -*-
import csv
from textblob import TextBlob
with open(u'items.csv', 'rb') as scrape_file:
reader = csv.reader(scrape_file, delimiter=',', quotechar='"')
for row in reader:
row = [unicode(cell, 'utf-8') for cell in row]
print row
blob = TextBlob(row)
print type(blob)
I have been working through UTF/unicode issues. I'd originally had a different subject which I posed to this thread. (Since my code and the error have changed, I'm posting to a new thread.) Print statements indicate that the variable "row" is of type=str, which I thought indicated that the reader object had been transformed as required by Textblob. The source .csv file is saved as UTF-8. Can anyone provide feedback as to how I can get unblocked on this, and the flaws in my code?
Thanks so much for the help.
So maybe you can make change as below:
row = str([cell.encode('utf-8') for cell in row])

PYPDF watermarking returns error

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong.
i get the following error:
Traceback (most recent call last): File "test.py", line 13, in <module>
page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage
self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage
page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line 1547, in
_contentStreamRename
op = operands[i] KeyError: 0
using python 2.7.6 with pypdf2 1.19 on windows 32bit.
hopefully someone can tell me what i do wrong.
my python file:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("test.pdf", "rb"))
watermark = PdfFileReader(open("watermark.pdf", "rb"))
# print how many pages input1 has:
print("test.pdf has %d pages." % input.getNumPages())
print("watermark.pdf has %d pages." % watermark.getNumPages())
# add page 0 from input, but first add a watermark from another PDF:
page = input.getPage(0)
page.mergePage(watermark.getPage(0))
output.addPage(page)
# finally, write "output" to document-output.pdf
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
Try writing to a StringIO object instead of a disk file. So, replace this:
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
with this:
outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object
outputStream.close()
If above code works, then you might be having file writing permission issues. For reference, look at the PyPDF working example in my article.
I encountered this error when attempting to use PyPDF2 to merge in a page which had been generated by reportlab, which used an inline image canvas.drawInlineImage(...), which stores the image in the object stream of the PDF. Other PDFs that use a similar technique for images might be affected in the same way -- effectively, the content stream of the PDF has a data object thrown into it where PyPDF2 doesn't expect it.
If you're able to, a solution can be to re-generate the source pdf, but to not use inline content-stream-stored images -- e.g. generate with canvas.drawImage(...) in reportlab.
Here's an issue about this on PyPDF2.