Django download excel file results in corrupted Excel file - django

I am trying to export a Pandas dataframe from a Django app as an Excel file. I currently export it to CSV file and this works fine except as noted. The problem is that when the user open the csv file in Excel App, a string that looks like numbers .... for example a cell with a value of '111,1112' or '123E345' which is intended to be a string, ends up showing as error or exponent in Excel view; even if I make sure that the Pandas column is not numeric.
This is how I export to CSV:
response = HttpResponse(content_type='text/csv')
filename = 'aFileName'
response['Content-Disposition'] = 'attachment; filename="' + filename + '"'
df.to_csv(response, encoding='utf-8', index=False)
return response
To export with content type EXCEL, I saw several references where the following approach was recommended:
with BytesIO() as b:
# Use the StringIO object as the filehandle.
writer = pd.ExcelWriter(b, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
return HttpResponse(b.getvalue(), content_type='application/vnd.ms-excel')
When I try this, It appears to export something, but if I try to open the file in Excel Office 2016, I get a Excel message that the file is corrupted. Generally the size is few KB at most.
Please advise what could be wrong with the 2nd approach that is causing a bad export. I am using Django 2.2.1, Pandas 0.25.1
Thank you

Related

Pandas closing django-like file, gives ValueError: I/O operation on closed file when uploading

In my process i need to upload a file to django as:
newFile = request.FILES['file']
then in another big function i open it with pandas:
data = pandas.read_csv(data_file, engine = 'python', header=headers_row, encoding = 'utf-8-sig')
and then i need to upload it
uploaded_file = Uploaded_file(file = newFile, retailer = ret, date = date)
but randomly (like 50/50) i get a ValueError: I/O operation on closed file.
Any solution to this? is it possible to open the file again or maybe make a copy of it and use pandas in one and upload the other?
I tried the later but i'm not sure of the implications of going this route:
from io import BytesIO
output = BytesIO(newFile.file.read())
for now it works but i'd appreciate any input on this

Read a Django UploadedFile into a pandas DataFrame

I am attempting to read a .csv file uploaded to Django into a DataFrame.
I am following the instructions and the Django REST Framework page for uploading files. When I PUT a .csv file to a defined endpoint I end up with a Django UploadedFile object, in particular, a TemporaryUploadedFile.
I am trying to read this object into a pandas Dataframe using read_csv, however, there is additional formatting around the temporary uploaded file. I am wondering how to read the original .csv file that was uploaded.
According to the DRF docs, I have assigned:
file_obj = request.data['file']
Inside of a Python debugging console, I see:
ipdb> file_obj
<TemporaryUploadedFile: foobar.csv (multipart/form-data; boundary=--------------------------044608164241682586561733)>
Things I've tried so far.
With the original file path, I can read it into pandas like this.
dataframe = pd.read_csv(open("foobar.csv", "rb"))
However, the original file has additional metadata added by Django during the upload process.
ipdb> pd.read_csv(open(file_obj.temporary_file_path(), "rb"))
*** pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 32
If I try to use the UploadedFile.read() method, I run into the following issue.
ipdb> dataframe = pd.read_csv(file_obj.read())
*** OSError: Expected file path name or file-like object, got <class 'bytes'> type
Thanks!
P.S. The first few lines of the original file look like this.
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
When I look at the contents of the temporary file, I see this.
----------------------------789873173211443224653494
Content-Disposition: form-data; name="file"; filename="foobar.csv"
Content-Type: File
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
UploadedFile.read() returns the file data in bytes, not a file path or file-like object. In order to use pandas read_csv() function, you'll need to turn those bytes into a stream. Since your file is a csv, the most straightforward way would be to use bytes.decode() with io.StringIO(), like:
dataframe = pd.read_csv(io.StringIO(file_obj.read().decode('utf-8')), delimiter=',')

Flask response to excel file giving corrupt excel file

I have a website that has a button. When it's clicked, it returns a number of pandas dataframers into an excel file and returns that excel file automatically as download.
It seems to work ok, except when I open the file, it seems to be corrupted. It asks if some of the tabs should be recovered. I'm using the code below. Any suggestions are appreciated what could be the cause for this.
import io
from flask.helpers import make_response
from pandas.io.excel import ExcelWriter
output = io.BytesIO()
writer = ExcelWriter(output)
dfs = [df1,df2....]
tabs ['tab1','tab2',....]
for df, tab_name in zip(dfs, tab_names):
df.to_excel(writer, tab_name)
writer.close()
resp = make_response(output.getvalue())
resp.headers['Content-Disposition'] = 'attachment; filename=output.xlsx'
resp.headers["Content-type"] = "text/csv"
return resp
You'll need to to add
output.seek(0)
after you close the writer.
You might also find it easier to write
return send_file(output, attachment_filename="output.xlsx", as_attachment=True)
(after importing send_file from flask)

Pandas dataframe to existing excel workbook

I'm trying to use openpyxl to write data to an existing xlsx workbook and save it as a separate file. I would like to write a dataframe to a sheet called 'Data' and write some values to another sheet called 'Summary' then save the workbook object as 'test.xlsx'. I have the following code that writes the calculations I need to the summary sheet, but I get a TypeError for an unexpected keyword argument 'font' when I try to write the dateframe, which doesn't make much sense to me...
I'm using the following code which I've adapted from here
from openpyxl import load_workbook
import pandas as pd
book = load_workbook('template.xlsx')
# Write values to summary sheet
ws = book.get_sheet_by_name('Summary')
ws['A1'] = 'TEST'
book.save('test.xlsx')
# Write df
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name="Data", index=False)
writer.save()
The trace-back:
File "C:\Users\me\AppData\Local\Continuum\Anaconda\lib\site- packages\pandas\core\frame.py", line 1274, in to_excel
startrow=startrow, startcol=startcol)
File "C:\Users\me\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\excel.py", line 778, in write_cells
xcell.style = xcell.style.copy(**style_kwargs)
File "C:\Users\me\AppData\Local\Continuum\Anaconda\lib\site-packages\openpyxl-2.2.2-py2.7.egg\openpyxl\compat\__init__.py", line 67, in new_func
return obj(*args, **kwargs)
TypeError: copy() got an unexpected keyword argument 'font'
I think the error arises from necessary changes in the openpyxl styles API. As a result Pandas installs an older version of openpyxl. You should be okay, therefore, if you remove openpyxl 2.2

xlwt cannot format number to date

Why the following code can't format 44000 to a date in excel? It shows up in xls file as the original number no matter what I try.
Things I have tried:
Different format string, none works. I copy them from source file so no mistake here
Check style object with breakpoint, it gets the correct num_format_str
quote or un-quote the number
I am using Mac Preview to open the xls file if that's relevant.
import xlwt
book = xlwt.Workbook(encoding='utf8')
sheet = book.add_sheet('sheet 1')
style = xlwt.easyxf(num_format_str="M/D/YY")
sheet.write(1, 1, 44000, style=style)
response = HttpResponse(mimetype='application/vnd.ms-excel')
response['Content-Disposition'] = 'attachment; filename=test.xls'
book.save(response)
return response
The code is with no problem. Problem is with Mac Preview.. I open the file in Excel on Windows and 44400 shows as date.