Read Excel file from Memory in Django - django

I am trying to read an excel file from memory in django but keep getting the following error:
NotImplementedError: formatting_info=True not yet implemented
Here's the code:
from pyexcel_xls import get_data
def processdocument(file):
print("file", file)
data = get_data(file)
return 1
when I am reading the same file from the local storage it works perfectly
data = get_data(r"C:\Users\Rahul Sharma\Downloads\Sample PFEP (2).xlsx")
I had a workaound solution in mind i.e. to save the uploaded file temporary in django os and then pass its URL to the function.
Can I do that?

Related

Read a Django UploadedFile into a pandas DataFrame

I am attempting to read a .csv file uploaded to Django into a DataFrame.
I am following the instructions and the Django REST Framework page for uploading files. When I PUT a .csv file to a defined endpoint I end up with a Django UploadedFile object, in particular, a TemporaryUploadedFile.
I am trying to read this object into a pandas Dataframe using read_csv, however, there is additional formatting around the temporary uploaded file. I am wondering how to read the original .csv file that was uploaded.
According to the DRF docs, I have assigned:
file_obj = request.data['file']
Inside of a Python debugging console, I see:
ipdb> file_obj
<TemporaryUploadedFile: foobar.csv (multipart/form-data; boundary=--------------------------044608164241682586561733)>
Things I've tried so far.
With the original file path, I can read it into pandas like this.
dataframe = pd.read_csv(open("foobar.csv", "rb"))
However, the original file has additional metadata added by Django during the upload process.
ipdb> pd.read_csv(open(file_obj.temporary_file_path(), "rb"))
*** pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 32
If I try to use the UploadedFile.read() method, I run into the following issue.
ipdb> dataframe = pd.read_csv(file_obj.read())
*** OSError: Expected file path name or file-like object, got <class 'bytes'> type
Thanks!
P.S. The first few lines of the original file look like this.
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
When I look at the contents of the temporary file, I see this.
----------------------------789873173211443224653494
Content-Disposition: form-data; name="file"; filename="foobar.csv"
Content-Type: File
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
UploadedFile.read() returns the file data in bytes, not a file path or file-like object. In order to use pandas read_csv() function, you'll need to turn those bytes into a stream. Since your file is a csv, the most straightforward way would be to use bytes.decode() with io.StringIO(), like:
dataframe = pd.read_csv(io.StringIO(file_obj.read().decode('utf-8')), delimiter=',')

flask send_from_directory function keeps sending the same old file

I have a flask app that contains a link to download a file from the server. The file will be updated by another callback function. The part for send_from_directory is like this:
app = flask.Flask(__name__)
dash_app = dash.Dash(__name__,server=app,url_base_pathname="/",external_stylesheets=external_stylesheets)
...
#dash_app.server.route('/download/',methods=["GET","POST"])
def download_data():
return flask.send_from_directory("../data/",
filename='result.csv',
as_attachment=True,
attachment_filename='result.csv',
cache_timeout=0)
I have 2 problems:
1) the file downloaded are always the same old file, despite I have have set the cache timeout as 0.
2) the downloaded file are always named as "download", instead of the file name I specified "result.csv".

Django FileResponse - How to speed up file download

I have a setup that lets users download files that are stored in the DB as BYTEA data. Everything works OK, except the download speed is very slow...it seems to download in 33KB chunks, one chunk per second.
Is there a setting I can specify to speed this up?
views.py
from django.http import FileResponse
def getFileResponse(filedata, filename, filesize, contenttype):
response = FileResponse(filedata, content_type=contenttype)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
response['Content-Length'] = filesize
return response
return getFileResponse(
filedata = myfile.filedata, # Binary data from DB
filename = myfile.filename + myfile.fileextension,
filesize = myfile.filesize,
contenttype = myfile.filetype
)
Previously, I had the binary data returned as an HttpResponse and it downloaded like a normal file, with normal speeds. This worked fine locally, but when I pushed to Heroku, it wouldn't download the file -- instead displaying <Memory at XXX> in the download file.
And another side issue...when I include a text file with non-ASCII data (i.e. รก), I get an error as well:
UnicodeEncodeError: 'ascii' codec can't encode characters...: ordinal not in range(128)
How can I handle files with Unicode data?
Update
Anyone know why the download speed gets so slow when changing from HTTPResponse to FileResponse? Or alternatively, why the HTTPResponse to return a file doesn't work on Heroku?
Update - Google Drive
I re-worked my application and hooked it up with a Google Drive back-end for serving files. It employs BytesIO() suggested by Eric below:
def download_file(self, fileid, mimetype=None):
# Get binary file data
request = self.get_file(fileid=fileid, mediaflag=True)
stream = io.BytesIO()
downloader = MediaIoBaseDownload(stream, request)
done = False
# Retry if we received HTTPError
for retry in range(0, 5):
try:
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
return stream.getvalue()
except (HTTPError) as error:
return ('API error: {}. Try # {} failed.'.format(error.response, retry))
I think the difference you observe between HttpResponse vs. FileResponse is caused by the spec: https://www.python.org/dev/peps/pep-3333/#buffering-and-streaming
In your previous code, an HttpResponse was created with one huge byte string containing your whole file, and the first iteration pass returned the complete response body. With a a FileResponse, the file is iterated in chunks (of 4kb, 8kb or other depending on your WSGI app server), which (I think) are streamed immediately upstream (to the reverse proxy then client), which may add overhead (more communication over process boundaries?).
It would help to know the app server used (uwsgi, gunicorn, waitress, other) and its relevant config. Also more details about the heroku error in case that can be solved!
why you store whole file in database.
best case is to store file on hard and store only path on database
then according to your web server you can let web server to serve file.
web services serve file better than Django.
if files have no access check store them on media
if your files have access control you according to your web server you can use some response headers
if you use Nginx must use X-Accel-Redirect and use any alternative on other web services tutorial on https://wellfire.co/learn/nginx-django-x-accel-redirects/

Django how to open file in FileField

I need to open a file saved in a FileField, create a list with the content of the file and pass it to the template. How can I open the file? I tried with open(stocklist.csv_file.url, "wb") but it gave me a "File not found" error. If I do this:
csv_file = stocklist.csv_file.open(mode="rb")
csv_file is None. However, there is a file. If I print print("stocklist.csv_file.url: %s" % stocklist.csv_file.url) I do get
stocklist.csv_file: https://d391vo1.cloudfront.net/csv_pricechart/...ss7.csv
And if I go to the admin, I can download the file. So, how can I open a file saved in a FileField?
The .open() opens the file cursor but does not return it, since it depends of your storage (filesystem, S3, FTP...). Once opened, you can use .read() to iterate over the file content.
stocklist.csv_file.open(mode="rb")
content = stocklist.csv_file.read()
stocklist.csv_file.close()
If you want to specifically work with file descriptor then you can use your storage functionality:
from django.core.files.storage import DefaultStorage
storage = DefaultStorage()
f = storage.open(stocklist.csv_file.name, mode='rb')

How can I get the temporary name of an UploadedFile in Django?

I'm doing some file validation and want to load an UploadedFile into an external library while it is in the '/tmp' directory before I save it somewhere that it can be executed. Django does the following:
Django will write the uploaded file to a temporary file stored in your system's temporary directory. On a Unix-like platform this means you can expect Django to generate a file called something like /tmp/tmpzfp6I6.upload.
It ihe "tmpzfp616.upload' that I want to be able to get my hands on. UploadedFile.name gives me "" while file.name gives me the proper name of the file "example.mp3".
With the library I am using, I need to pass the filepath of the temporary file to the library, rather than the file itself and so, need the string.
Any ideas?
Thanks in advance.
EDIT: Here's my code:
from django.core.files.uploadedfile import UploadedFile
class SongForm(forms.ModelForm):
def clean_audio_file(self):
file = self.cleaned_data.get('audio_file',False)
if file:
[...]
if file._size > 2.5*1024*1024:
try:
#The following two lines are where I'm having trouble, MP3 takes the path to file as input.
path = UploadedFile.temporary_file_path
audio = MP3('%s' %path)
except HeaderNotFoundError:
raise forms.ValidationError("Cannot read file")
else:
raise forms.ValidationError("Couldn't read uploaded file")
return file
Using "UploadedFile" I get an AttributeError "type object 'UploadedFile' has no attribute 'temporary_file_path'". If I instead use file.temporary_file_path (just throwing darts in the dark here) I get an IOError:
[Errno 2] No such file or directory: 'bound method TemporaryUploadedFile.temporary_file_path of >'
I realize temporary_file_path is the solution I'm looking for, I just can't figure out how to use it and neither the docs nor google seem to be much help in this particular instance.
UploadedFile.temporary_file_path
Only files uploaded onto disk will have this method; it returns the full path to the temporary uploaded file.