Created temporary file is not accessable in production - django

I have written a custom filefield AudioFileField. For this i created a check if a file really is a valid audiofile. To be able to do that, i use the sox commandlinetool, so i have to create a file on disk first. As sox depends on the suffix to do that validation, i needed to write my own TemporaryUploadedAudioFile, using the original suffix (instead of .upload):
class TemporaryUploadedAudioFile(TemporaryUploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, suffix='.upload'):
"""
The init method overrides the name creation to allow passing
an extension, so that sox is able to test the file
"""
if settings.FILE_UPLOAD_TEMP_DIR:
file = tempfile.NamedTemporaryFile(suffix=suffix,
dir=settings.FILE_UPLOAD_TEMP_DIR)
else:
file = tempfile.NamedTemporaryFile(suffix=suffix)
super(TemporaryUploadedFile, self).__init__(file, name, content_type, size, charset)
That file i use to do the audiovalidation in the AudioFileForm to_python method:
def to_python(self, data):
"""
checks that the file-upload field data contains a valid audio file.
"""
f = super(AudioFileForm, self).to_python(data)
if f is None:
return None
# get the file suffix, sox needs this to be able to test the file
suffix = os.path.splitext(data.name)[1]
# We need to get a temporary file for sox. Even if we allready have a temporary
# file, we have to create a new one ending with the correct suffix
file = TemporaryUploadedAudioFile(data.name, data.content_type, 0, data.charset,suffix = suffix)
with open(file.temporary_file_path(), 'w') as f:
f.write(data.read())
# Do the validation of the audiofile.
filetype=subprocess.Popen([sox,'--i','-t','%s'%file.temporary_file_path()], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
filetype=filetype.communicate()[0]
filetype=filetype.replace('\n','')
if not filetype in ['wav','aiff','flac']:
raise forms.ValidationError('Not a valid audiofile (valid are: aif, flac & wav | 16 or 24 bit | 44.1 or 48 kHz)')
return data
Now to the strange things happening: this works like a charm on the development server, but as soon as i switch to apache2/mod_wsgi it stops working. sox returns an error telling me that the file is missing.
I have allready checked rights, tmp-location on the production server is /tmp, all rights are granted there (777). What else could be happening here?

mod-wsgi is known to have problems with standard output and this subprocess thingy with django. There are already lot of questions answered about this in stackoverflow.com.
A quick Google search should help you!

Related

Django FileResponse - How to speed up file download

I have a setup that lets users download files that are stored in the DB as BYTEA data. Everything works OK, except the download speed is very slow...it seems to download in 33KB chunks, one chunk per second.
Is there a setting I can specify to speed this up?
views.py
from django.http import FileResponse
def getFileResponse(filedata, filename, filesize, contenttype):
response = FileResponse(filedata, content_type=contenttype)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
response['Content-Length'] = filesize
return response
return getFileResponse(
filedata = myfile.filedata, # Binary data from DB
filename = myfile.filename + myfile.fileextension,
filesize = myfile.filesize,
contenttype = myfile.filetype
)
Previously, I had the binary data returned as an HttpResponse and it downloaded like a normal file, with normal speeds. This worked fine locally, but when I pushed to Heroku, it wouldn't download the file -- instead displaying <Memory at XXX> in the download file.
And another side issue...when I include a text file with non-ASCII data (i.e. á), I get an error as well:
UnicodeEncodeError: 'ascii' codec can't encode characters...: ordinal not in range(128)
How can I handle files with Unicode data?
Update
Anyone know why the download speed gets so slow when changing from HTTPResponse to FileResponse? Or alternatively, why the HTTPResponse to return a file doesn't work on Heroku?
Update - Google Drive
I re-worked my application and hooked it up with a Google Drive back-end for serving files. It employs BytesIO() suggested by Eric below:
def download_file(self, fileid, mimetype=None):
# Get binary file data
request = self.get_file(fileid=fileid, mediaflag=True)
stream = io.BytesIO()
downloader = MediaIoBaseDownload(stream, request)
done = False
# Retry if we received HTTPError
for retry in range(0, 5):
try:
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
return stream.getvalue()
except (HTTPError) as error:
return ('API error: {}. Try # {} failed.'.format(error.response, retry))
I think the difference you observe between HttpResponse vs. FileResponse is caused by the spec: https://www.python.org/dev/peps/pep-3333/#buffering-and-streaming
In your previous code, an HttpResponse was created with one huge byte string containing your whole file, and the first iteration pass returned the complete response body. With a a FileResponse, the file is iterated in chunks (of 4kb, 8kb or other depending on your WSGI app server), which (I think) are streamed immediately upstream (to the reverse proxy then client), which may add overhead (more communication over process boundaries?).
It would help to know the app server used (uwsgi, gunicorn, waitress, other) and its relevant config. Also more details about the heroku error in case that can be solved!
why you store whole file in database.
best case is to store file on hard and store only path on database
then according to your web server you can let web server to serve file.
web services serve file better than Django.
if files have no access check store them on media
if your files have access control you according to your web server you can use some response headers
if you use Nginx must use X-Accel-Redirect and use any alternative on other web services tutorial on https://wellfire.co/learn/nginx-django-x-accel-redirects/

Django how to open file in FileField

I need to open a file saved in a FileField, create a list with the content of the file and pass it to the template. How can I open the file? I tried with open(stocklist.csv_file.url, "wb") but it gave me a "File not found" error. If I do this:
csv_file = stocklist.csv_file.open(mode="rb")
csv_file is None. However, there is a file. If I print print("stocklist.csv_file.url: %s" % stocklist.csv_file.url) I do get
stocklist.csv_file: https://d391vo1.cloudfront.net/csv_pricechart/...ss7.csv
And if I go to the admin, I can download the file. So, how can I open a file saved in a FileField?
The .open() opens the file cursor but does not return it, since it depends of your storage (filesystem, S3, FTP...). Once opened, you can use .read() to iterate over the file content.
stocklist.csv_file.open(mode="rb")
content = stocklist.csv_file.read()
stocklist.csv_file.close()
If you want to specifically work with file descriptor then you can use your storage functionality:
from django.core.files.storage import DefaultStorage
storage = DefaultStorage()
f = storage.open(stocklist.csv_file.name, mode='rb')

Django Tweepy can't access Amazon S3 file

I'm using Tweepy, a tweeting python library, django-storages and boto. I have a custom manage.py command that works correctly locally, it gets an image from the filesystem and tweets that image. If I change the storage to Amazon S3, however, I can't access the file. It gives me this error:
raise TweepError('Unable to access file: %s' % e.strerror)
I tried making the images in the bucket "public". Didn't work. This is the code (it works without S3):
filename = model_object.image.file.url
media_ids = api.media_upload(filename=filename) # ERROR
params = {'status': tweet_text, 'media_ids': [media_ids.media_id_string]}
api.update_status(**params)
This line:
model_object.image.file.url
Gives me the complete url of the image I want to tweet, something like this:
https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg?Signature=xxxExpires=1467645897&AWSAccessKeyId=yyy
I also tried constructing the url manually, since it is a public image stored in my bucket, like this:
filename = "https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg"
But it doesn't work.
¿Why do I get the Unable to access file error?
The source code from tweepy looks like this:
def media_upload(self, filename, *args, **kwargs):
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
:allowed_param:
"""
f = kwargs.pop('file', None)
headers, post_data = API._pack_image(filename, 3072, form_field='media', f=f) # ERROR
kwargs.update({'headers': headers, 'post_data': post_data})
def _pack_image(filename, max_size, form_field="image", f=None):
"""Pack image from file into multipart-formdata post body"""
# image must be less than 700kb in size
if f is None:
try:
if os.path.getsize(filename) > (max_size * 1024):
raise TweepError('File is too big, must be less than %skb.' % max_size)
except os.error as e:
raise TweepError('Unable to access file: %s' % e.strerror)
Looks like Tweepy can't get the image from the Amazon S3 bucket, but how can I make it work? Any advice will help.
The issue occurs when tweepy attempts to get file size in _pack_image:
if os.path.getsize(filename) > (max_size * 1024):
The function os.path.getsize assumes it is given a file path on disk; however, in your case it is given a URL. Naturally, the file is not found on disk and os.error is raised. For example:
# The following raises OSError on my machine
os.path.getsize('https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg')
What you could do is to fetch the file content, temporarily save it locally and then tweet it:
import tempfile
with tempfile.NamedTemporaryFile(delete=True) as f:
name = model_object.image.file.name
f.write(model_object.image.read())
media_ids = api.media_upload(filename=name, f=f)
params = dict(status='test media', media_ids=[media_ids.media_id_string])
api.update_status(**params)
For your convenience, I published a fully working example here: https://github.com/izzysoftware/so38134984

Writing between between characters in a text file?

I have a module that i want to write into. I'm having several problems. One of which locating a string within the file. Currently I open the file, then use a for line in (filename), then do an if to determine if it finds a string, and all of that works. However before (it is commented out now) i tried to determine what position it was at using tell(). However this gave me an incorrect position, giving me 1118 i believe, instead of 660 something. So i determined the position manually to use seek.
However the second problem was, if i write to this file at the position in the file, it just overwrites all the data from thereon. I would want to insert the data instead of overwriting it.
Unless i insert a string equal in character length where i want the write to happen, it will just override most of the if statements and things like that below.
Is there any way to naively do this?
Here is the file i want to write into
# Filename: neo_usercurves.py
# Created By: Gregory Smith
# Description: A script containing a library of user created curves
# Purpose: A library to store names of all the user curves, and deletes curves
# if specified to do so
import os
import maya.cmds as mc
import module_locator
my_path = module_locator.module_path()
def usercurve_lib(fbxfile=None, remove=None):
"""All control/curve objects created by user
Keyword Arguments:
fbxfile -- (string) name of fbx file to import
remove -- (boolean) will remove an entry from the library and delete the
associated fbx file
"""
curves_dict = {
#crvstart
#crvend
}
if remove is None:
return curves_dict
elif not remove:
try:
name = mc.file(curves_dict[fbxfile], typ='FBX', i=1,
iv=True, pmt=False)
return name[0]
except RuntimeError:
return None
else:
try:
os.remove('%s\%s.fbx' %(my_path, fbxfile))
return '%s.fbx' %(fbxfile)
except OSError:
print 'File %s does not exist.' %(fbxfile)
return None
This is the code below that i'm running in a module called neo_curves.py (this is not the complete code, and 'my_path' is just the path of the current directory neo_curves.py is being run in)
def create_entry(self, crv):
"""Exports user curve to user data directory and adds entry into
neo_usercurves.py
Keyword Arguments:
crv -- (PyNode) the object to export
"""
# set settings
mel.eval('FBXExportFileVersion "FBX201400"')
mel.eval('FBXExportInputConnections -v 0')
select(crv)
mc.file('%s\userdat\%s.fbx' %(my_path, str(crv)), force=True, options='',
typ='FBX export', pr=True, es=True)
with open('%s\userdat\\neo_usercurves.py' %(my_path), 'r+') as usercrvs:
for line in usercrvs:
if line.strip() == '#crvstart':
#linepos = usercrvs.tell()
#linepos = int(linepos)
#usercrvs.seek(linepos, 0)
usercrvs.seek(665, 0)
usercrvs.write("\n "+str(crv)+" : '%s\%s' %(my_path, '"+
str(crv)+".fbx')")
break
This will give me this result below:
# Filename: neo_usercurves.py
# Created By: Gregory Smith
# Description: A script containing a library of user created curves
# Purpose: A library to store names of all the user curves, and deletes curves
# if specified to do so
import os
import maya.cmds as mc
import module_locator
my_path = module_locator.module_path()
def usercurve_lib(fbxfile=None, remove=None):
"""All control/curve objects created by user
Keyword Arguments:
fbxfile -- (string) name of fbx file to import
remove -- (boolean) will remove an entry from the library and delete the
associated fbx file
"""
curves_dict = {
#crvstart
loop_crv : '%s\%s' %(my_path, 'loop_crv.fbx') return curves_dict
elif not remove:
try:
name = mc.file(curves_dict[fbxfile], typ='FBX', i=1,
iv=True, pmt=False)
return name[0]
except RuntimeError:
return None
else:
try:
os.remove('%s\%s.fbx' %(my_path, fbxfile))
return '%s.fbx' %(fbxfile)
except OSError:
print 'File %s does not exist.' %(fbxfile)
return None
In short: on most operating systems you can not insert into files without rewriting if the lengths are not the same.
Have a look at a long discussion here: Why can we not insert into files without the additional writes? (I neither mean append, nor over-write)

How can I get the temporary name of an UploadedFile in Django?

I'm doing some file validation and want to load an UploadedFile into an external library while it is in the '/tmp' directory before I save it somewhere that it can be executed. Django does the following:
Django will write the uploaded file to a temporary file stored in your system's temporary directory. On a Unix-like platform this means you can expect Django to generate a file called something like /tmp/tmpzfp6I6.upload.
It ihe "tmpzfp616.upload' that I want to be able to get my hands on. UploadedFile.name gives me "" while file.name gives me the proper name of the file "example.mp3".
With the library I am using, I need to pass the filepath of the temporary file to the library, rather than the file itself and so, need the string.
Any ideas?
Thanks in advance.
EDIT: Here's my code:
from django.core.files.uploadedfile import UploadedFile
class SongForm(forms.ModelForm):
def clean_audio_file(self):
file = self.cleaned_data.get('audio_file',False)
if file:
[...]
if file._size > 2.5*1024*1024:
try:
#The following two lines are where I'm having trouble, MP3 takes the path to file as input.
path = UploadedFile.temporary_file_path
audio = MP3('%s' %path)
except HeaderNotFoundError:
raise forms.ValidationError("Cannot read file")
else:
raise forms.ValidationError("Couldn't read uploaded file")
return file
Using "UploadedFile" I get an AttributeError "type object 'UploadedFile' has no attribute 'temporary_file_path'". If I instead use file.temporary_file_path (just throwing darts in the dark here) I get an IOError:
[Errno 2] No such file or directory: 'bound method TemporaryUploadedFile.temporary_file_path of >'
I realize temporary_file_path is the solution I'm looking for, I just can't figure out how to use it and neither the docs nor google seem to be much help in this particular instance.
UploadedFile.temporary_file_path
Only files uploaded onto disk will have this method; it returns the full path to the temporary uploaded file.