Django: Flaky file downloads - django

I've made a simple file server that runs on my raspberry pi (1/2 gb RAM, 1 CPU). It's running under gunicorn (3 workers) behind nginx (1 worker).
I've got a weird issue where when I try to download too many files simultaneously (say 5) they all get part way through and then just abort. There's no output from the django server (I get this issue using the development server too which is why it's now running behind gunicorn & nginx, but still no joy).
My download view is:
#never_cache
def download_media(request, user_id, session_key, id, filepath):
"Download an individual media file"
context = RequestContext(request)
# validate the user_id & session_key pair
if not __validate_session_key(user_id, session_key):
return HttpResponseRedirect(reverse('handle_logout'))
filepath = unicode(urllib.unquote(filepath))
if '..' in filepath:
raise SuspiciousOperation('Invalid characters in subdir parameter.')
location = MediaCollectionLocation.objects.get(id=id)
path = os.path.join(location.path, filepath)
response = HttpResponse(FileWrapper(file(path)), content_type='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename=%s' % os.path.basename(path)
response['Content-Length'] = os.path.getsize(path)
response["Cache-Control"] = "no-cache, no-store, must-revalidate"
return response
I'm serving files this way because I want clients to authenticate (so don't just want to redirect and serve static content with nginx).
Anyone any idea why it'd drop out if I make several requests in parallel?

I'm not entirely sure why the files would be failing, but I'd imagine it'd have something to do with downloading more files than you have workers, or that there is a timeout happening between nginx and gunicorn.
You can get nginx to serve the files only after django has authenticated the user by making django set a particular header that nginx then reads (internal only) and serves the file itself.
XSendFile is what nginx uses to do this. You may then either create some middleware or a function to set the appropriate headers from django, or use something like django-sendfile with the nginx backend to have it all done for you.
If the issue you're experiencing is caused by a timeout between django and nginx, this fix should solve it. If it does not, increase the number of nginx workers as well, since it will be responsible for the serving of files now.

Related

Files are being downloaded at pythonanywhere server and user laptop/pc too. How to restrict to write at pythonanywhere server

Problem is i have hosted at pythonanywhere using django.Video is downloaded at pythonanywhere server and user/client system too.Thats why i used os. remove(path).After downloading it removes from server.
Is there any ways files donot write on pyhtonanywhere server. so that i donot use os.remove(path).
How to restrict to write at pythonanywhere server. Only to download at user system.
def fb_download(request):
link = request.GET.get('url')
html= requests.get(link)
try:
url= re.search('hd_src:"(.+?)"',html.text)[1]
except:
url= re.search('sd_src:"(.+?)"',html.text)[1]
path=wget.download(url, 'Video.mp4')
response=FileResponse(open(path, 'rb'), as_attachment=True)
os.remove(path)
return response
If I understand correctly, you're trying to get a request from a browser, which contains a URL. You then access the page at that URL and extract a further URL from it, and then you want to present the contents of that second URL -- a video -- to the browser.
The way you are doing that is to download the file to the server, and then to serve that up as a file attachment to the browser.
If you do it that way, then there is no way to avoid writing the file on the server; indeed, the way you are doing it right now might have problems because you are deleting the file before you've returned the response to the browser, so there may (depending on how the file deletion is processed and whether the FileResponse caches the file's contents) be cases where there is no file to send back to the browser.
But an alternative way to do it that might work would be to send a redirect response to the URL -- the one in your variable url -- like this, without downloading it at all:
def fb_download(request):
link = request.GET.get('url')
html= requests.get(link)
try:
url= re.search('hd_src:"(.+?)"',html.text)[1]
except:
url= re.search('sd_src:"(.+?)"',html.text)[1]
return redirect(url)
By doing that, the download happens on the browser instead of on the server.
I don’t understand javascript really good,
But i think if you download the file to the server
And then you can download the file to the use using JS
And i think you can use

make some django/wagtail files private

Is there a way to make some files unaccessible via direct url? For example, an image appears on a page but the image location doesn't work on it's own.
How Things Normally Work
By default, all static and media files are served up from the static root and media root folders. A good practice is to have NGINX or Apache route to these, or to use something like Django Whitenoise to serve up static files.
In production, you definitely don't want to have the runserver serving up these files because 1) That doesn't scale well, 2) It means you're running in DEBUG mode on production which is an absolute no-no.
Protecting Those Files
You can keep this configuration for most files that you don't need to protect. Instead, you can serve up files of your own within Django from a different filepath. Use the upload_to parameter of the filefield to specify where you want those files to live. For example,
protectedfile = models.FileField(upload_to="protected/")
publicfile = models.FileField(upload_to="public/")
Then in NGINX make your block direct to /var/www/myproject/MEDIAROOT/public instead. That will enable you to continue serving up public files.
Meanwhile, for the protected files, those can be served up by the view with:
def view_to_serve_up_docs(request, document_id):
my_doc = MyDocumentModel.objects.get(id=document_id)
# Do some check against this user
if request.user.is_authenticated():
response = FileResponse(my_doc.privatefile)
response["Content-Disposition"] = "attachment; filename=" + my_doc.privatefile.name
else:
raise Http404()
return response
And link to this view within your templates
<a href='/downloadFileView/12345'>Download File #12345 Here!</a>
Reference
More about the FileResponse object: https://docs.djangoproject.com/en/1.11/ref/request-response/#fileresponse-objects

How to Serve Django media user uploaded files using Cherokee with restriction to logged users

How to configure Django and Cherokee to serve media (user uploaded) files from Cherokee but to logged in users only as with #login_required on production.
Create a Django view which servers the file
Use #login_required on this view to restrict the access
Read the file from the disk using standard Python io operations
Use StreamingHttpResponse so there is no latency or memory overhead writing the response
Set response mimetype correctly
I will answer my own question
As you are using Cherokee
Remove direct access to media folder with the media URL as localhost/media/.. for exemple by removing the virtuelhost serving it
Activate (check) Allow X-Sendfile under Handler tab in Common CGI Options in the virtuelserver page that handle Django request.
Let's say you have users pictures under media/pictures to protect that will be visible to all users only. (can be modified as you want just an exemple)
Every user picture is stored in media/pictures/pk.jpg (1.jpg, 2.jpg ..)
Create a view :
#login_required(redirect_field_name=None)
def media_pictures(request,pk):
response = HttpResponse()
path=os.path.join (settings.MEDIA_ROOT,'pictures',pk+'.jpg')
if os.path.isfile(path):
#response['Content-Type']="" # add it if it's not working without ^^
response['X-Accel-Redirect'] = path
#response['X-Sendfile'] = path # same as previous line,
# X-Accel-Redirect is for NGINX and X-Sendfile is for apache , in our case cherokee is compatible with two , use one of them.
return response
return HttpResponseForbidden()
Cherokee now take care of serving the file , it's why we checked the Allow X-Sendfile , this will not work without
path variable here is the full path to the file, can be anywhere , just read accsible by cherokee user or group
4. Url conf
As we disable direct access of Media folder, we need to provide an url to access with from Django using the previous view
for exemple , To make image of user with id 17 accessible
localhost/media/pictures/17.jpg
url(r"^media/pictures/(?P<pk>\d+)\.jpg$", views.media_pictures,name="media_pictures"),
This will also work for Apache, Nginx etc , just configure your server to use X-Sendfile (or X-Accel-Redirect for Nginx), this can be easily found on docs
Using this, every logged user can view all users' pictures , feel free to add additional verifications before serving the file , per user check etc
Hope it will help someone

Django as reverse proxy

My client-server application is mainly based on special purpose http server that communicates with client in an Ajax like fashion, ie. the client GUI is refreshed upon asynchronous http request/response cycles.
Evolvability of the special purpose http server is limited and as the application grows, more and more standard features are needed which are provided by Django for instance.
Hence, I would like to add a Django application as a facade/reverse-proxy in order to hide the non-standard special purpose server and be able to gain from Django. I would like to have the Django app as a gateway and not use http-redirect for security reasons and to hide complexity.
However, my concern is that tunneling the traffic through Django on the serer might spoil performance. Is this a valid concern?
Would there be an alternative solution to the problem?
Usually in production, you are hosting Django behind a web container like Apache httpd or nginx. These have modules designed for proxying requests (e.g. proxy_pass for a location in nginx). They give you some extras out of the box like caching if you need it. Compared with proxying through a Django application's request pipeline this may save you development time while delivering better performance. However, you sacrifice the power to completely manipulate the request or proxied response when you use a solution like this.
For local testing with ./manage.py runserver, I add a url pattern via urls.py in an if settings.DEBUG: ... section. Here's the view function code I use, which supports GET, PUT, and POST using the requests Python library: https://gist.github.com/JustinTArthur/5710254
I went ahead and built a simple prototype. It was relatively simple, I just had to set up a view that maps all URLs I want to redirect. The view function looks something like this:
def redirect(request):
url = "http://%s%s" % (server, request.path)
# add get parameters
if request.GET:
url += '?' + urlencode(request.GET)
# add headers of the incoming request
# see https://docs.djangoproject.com/en/1.7/ref/request-response/#django.http.HttpRequest.META for details about the request.META dict
def convert(s):
s = s.replace('HTTP_','',1)
s = s.replace('_','-')
return s
request_headers = dict((convert(k),v) for k,v in request.META.iteritems() if k.startswith('HTTP_'))
# add content-type and and content-length
request_headers['CONTENT-TYPE'] = request.META.get('CONTENT_TYPE', '')
request_headers['CONTENT-LENGTH'] = request.META.get('CONTENT_LENGTH', '')
# get original request payload
if request.method == "GET":
data = None
else:
data = request.raw_post_data
downstream_request = urllib2.Request(url, data, headers=request_headers)
page = urllib2.urlopen(downstream_request)
response = django.http.HttpResponse(page)
return response
So it is actually quite simple and the performance is good enough, in particular if the redirect goes to the loopback interface on the same host.

redirect and force download

this is my problem: I have some pdf files on a server, my Django web-application
is hosted on another server (not the same of the pdf files).
On my appplication i know the pdf files link on the other server. I want to download that pdf files through my application without read them on web server application.
I try to explane. If i click on download link, my browser shows the pdf into his internal pdf viewer. I don't want this, i want that on click on a button the user will download the file without open it on internal browser.
I looked here: http://docs.djangoproject.com/en/dev/ref/request-response/#telling-the-browser-to-treat-the-response-as-a-file-attachment
but this is not a good way for me, cause it requires that I read the file inside my web-application and after return it to the user.
Is it possible??
Hmm, sounds like the wrong tool for the job. You can't really "redirect" and modify the response header, which means using django just to set the Content-Disposition header would require you to stream the file through django, then have django stream it to the client.
Let a lighter weight web server handle that. If you happen to be using nginx, here's an awesome solution that fits your scenario 99% (the 1% being it's rails setting the header that nginx is waiting for).
If all you want is to set the header and the file doesn't need django processing, it would be even easier to proxy!
If you are not using nginx, I would change the title to a web server specific question about proxying a file & setting headers.
I had a similar problem recently. I have solved it downloading the file to my server and then writing it to the HttpResponse
Here is my code:
import requests
from wsgiref.util import FileWrapper
from django.http import Http404, HttpResponse
def startDownload():
url, filename, ext = someFancyLogic()
request = requests.get(url, stream=True)
# Was the request OK?
if request.status_code != requests.codes.ok:
return HttpResponse(status=400)
wrapper = FileWrapper(request.raw)
content_type = request.headers['content-type']
content_len = request.headers['content-length']
response = HttpResponse(wrapper, content_type=content_type)
response['Content-Length'] = content_len
response['Content-Disposition']
= "attachment; filename={0}.{1}".format(filename, ext)
return response