I have a django app that serves images when a certain page is loaded. The images are stored on S3 and I retrieve them using boto and send the image content as an HttpResponse with the appropriate content type.
The problem here is, this is a blocking call. Sometimes, it takes a long time (few secs for the image of few hundred KBs) to retrieve the images and serve them to the client.
I tried doing converting this process to a celery task (async, non-blocking), but I am not sure how I can send back the data (images) when they are done downloading. Just returning HttpResponse from a celery task does not work. I found docs related to http callback tasks in an old celery docs here, but this is not supported in the newer celery versions.
So, should I use polling in the js? (I have used celery tasks in other parts of my website, but all of them are socket based) or is this even the right way to approach the problem?
Code:
Django views code that fetches the images (from S3 using boto3): (in views.py)
#csrf_protect
#ensure_csrf_cookie
def getimg(request, public_hash):
if request.user.is_authenticated:
query = img_table.objects.filter(public_hash=public_hash)
else:
query = img_table.objects.filter(public_hash=public_hash, is_public=True)
if query.exists():
item_dir = construct_s3_path(s3_map_thumbnail_folder, public_map_hash)
if check(s3, s3_bucket, item_dir): #checks if file exists
item = s3.Object(s3_bucket, item_dir)
item_content = item.get()['Body'].read()
return HttpResponse(item_content, content_type="image/png",status=200)
else: #if no image found, return a blank image
blank = Image.new('RGB', (1000,1000), (255,255,255))
response = HttpResponse(content_type="image/jpeg")
blank.save(response, "JPEG")
return response
else: #if request image corresponding to hash is not found in db
return render(request, 'core/404.html')
I call the above django view in a page like this:
<img src='/api/getimg/123abc' alt='img'>
In urls.py I have:
url(r'^api/getimg/(?P<public_hash>[a-zA-Z0-9]{6})$', views.getimg, name='getimg')
Related
I'm building a gallery webapp based on Django (4.1.1) and Vue. I want to also upload and display videos (not only images). For supporting formats, that don't work in a video html tag, I'm converting these formats to mp4 via pyffmpeg.
For that I created a custom field for my model based on FileField. In it's save method I take the files content, convert it and save the result. This is called by the serializer through a corresponding ViewSet. This works, but the video conversion takes way too long, so that the webrequest from my Vue app (executed with axios) is running into a timeout.
It is clear, that I need to offload the conversion somehow, return a corresponding response directly and save the data in the database as soon as the conversion is finished.
Is this even possible? Or do I need to write a custom view apart from the ViewSet to do the calculation? Can you give me a hint on how to offload that calcuation? I only have rudimentary knowledge about things like asyncio.
TL;DR: How to do extensive calculations asychronous on file data before saving them to a model with FileField and returning a response before the calcuation ends?
I can provide my current code if necessary.
I've now solved my problem, though I'm still interested in other/better solutions. My solution works but might not be the best and I feel it is a bit hacky at some places.
TL;DR: Installed django-q as task queue manager with a redis database backend, connected it to django and then called the function for transcoding the video file from my view via
taskid = async_task("apps.myapp.services.transcode_video", data)
This should be a robust system to handle these transcode tasks in parallel and without blocking the request.
I found this tutorial about Django-Q. Django-Q manages and executes tasks from django. It runs in parallel with Django and is connected to it via its broker (a redis database in this case).
First I installed django-q and the redis client modules via pip
pip install django-q redis
Then I build up a Redis database (here running in a docker container on my machine with the official redis image). How to do that depends largely on your platform.
Then configuring Django to use Django-Q by adding the configuration into settings.py (Note, that I disabled timeouts, because the transcode tasks can take rather long. May change that in future):
Q_CLUSTER = {
'name': 'django_q_django',
'workers': 8,
'recycle': 500,
'timeout': None,
'compress': True,
'save_limit': 250,
'queue_limit': 500,
'cpu_affinity': 1,
'label': 'Django Q',
'redis': {
'host': 'redishostname',
'port': 6379,
'password': 'mysecureredisdbpassword',
'db': 0, }
}
and then activating Django-Q by adding it to the installed apps in settings.py:
INSTALLED_APPS = [
...
'django_q',
]
Then migrate the model definitions of Django Q via:
python manage.py migrate
and start Django Q via (the Redis database should run at this point):
python manage.py qcluster
This runs in a separate terminal from the typical
python manage.py runserver
Note: Of course these two are only for development. I currently don't know how to deploy Django Q in production yet.
Now we need a file for our functions. As in the tutorial I added the file services.py to my app. There I simply defined the function to run:
def transcode_video(data):
# Doing my transcoding stuff here
return {'entryid': entry.id, 'filename': target_name}
This function can then be called inside the view code via:
taskid = async_task("apps.myapp.services.transcode_video", data)
So I can provide data to the function and get a task ID as a return value. The return value of the executed function will appear in the result field of the created task, so that you can even return data from there.
I encountered a problem at that stage: The data contains a TemporaryUploadedFile object, which resulted in a pickle error. The data seems to get pickled before it gets passed to Django Q, which didn't work for that object. There might be a way to convert the file in a picklable format, though since I already need the file on the filesystem for invoking pyffmeg on it, in the view I just write the data to a file (in chunks to avoid loading the whole file into memory at once) with
with open(filepath, 'wb') as f:
for chunk in self.request.data['file'].chunks():
f.write(chunk)
Normally in the ViewSet I would call serializer.save() at the end, but for transcoding I don't do that, since the new object gets saved inside the Django Q function after the transaction. There I create it like this: (UploadedFile being from dango.core.files.uploadedfile and AlbumEntry being my own model for which I want to create an instance)
with open(target_path, 'rb') as f:
file = UploadedFile(
file=f,
name=target_name,
content_type=data['file_type']+"/"+data['target_ext'],
)
entry = AlbumEntry(
file=file,
... other Model fields here)
entry.save()
To return a defined Response from the viewset even when the object wasn't created yet, I had to overwrite the create() method in addition to the perform_create() method (where I did all the handling). For this I copied the code from the parent class and changed it slightly to return a specific response depending on the return value of perform_create() (which previously didn't return anything):
def create(self, request, *args, **kwargs):
serializer = self.get_serializer(data=request.data)
serializer.is_valid(raise_exception=True)
taskid = self.perform_create(serializer)
if taskid:
return HttpResponse(json.dumps({'taskid': taskid, 'status': 'transcoding'}), status=status.HTTP_201_CREATED)
headers = self.get_success_headers(serializer.data)
return Response(serializer.data, status=status.HTTP_201_CREATED, headers=headers)
So perform_create() would return a task ID on transcode jobs and None otherwise. A corresponding response is send here.
Last but not least there was the problem of the frontend not knowing when the transcoding was done. So I build a simple view to get a task by ID:
#api_view(['GET'])
#authentication_classes([authentication.SessionAuthentication])
#permission_classes([permissions.IsAuthenticated])
def get_task(request, task_id):
task = Task.get_task(task_id)
if not task:
return HttpResponse(json.dumps({
'success': False
}))
return HttpResponse(json.dumps({
'id': task.id,
'result': task.result,
...some more data to return}))
You can see that I return a fixed response, when the task is not found. This is my workaround, since by default the Task object will get created only when the task is finished. For my purpose it is OK to just assume, that it still runs. A comment in this github issue of Django Q suggests, that to get an up-to-date Task object you would need to write your own Task model and implement it in a way, that it regularly checks Django Q for the Task status. I didn't want to do this.
I also put the result in the response, so that my frontend can poll the task regularly (by its task ID) and when the transcode is finished it will contain the ID of the created Model object in the database. When my frontend sees this, it will load the objects content.
I am creating a site that downloads videos from other sites and converts them to GIF when requested.
The problem is that it takes too long to download and convert videos.
This causes a 504 timeout error.
How can I avoid timeout?
Currently, I am downloading using celery when we receive a request.
While downloading, django redirects right away.
def post(self, request):
form = URLform(request.POST)
ctx = {'form':form}
....
t = downloand_video.delay(data)
return redirect('gif_to_mp4:home')
This prevents transferring the files to the user.
Because celery cannot return file to user or response.
How can I send the file to the user?
I have several API's as sources of data, for example - blog posts. What I'm trying to achieve is to send requests to this API's in parallel from Django view and get results. No need to store results in db, I need to pass them to my view response. My project is written on python 2.7, so I can't use asyncio. I'm looking for advice on the best practice to solve it (celery, tornado, something else?) with examples of how to achieve that cause I'm only starting my way in async. Thanks.
A solution is use Celery and pass your request args to this, and in the front use AJAX.
Example:
def my_def (request):
do_something_in_celery.delay()
return Response(something)
To control if a task is finished in Celery, you can put the return of Celery in a variable:
task_run = do_something_in_celery.delay()
In task_run there is a property .id.
This .id you return to your front and use it to monitor the status of task.
And your function executed in Celery must have de decorator #task
#task
do_something_in_celery(*args, **kwargs):
You will a need to control the tasks, like a Redis or RabbitMQ.
Look this URLs:
http://masnun.com/2014/08/02/django-celery-easy-async-task-processing.html
https://buildwithdjango.com/blog/post/celery-progress-bars/
http://docs.celeryproject.org/en/latest/index.html
I found a solution using concurrent.futures ThreadPoolExecutor from futures lib.
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
You can also check out the rest of the concurrent.futures doc.
Important!
The ProcessPoolExecutor class has known (unfixable) problems on Python 2 and should not be relied on for mission critical work.
In stead of using the BlobstoreUploadHandler supplied in AppEngine, I'd prefer to use a Django view, so I can keep all the urls and view functions together. However, I can't find out how to get the blob-key of the uploaded file! (like get_uploads() does for the upload handler). I saw that the BlobstoreUploadHandler uses request.params, but I don't think that is available from Django's Request.
def upload_form(request):
upload_url = blobstore.create_upload_url(reverse(upload_blob))
output = '<html><body>'
output += '<form action="%s" method="POST" enctype="multipart/form-data">' % upload_url)
output += ('''Upload File: <input type="file" name="file"><br> <input type="submit"
name="submit" value="Submit"> </form></body></html>''')
def upload_blob(request):
print request
# How to get the 'blob-key' from request?!
When I examine the request object, all I get is
<WSGIRequest
GET:<QueryDict: {}>,
POST:<QueryDict: {u'submit': [u'Submit']}>
# And COOKIES, META, etcetera
EDIT: Request.FILES
I discovered that some info can be extracted using Request.FILES, which gives:
<MultiValueDict: {u'file': [<InMemoryUploadedFile: my_file (message/external-body)>]}>
However, I assume that the blobstore still handles the file content (is that why it says "content_type=message/external-body"?), so I still need the key somehow. Calling read() gives:
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Length: 17
Content-MD5: ZmQ3OTJhNjMzNGE0OTAzNGU4NjE5MDNmMGEwNjliMGE=
content-type: application/octet-stream
content-disposition: form-data; name="file"; filename="a1_blob"
X-AppEngine-Upload-Creation: 2012-02-12 22:11:49.643751
So it looks like AppEngine actually replaced the file content by this descriptor, but still, where does AppEngine put the key?
I'm starting to suspect that the blob-key is just lost when not using the webapp framework, since the UploadedFile object has no key() method.
It took me a long time to find, but the content_type: message/external-body requires extra parameters, to find the actual file, in AppEngine's case, this is the blob-key. However, Django doesn't support these extra content_type parameters, so they are indeed lost in the process. There seems to be a patch, but I don't think it's in the AppEngine Django version yet.
https://code.djangoproject.com/ticket/13721
I had the same problem yesterday. Thanks to your post I realizad that the problem was django and his class views. I finally use a code that I have since 2011 and it still works. It does not use BlobstoreUploadHandler, but it gets the blob_infos from the request after automatically upload it to blobstore.
You can use that function in the next way from your callback django function or class (I finally did not try it in a class view from Django but I think it will work. Currently I'm using it in a function view from Django with its request):
media_blobs = get_uploads(request, populate_post=True)
The function is the next:
import cgi
from google.appengine.ext import blobstore
def get_uploads(request, field_name=None, populate_post=False):
"""Get uploads sent to this handler.
Args:
field_name: Only select uploads that were sent as a specific field.
populate_post: Add the non blob fields to request.POST
Returns:
A list of BlobInfo records corresponding to each upload.
Empty list if there are no blob-info records for field_name.
"""
if hasattr(request,'__uploads') == False:
request.META['wsgi.input'].seek(0)
ja = request.META['wsgi.input']
fields = cgi.FieldStorage(request.META['wsgi.input'], environ=request.META)
request.__uploads = {}
if populate_post:
request.POST = {}
for key in fields.keys():
field = fields[key]
if isinstance(field, cgi.FieldStorage) and 'blob-key' in field.type_options:
request.__uploads.setdefault(key, []).append(blobstore.parse_blob_info(field))
elif populate_post:
if isinstance(field, list):
request.POST[key] = [val.value for val in field]
else:
request.POST[key] = field.value
if field_name:
try:
return list(request.__uploads[field_name])
except KeyError:
return []
else:
results = []
for uploads in request.__uploads.itervalues():
results += uploads
return results
The last function is not mine. I do not remember where I got it three or four years ago. But I think it will help someone.
UPDATE:
You also can use a view handler of webapp.WSGIApplication and at the same time use django. This way will allow you to use BlobstoreUploadHandler and BlobstoreDownloadHandler (for video stream as example). You only need to add the view class in main.py and create its handler:
class ServeVideoHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
...
downloader_handler = webapp.WSGIApplication([('/pathA/pathB/([A-Za-z0-9\-\=_]+)', ServeVideoHandler),], debug=True)
And in app.yaml add the handler before the script main.application that contains your django app.
- url: /pathA/pathB/(.+)
script: main.downloader_handler
The key info isn't directly in the file, it's in file.blobstore_info.key()
post your form containing your image to a url created using blobstore.create_upload_url():
from google.appengine.ext import blobstore
upload_url = blobstore.create_upload_url('/add_image/')
the created url will save the image in the blobstore and redirect the request (with modified file object) to /add_image/
define a url pattern and view for /add_image/ and handle the image:
def add_action_image(request):
image = request.data.get('image')
image_key = image.blobstore_info.key()
... addl' logic to save a record with the image_key...
As you noted, BlobstoreUploadHandler is open source, so you can see the logic they use to parse the key out of the request params. Note that request.params just includes variables from both the query string and the request body (for POST requests). So you might want to start with your djnago request's REQUEST object.
I have a view to which I am trying to submit multiple ajax uploads via raw post data (e.g. via an octet-stream). These requests are submitted one after the other so that they process in parallel. The problem is that django thinks that only the last request is valid. For example, if I submit 5 files, the first four give:
Upload a valid image. The file you uploaded was either not an image or a corrupted image.
I'm guessing this occurs because somehow the requests overlap? And so the image isn't completely loaded before the form attempts to validate it?
And the last one works fine.
My upload view:
def upload(request):
form = UploadImageForm(request.POST, request.FILES)
print form
if form.is_valid():
# ..process image..
And my upload image form:
class UploadImageForm(forms.Form):
upload = forms.ImageField()
To submit the requests I'm using the html5uploader js pretty much right out of the box.
On a different not, have you tried https://github.com/blueimp/jQuery-File-Upload/ - is a pretty good non-flash based file uploader with progress bar.