I am not familiar with Djano-Celery, so I would like to know if it is the right tool for what I need to do before going deeper in the doc.
My django app has a web service for tiling map images that is called like this: http://host.com/tiling/x/y/z.png
xyz are integer variables that are used in the tiling function to compute the output.
My question is: do Djano-Celery can create workers for parallel processing on this tiling function when repetitive requests are detected?
For instance, 10 or more requests could be sent by a user at a time : http://host.com/tiling/0/1/1.png, http://host.com/tiling/1/0/1.png etc...
Can Django-Celery creates workers for each in parralel instead of computing each request one by one? What are the requierements on server side? Do I need something linke NGINX or GUNICORN or WSGI or CGI? I am confuse about those things...
In most cases celery is used for asynchronous tasks handling. But it works also for concurrent tasks!
By default celery uses multiprocessing but you can also use Eventlet - concurrent networking library for Python.
Reference:
http://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html#concurrency-eventlet
http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency
Related
So I'm building a Django app on Heroku. The tasks that the app performs frequently run longer than 30 seconds, and therefore I'm running into the 30s timeout by Heroku. I first tried solving it by submitting the task from my Django view to AWS lambda, but in that case, the view is waiting for the AWS Lambda function to finish, so it doesn't solve my problem.
I have already read the tutorials on Heroku on handling background tasks with Django. I'm now faced with a few different options on how to proceed, and would love to get outside input on which one makes the most sense:
Use Celery & Redis to manage the background tasks, and let the tasks be executed on AWS Lambda.
Use Celery & Redis to manage the background tasks, but let the tasks be executed in a Python script on Heroku.
Trying to solve it with asyncio in order to keep it leaner (not sure whether that specific case could be solved with asyncio, though?
Maybe there's an even better solution that I don't see?
Looking forward to any input/suggestions!
In my Django project I'm using Celery with a RabbitMQ broker for asynchronous tasks, how can I record the information of all of my tasks (e.g. created time (task appears in queue), worker consume task time, execution time, status, ...) to monitor how Celery is doing?
I know there are solutions like Flower but that seems to much for what I need, django-celery-results looks like what I want but it's missing a few information I need like task created time.
Thanks!
It seems like you often find the answer yourself after asking on SO. I settled with using celery signals to do all the recording I want and store the results in a database table.
I use Django and Django Rest Framework for my internal API and I use Vue.js for my frontend. The backend (API) and the frontend are totally separated.
I need to run a background task (every time a user is created) and I am considering 2 solutions:
Call (with a post_save signal) a function that runs the task.
Note that this function will call a 3rd party API. The call might fail for various reasons and/or run during a long period ~20sec.
Create a background task
With Redis or RabbitMQ or django-background-tasks.
Which solution should I go for ?
If both solutions are acceptable, what would be the limitations/advantages of each one ?
You might need django celery. This is a great package for background tasks for django, you can choose either Redis or RabbitMQ as the broker, where the brokers doesn't matter much on my opinion.
Why can this be a good solution for your problem?
This is easy to install where you just need to install the django celery and redis(I prefer redis), configure some settings and you have now async functions.
You might need soon a scheduled task, where you just need to install its additional package.
You only need to build type function and attach a decorator for it to be async.
from celery import shared_task
#shared_task
def add(x,y):
return X+y
and call it anywhere in you code
add.delay()
you know how background task.
So I'm trying to accomplish the following. User browses webpage and at the sime time there is a task running in the background. When the task completes it should return args where one of args is flag: True in order to trigger a javascript and javascript shows a modal form.
I tested it before without async tasks and it works, but now with celery it just stores results in database. I did some research on tornado-celery and related stuff but some of components like tornado-redis is not mantained anymore so it would not be vise in my opinion to use that.
So what are my options, thanks?
If I understand you correctly, then you want to communicate something from the server side back to the client. You generally have three options for that:
1) Make a long pending request to the server - kinda bad. Jumping over the details, it will bog down your web server if not configured to handle that, it will make your site score low on performance tests and if the request fails, everything fails.
2) Poll the server with numerous requests with a time interval (0.2 s, something like that) - better. It will increase the traffic, but the requests will be tiny and will not interfere with the site's performance very much. If you instate a long interval to not load the server with pointless requests, then the users will see the data with a bit of a delay. On the upside this will not fail (if written correctly) even if the connection is interrupted.
3) Websockets where the server can just hit the client with any message whenever needed - nice, but takes some time to get used to. If you want to try, you can use django-channels which is a nice library for Django websockets.
If I did not understand you correctly and this is not the problem at hand and you are figuring how to get data back from a Celery task to Django, then you can store the Celery task ID-s and use the ID-s to first check, if the task is completed and then query the data from Celery.
One of the characteristics I love most about Google's Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
User request comes in and triggers a few tasks that shouldn't be blocking.
Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
The gearman server finds a worker, passes the url and post data to a worker
The worker simply posts to the url with the data, thus executing the task.
Assume the following:
Each request from a gearman worker is signed somehow so that we know it's coming from a gearman server and not a malicious request.
Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here's one that worries me:
The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?
As a user of both Django and Google AppEngine, I can certainly appreciate what you're getting at. At work I'm currently working on the exact same scenario using some pretty cool open source tools.
Take a look at Celery. It's a distributed task queue built with Python that exposes three concepts - a queue, a set of workers, and a result store. It's pluggable with different tools for each part.
The queue should be battle-hardened, and fast. Check out RabbitMQ for a great queue implementation in Erlang, using the AMQP protocol.
The workers ultimately can be Python functions. You can trigger workers using either queue messages, or perhaps more pertinent to what you're describing - using webhooks
Check out the Celery webhook documentation. Using all these tools you can build a production ready distributed task queue that implements your requirements above.
I should also mention that in regards to your first pitfall, celery implements rate-limiting of tasks using a Token Bucket algorithm.