Django Celery FIFO - django

So I have this 2 applications connected with a REST API (json messages). One written in Django and the other in Php. I have an exact database replica on both sides (using mysql).
When i press "submit" on one of them, i want that data to be saved on the current app database, and start a cron job with celery/redis to update the remote database for the other app using rest.
My question is, how do i attribute the same worker to my tasks in order to keep a FIFO order?
I need my data to be consistent and FIFO is really important.
Ok i am going to detail what i want to do a little further:
So i have this django app, and when i press submit after i fill in the form my celery worker wakes up and takes care of taking that submitted data and posting to a remote server. This i can do without problems.
Now, imagine that my internet goes down at that exact time, my celery worker keeps retrying to send until it is successful But imagine i do another submit before my previous data is submitted, my data wont be consistent on the other remote server.
Now that is my problem. I am not able to make this requests FIFO with the retry option given by celery so i that's were i need some help figuring that out.

this is the answer i got from another forum:
Use named queues with celery:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues
Start a worker process with a single worker:
http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html#starting-the-worker-process
Set this worker to consume from the appropriate queue:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues-adding-consumers
For the fifo part i can sort my celery broker in a fifo order before sending my requests

Related

How to record all tasks information with Django and Celery?

In my Django project I'm using Celery with a RabbitMQ broker for asynchronous tasks, how can I record the information of all of my tasks (e.g. created time (task appears in queue), worker consume task time, execution time, status, ...) to monitor how Celery is doing?
I know there are solutions like Flower but that seems to much for what I need, django-celery-results looks like what I want but it's missing a few information I need like task created time.
Thanks!
It seems like you often find the answer yourself after asking on SO. I settled with using celery signals to do all the recording I want and store the results in a database table.

control consume speed of celery worker

I am trying to set up Celery with my Django app. This app will let user call some API and get the response. However this API has limitation that only allow user to call it not more than once per second.
So I am planning to use celery to control the calling frequency. Multiple users might use this app at the same time, so I think I should set up the calling part as worker. Everybody can submit requests to call the API into queue, worker will consume them.
Here is the question: how can I limit the work to consume the task(i.e. send the api request) as 1 per second?
Thanks!
celery tasks have a rate_limit option that should do what you want. This is per-worker so you'll need to use a dedicated queue to ensure that all requests for this task got the the one worker.

How consumer a message from celery

I have a few questions about celery.
1 celery contains producer and consumer.
Is the task in celery equals to producer?
What is the consumer?
2 I call a task to send message. How can I consume the message in other places?
Now I have read the docs of celery and rabbitmq. I want to develop a message center with django.
Message center is where user can receive message from other users and system How can I design this?
This is not the right approach.
Celery is used to queue/distribute messages which are consumed. Once a message is consumed, it's gone forever.
An example of this is sending documents to a set of printers. Documents are put on the queue. Each printer consumes from the queue when it's available to print. Once it's printed, it "acknowledges" the document which removes it from the queue permanently. If a printer fails to print for some reason (runs out of ink), it tells celery it was unable to process the document. The document is then made available for a different printer to process.
Think of celery as a queue/flow system. Using it for messages might make sense if you've got multiple servers and need to route messages to the appropriate one.
In your case, you want a database table of messages with a fromId, toId, message, date, etc...
That way, a user can see the message more than once.

django celery rabbitmq execute delay

I use Django-Celery +rabbitmq to execute some asyn tasks,I define a queue 'sendmail' to execute send email task,send mail is triggered by a specific task(this task has own queue), but now I encounter a problem,after the specific task finish, the mail sometimes send at once, sometimes need 5-20minutes.I want to know what reason caused it.
Django-celery will package the taskname and param as message to rabbitmq when call task.delay().
I want to know when the message go to the rabbitmq, but use web management tool only can see total messages,can't see the every message's detail, especially the time the message reached. Django-celery log can only see the work got from broker time and execute task time.I want to know all related timepoint to sure which step the time main consumed.
Django-Celery does (I believe) report task data on a per-task basis. When you sync your database, it crates a bunch of monitoring tables which are accessible via the admin. However, in order for these tasks to be recorded in these tables, you need to run the celerycam program in the django context (python ./manage.py celerycam). The celerycam program will take "snapshots" of your tasks every second or so (by default) and record information about them. Another useful tool for monitoring is the celerymon program (which also has to run in the django context). This is a command line ncurses program that reports real-time information about tasks as they occur. Finally, rabbitmqctrl has a bunch of options that might help with monitoring.
This is a particularly useful page in the docs:
http://celery.github.com/celery/userguide/monitoring.html
Anyway, this is what I use to monitor my tasks when using celery.

Simulating Google Appengine's Task Queue with Gearman

One of the characteristics I love most about Google's Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
User request comes in and triggers a few tasks that shouldn't be blocking.
Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
The gearman server finds a worker, passes the url and post data to a worker
The worker simply posts to the url with the data, thus executing the task.
Assume the following:
Each request from a gearman worker is signed somehow so that we know it's coming from a gearman server and not a malicious request.
Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here's one that worries me:
The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?
As a user of both Django and Google AppEngine, I can certainly appreciate what you're getting at. At work I'm currently working on the exact same scenario using some pretty cool open source tools.
Take a look at Celery. It's a distributed task queue built with Python that exposes three concepts - a queue, a set of workers, and a result store. It's pluggable with different tools for each part.
The queue should be battle-hardened, and fast. Check out RabbitMQ for a great queue implementation in Erlang, using the AMQP protocol.
The workers ultimately can be Python functions. You can trigger workers using either queue messages, or perhaps more pertinent to what you're describing - using webhooks
Check out the Celery webhook documentation. Using all these tools you can build a production ready distributed task queue that implements your requirements above.
I should also mention that in regards to your first pitfall, celery implements rate-limiting of tasks using a Token Bucket algorithm.