How consumer a message from celery - django

I have a few questions about celery.
1 celery contains producer and consumer.
Is the task in celery equals to producer?
What is the consumer?
2 I call a task to send message. How can I consume the message in other places?
Now I have read the docs of celery and rabbitmq. I want to develop a message center with django.
Message center is where user can receive message from other users and system How can I design this?

This is not the right approach.
Celery is used to queue/distribute messages which are consumed. Once a message is consumed, it's gone forever.
An example of this is sending documents to a set of printers. Documents are put on the queue. Each printer consumes from the queue when it's available to print. Once it's printed, it "acknowledges" the document which removes it from the queue permanently. If a printer fails to print for some reason (runs out of ink), it tells celery it was unable to process the document. The document is then made available for a different printer to process.
Think of celery as a queue/flow system. Using it for messages might make sense if you've got multiple servers and need to route messages to the appropriate one.
In your case, you want a database table of messages with a fromId, toId, message, date, etc...
That way, a user can see the message more than once.


How do confirm_publish and acks_late compare in Celery?

I've noticed the following in the docs:
Ideally task functions should be idempotent: meaning the function won’t cause unintended effects even if called multiple times with the same arguments. Since the worker cannot detect if your tasks are idempotent, the default behavior is to acknowledge the message in advance, just before it’s executed, so that a task invocation that already started is never executed again.
If your task is idempotent you can set the acks_late option to have the worker acknowledge the message after the task returns instead. See also the FAQ entry Should I use retry or acks_late?.
If set to True messages for this task will be acknowledged after the task has been executed, not just before (the default behavior).
Note: This means the task may be executed multiple times should the worker crash in the middle of execution. Make sure your tasks are idempotent.
Then there is the BROKER_TRANSPORT_OPTIONS = {'confirm_publish': True} option found here. I could not find official documentation for that.
I want to be certain that tasks which are submitted to celery (1) arrive at celery and (2) eventually get executed.
Here is how I think it works:
Celery stores the information which tasks should get executed in a broker (typically RabbitMQ or Redis)
The application (e.g. Django) submits a task to Celery which immediately stores it in the broker. confirm_publish confirms that it was added (right?). If confirm_publish is set but the confirmation is missing, it retries (right?).
Celery takes messages from the broker. Now celery behaves as a consumer for the broker. The consumer acknowledges (confirms) that it received a message an the broker stores this information. If the consumer didn't sent an acknowledgement, the broker will re-try to send it.
Is that correct?

another reliable way to do PULL-PUSH sync in ZeroMQ

If you're using PUSH sockets, you'll find that the first PULL socket to connect will grab an unfair share of messages. The accurate rotation of messages only happens when all PULL sockets are successfully connected, which can take some milliseconds. As an alternative to PUSH/PULL, for lower data rates, consider using ROUTER/DEALER and the load balancing pattern.
So one way to do sync in PUSH/PULL is using the load balancing pattern.
For this specific case below, I wonder whether there is another way to do sync:
I could set the PULL endpoint in worker to block until the connection successfully setup, and then send a special message via worker's PULL endpoint to 'sink'. After 'sink' receives #worker's special messages, 'sink' sends a message with REQ-REP to 'ventilator' to notify that all workers ready. 'ventilator' starts to distribute jobs to workers.
Is it reliable?
The picture is from here
Yes, so long as the Sink knows how many Workers to wait for before telling the Ventilator that it's OK to start sending messages. There's the question of whether the special messages from the Workers get through if they start up before the Sink connects - but you could solve that by having them keep sending their special message until they start getting data from the Ventilator. If you do this, the Sink would of course simply ignore any duplicates it receives.
Of course, that's not quite the same as the Workers having a live, working connection to the Ventilator, but that could itself be sending out special do-nothing messages that the Workers receive. When they receive one of those that's when they can start sending a special message to the Sink.

Amazon Message Queue Service (SQS) sending and confirmation

A elastic beanstalk environment has a few web servers which sends a request to another worker environment for user sign up, etc.
When one of the worker machines have finished the task, I also want it to send a confirmation to the web server.
It seems that SQS does not have "confirmation" message.
When we offload a job to send a email, but I also want to let the web server to know that send email was successful.
One solution I could do is implement another queue that the web server polls, however, many servers can poll on the same queue, and the confirmation for Server 1, can be recieved by Server 2, and we would need to wait for a timeout for the message, but then Server 3 might intercept the message. It could wait a while for Server 1 to get a confirmation.
The way your worker machines "confirm" they processed a message is by deleting it from the queue. The lifecycle of a queue message is:
Web server sends a message with a request.
Worker receives the message.
Worker processes the message successfully.
Worker deletes the message.
A message being deleted is the confirmation that it was processed successfully.
If the worker does not delete the message in step 4 then the message will reappear in the queue after a specified timeout, and the next worker to check for messages will get it and process it.
If you are concerned that a message might be impossible to process and keep reappearing in the queue forever, you can set up a "dead letter" queue. After a message has been received a specified number of times but never deleted, the message is transferred to the dead letter queue, where you can have a process for dealing with these unusual situations.
Amazon's Simple Queue Service Developer Guide has a good, clear explanation of this lifecycle and how you can be sure that every message is processed or moved to a dead letter queue for special handling.

How to detect stale workers (or auto-restart)

We recently experienced a nasty situation with the celery framework. There were a lot of messages in the queue, however those messages weren't processed. We restarted celery and the messages started being processed again. However we do not want a situation like this happening again and are looking for a permanent solution.
It appears that celery's workers have gone stale. The documentation of celery notes the following on stale workers:
This shows that there’s 2891 messages waiting to be processed in the task queue, and there are two consumers processing them.
One reason that the queue is never emptied could be that you have a stale worker process taking the messages hostage. This could happen if the worker wasn’t properly shut down.
When a message is received by a worker the broker waits for it to be acknowledged before marking the message as processed. The broker will not re-send that message to another consumer until the consumer is shut down properly.
If you hit this problem you have to kill all workers manually and restart them
See documentation
However this relies on manual checking for stale workers, leaving lots of room for error and costing manual labor. What would be a good solution to keep celery working?
You could use supervisor or supervisor-like tools to deploy the workers, refer to Running the worker as daemon .
Moreover, you could monitor the queue status with rabbitmq-management, to check if the queue become too large, assume that you are using RabbitMQ; celery monitoring also provide some mechanisms for monitoring

Django Celery FIFO

So I have this 2 applications connected with a REST API (json messages). One written in Django and the other in Php. I have an exact database replica on both sides (using mysql).
When i press "submit" on one of them, i want that data to be saved on the current app database, and start a cron job with celery/redis to update the remote database for the other app using rest.
My question is, how do i attribute the same worker to my tasks in order to keep a FIFO order?
I need my data to be consistent and FIFO is really important.
Ok i am going to detail what i want to do a little further:
So i have this django app, and when i press submit after i fill in the form my celery worker wakes up and takes care of taking that submitted data and posting to a remote server. This i can do without problems.
Now, imagine that my internet goes down at that exact time, my celery worker keeps retrying to send until it is successful But imagine i do another submit before my previous data is submitted, my data wont be consistent on the other remote server.
Now that is my problem. I am not able to make this requests FIFO with the retry option given by celery so i that's were i need some help figuring that out.
this is the answer i got from another forum:
Use named queues with celery:
Start a worker process with a single worker:
Set this worker to consume from the appropriate queue:
For the fifo part i can sort my celery broker in a fifo order before sending my requests