Create a django background worker, best practices - django

I have an API in Django, which should consume a RabbitMQ queue. I'm currently using Amqpstorm Robust Consumer, instantiated by django command management, like this:
python3 manage.py mycommand.
and API runs in another container using gunicorn.
My queue has a high message rate coming in. My concern is if this is the best practice to run this kind of process in the background with django. Is there a better way to run this worker? I say this because my container is closing sporadically, without logging anyone, just this message:
Logs
Killed.

Related

How can I make celery more robust with regards to dropped tasks?

Occasionally (read: too often) my celery setup will drop tasks. I'm running the latest celery 4.x on Django 1.11 with a redis backend for the queue and results.
I don't know exactly why tasks are being dropped, but what I suspect is that a worker is starting a job, then the worker is killed for some reason (autoscaling action, redeployment, out-of-memory...) and the job is killed in the middle.
At this point probably it has exited the redis queue and it won't be picked up again.
So my questions are:
How can I monitor this kind of thing? I use celerymon, and the task is not reported as failed, and yet I don't see in my database the data that I expected by the task that I suspect failed.
How can I make celery retry such tasks without implementing my own "fake queue" with flags in the database?
How do I make celery more robust and dependable in general?
Thanks for any pointers!
You have to use RabbitMq instead redis, I read this in the celery documentation(right here: https://docs.celeryproject.org/en/stable/getting-started/first-steps-with-celery.html#choosing-a-broker):
RabbitMQ is feature-complete, stable, durable and easy to install. It’s an
excellent choice for a production environment.
Redis is also feature-complete, but is more susceptible to data loss in the
event of abrupt termination or power failures.
Using rabbit mq your problem of lossing message on restart have to gone.

Remote Django application sending messages to RabbitMQ

I'm starting to get familiar with the RabbitMQ lingo so I'll try my best to explain. I'll be going into a public beta test in a few weeks and this is the set up I am hoping to achieve. I would like Django to be the producer; producing messages to a remote RabbitMQ box and another Celery box listening on the RabbitMQ queue for tasks. So in total there would be three boxes. Django, RabbitMQ & Celery. So far, from the Celery docs, I have successfully been able to run Django and Celery together and Rabbit MQ on another machine. Django simply calls the task in the view:
add.delay(3, 3)
And the message is sent over to RabbitMQ. RabbitMQ sends it back to the same machine that the task was sent from (since Django and celery share the same box) and celery processes the task.
This is great for development purposes. However, having Django and Celery running on the same box isn't a great idea since both will have to compete for memory and CPU. The whole goal here is to get clients in and out of the HTTP Request cycle and have celery workers process the tasks. But the machine will slow down considerably if it is accepting HTTP requests and also processing tasks.
So I was wondering is there was a way to make this all separate from one another. Have Django send the tasks, RabbitMQ forward them, and Celery process them (Producer, Broker, Consumer).
How can I go about doing this? Really simple examples would help!
What you need is to deploy the code of your application on the third machine and execute there only the command that handles the tasks. You need to have the code on that machine also.

Celery and Twitter streaming api with Django

I'm having a really hard time conceptualising how I can connect to the the twitter streaming api and process tweets via an admin interface provided by Django.
The main problem is starting a daemon from Django and having the ability to stop/start it, plus making sure there is provision for monitoring. I don't really want to use upstart for this purpose because I want to try and keep the project as self contained as possible.
I'm currently attempting the following and am unsure if it's perhaps the wrong way to go about things
Start a celery task from Django which establishes a persistent connection to the streaming API
The above task creates subtasks which will process tweets and store them
Because celeryd runs as a daemon it will automatically run the first task again if the connection breaks and the task fails - does this mean I don't need any additional monitoring?
Does the above make sense or have I misunderstood how celery works?

Getting started with Celery in Django

I'm currently working on a project and I'd like to integrate asynchronous task processing as well as some sort of message queue early on so that I'll be able to scale up quickly by simply adding message queue processor servers to the cluster.
I came across Celery a while back and it caught my eye. Since it's pretty well integrated with Django, I figured I'd get pretty good support with it. I'm just not really sure how to start, as there's a lot of configuration involved.
For now, I'm running just about everything out of my Django project (serving static files, pipeline, etc.) so I'd like to have a messaging queue built in to run with django runserver if possible. (Don't worry, this is only for development.) How can I get started using Celery with my existing Django project?
djkombu is now deprecated, the django transport is now directly integrated in the kombu package.
For defining the backend in your Django settings.py, you can use:
BROKER_BACKEND = "django"
You can find different transport aliases from Kombu here.
This was tested with django-celery 2.5.5, celery 2.5.3 and kombu 2.1.8.
Celery has quite a nice documentation, also for those getting started, but two facts worth being mentioned for beginners:
Use djkombu as the BROKER_BACKEND. This will give you a pretty simple message queue for development, where all messages are stored in the SQL database used by django. Due to celery's api you can easily replace it with a "real" message queue for production:
BROKER_TRANSPORT = "kombu.transport.django"
Django-celery has a setting CELERY_ALWAYS_EAGER. If set to True there will be no asynchronous background processing, all tasks that are getting called via celery will be run synchronously (so no need to start any additional celery workers - very useful for debugging as well).

Getting the Twitter stream on Heroku with Django App

I need to constantly monitor a Twitter stream using Heroku. Basically what I want to do is start the monitoring process up and never have it stop. I was looking into celery, but from my understanding of it, it looks like a user initiated or short term process adds tasks to a queue that are then processed by another queue. This is different model than having a background process constantly monitoring a Twitter stream. What would be the best way to monitor a Twitter stream for a Django app on Heroku?
I'm not aware of anything in Django that can run in the background like that. It's certainly one of the limitations of living in the web-app sandbox.
If you have access to your server in Heroku (?) you could write your own script/application along the lines of this tutorial and daemonize using Supervisord.
if not:
Celery has a nice periodic scheduler. If you're okay with polling instead of streaming (API), I might just use the twitter REST API and the scheduler in Celery to periodically poll and update. It's helpful to match the scheduling with the rate limits as well.