Celery and Twitter streaming api with Django - django

I'm having a really hard time conceptualising how I can connect to the the twitter streaming api and process tweets via an admin interface provided by Django.
The main problem is starting a daemon from Django and having the ability to stop/start it, plus making sure there is provision for monitoring. I don't really want to use upstart for this purpose because I want to try and keep the project as self contained as possible.
I'm currently attempting the following and am unsure if it's perhaps the wrong way to go about things
Start a celery task from Django which establishes a persistent connection to the streaming API
The above task creates subtasks which will process tweets and store them
Because celeryd runs as a daemon it will automatically run the first task again if the connection breaks and the task fails - does this mean I don't need any additional monitoring?
Does the above make sense or have I misunderstood how celery works?

Related

Create a django background worker, best practices

I have an API in Django, which should consume a RabbitMQ queue. I'm currently using Amqpstorm Robust Consumer, instantiated by django command management, like this:
python3 manage.py mycommand.
and API runs in another container using gunicorn.
My queue has a high message rate coming in. My concern is if this is the best practice to run this kind of process in the background with django. Is there a better way to run this worker? I say this because my container is closing sporadically, without logging anyone, just this message:
Logs
Killed.

Django and Celery - How to Distribute?

I'm trying to distribute Django and Celery.
I've created a small project with Django and Celery. Django will request a Celery Worker to work on some data on the database. Then the data is passed back to Django.
My idea is that:
Django stack installed on one server
Message queue (RabbitMQ) on one server
Celery worker on one server
Hence 3 Servers in Total
However, the problem is celery has to use some code from Django, for example models, because it accesses the model. Hence, it would also require settings.py file to know what are the servers.
Does this mean that for #3, I would need to install Django and Celery on the server, but disable Django and only run celery? For example celery -A PROJECT_NAME worker -l INFO, but without an Apache Server for Django?
If you want your celery workers to operate on a different server, you need to make sure that all the resources required by the worker are accessible from that server.
For example, if you have a simple task, you can copy only the code required for that task to the server. If your worker needs any other resources like some other code, files, db you need to make sure it has access.
Really, if you want to have two servers working on the same tasks, you will have to use a simple web interface (such as Flask) to communicate between the servers (and extend the functionality of your queue). Then, you will have to ensure they are both using the same data source.
Consider hosting your database remotely, or have the remote server access the database remotely. Either way, any workers running on a server will need access to the database and all source code necessary to complete the task. Then, you must simply have the two servers share a messaging queue.
Source: how to configure and run celery worker on remote system

Remote Django application sending messages to RabbitMQ

I'm starting to get familiar with the RabbitMQ lingo so I'll try my best to explain. I'll be going into a public beta test in a few weeks and this is the set up I am hoping to achieve. I would like Django to be the producer; producing messages to a remote RabbitMQ box and another Celery box listening on the RabbitMQ queue for tasks. So in total there would be three boxes. Django, RabbitMQ & Celery. So far, from the Celery docs, I have successfully been able to run Django and Celery together and Rabbit MQ on another machine. Django simply calls the task in the view:
add.delay(3, 3)
And the message is sent over to RabbitMQ. RabbitMQ sends it back to the same machine that the task was sent from (since Django and celery share the same box) and celery processes the task.
This is great for development purposes. However, having Django and Celery running on the same box isn't a great idea since both will have to compete for memory and CPU. The whole goal here is to get clients in and out of the HTTP Request cycle and have celery workers process the tasks. But the machine will slow down considerably if it is accepting HTTP requests and also processing tasks.
So I was wondering is there was a way to make this all separate from one another. Have Django send the tasks, RabbitMQ forward them, and Celery process them (Producer, Broker, Consumer).
How can I go about doing this? Really simple examples would help!
What you need is to deploy the code of your application on the third machine and execute there only the command that handles the tasks. You need to have the code on that machine also.

Django+celery: how to show worker logs on the web server

I'm using Django+celery for my 1st ever web development project, and rabbitmq as the broker. My celery workers are running on a different system from the web server and are executing long-running tasks. During the task execution, the task output will be dumped to local log files on the workers. I'd like to display these task log files through the web server so the user can know in real-time where the execution is, but I've no idea how I should transfer these log files between the workers and the system where the web server is. Any suggestion is appreciated.
Do not move logs, just log to the same place. It can be really any database (relational or non-relational) accessible from the web server and Celery workers. You can even create (or look for) appropriate python logging handler, saving log records to the centralized storage.
Maybe the solution isn't to move the logs, but to aggregate them. Take a look at some logging tools like splunk, loggly or logscape.

Getting the Twitter stream on Heroku with Django App

I need to constantly monitor a Twitter stream using Heroku. Basically what I want to do is start the monitoring process up and never have it stop. I was looking into celery, but from my understanding of it, it looks like a user initiated or short term process adds tasks to a queue that are then processed by another queue. This is different model than having a background process constantly monitoring a Twitter stream. What would be the best way to monitor a Twitter stream for a Django app on Heroku?
I'm not aware of anything in Django that can run in the background like that. It's certainly one of the limitations of living in the web-app sandbox.
If you have access to your server in Heroku (?) you could write your own script/application along the lines of this tutorial and daemonize using Supervisord.
if not:
Celery has a nice periodic scheduler. If you're okay with polling instead of streaming (API), I might just use the twitter REST API and the scheduler in Celery to periodically poll and update. It's helpful to match the scheduling with the rate limits as well.