Subscribe to a redis channel in Django project - django

I have multiple applications written with nodejs or python/django or ...
These services are working fine. But need to have pub/sub Async communication with each other.
In nodejs there is no problem and easily can pub/sub to any redis channel.
Question: My question is how can i continuously subscribe to a redis channel and receive data published with other services?
Note: many links suggest to use django-channels. But I guess that's not the way to do it. If so can any one help me and give details on how to do it.
Update:
Django by default is not event-based like nodejs. So if i use a redis client, I should for example check redis every second and see if anything is published or not. I don't think using just a redis client in python will be enough.
Really appreciate it.

There are a lot of alternatives. If you have FIFO issue you have to use queues in order to connect one microservice to another. For me, if you don’t have Big Data problem you can use RabbitMQ, It is very practical and very effective, otherwise if you have Big Data problem you can use Kafka. There are wide variety services.
If you want just Pub/Sub. The best tool is Redis, It is very fast and easy to integrate. If you are concerned how to implement it in Python just look at article
[Update]
It's possible to create a manage.py command in django and subscribe to redis in that management file and execute this script separated from django server:
class Command(BaseCommand):
def handle(self, *args, **options):
r = redis.StrictRedis(host='localhost', port=6379, db=1)
p = r.pubsub()
p.psubscribe('topic.*')
for message in p.listen():
if message:
print('message received, do anything you want with it.')

In order to handle subscriptions to redis, you will need to have a separate continuously running process (server) which listens to redis and then makes something with your data. django-channels will do the same by running the code in a worker
As pointed above, Django provides convenient way to run "servers" by using Django management command approach. When running a django management command, you have complete access to your code, i.e. to ORM as well.
Only detail that you mentioned Async communication. Here you need to take into account that Django's ORM is strictly sync code, and you need to pay attention how you want to use ORM with async code. Probably you need to clarify what do you mean by async here.
As for redis messages processing, you can use any libraries that work with it. For example, aioredis or redis-py

Related

Should I use a task queue (Celery), ayncio or neither for an API that polls other APIs?

I have written an API with Django which purpose is to operate as a bridge between a website back-end and external services we use, so that the website doesn't have to handle many requests to external APIs (CRM, calendar events, email providers etc.).
The API mainly polls other services, parses the results and forwards them to the website backend.
I initially went for a Celery-based task queue, as it seemed to me like the right tool to offload that processing to another instance, but I'm starting to think it doesn't really fit the purpose.
As the website expects synchronous responses, my code contains a lot of :
results = my_task.delay().get()
or
results = chain(fetch_results.s(), parse_results.s()).delay().get()
Which doesn't feel like the proper way to use Celery tasks.
It is efficient when pulling dozens of requests and processing the results in parallel - a periodic refresh task for example - but adds a lot of overhead for simple requests (fetch - parse - forward), which represent most of the traffic.
Should I go full synchronous for those "simple requests" and keep Celery tasks for specific scenarios ? Is there an alternative design (maybe involving asyncio) that would better suit the purpose of my API ?
Using Django, Celery (w/ Amazon SQS) on an EBS EC2 instance.
You could consider using Gevent with your Django webserver to allow it to operate efficiently for the "simple requests" you've mentioned without being blocked. If you proceed with this approach, be sure to pool database connections with PgBouncer or Pgpool-II or a Python library since each greenlet will make its own connection.
Once you've implemented that, it's possible to also use Gevent instead of Celery to handle asynchronous processing by joining on multiple Greenlets that each make an external API request, rather than incur the overhead of passing messages to an external celery worker.
Your implementation is similar to what we've done at Kloudless, which provides a single API to access multiple other APIs, including CRM, calendar, storage, etc.

Handling long requests

I'm working on a long request to a django app (nginx reverse proxy, mysql db, celery-rabbitMQ-redis set) and have some doubts about the solution i should apply :
Functionning : One functionality of the app allows users to migrate thousands of objects from one system to another. Each migration is logged into a db, and the users are provided the possibility to get in a csv format the history of the migration : which objects have been migrated, which status (success, errors, ...)
To get the history, a get request is sent to a django view, which returns, after serialization and rendering into csv, the download response.
Problem : the serialisation and rendering processes, for a large set of objects (e.g. 160 000) are quite long and the request times out.
Some solutions I was thinking about/found thanks to pervious search are :
Increasing the amount of time before timeout : easy, but I saw everywhere that this is a global nginx setting and would affect every requests on the server.
Using an asynchronous task handled by celery : the concept would be to make an initial request to the server, which would launch the serializing and rendering task with celery, and give a special httpresponse to the client. Then the client would regularly ask the server if the job is done, and the server would deliver the history at the end of processing. I like this one but I'm not sure about how to technically implement that.
Creating and temporarily storing the csv file on the server, and give the user a way to access it & to download it. I'm not a big fan of that one.
So my question is : has anyone already faced a similar question ? Do you have advises for the technical implementation of the solution (#2), or a better solution to propose me ?
Thqnks !
Clearly you should use Celery + RabbitMQ/REDIS. If you look at the docs it´s not that hard to setup.
The first question is whether to use RabbitMQ or Redis. There are many SO questions about this with good information about pros/cons.
The implementation in django is really simple. You can just wrap django functions with celery tasks (with #task attribute) and it´ll become async, so this is the easy part.
The problem I see in your project is that the server who is handling http traffic is the same server running the long process. That can affect performance and user experience even if celery is running on the background. Of course that depends on how much traffic you are expecting on that machine and how many migrations can run at the same time.
One of the things you setup on Celery is the number of workers (concurrent processing units) available. So the number of cores in your machine will matter.
If you need to handle http calls quickly I would suggest to delegate the migration process to another machine. Celery/REDIS can be configured that way. Let´s say you´ve got 2 servers. One would handle only normal django calls (no celery) and trigger celery tasks on the other server (the one who actually runs the migration process). Both servers can connect to the same database.
But this is just an infrastructure optimization and you may not need it.
I hope this answers your question. If you have specific Celery issues it would be better to create another question.

Django + RabbitMQ + Celery all on different machines (Servers)

I managed to get Django and RabbitMQ and Celery work on single machine. I have followed instructions from here. Now I want to make them work together but in situation when they are on different servers. I do not want Django knows anything about Celery nor Celery about Django.
So, basically I just want in Django to send some message to RabbitMQ queue (probably id, type of task, maybe some other info), and then I want RabbitMQ to publish that message (when its possible) to Celery on another server. Celery/Django should not know about each other, basically I want architecture where it is easy to replace any one of them.
Right now I have in my Django several calls like
create_project.apply_async(args, countdown=10)
I want to replace that with similar calls directly to RabbitMQ (as I said Django should not depend on Celery). Then, RabbitMQ should notify Celery (when it is possible) and Celery will do its job (probably interact with Django but through REST interface).
Also, I have need to have Celery workers on two or more servers and I want RabbitMQ to notify only one of them depending on some field in message. If this is to complicated I could just check in every task (on different machines) something like: is this is something you should do (like checking ip address field in message) and if its not than just stop with execution of task.
How can I achieve this? if possible I would prefer code + configuration examples not just theoretical explanation.
Edit:
I think that for my use case celery is total overhead. Simple RabbitMQ
routing with custom clients will do the job. I already tried simple
use case (one server) and it works perfectly. It should be easy to
make communication multi-server ready. I do not like celery. It is
"magical", hides too many details and it is not easy to configure. But I will leave this question alive, because I am interested in others opinions.
The short of it
How can I achieve this?
Celery only sends the task name and a serialized set of parameters as the message body. That is your scenario is absolutely in line with how Celery operates.
if possible I would prefer code + configuration examples not just theoretical explanation.
For the client app, i.e. your Django app, define stub tasks, like so:
#task
def foo():
pass
For the Celery processing, on your remote server, define the actual tasks to be executed.
#task
def foo():
pass
It is important that the tasks live in the same Python package in both sides (i.e. app.tasks.py, otherwise Celery won't be able to match the message to the actual task.
Note that this also means your Django app becomes untestable if you have set CELERY_ALWAYS_EAGER=True, unless you make the Celery apps's tasks.py available locally to the Django app.
Even Simpler Alternative
An alternative to the above stub tasks is to send tasks by name:
>>> app.send_task('tasks.add', args=[2, 2], kwargs={})
<AsyncResult: 373550e8-b9a0-4666-bc61-ace01fa4f91d>
On Message Patterns
Also, I have need to have Celery workers on two or more servers and I want RabbitMQ to notify only one of them depending on some field in message.
RabbitMQ offers several messaging patterns, their tutorials are quite well written and to the point. What you want (one message processed by one worker) is trivially achieved with a simple queue/exchange setup, which (with Celery at least) is the default if you don't do anything else. If you need specific workers to attend to specific tasks/respond to specific messages, use Celery's task routing which works hand-in-hand with RabbitMQ's concept of queues and exchanges.
Trade-Offs
I think that for my use case celery is total overhead. Simple RabbitMQ routing with custom clients will do the job. I already tried simple use case (one server) and it works perfectly.
Of course, you may use RabbitMQ out of the box, at the cost of having to deal with the lower-level API that RabbitMQ provides. Celery adds a task abstraction that makes it very straight forward to build any producer/consumer scenario, essentially using just plain Python functions or methods. Note that this is not a better/worse judgement of either RabbitMQ or Celery -- as always with engineering decisions, there is trade-off involved:
If you use Celery, you probably loose some of the flexibility of the RabbitMQ API, but you gain ease of development, while gaining in development speed and a lower deployment complexity -- it basically just works.
If you use RabbitMQ directly, you gain flexibility, but with this comes deployment complexity that you need to manage yourself.
Depending on the requirements of your project, either approach may be valid - your call, really.
Any sufficiently advanced technology is indistinguishable from magic ;-)
I do not like celery. It is "magical", hides too many details and it is not easy to configure.
I would choose to disagree. It may be "magical" in Arthur C. Clarke's sense, but it certainly is rather easy to configure if you compare it to a plain RabbitMQ setup. Of course if you're also the guy who does the RabbitMQ setup, it may just add a layer of abstraction that you don't really gain anything from. Maybe your developers will?

Celery and Twitter streaming api with Django

I'm having a really hard time conceptualising how I can connect to the the twitter streaming api and process tweets via an admin interface provided by Django.
The main problem is starting a daemon from Django and having the ability to stop/start it, plus making sure there is provision for monitoring. I don't really want to use upstart for this purpose because I want to try and keep the project as self contained as possible.
I'm currently attempting the following and am unsure if it's perhaps the wrong way to go about things
Start a celery task from Django which establishes a persistent connection to the streaming API
The above task creates subtasks which will process tweets and store them
Because celeryd runs as a daemon it will automatically run the first task again if the connection breaks and the task fails - does this mean I don't need any additional monitoring?
Does the above make sense or have I misunderstood how celery works?

Long running tasks with Django

My goal is to create an application that will be able to do long-lasting mainly system tasks, such as:
checking out code from the repositories,
copying directories between various localizations,
etc.
The problem is I need to prepare it somehow independently from the web browser. I mean that for example after starting the checkout/copy action, closing the web browser will not interrupt the action. So after going back to that site I can see that the copying goes on or another action started when the browser was closed...
I was searching through various tools, like RabbitMQ + Celery, Twisted, Pyro, XML-RPC but I don't know if any of these will be suitable for me. Has anyone encountered similar needs when creating Django app? Please let me know if there are any methods/packages that I should know. Code samples also will be more than welcome!
Thank you in advance for your suggestions!
(And sorry for my bad English. I'm working on it.)
Basically you need to have a process that runs outside of the request. The absolute simplest way to do this (on a Unix-like operating system, at least) is to fork():
if os.fork() == 0:
do_long_thing()
sys.exit(0)
… continue with request …
This has some downsides, though (ex, if the server crashes, the “long thing” will be lost)… Which is where, ex, Celery can come in handy. It will keep track of the jobs that need to be done, the results of jobs (success/failure/whatever) and make it easy to run the jobs on other machines.
Using Celery with a Redis backend (see Kombu's Redis transport) is very simple, so I would recommend looking there first.
You might need to have a process outside the request / response cycle. If that is the case, Celery with a Redis backend is what I would suggest looking into, as that integrates nicely with Django (as David Wolever suggested).
Another option is to create Django management commands, and then use cron to execute them at scheduled intervals.