MQTT async worker in Django - django

In my Django application, I need to connect to a MQTT broker from several locations.
It will be great if I can create some kind of MQTT worker, which will run in the background/different thread and I can use this worker to publish/subscribe for messages and I don't have to create a separate MQTT connection for each function.
Example:
Create MQTT worker with connection details. On startup, this connection is started and handled, restarted if connection lost, etc... (maybe use Celery for this?)
Create functions which is available inside my Django projects for publish and subscribe. Publish seems more straighforward, but I'm not sure about the subscribe part.
My current implementation:
#shared_task(
bind=True,
name="tasks.send_command",
)
def send_command(self, username):
pedestals = User.objects.filter(username=username)
client = MqttClient("scheduled")
client.connect()
...
#shared_task(
bind=True,
name="tasks.toggle_switch",
)
def toggle_switch(self, switch):
from django.core.cache import cache
client = MqttClient("toggle-switch")
client.connect()
...
As you can see, I need to create the client in every task. Also I'm using it multiple times in Django as well, not just as a Celery task.
How can I create a worker for this. So like:
#shared_task(
bind=True,
name="tasks.toggle_switch",
)
def toggle_switch(self, switch):
from django.core.cache import cache
from mqtt.worker import mqtt_worker
mqtt_worker.publish()
...
That way I could simplify my codebase and I would not have to wait for the client to connect every time the task runs.
I have found mqttasgi but I don't know if it will fit my needs.

Related

Listen to mqtt topics with django channels and celery

I would like a way to integrate django with mqtt and for that the first thing that came in my mind was using django-channels and an mqtt broker that supports mqtt over web sockets, so I could communicate directly between the broker and django-channels.
However, I did not found a way to start a websocket client from django, and acording to this link it's not possible.
Since I'm also starting to study task queues I wonder if it would be a good practice to start an mqtt client using paho-mqtt and then run that in a separate process using celery. This process would then forward the messages receives by the broker to django channels through websockets, this way I could also communicate with the client process, to publish data or stop the mqtt client when needed, and all that directly from django.
I'm a little skeptical about this idea since I also read that process run in celery should not take too long to complete, and in this case that's exactly what I want to do.
So my question is, how much of a bad idea that is? Is there any other option to directly integrate django with mqtt?
*Note: I dont want to have a separate process running on the server, I want to be able to start and stop the process from django, in order to have full control over the mqtt client from the web gui
I found a better way that does not need to use celery.
I simply started a mqtt client on app/apps.py on the ready method, so a client will be started everytime I run the application. From here I can communicate with other parts of the system using django-channels or signals.
apps.py:
from django.apps import AppConfig
from threading import Thread
import paho.mqtt.client as mqtt
class MqttClient(Thread):
def __init__(self, broker, port, timeout, topics):
super(MqttClient, self).__init__()
self.client = mqtt.Client()
self.broker = broker
self.port = port
self.timeout = timeout
self.topics = topics
self.total_messages = 0
# run method override from Thread class
def run(self):
self.connect_to_broker()
def connect_to_broker(self):
self.client.on_connect = self.on_connect
self.client.on_message = self.on_message
self.client.connect(self.broker, self.port, self.timeout)
self.client.loop_forever()
# The callback for when a PUBLISH message is received from the server.
def on_message(self, client, userdata, msg):
self.total_messages = self.total_messages + 1
print(str(msg.payload) + "Total: {}".format(self.total_messages))
# The callback for when the client receives a CONNACK response from the server.
def on_connect(self, client, userdata, flags, rc):
# Subscribe to a list of topics using a lock to guarantee that a topic is only subscribed once
for topic in self.topics:
client.subscribe(topic)
class CoreConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'core'
def ready(self):
MqttClient("192.168.0.165", 1883, 60, ["teste/01"]).start()
If you are using ASGI in your Django application you can use MQTTAsgi. Full disclosure I'm the author of MQTTAsgi.
It's a complete protocol server for Django and MQTT.
To utilize the mqtt protocol server you can run your application, first you need to create a MQTT consumer:
from mqttasgi.consumers import MqttConsumer
class MyMqttConsumer(MqttConsumer):
async def connect(self):
await self.subscribe('my/testing/topic', 2)
async def receive(self, mqtt_message):
print('Received a message at topic:', mqtt_mesage['topic'])
print('With payload', mqtt_message['payload'])
print('And QOS:', mqtt_message['qos'])
pass
async def disconnect(self):
await self.unsubscribe('my/testing/topic')
Then you should add this protocol to the protocol router:
application = ProtocolTypeRouter({
'websocket': AllowedHostsOriginValidator(URLRouter([
url('.*', WebsocketConsumer)
])),
'mqtt': MyMqttConsumer,
....
})
Then you can run the mqtt protocol server with*:
mqttasgi -H localhost -p 1883 my_application.asgi:application
*Assuming the broker is in localhost and port 1883.
I wanted to solve this problem too but found no good solutions out there that really fitted the Channels architecture (though MQTTAsgi came close but it uses paho-mqtt and doesn't fully use the Channels-layer system).
I created: https://pypi.org/project/chanmqttproxy/
(src at https://github.com/lbt/channels-mqtt-proxy)
Essentially it's a fully async Channels 3 proxy to MQTT that allows publishing and subscribing. The documentation show how to extend the standard Channels tutorial so chat messages are seen on MQTT topics - and can be sent from MQTT topics to all websocket browser clients.
I don't know it this is what the OP wants as far as listening to MQTT topics goes but for the general case I think this is a good solution.

Django: How to establish persistent connection to rabbitmq?

I am looking for a way to publish messages to a rabbitmq server from my django application. This is not for task offloading, so I don't want to use Celery. The purpose is to publish to the exchange using the django application and have a sister (non-django) application in the docker container consume from that queue.
This all seems very straightforward, however, I can't seem to publish to the exchange without establishing and closing a connection each time, even without explicitly calling for that to happen.
In an attempt to solve this, I have defined a class with a nested singleton class that maintains a connection to the rabbitmq server using Pika. The idea was that the nested singleton would be instantiated only once, declaring the connection at that time. Any time something is to be published to the queue, the singleton handles it.
import logging
import pika
import os
logger = logging.getLogger('django')
class PikaChannelSingleton:
class __Singleton:
channel = pika.adapters.blocking_connection.BlockingChannel
def __init__(self):
self.initialize_connection()
def initialize_connection(self):
logger.info('Attempting to establish RabbitMQ connection')
credentials = pika.PlainCredentials(rmq_username, rmq_password)
parameters = pika.ConnectionParameters(rmq_host, rmq_port, rmq_vhost, credentials, heartbeat=0)
connection = pika.BlockingConnection(parameters)
con_chan = connection.channel()
con_chan.exchange_declare(exchange='xchng', exchange_type='topic', durable=True)
self.channel = con_chan
def send(self, routing_key, message):
if self.channel.is_closed:
PikaChannelSingleton.instance.initialize_connection()
self.channel.basic_publish(exchange='xchng', routing_key=routing_key,
body=message)
instance = None
def __init__(self, *args, **kwargs):
if not PikaChannelSingleton.instance:
logger.info('Creating channel singleton')
PikaChannelSingleton.instance = PikaChannelSingleton.__Singleton()
#staticmethod
def send(routing_key, message):
PikaChannelSingleton.instance.send(routing_key, message)
rmq_connection = PikaChannelSingleton()
I then import rmq_connection where needed in the django application. Everything works in toy applications and in the python repl, but a new connection is being established every time the send function is being called in the django application. The connection then immediately closes with the message 'client unexpectedly closed TCP connection'. The message does get published to the exchange correctly.
So I am sure there is something going on with django and how it handles processes and such. The question still remains, how do I post numerous messages to a queue without re-establishing a connection each time?
If I understand correctly, connections cannot be kept alive like that in a single-threaded context. As your Django app continues executing, the amqp client is not sending the heartbeats on the channel and the connection will die.
You could use SelectConnection instead of BlockingConnection, probably not easy in the context of Django.
A good compromise could be to simply collect messages in your singleton but only send them all at once with a BlockingConnection at the very end of your Django request.

How to start celery task after django request finished

I need to run celery task only when django request finished.
Is it possible?
I've found that the best way to make sure your task happens after the request is finished is to write a custom middleware. In the process_response method, you can handle any quick actions that don't impact page load time or performance too much. Anything else, you can hand off to Celery. Any saving or database transactions are completed by the time process_response is called (AFAICT).
Try something like this:
Django sends request_finished at the end of every request.
You can access request object through sender argument,
from django.dispatch import receiver
from django.core.signals import request_finished
from app.tasks import my_task
#receiver(request_finished)
def add_celery_task(sender):
if sender.__name__ != 'StaticFilesHandler':
my_task.delay()
If you are running server in development environment it's good to check sender's name to avoid adding too many celery task for every static file you are serving.
You can run the task in the background, using delay method of celery. I mean just before returning the response you can call the delay method to put the task in the background.
Some thing like this:
task_name.delay(arg1, arg2, ...)
By doing this your task will be put into background and run asynchronously, this is not going to block the request response cycle .

django-socketio with Celery: send to socket after async task completes in separate process

How can I access the result of a Celery task in my main Django application process? Or, how can I publish to an existing socket connection from a separate process?
I have an application in which users receive scores. When a score is recorded, calculations are made (progress towards goals, etc), and based on those calculations notifications are sent to interested users. The calculations may take 30s+, so to avoid sluggish UI those operations are performed in a background process via a Celery task, invoked by the post_save signal of my Score model.
Ideally the post_save signal on my Nofication model would publish a message to subscribed clients (I'm using django-socketio, a wrapper for gevent-socketio). This seems straightforward...
Create a Score
Do some calculations on the new Score instance in a background process
Based on those calculations, create a Notification
On Notification save, grab the instance and publish to subscribed clients via socket connection
However after trying the following I'm not sure this is possible:
passing gevent's SocketIOServer instance to the callback method invoked by the task, but this requires pickling the passed object, which isn't possible
storing the socket's session_id (different from Django's session_id) in memchache and retrieving that in the Celery task process.
using Redis pubsub, so methods called by post_save signals on models created in a background process could simply publish to a Redis channel, but listening to chat channel in main application process (that has access to the socket connection) blocks the rest of the application.
I've also tried spawning new threads for each Redis client, which are created for each socket subscriber. As far as I can tell this requires spawning a new gevent.greenlets.Greenlet, and gevent can't be used in multiple threads
Surely this is a solved problem. What am I missing?
You already have django-socketio, writing a pub/sub with redis would be a pity :)
client side:
var socket = new io.Socket();
socket.connect();
socket.on('connect', function() {
socket.subscribe({{ score_update_channel }});
});
server side:
from django_socketio import broadcast_channel
def user_score_update(user):
return 'score_updates_user_%s' % user.pk
channel = user_score_update(user)
broadcast_channel(score_result_data, channel)
You need to run the broadcast on the django-socketio process; if you run it from a different process (celery worker) it will not work (channels are referenced in memory by the django-socketio process); You can solve this by wrapping it in a view and that celery will call (making a real http request) when the task is complete.

Django multiprocessing and database connections

Background:
I'm working a project which uses Django with a Postgres database. We're also using mod_wsgi in case that matters, since some of my web searches have made mention of it. On web form submit, the Django view kicks off a job that will take a substantial amount of time (more than the user would want to wait), so we kick off the job via a system call in the background. The job that is now running needs to be able to read and write to the database. Because this job takes so long, we use multiprocessing to run parts of it in parallel.
Problem:
The top level script has a database connection, and when it spawns off child processes, it seems that the parent's connection is available to the children. Then there's an exception about how SET TRANSACTION ISOLATION LEVEL must be called before a query. Research has indicated that this is due to trying to use the same database connection in multiple processes. One thread I found suggested calling connection.close() at the start of the child processes so that Django will automatically create a new connection when it needs one, and therefore each child process will have a unique connection - i.e. not shared. This didn't work for me, as calling connection.close() in the child process caused the parent process to complain that the connection was lost.
Other Findings:
Some stuff I read seemed to indicate you can't really do this, and that multiprocessing, mod_wsgi, and Django don't play well together. That just seems hard to believe I guess.
Some suggested using celery, which might be a long term solution, but I am unable to get celery installed at this time, pending some approval processes, so not an option right now.
Found several references on SO and elsewhere about persistent database connections, which I believe to be a different problem.
Also found references to psycopg2.pool and pgpool and something about bouncer. Admittedly, I didn't understand most of what I was reading on those, but it certainly didn't jump out at me as being what I was looking for.
Current "Work-Around":
For now, I've reverted to just running things serially, and it works, but is slower than I'd like.
Any suggestions as to how I can use multiprocessing to run in parallel? Seems like if I could have the parent and two children all have independent connections to the database, things would be ok, but I can't seem to get that behavior.
Thanks, and sorry for the length!
Multiprocessing copies connection objects between processes because it forks processes, and therefore copies all the file descriptors of the parent process. That being said, a connection to the SQL server is just a file, you can see it in linux under /proc//fd/.... any open file will be shared between forked processes. You can find more about forking here.
My solution was just simply close db connection just before launching processes, each process recreate connection itself when it will need one (tested in django 1.4):
from django import db
db.connections.close_all()
def db_worker():
some_paralell_code()
Process(target = db_worker,args = ())
Pgbouncer/pgpool is not connected with threads in a meaning of multiprocessing. It's rather solution for not closing connection on each request = speeding up connecting to postgres while under high load.
Update:
To completely remove problems with database connection simply move all logic connected with database to db_worker - I wanted to pass QueryDict as an argument... Better idea is simply pass list of ids... See QueryDict and values_list('id', flat=True), and do not forget to turn it to list! list(QueryDict) before passing to db_worker. Thanks to that we do not copy models database connection.
def db_worker(models_ids):
obj = PartModelWorkerClass(model_ids) # here You do Model.objects.filter(id__in = model_ids)
obj.run()
model_ids = Model.objects.all().values_list('id', flat=True)
model_ids = list(model_ids) # cast to list
process_count = 5
delta = (len(model_ids) / process_count) + 1
# do all the db stuff here ...
# here you can close db connection
from django import db
db.connections.close_all()
for it in range(0:process_count):
Process(target = db_worker,args = (model_ids[it*delta:(it+1)*delta]))
When using multiple databases, you should close all connections.
from django import db
for connection_name in db.connections.databases:
db.connections[connection_name].close()
EDIT
Please use the same as #lechup mentionned to close all connections(not sure since which django version this method was added):
from django import db
db.connections.close_all()
For Python 3 and Django 1.9 this is what worked for me:
import multiprocessing
import django
django.setup() # Must call setup
def db_worker():
for name, info in django.db.connections.databases.items(): # Close the DB connections
django.db.connection.close()
# Execute parallel code here
if __name__ == '__main__':
multiprocessing.Process(target=db_worker)
Note that without the django.setup() I could not get this to work. I am guessing something needs to be initialized again for multiprocessing.
I had "closed connection" issues when running Django test cases sequentially. In addition to the tests, there is also another process intentionally modifying the database during test execution. This process is started in each test case setUp().
A simple fix was to inherit my test classes from TransactionTestCase instead of TestCase. This makes sure that the database was actually written, and the other process has an up-to-date view on the data.
Another way around your issue is to initialise a new connection to the database inside the forked process using:
from django.db import connection
connection.connect()
(not a great solution, but a possible workaround)
if you can't use celery, maybe you could implement your own queueing system, basically adding tasks to some task table and having a regular cron that picks them off and processes? (via a management command)
Hey I ran into this issue and was able to resolve it by performing the following (we are implementing a limited task system)
task.py
from django.db import connection
def as_task(fn):
""" this is a decorator that handles task duties, like setting up loggers, reporting on status...etc """
connection.close() # this is where i kill the database connection VERY IMPORTANT
# This will force django to open a new unique connection, since on linux at least
# Connections do not fare well when forked
#...etc
ScheduledJob.py
from django.db import connection
def run_task(request, job_id):
""" Just a simple view that when hit with a specific job id kicks of said job """
# your logic goes here
# ...
processor = multiprocessing.Queue()
multiprocessing.Process(
target=call_command, # all of our tasks are setup as management commands in django
args=[
job_info.management_command,
],
kwargs= {
'web_processor': processor,
}.items() + vars(options).items()).start()
result = processor.get(timeout=10) # wait to get a response on a successful init
# Result is a tuple of [TRUE|FALSE,<ErrorMessage>]
if not result[0]:
raise Exception(result[1])
else:
# THE VERY VERY IMPORTANT PART HERE, notice that up to this point we haven't touched the db again, but now we absolutely have to call connection.close()
connection.close()
# we do some database accessing here to get the most recently updated job id in the database
Honestly, to prevent race conditions (with multiple simultaneous users) it would be best to call database.close() as quickly as possible after you fork the process. There may still be a chance that another user somewhere down the line totally makes a request to the db before you have a chance to flush the database though.
In all honesty it would likely be safer and smarter to have your fork not call the command directly, but instead call a script on the operating system so that the spawned task runs in its own django shell!
If all you need is I/O parallelism and not processing parallelism, you can avoid this problem by switch your processes to threads. Replace
from multiprocessing import Process
with
from threading import Thread
The Thread object has the same interface as Procsess
If you're also using connection pooling, the following worked for us, forcibly closing the connections after being forked. Before did not seem to help.
from django.db import connections
from django.db.utils import DEFAULT_DB_ALIAS
connections[DEFAULT_DB_ALIAS].dispose()
One possibility is to use multiprocessing spawn child process creation method, which will not copy django's DB connection details to the child processes. The child processes need to bootstrap from scratch, but are free to create/close their own django DB connections.
In calling code:
import multiprocessing
from myworker import work_one_item # <-- Your worker method
...
# Uses connection A
list_of_items = djago_db_call_one()
# 'spawn' starts new python processes
with multiprocessing.get_context('spawn').Pool() as pool:
# work_one_item will create own DB connection
parallel_results = pool.map(work_one_item, list_of_items)
# Continues to use connection A
another_db_call(parallel_results)
In myworker.py:
import django. # <-\
django.setup() # <-- needed if you'll make DB calls in worker
def work_one_item(item):
try:
# This will create a new DB connection
return len(MyDjangoModel.objects.all())
except Exception as ex:
return ex
Note that if you're running the calling code inside a TestCase, mocks will not be propagated to the child processes (will need to re-apply them).
You could give more resources to Postgre, in Debian/Ubuntu you can edit :
nano /etc/postgresql/9.4/main/postgresql.conf
by replacing 9.4 by your postgre version .
Here are some useful lines that should be updated with example values to do so, names speak for themselves :
max_connections=100
shared_buffers = 3000MB
temp_buffers = 800MB
effective_io_concurrency = 300
max_worker_processes = 80
Be careful not to boost too much these parameters as it might lead to errors with Postgre trying to take more ressources than available. Examples above are running fine on a Debian 8GB Ram machine equiped with 4 cores.
Overwrite the thread class and close all DB connections at the end of the thread. Bellow code works for me:
class MyThread(Thread):
def run(self):
super().run()
connections.close_all()
def myasync(function):
def decorator(*args, **kwargs):
t = MyThread(target=function, args=args, kwargs=kwargs)
t.daemon = True
t.start()
return decorator
When you need to call a function asynchronized:
#myasync
def async_function():
...