As I understand, using Flask without Celery will block the server availability when a user starts a long operation.
My web server is actually not exposed to the internet and the maximum amount of users that will be connected at a time will be 3 (its of internal use for invoking automation scripts).
I have created a test environment in order to check how Flask is handling several calls at a time.
I have created 'task1' and 'task2' and run a loop with print statement + sleep in order to block the main thread for several seconds.
It seems like its not really blocking the main thead!!!
I can run 'task1' start to see the output for every loop and then run 'task2' and see the output of 'task2' together with task one.
I checked the limit and it seems like I can run 7 tasks without blocking.
How is that possible? according to the above, I dont need to use Celery in my organization since I will have no scenario that 2 users will run more then 2 tasks at a time.
Can someone explain why 'task1' is not blocking the starting of 'task2'?
#app.route('/task1', methods=['POST'])
def task1():
for i in range(4):
print('task 1 - ' + str(i))
time.sleep(1)
return 'message'
#app.route('/task2', methods=['POST'])
def task2():
for i in range(5):
print('task 2 - ' + str(i))
time.sleep(1)
return 'message'
<script >
function runTask(){
document.getElementById('task').value = "this is a value"
let req = $.ajax({
url : '/task1',
type : 'POST', // post request
data : { }
});
req.done(function (data) {
});
}
function runLongerTask(){
document.getElementById('longer_task').value = "this is longer value"
let req = $.ajax({
url : '/task2',
type : 'POST', // post request
data : { }
});
req.done(function (data) {
});
}
</script>
I expected 'task1' to start only when 'task2' will finish but it seems like the two tasks is running in threads (without actually configuring a thread)
Here is the results that I got:
task 2 - 0
task 1 - 0
task 2 - 1
task 1 - 1
task 2 - 2
task 1 - 2
task 2 - 3
task 1 - 3
task 2 - 4
As I understand, using Flask without Celery will block the server availability when a user starts a long operation.
This is not precisely correct, although it's a good rule of thumb to keep heavy workloads out of your webserver for lots of reasons.
You haven't described how you are running flask - with a WSGI container, or the run options. I'd look there to understand how concurrency is configured.
Related
We are using GCP Workflows to do some API calls for status check every n second via http.post call.
Everything was fine till recently all of our workflows started failing with internal error:
{"message":"ResourceLimitError: Memory usage limit exceeded","tags":["ResourceLimitError"]}
I found out, that when we are using GET with query params, it's failure happens a bit later than the same for POST and body.
Here is the testing workflow:
main:
steps:
- init:
assign:
- i: 0
- body:
foo: 'thisismyhorsemyhorseisamazing'
- doRequest:
call: http.request
args:
url: https://{my-location-and-project-id}.cloudfunctions.net/workflow-test
method: GET
query: ${body}
result: res
- sleepForOneSecond:
call: sys.sleep
args:
seconds: 1
- logCounter:
call: sys.log
args:
text: ${"Iteration - " + string(i)}
severity: INFO
- increaseCounter:
assign:
- i: ${i + 1}
- checkIfFinished:
switch:
- condition: ${i < 500}
next: doRequest
next: returnOutput
- returnOutput:
return: ${res.body}
It can do up to 37 requests with GET and 32 with POST and then execution stops with an error. And that numbers don't change.
For reference, Firebase function on POST and GET returns 200 with next JSON:
{
"bar": "thisismyhorsemyhorseisamazing",
"fyz": [],
}
Any ideas what goes wrong there? I don't think that 64Kb quota for variables is exceeded there. It shouldn't be calculated as a sum of all assignments, should it?
This looks like an issue with the product, I found this Google tracker, This issue was reported.
It is better continue over the public issue tracker.
Currently my Flask app only processes one request at one time. Any request has to wait for previous request to finish before being processed and it is not a good user experience.
While I do not want to increase the number of requests the Flask app can processed at one time, how is it possible to return a Server Busy message immediately when the next request comes in before the previous request finishes?
I have tried out the threading method below, but can only get both 'Server busy message' and "Proper return message" after 10 seconds.
import threading
from contextlib import ExitStack
busy = threading.Lock()
#app.route("/hello")
def hello():
if busy.acquire(timeout = 1):
return 'Server busy message'
with ExitStack() as stack:
stack.callback(busy.release)
# simulate heavy processing
time.sleep(10)
return "Proper return message"
I need to continuously get data from a MySQL database which gets data with an update frequency of around 200 ms. I need to continuously update the data value on the dashboard text field.My dashboard is built on Django.
I have read a lot about Channels but all the tutorials are about chat applications. I know that I need to implement WebSockets which will basically have an open connection and get the data. With the chat application, it makes sense but I haven't come across anything which talks about MySQL database.
I also read about mysql-events. Since the data which is getting in the table is from an external sensor, I don't understand how I can monitor a table inside Django i.e whenever a new row is added in the table, I need to get that new inserted based on a column value.
Any ideas on how to go about it? I have gone through a lot of articles and I couldnt find something specific to this requirement.
Thanks to Timothee Legros answer, it kinda helped me move along in the right direction.
Everywhere on the internet, it says that Django channels is/can be used for real-time applications, but nowhere it talks about the exact implementation(other than chat applications).
I used Celery, Django Channels and Celery's Beat to accomplish the task and it works as expected.
There are three parts to it. Setting up channel's, then creating a celery task, calling it periodically (with the help of Celery Beat) and then sending that task's output to channel's so that it can send that data to the websocket.
Channels
I followed the original tutorial on Channel's website and build up on that.
routing.py
from django.urls import re_path
from . import consumers
websocket_urlpatterns = [
re_path(r'ws/chat/(?P<room_name>\w+)/$', consumers.ChatConsumer),
re_path(r'ws/realtimeupdate/$', consumers.RealTimeConsumer),
]
consumers.py
class RealTimeConsumer(AsyncWebsocketConsumer):
async def connect(self):
self.channel_group_name = 'core-realtime-data'
# Join room group
await self.channel_layer.group_add(
self.channel_group_name,
self.channel_name
)
await self.accept()
async def disconnect(self, close_code):
# Leave room group
await self.channel_layer.group_discard(
self.channel_group_name,
self.channel_name
)
# Receive message from WebSocket
async def receive(self, text_data):
print(text_data)
pass
async def loc_message(self, event):
# print(event)
message_trans = event['message_trans']
message_tag = event['message_tag']
# print("sending data to websocket")
await self.send(text_data=json.dumps({
'message_trans': message_trans,
'message_tag': message_tag
}))
This class will basically send data to the websocket once it receives it. Above two will be specific to the app.
Now we will setup Celery.
In the project's base directory, where the setting file resides, we need to make three files.
celery.py This will init the celery.
routing.py This will be used to route the channel's websocket addresses.
task.py This is where we will setup the task
celery.py
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj_name.settings')
app = Celery('proj_name', backend='redis://localhost', broker='redis://localhost/')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
routing.py
from channels.auth import AuthMiddlewareStack
from channels.routing import ProtocolTypeRouter, URLRouter
from app_name import routing
application = ProtocolTypeRouter({
# (http->django views is added by default)
'websocket': AuthMiddlewareStack(
URLRouter(
routing.websocket_urlpatterns
)
),
})
tasks.py
#shared_task(name='realtime_task')
def RealTimeTask():
time_s = time.time()
result_trans = CustomModel_1.objects.all()
result_tag = CustomModel_2.objects.all()
result_trans_json = serializers.serialize('json', result_trans)
result_tag_json = serializers.serialize('json', result_tag)
# output = {"ktr": result_transmitter_json, "ktag": result_tag_json}
# print(output)
channel_layer = get_channel_layer()
message = {'type': 'loc_message',
'message_transmitter': result_trans_json,
'message_tag': result_tag_json}
async_to_sync(channel_layer.group_send)('core-realtime-data', message)
print(time.time()-time_s)
The task, after completing the task, sends the result back to the Channels, which in turn will relay it to the websocket.
Settings.py
# Channels
CHANNEL_LAYERS = {
'default': {
'BACKEND': 'channels_redis.core.RedisChannelLayer',
'CONFIG': {
"hosts": [('127.0.0.1', 6379)],
},
},
}
CELERY_BEAT_SCHEDULE = {
'task-real': {
'task': 'realtime_task',
'schedule': 1 # this means, the task will run itself every second
},
}
Now the only thing left is to create a websocket in the javascript file and start listening to it.
//Create web socket to receive data
const chatSocket = new WebSocket(
'ws://'
+ window.location.host
+ '/ws/realtimeupdate'
+ '/'
);
chatSocket.onmessage = function(e) {
const data = JSON.parse(e.data);
console.log(e.data + '\n');
// For trans
var arrayOfObjects = JSON.parse(data.message_trans);
//Do your thing
//For tags
var arrayOfObjects_tag = JSON.parse(data.message_tag);
//Do your thing
}
};
chatSocket.onclose = function(e) {
console.error('Chat socket closed unexpectedly');
};
To answer the MySQL usage, I am inserting data into the MySQL database from external sensor and in the tasks.py, am querying the table using Django ORM.
Overall, it does the intended work, populate a real-time dashboard with real-time data from MySQL . Am sure, there might be different and better approach to it, please let me know about it.
Your best bet if you need to constantly query your sql database would be to use Celery or dramatiq which is simpler/easier but less battle tested in combination with Django Channels.
Celery allows you to create workers (kind of like background processes) that you can send tasks (functions) to. When a worker receives a task it will execute. All this is done in the background. From the task that the worker is executing you can actually send data back through a websocket directly from the worker. This only works if you have django channels + channel layers enabled because when you enable channel layers, each consumer instance created when you open a channel/websocket will have a name that you can pass to the worker so that it knows which websocket to send the query data back to.
Here is what the flow of this process would look like:
Client requests to connect to your websocket
Consumer instance is created and with it a specific name for it
Consumer instance accepts connection
Consumer triggers celery task and passes the name
Worker begins polling your SQL databases every X seconds
When worker finds new entry use the name it was given and send the new entry back through the websocket.
I suggest reading django channels documentation on consumers and channel layers as well as celery or dramatiq tutorials to understand how those work. For all this to work you will also have to learn about Redis and a message queue service such as RabbitMQ. There is just too much to put in a simple answer but I can provide more information if you have specific questions.
Edit:
Get Redis Server Setup on your machine. If you are on Windows like me then you have to download WSL 2 and install Ubuntu from the Windows Store (free). This link can walk you through it.
Get RabbitMQ server setup. Follow their tutorial
Enable Django Channels and Django-Channel-layers and then setup Redis as your default Django-channels backend.
Setup Dramatiq or Celery. I prefer Dramatiq as it is basically a new and improved version of Celery albeit being less popular. It is much easier to setup and use. This is the github repo for Django-dramatiq and it will walk you through how to set it up. Note that just like when you launch your django server with python manage.py runserver you have to launch dramatiq workers with python manage.py rundramatiq before testing you website.
Create a tasks.py file in your django app and inside of that task implement your code to check MySQL database for new entries. If you haven't figured that out already here is the link to get started with that. In your tasks file you should have a function with the dramatiq.actor decorator on top so that dramatiq knows that the function is a task.
Build a django-channels consumer to handle WebSocket connections as well as allow you to send data through the WebSocket connection. This is what the standard consumer would look like:
class AsyncDashboardConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await self.accept()
async def disconnect(self, code):
await self.close()
async def receive_json(self, text_data=None, bytes_data=None, **kwargs):
someData = text_data['someData']
someOtherData = text_data['someOtherData']
if 'execute_getMySQLdata' in text_data['function']:
await self.getData(someData, someOtherData)
async def sendDataToClient(self, event):
await self.send(text_data=event['text'])
async def getData(self, someData, someOtherData):
sync_to_async(SQLData.send(self.channel_name, someData, someOtherData))
connect function is called when the client attempts to connect to the WebSocket URL that your routing file (in step 2) points to this consumer.
recieve_json function is called whenever the client sends data to your django server.
getData function is called from the recieve_json function and sends a message to start your dramatiq task that you created earlier to check SQL db. Note that when you send the message you must pass in self.channel_name as you use that channel_name to send data back through the WebSocket directly from the dramatiq worker/task.
sendDataToClient function is used when you send data back to the client. So when you send data from your task this is the function you must pass in as a callable.
To send data from the task you created earlier use this: async_to_sync(channel_layer.send)(channelName, {'type': 'sendData', 'text': jsonPayload}). Notice how you pass the channelName as well as the sendData function from your consumer.
Finally, this is what the javascript on the client side would look like:
let socket = new WebSocket("wss://javascript.info/article/websocket/demo/hello");
socket.onopen = function(e) {
alert("[open] Connection established");
alert("Sending to server");
socket.send("My name is John");
};
socket.onmessage = function(event) {
alert(`[message] Data received from server: ${event.data}`);
};
socket.onclose = function(event) {
if (event.wasClean) {
alert(`[close] Connection closed cleanly, code=${event.code} reason=${event.reason}`);
} else {
// e.g. server process killed or network down
// event.code is usually 1006 in this case
alert('[close] Connection died');
}
};
socket.onerror = function(error) {
alert(`[error] ${error.message}`);
};
This code came directly from this JavaScript WebSocket walkthrough.
This is how a basic web application with background workers would continually update information in real-time. There are probably other ways of doing this without background workers but since you want to get information as fast as possible as soon as it arrives it is better to have a background process that is continually checking for updates. On another note, the code above means that separate connections to the database are opened for each new client that connects but you can easily take advantage of django-channels groups and have one connection to your database that then just sends to all clients in certain groups.
Build a microservice for Websockets connections
Another way to implement such a feature - is to build a standalone WebSocket microservice.
Monolyth architecture isn't what you need here. Every WebSocket will open a connection to the Django (which will be behind reverse proxy and server: NGINX and Gunicorn ex.). If your client opens two tabs in the browser you will get 2 connections etc...
My recommendation is to modify the tech stack (yes, I'm a huge fan of Django, but there are many cool solutions in building WS):
Use Starlette ready for production framework with build-in WebSockets: https://www.starlette.io/websockets/
Use uvicorn.workers.UvicornWorker for Gunicorn to manage your ASGI application: this is only 1 line of code, like gunicorn -w 4 -k uvicorn.workers.UvicornWorker --log-level warning example:app
handle your WebSocket connections and use examples to request updates from the database: https://www.starlette.io/database/
Use super simple Javascript code to open the connection of the client-side and listen for updates.
So your models, templates, the view will be managed by Django.
Your WebSocket connections will be managed by Starlette in a native async way.
If you're interested in such an option I can make detailed instructions.
Relatively long-running task is delegated to celery workers, which are running separately, on another server.
However, results are added back to the relational database (table updated according to a task_descr.id as a key, see below), worker uses ignore_result.
Task requested from Flask application:
task = app.celery.send_task('tasks.mytask', [task_descr.id, attachments])
The problem is that tasks are requested while transaction is not yet closed on the Flask side. This causes race condition, because sometimes celery worker completes the task before the end of transaction in Flask app.
What is the proper way to send tasks after successful transaction only?
Or should the worker check task_descr.id availability before attempting conditional UPDATE and retry the task (this feels as too complex arrangement)?
Answer to Run function after a certain type of model is committed discusses similar situation, but here task sending is explicit, so no need to listen to the updates/inserts in some model.
One of the ways is Per-Request After-Request Callbacks, thanks to Armin Ronacher:
from flask import g
def after_this_request(func):
if not hasattr(g, 'call_after_request'):
g.call_after_request = []
g.call_after_request.append(func)
return func
#app.after_request
def per_request_callbacks(response):
for func in getattr(g, 'call_after_request', ()):
response = func(response)
return response
In my case the usage is in the form of a nested function:
task_desc = ...
attachments = ...
#...
#after_this_request
def send_mytask(response):
if response.status_code in {200, 302}:
task = app.celery.send_task('tasks.mytask', [task_descr.id, attachments])
return response
Not ideal, but works. My tasks are only for successfully served request, so I do not care of 500s or other error conditions.
I have a long running celery task which iterates over an array of items and performs some actions.
The task should somehow report back which item is it currently processing so end-user is aware of the task's progress.
At the moment my django app and celery seat together on one server, so I am able to use Django's models to report the status, but I am planning to add more workers which are away from Django, so they can't reach DB.
Right now I see few solutions:
Store intermediate results manually using some storage, like redis or mongodb making then available over the network. This worries me a little bit because if for example I will use redis then I should keep in sync the code on a Django side reading the status and Celery task writing the status, so they use the same keys.
Report status to the Django back from celery using REST calls. Like PUT http://django.com/api/task/123/items_processed
Maybe use Celery event system and create events like Item processed on which django updates the counter
Create a seperate worker which runs on a server with django which holds a task which only increases items proceeded count, so when the task is done with an item it issues increase_messages_proceeded_count.delay(task_id).
Are there any solution or hidden problems with the ones I mentioned?
There are probably many ways to achieve your goal, but here is how I would do it.
Inside your long running celery task set the progress using django's caching framework:
from django.core.cache import cache
#app.task()
def long_running_task(self, *args, **kwargs):
key = "my_task: %s" % self.result.id
...
# do whatever you need to do and set the progress
# using cache:
cache.set(key, progress, timeout="whatever works for you")
...
Then all you have to do is make a recurring AJAX GET request with that key and retrieve the progress from cache. Something along those lines:
def task_progress_view(request, *args, **kwargs):
key = request.GET.get('task_key')
progress = cache.get(key)
return HttpResponse(content=json.dumps({'progress': progress}),
content_type="application/json; charset=utf-8")
Here is a caveat though, if you are running your server as multiple processes, make sure that you are using something like memcached, because django's native caching will be inconsistent among the processes. Also I probably wouldn't use celery's task_id as a key, but it is sufficient for demonstration purpose.
Take a look at flower - a real-time monitor and web admin for Celery distributed task queue:
https://github.com/mher/flower#api
http://flower.readthedocs.org/en/latest/api.html#get--api-tasks
You need it for presentation, right? Flower works with websockets.
For instance - receive task completion events in real-time (taken from official docs):
var ws = new WebSocket('ws://localhost:5555/api/task/events/task-succeeded/');
ws.onmessage = function (event) {
console.log(event.data);
}
You would likely need to work with tasks ('ws://localhost:5555/api/tasks/').
I hope this helps.
Simplest:
Your tasks and django app already share access one or two data stores - the broker and the results backend (if you're using one that is different to the broker)
You can simply put some data into one or other of these data stores that indicates which item the task is currently processing.
e.g. if using redis simply have a key 'task-currently-processing' and store the data relevant to the item currenlty being processed in there.
You can use something like Swampdragon to reach the user from the Celery instance (you have to be able to reach it from the client thou, take care not to run afoul of CORS thou). It can be latched onto the counter, not the model itself.
lehins' solution looks good if you don't mind your clients repeatedly polling your backend. That may be fine but it gets expensive as the number of clients grows.
Artur Barseghyan's solution is suitable if you only need the task lifecycle events generated by Celery's internal machinery.
Alternatively, you can use Django Channels and WebSockets to push updates to clients in real-time. Setup is pretty straightforward.
Add channels to your INSTALLED_APPS and set up a channel layer. E.g., using a Redis backend:
CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels_redis.core.RedisChannelLayer",
"CONFIG": {
"hosts": [("redis", 6379)]
}
}
}
Create an event consumer. This will receive events from Channels and push them via Websockets to the client. For instance:
import json
from asgiref.sync import async_to_sync
from channels.generic.websocket import WebSocketConsumer
class TaskConsumer(WebsocketConsumer):
def connect(self):
self.task_id = self.scope['url_route']['kwargs']['task_id'] # your task's identifier
async_to_sync(self.channel_layer.group_add)(f"tasks-{self.task_id}", self.channel_name)
self.accept()
def disconnect(self, code):
async_to_sync(self.channel_layer.group_discard)(f"tasks-{self.task_id}", self.channel_name)
def item_processed(self, event):
item = event['item']
self.send(text_data=json.dumps(item))
Push events from your Celery tasks like this:
from asgiref.sync import async_to_sync
from channels.layers import get_channel_layer
...
async_to_sync(get_channel_layer.group_send)(f"tasks-{task.task_id}", {
'type': 'item_processed',
'item': item,
})
You can also write an async consumer and/or invoke group_send asynchronously. In either case you no longer need the async_to_sync wrapper.
Add websocket_urlpatterns to your urls.py:
websocket_urlpatterns = [
path(r'ws/tasks/<task_id>/', TaskConsumer.as_asgi()),
]
Finally, to consume events from JavaScript in your client, you can do something like this:
let task_id = 123;
let protocol = location.protocol === 'https:' ? 'wss://' : 'ws://';
let socket = new WebSocket(`${protocol}${window.location.host}/ws/tasks/${task_id}/`);
socket.onmessage = function(event) {
let data = JSON.parse(event.data);
let item = data.item;
// do something with the item (e.g., push it into your state container)
}