Gunicorn not responding - django

I'm using Gunicorn to serve a Django application, it was working alright till I changed its timeout from 30s to 900000s, I had to do this because I had a usecase in which a huge file needed to get uploaded and processed (process taking more than 30m in some cases) but after this change Gunicorn goes unresponsive after few hours, I guess the problem is all workers (being 30) will be busy with some requests after this amount of time, the weird thing is it happens even if I don't run that long request at all and it happens with normal exploring in django admin. I wanna know if there's a way to monitor requests on gunicorn and see workers are busy with what requests, I wanna find out the requests that's making them busy. I tried --log-file=- --log-level=debug but it doesn't tell anything about requests, I need more detailed logs.

From the latest Gunicorn docs, on the cmd line/script you can use:
--log-file - ("-" means log to stderr)
--log-level debug
or in the config file you can use:
errorlog = '-'
accesslog = '-'
loglevel = 'debug'
but there is no mention of the parameter format you specified in your question:
--log-file=-
--log-level=debug
The default logging for access is 'None' (ref), so shot in the dark, but this may explain why are not receiving detailed log information. Here is a sample config file from the Gunicorn source, and the latest Gunicorn config docs.
Also, you may want to look into changing your logging configuration in Django.

While I am also looking for a good answer for how to see how many workers are busy, you are solving this problem the wrong way. For a task that takes that long you need a worker, like Celery/RabbitMQ, to do the heavy lifting asynchronously, while your request/response cycle remains fast.
I have a script on my site that can take 5+ minutes to complete, and here's a good pattern I'm using:
When the request first comes in, spawn the task and simply return a HTTP 202 Accepted. Its purpose is "The request has been accepted for processing, but the processing has not been completed."
Have your front-end poll the same endpoint (every 30 seconds should suffice). As long as the task is not complete, return a 202
Eventually when it finishes return a 200, along with any data the front-end might need.
For my site, we want to update data that is more than 10 minutes old. I pass the Accept-Datetime header to indicate how old of data is acceptable. If our local cached copy is older than that, we spawn the task cycle.

Related

gUnicorn/Flask/GAE - two processes started for processing the same http request

I have an app on Google AppEngine (Python39 standard env) running on gUnicorn and Flask. I'm making a request to the server from client-side app for a long-running operation and seeing that the request processed twice. The second process (worker) started after a while (a hour and a half) after the first one has been working.
I'm not sure is it related to gUnicorn specifically or to GAE.
The server controller has logging at the beginning :
#app.route("/api/campaign/generate", methods=["GET"])
def campaign_generate():
logging.info('Entering campaign_generate');
# some very long processing here
The controller is called by clicking a button from the UI app. I checked the network in DevTools in the browser that only one request fired. And I can see that there's only one request in server logs at the moment of executing of workers (more on this follow).
The whole app.yaml is like this:
runtime: python39
default_expiration: 0
instance_class: B2
basic_scaling:
max_instances: 1
entrypoint: gunicorn -b :$PORT server.server:app --timeout 0 --workers 2
So I have 2 workers with infinite timeouts, basic scaling with max instances = 1.
I expect while the app is processing one request for a long-running operation, another worker is available for serving.
I don't expect the second worker will used to processing the same request, it's a nonsense (if only the user won't start another operation from another browser).
Thanks to timeout=0 I expect gUnicorn will wait indefinitely till the controller finishes. And only one thing that can hinder is GAE'e timeout. But thanks to basic-scaling it's 24 hours. So I expect the app should process requests for several hours without problem.
But what I'm seeing instead is that after the processing the request for a while another execution is started. Here's simplified logs I see in Cloud Logging:
13:00:58 GET /api/campaign/generate
13:00:59 Entering campaign_generate
..skipped
13:39:13 Starting generating zip-archive (it's something that takes a while)
14:25:49 Entering campaign_generate
So, at 14:25, 1:25 after the current request came another processing of the same request started!
And now there're two request processings running in parallel.
Needless to say that this increase memory pressure and doubles execution time.
When the first "worker" finished (14:29:28 in our example) its processing, its result isn't being returned to the client. It looks like gUnicorn or GAE simply abandoned the first request. And the client has to wait till the second worker finishes processing.
Why is it happening?
And how can I fix it?
Regarding http requests records in the log.
I did see only one request in Cloud Logging (the first one) when the processing was active, and even after the controller was called for the second time ('Entering campaign_generate' in logs appeared) there was not any new GET-request in the logs. But after that everything completed (actually the second processing returned a response) a mysterious second GET-request appeared. So technically after everything is done, from the server logs' view (Cloud Logging) it looks like there were two subsequent requests from the client. But there weren't! There was only one, and I can see it in the browser's DevTools.
Those two requests have different traceId and requestId http headers.
It's very hard to understand what's going on, I tried running the app locally (on the same data) but it works as intended.

Best wsgi service for handling webhooks with few resources

Currently working on a Virtual server with 2 CPU's 4GB of ram. I am running a Flask + uwsgi + nginx to host the webserver. I need the server to be capable of accepting about 10 out of 2500-ish the requests a day. The requests that don't pass average about 2ms yet the queue is consistently backed up. The issue I have been encountering lately is both speed and duplication when it does work. As the accepted webhooks are sent to another server and I will get duplicates or completely miss a bunch.
[uwsgi]
module = wsgi
master = true
processes = 4
enable-threads = true
threads = 2
socket = API.sock
chmod-socket = 660
vacuum = true
harakiri = 10
die-on-term = true
This is my current .ini file I have messed around with harakiri and have read countless hours through the uwsgi documentation trying different things it is unbelievably frustrating.
Picture of Systemctl status API
The check for it looks similar to this redacted some info.
#app.route('/api', methods=['POST'])
def handle_callee():
authorization = request.headers.get('authorization')
if authorization == SECRET and check_callee(request.json):
data = request.json
name = data["payload"]["object"]["caller"]["name"]
create_json(name, data)
return 'success', 200
else:
return 'failure', 204
The json is then parsed through a number of functions. This is my first time deploying a wsgi service and I don't know if my configuration is incorrect. I've poured hours of research into trying to fix this. Should I try switching to gunicorn. I have asked this question differently a couple of days ago but to no avail. Trying to put more context in hopes someone could point me in the right direction. I don't even know in the systemctl status whether the | req: 12/31 is how many it's done thus far and what's queued for that PID. Any insight into this situation would make my week. I've been unable to fix this for about 2 weeks of trying different configs increasing working, processes, messing with the harakiri, disabling logging. But none of this has proved to get the requests to process at a speed that I desire.
Thank you to anyone who took the time to read this, I am still learning and have tried to add as much context as possible. If you need more I will gladly respond. I just can't wrap my head around this issue.
You would need to take a systematic approach in figuring out:
How many requests per second can you handle
What are your apps bottlenecks and scaling factors
Cloudbees have written a great article on performance tuning for uwsgi + flask + nginx.
To give an overview of the steps to tune your service here is what it might look like:
First, you need to make sure you have the required tooling, particularly a benchmarking tool like Apache Bench, k6, etc.
Establish a base. This means that you configure your application with the minimum setup to run i.e. single thread and single process, no multi-threading. Run the benchmark and record the results.
Start tweaking the setup. Add threads, processes, etc.
Benchmark after the tweaks.
Repeat steps 2 & 3 until you see the upper limits, and understand the service characteristics - are you CPU/IO bound?
Try changing the hardware/vm, as some offerings come with penalties in performance due to shared CPU with other tenants, bandwidth, etc.
Tip: Try to run the benchmark tool from a different system than the one you are benchmarking, since it also consumes resources and loads the system further.
In your code sample you have two methods create_json(name, data) and check_callee(request.json), do you know their performance?
Note: Can't comment so had to write this as an answer.

Set Django Rest Framework endpoint a timeout for a specific view

I'm running Django 4.0.5 + Django Rest Framework + Nginx + Gunicorn
Sometimes, I'm going to need to handle some POST requests with a lot of data to process.
The user will wait for a "ok" or "fail" response and a list of ids resulting from the process.
Everything works fine so far for mid size body requests (this is subjective), but when I get into big ones, the process will take 1min+.
It's in these cases when I get a 500 error response from DRF, but my process in the background will keep running till the end (but user will not know it finished successfully).
I was doing some investigation and changed the Gunicorn timeout parameter (to 180), but didn't change the behavior in the service.
Is there a way to set a timeout larger than 60s at the #api_view or somewhere else?
Use celery async tasks to process such requests as background tasks.

How can my Heroku Flask web application support N users concurrently downloading an image file?

I am working on a Flask web application using Heroku. As part of the application, users can request to download an image from the server. That calls a function which has to then retrieve multiple images from my cloud storage (about 500 KB in total), apply some formatting, and return a single image (about 60 KB). It looks something like this:
#app.route('/download_image', methods=['POST'])
def download_image():
# Retrieve about 500 KB of images from cloud storage
base_images = retrieve_base_images(request.form)
# Apply image formatting into a single image
formatted_image = format_images(base_images)
# Return image of about 60 KB for download
formatted_image_file = io.BytesIO()
formatted_image.save(formatted_image_file, format='JPEG')
formatted_image_data = formatted_image_file.getvalue()
return Response(formatted_image_data,
mimetype='image/jpeg',
headers={'Content-Disposition': 'attachment;filename=download.jpg'})
My Procfile is
web: gunicorn my_app:app
How can I design/configure this to support N concurrent users? Let's say, for example, I want to make sure my application can support 100 different users all requesting to download an image at the same time. With several moving parts, I am unsure how to even go about doing this.
Also, if someone requests a download but then loses internet connection before their download is complete, would this cause some sort of lock that could endlessly stall, or would that thread/process automatically timeout after a short period and therefore be handled smoothly?
I currently have 1 dyno (on the Heroku free plan). I am willing to add more dynos if needed.
Run multiple Gunicorn workers:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
…
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
Note that Heroku sets a default WEB_CONCURRENCY for you based on your dyno size. You can probably handle a small number of concurrent requests right now.
However, you're not going to get anywhere close to 100 on a free dyno. This section appears between the previous two in the documentation:
Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–4 Gunicorn worker processes on a free, hobby or standard-1x dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.
Even if your application is very lightweight you probably won't be able to go above 6 workers on a single small dyno. Adding more dynos and / or increasing the number of dynos you run will be required.
Do you really need to support 100 concurrent requests? If you have four workers going, four users' requests can be served at the same time. If a fifth makes a request, that request just won't get responded to until one of the workers frees up. That's usually reasonable.
If your request takes an unreasonable amount of time to complete you have a few options besides adding more workers:
Can you cache the generated images?
Can you return a response immediately, create the images in a background job, and then notify the user that the images are ready? With some fancy front-end work this can be fairly transparent to the end user.
The right solution will depend on your specific use case. Good luck!

Returning the result of celery task to the client in Django template

So I'm trying to accomplish the following. User browses webpage and at the sime time there is a task running in the background. When the task completes it should return args where one of args is flag: True in order to trigger a javascript and javascript shows a modal form.
I tested it before without async tasks and it works, but now with celery it just stores results in database. I did some research on tornado-celery and related stuff but some of components like tornado-redis is not mantained anymore so it would not be vise in my opinion to use that.
So what are my options, thanks?
If I understand you correctly, then you want to communicate something from the server side back to the client. You generally have three options for that:
1) Make a long pending request to the server - kinda bad. Jumping over the details, it will bog down your web server if not configured to handle that, it will make your site score low on performance tests and if the request fails, everything fails.
2) Poll the server with numerous requests with a time interval (0.2 s, something like that) - better. It will increase the traffic, but the requests will be tiny and will not interfere with the site's performance very much. If you instate a long interval to not load the server with pointless requests, then the users will see the data with a bit of a delay. On the upside this will not fail (if written correctly) even if the connection is interrupted.
3) Websockets where the server can just hit the client with any message whenever needed - nice, but takes some time to get used to. If you want to try, you can use django-channels which is a nice library for Django websockets.
If I did not understand you correctly and this is not the problem at hand and you are figuring how to get data back from a Celery task to Django, then you can store the Celery task ID-s and use the ID-s to first check, if the task is completed and then query the data from Celery.