gUnicorn/Flask/GAE - two processes started for processing the same http request - flask

I have an app on Google AppEngine (Python39 standard env) running on gUnicorn and Flask. I'm making a request to the server from client-side app for a long-running operation and seeing that the request processed twice. The second process (worker) started after a while (a hour and a half) after the first one has been working.
I'm not sure is it related to gUnicorn specifically or to GAE.
The server controller has logging at the beginning :
#app.route("/api/campaign/generate", methods=["GET"])
def campaign_generate():
logging.info('Entering campaign_generate');
# some very long processing here
The controller is called by clicking a button from the UI app. I checked the network in DevTools in the browser that only one request fired. And I can see that there's only one request in server logs at the moment of executing of workers (more on this follow).
The whole app.yaml is like this:
runtime: python39
default_expiration: 0
instance_class: B2
basic_scaling:
max_instances: 1
entrypoint: gunicorn -b :$PORT server.server:app --timeout 0 --workers 2
So I have 2 workers with infinite timeouts, basic scaling with max instances = 1.
I expect while the app is processing one request for a long-running operation, another worker is available for serving.
I don't expect the second worker will used to processing the same request, it's a nonsense (if only the user won't start another operation from another browser).
Thanks to timeout=0 I expect gUnicorn will wait indefinitely till the controller finishes. And only one thing that can hinder is GAE'e timeout. But thanks to basic-scaling it's 24 hours. So I expect the app should process requests for several hours without problem.
But what I'm seeing instead is that after the processing the request for a while another execution is started. Here's simplified logs I see in Cloud Logging:
13:00:58 GET /api/campaign/generate
13:00:59 Entering campaign_generate
..skipped
13:39:13 Starting generating zip-archive (it's something that takes a while)
14:25:49 Entering campaign_generate
So, at 14:25, 1:25 after the current request came another processing of the same request started!
And now there're two request processings running in parallel.
Needless to say that this increase memory pressure and doubles execution time.
When the first "worker" finished (14:29:28 in our example) its processing, its result isn't being returned to the client. It looks like gUnicorn or GAE simply abandoned the first request. And the client has to wait till the second worker finishes processing.
Why is it happening?
And how can I fix it?
Regarding http requests records in the log.
I did see only one request in Cloud Logging (the first one) when the processing was active, and even after the controller was called for the second time ('Entering campaign_generate' in logs appeared) there was not any new GET-request in the logs. But after that everything completed (actually the second processing returned a response) a mysterious second GET-request appeared. So technically after everything is done, from the server logs' view (Cloud Logging) it looks like there were two subsequent requests from the client. But there weren't! There was only one, and I can see it in the browser's DevTools.
Those two requests have different traceId and requestId http headers.
It's very hard to understand what's going on, I tried running the app locally (on the same data) but it works as intended.

Related

Continue request django rest framework

I have a request that lasts more than 3 minutes, I want the request to be sent and immediately give the answer 200 and after the end of the work - give the result
The workflow you've described is called asynchronous task execution.
The main idea is to remove time or resource consuming parts of work from the code that handles HTTP requests and deligate it to some kind of worker. The worker might be a diffrent thread or process or even a separate service that runs on a different server.
This makes your application more responsive, as the users gets the HTTP response much quicker. Also, with this approach you can display such UI-friendly things as progress bars and status marks for the task, create retrial policies if task failes etc.
Example workflow:
user makes HTTP request initiating the task
the server creates the task, adds it to the queue and returns the HTTP response with task_id immediately
the front-end code starts ajax polling to get the results of the task passing task_id
the server handles polling HTTP requests and gets status information for this task_id. It returns the info (whether results or "still waiting") with the HTTP response
the front-end displays spinner if server returns "still waiting" or the results if they are ready
The most popular way to do this in Django is using the celery disctributed task queue.
Suppose a request comes, you will have to verify it. Then send response and use a mechanism to complete the request in the background. You will have to be clear that the request can be completed. You can use pipelining, where you put every task into pipeline, Django-Celery is an option but don't use it unless required. Find easy way to resolve the issue

Set Django Rest Framework endpoint a timeout for a specific view

I'm running Django 4.0.5 + Django Rest Framework + Nginx + Gunicorn
Sometimes, I'm going to need to handle some POST requests with a lot of data to process.
The user will wait for a "ok" or "fail" response and a list of ids resulting from the process.
Everything works fine so far for mid size body requests (this is subjective), but when I get into big ones, the process will take 1min+.
It's in these cases when I get a 500 error response from DRF, but my process in the background will keep running till the end (but user will not know it finished successfully).
I was doing some investigation and changed the Gunicorn timeout parameter (to 180), but didn't change the behavior in the service.
Is there a way to set a timeout larger than 60s at the #api_view or somewhere else?
Use celery async tasks to process such requests as background tasks.

GAE - how to avoid service request timing out after 1 day

As I explained in this post, I'm trying to scrape tweets from Twitter.
I implemented the suggested solution with services, so that the actual heavy lifting happens in the backend.
The problem is that after about one day, I get this error
"Process terminated because the request deadline was exceeded. (Error code 123)"
I guess this is because the manual scaling has the requests timing out after 24 hours.
Is it possible to make it run for more than 24 hours?
You can't make a single request / task run for more than 24 hours but you can split up your request into different parts and each lasting a day. It's unwise to have a request run indefinitely that's why app engine closes them after a certain time to prevent idling / loopy request that lasts indefinitely.
I would recommend having your task fire a call at the end to trigger the queuing of the next task, that way it's automatic and you don't have to queue a task daily. Make sure there's some cursor or someway for your task to communicate progress so it won't duplicate work.

Returning the result of celery task to the client in Django template

So I'm trying to accomplish the following. User browses webpage and at the sime time there is a task running in the background. When the task completes it should return args where one of args is flag: True in order to trigger a javascript and javascript shows a modal form.
I tested it before without async tasks and it works, but now with celery it just stores results in database. I did some research on tornado-celery and related stuff but some of components like tornado-redis is not mantained anymore so it would not be vise in my opinion to use that.
So what are my options, thanks?
If I understand you correctly, then you want to communicate something from the server side back to the client. You generally have three options for that:
1) Make a long pending request to the server - kinda bad. Jumping over the details, it will bog down your web server if not configured to handle that, it will make your site score low on performance tests and if the request fails, everything fails.
2) Poll the server with numerous requests with a time interval (0.2 s, something like that) - better. It will increase the traffic, but the requests will be tiny and will not interfere with the site's performance very much. If you instate a long interval to not load the server with pointless requests, then the users will see the data with a bit of a delay. On the upside this will not fail (if written correctly) even if the connection is interrupted.
3) Websockets where the server can just hit the client with any message whenever needed - nice, but takes some time to get used to. If you want to try, you can use django-channels which is a nice library for Django websockets.
If I did not understand you correctly and this is not the problem at hand and you are figuring how to get data back from a Celery task to Django, then you can store the Celery task ID-s and use the ID-s to first check, if the task is completed and then query the data from Celery.

Gunicorn not responding

I'm using Gunicorn to serve a Django application, it was working alright till I changed its timeout from 30s to 900000s, I had to do this because I had a usecase in which a huge file needed to get uploaded and processed (process taking more than 30m in some cases) but after this change Gunicorn goes unresponsive after few hours, I guess the problem is all workers (being 30) will be busy with some requests after this amount of time, the weird thing is it happens even if I don't run that long request at all and it happens with normal exploring in django admin. I wanna know if there's a way to monitor requests on gunicorn and see workers are busy with what requests, I wanna find out the requests that's making them busy. I tried --log-file=- --log-level=debug but it doesn't tell anything about requests, I need more detailed logs.
From the latest Gunicorn docs, on the cmd line/script you can use:
--log-file - ("-" means log to stderr)
--log-level debug
or in the config file you can use:
errorlog = '-'
accesslog = '-'
loglevel = 'debug'
but there is no mention of the parameter format you specified in your question:
--log-file=-
--log-level=debug
The default logging for access is 'None' (ref), so shot in the dark, but this may explain why are not receiving detailed log information. Here is a sample config file from the Gunicorn source, and the latest Gunicorn config docs.
Also, you may want to look into changing your logging configuration in Django.
While I am also looking for a good answer for how to see how many workers are busy, you are solving this problem the wrong way. For a task that takes that long you need a worker, like Celery/RabbitMQ, to do the heavy lifting asynchronously, while your request/response cycle remains fast.
I have a script on my site that can take 5+ minutes to complete, and here's a good pattern I'm using:
When the request first comes in, spawn the task and simply return a HTTP 202 Accepted. Its purpose is "The request has been accepted for processing, but the processing has not been completed."
Have your front-end poll the same endpoint (every 30 seconds should suffice). As long as the task is not complete, return a 202
Eventually when it finishes return a 200, along with any data the front-end might need.
For my site, we want to update data that is more than 10 minutes old. I pass the Accept-Datetime header to indicate how old of data is acceptable. If our local cached copy is older than that, we spawn the task cycle.