Use Nginx to prevent downstream timeout by sending blank lines - django

We have a setup where a CDN is calling Nginx which is calling a uwsgi server. Some of the requests take a lot of time for Django to handle, so we are relying on the CDN for caching. However, the CDN has a hard timeout of 30 seconds, which is unfortunately not configurable.
If we were able to send a blank line every few seconds before the request is received from the uwsgi server, it would mean that the CDN would not timeout. Is there a way to send a blank line every few seconds with Nginx until the response is received?

I see a few possibilities:
Update your Django app to work this way-- have /it/ start dribbling a response immediately.
Rework your design to avoid user's periodically having requests that take more than 30 seconds to respond. Use a frequent cron job to prime the cache on your backend server, so when the CDN asks for assets, they are already ready. Web servers can be configured to check for a static ".gz" versions of URLs, which might be a good fit here.
Configure Nginx to cache the requests. The first time the CDN requests the slow URL, it may timeout, but Nginx ought to eventually cache the result anyway. The next time the CDN asks, Nginx should have the cached response ready.

Related

Huge time latency between nginx and upstream django backdend

So we have a setup of nginx ingress controller as reverse proxy for a django based backend app in production (GKE k8s cluster). We have used opentelemetry to trace this entire stack(Signoz being the actual tool). One of our most critical api is validate-cart.
And we have observed that this api sometimes take a lotta time, like 10-20 seconds and even more. But if we look at the trace of one of such request in Signoz, the actual backend takes very less time like 100ms but the total trace starting from nginx shows 29+ seconds. As you can see from the screen shot attached.
And looking at the p99 latency, the nginx service has way bigger spikes than the order. This graph is populated for the same validate-cart api.
Have been banging my head around this for quite some time and i am still stuck.
I am assuimg there might be some case of request being queued at either nginx or django layer. But i am trusting otel libraries that are used to trace django, to start the trace the moment it hit django layer and since there isn't a big latency at django layer, issue might be at nginx layer. I have traced nginx using the third party module open-tracing since ingress-nginx doesn't yet support opentelemetry where more tracing information about nginx layer is provided.
Nginx opentracing

libcurl: send GET requests after timeout limit is reached

Problem:
OS: Ubuntu 20.04.1 LTS
When a target URL updates its content, recently libcurl has had unexpected polling delays / timeouts anywhere between 2 and 20+ seconds between sending a GET request to the target URL and receiving any response.
I have no idea what has been causing this behaviour, and have detailed all of the strace reports, tshark results, entire libcurl C++ program, attempts to diagnose, and other terminal outputs at the following SO question, but have had no luck in diagnosing this for about four months:
libcurl: abnormal GET response delays
There seems to be something between the client server and remote server that is stopping packets from being returned, but only when the page changes its content. During this polling delay / timeout, no other requests can be sent - therefore any new data uploaded on the remote server cannot be retrieved quickly.
This issue did not exist before mid-July 2021. Given that after four months this problem still hasn't been solved, I want to attempt a workaround that will still send requests to the target when this polling delay presents itself. I won't understand what caused the polling timeouts, but hopefully I will be able to retrieve the data without delays like the program used to do.
Target URL: https://ir.eia.gov/wpsr/table4.csv
Summary questions:
Q1. Is there a timeout option with libcurl that, when exceeded, the program does not exit but instead sends another GET request to the target URL?
Q2. Since this problem only arises when the target URL makes a scheduled content update, could there be a chance the target URL changes its IP address and thus there is some delay caused by a DNS resolution server in between the client and remote side on the return leg? I am going to attempt to use a tool like pingPlotter to see if there is a delay at some specific IP address between the outbound GET request and the response.
Before any scheduled page content changes, the latency between the outbound GET request and the response is <100 ms.

Django on Apache - Prevent 504 Gateway Timeout

I have a Django server running on Apache via mod_wsgi. I have a massive background task, called via a API call, that searches emails in the background (generally takes a few hours) that is done in the background.
In order to facilitate debugging - as exceptions and everything else happen in the background - I created a API call to run the task blocking. So the browser actually blocks for those hours and receives the results.
In localhost this is fine. However, in the real Apache environment, after about 30 minutes I get a 504 Gateway Timeout error.
How do I change the settings so that Apache allows - just in this debug phase - for the HTTP request to block for a few hours without returning a 504 Gateway Timeout?
I'm assuming this can be changed in the Apache configuration.
You should not be doing long running tasks within Apache processes, nor even waiting for them. Use a background task queueing system such as Celery to run them. Have any web request return as soon as it is queued and implement some sort of polling mechanism as necessary to see if the job is complete and results can be obtained.
Also, are you sure the 504 isn't coming from some front end proxy (explicit or transparent) or load balancer? There is no default timeout in Apache which is 30 minutes.

throttling http api calls with delay

I'm trying to implement some throttles on our REST API. A typical approach is after a certain threshold to block the request (with 403 or 429 response). However, I've seen one api that adds a delay to the response instead.
As you make calls to the API, we will be looking at your average calls per second (c/s) over the previous five-minute period. Here's what will happen:
over 3c/s and we add a 2 second delay
over 5c/s and we add a 4 second delay
over 7c/s and we add a 5 second delay
From the client's perspective, I see this being better than getting back an error. The worst that can happen is that you'll slow down.
I am wondering how this can be achieved without negatively impacting the app server. i.e. To add those delays, the server needs to keep the request open, causing it to keep more and more request processors busy, meaning it has less capacity for new requests coming in.
What's the best way to accomplish this? (i.e. is this something that can be done on the web server / load balancer so that the application server is not negatively affected? Is there some kind of a throttling layer that can be added for this purpose?)
We're using Django/Tastypie, but the question is more on the architecture/conceptual level.
If your are using synchronous application server which is the most common setup for Django applications (for example a gunicorn with default --worker-class sync), then adding such a delay in the application would indeed have a very bad impact on performance. A worker handling a delayed request would be blocked during a delay period.
But you can use asynchronous application server (for example gunicorn with '--worker-class gevent`) and then an overhead should be negligible. A worker that handles a delayed requests is able to handle other requests while a delay is in progress.
Doing this in the reverse proxy server may be a better option, because it allows to easily and flexibly adjust a policy. There is an external nginx module for exactly such thing.

nginx/gunicorn connection hanging for 60 seconds

I'm doing a HTTP POST request against a nginx->gunicorn->Django app. Response body comes back quickly, but the request doesn't finish completely for about 60 more seconds.
By "finish completely" I mean that various clients I've tried (Chrome, wget, Android app I'm building) indicate that request is still in progress, as if waiting for more data. Listening in from Wireshark I see that all data arrives quickly, then after 60 seconds ACK FIN finally comes.
The same POST request on local development server (./manage.py runserver) executes quickly. Also, it executes quickly against gunicorn directly, bypassing nginx. Also works quickly in Apache/mod_wsgi setup.
GET requests have no issues. Even other POST requests are fine. One difference I'm aware of is that this specific request returns 201 not 200.
I figure it has someting to do with Content-Length headers, closed vs keepalive connections, but don't yet know how things are supposed to work correctly.
Backend server (gunicorn) is currently closing connections, this makes sense.
Should backend server include Content-Length header, or Transfer-encoding: chunked? Or should nginx be able to cope without these, and add them as needed?
I assume connection keep-alive is good to have, and should not be disabled in nginx.
Update: Setting keepalive_timeout to 0 in nginx.conf fixes my problem. But, of course, keep-alive is gone. I'm still not sure what's the issue. Probably something in the stack (my Django app or gunicorn) doesn't implement chunked transfer correctly, and confuses clients.
It sounds like your upstream server (gunicorn) is somehow holding the connection open on that specific api call - I don't know why (depends on your code, I think), but the default proxy_read_timeout option in nginx is 60 seconds, so it sounds like this response isn't being received, for some reason.
I use a very similar setup and I don't notice any issues on POSTs generally, or any other requests.
Note that return HttpResponse(status=201) has caused me issues before - it seems Django prefers an explicitly empty body: return HttpResponse("", status=201) to work. I usually set something in the body where I'm expected to - this may be something to watch out for.