I've just deployed a simple Java/Tomcat based application into Elastic Beanstalk (using the java8/tomcat8 config). Mostly the application works fine.
However, all HEAD requests seem to take 60 seconds. Feels like a timeout of some kind. I can't seem to find any settings regarding filtering or delaying particular types of requests. These requests work fine when I run locally. GET requests to the same URL work fine.
I've confirmed that both the Tomcat and the Apache instance on the server log the HEAD request instantly (which indicates they are done with it, right?).
I've confirmed (using telnet) that the client is not receiving any response header bytes until very late. This isn't a problem of the client waiting for a payload or something like that.
Furthermore, the delay is clearly tied to the load balancer's "Idle Timeout" setting. If I push that down to 5 seconds, then the HEAD requests take about 5 seconds, if I set the idle-timeout to 20 seconds then the HEAD requests take just about 20 seconds (always a few ms over). The default is 60s.
What could be causing all HEAD requests (even those returning a 401 unauthorized error, no processing) to clog up the works like that?
Turns out the problem was a firewall issue at the local site. AWS ElasticBeanstock was returning the responses in a timely manner, but they were getting clogged up in a local firewall. Grr..
Related
I have an app on Google AppEngine (Python39 standard env) running on gUnicorn and Flask. I'm making a request to the server from client-side app for a long-running operation and seeing that the request processed twice. The second process (worker) started after a while (a hour and a half) after the first one has been working.
I'm not sure is it related to gUnicorn specifically or to GAE.
The server controller has logging at the beginning :
#app.route("/api/campaign/generate", methods=["GET"])
def campaign_generate():
logging.info('Entering campaign_generate');
# some very long processing here
The controller is called by clicking a button from the UI app. I checked the network in DevTools in the browser that only one request fired. And I can see that there's only one request in server logs at the moment of executing of workers (more on this follow).
The whole app.yaml is like this:
runtime: python39
default_expiration: 0
instance_class: B2
basic_scaling:
max_instances: 1
entrypoint: gunicorn -b :$PORT server.server:app --timeout 0 --workers 2
So I have 2 workers with infinite timeouts, basic scaling with max instances = 1.
I expect while the app is processing one request for a long-running operation, another worker is available for serving.
I don't expect the second worker will used to processing the same request, it's a nonsense (if only the user won't start another operation from another browser).
Thanks to timeout=0 I expect gUnicorn will wait indefinitely till the controller finishes. And only one thing that can hinder is GAE'e timeout. But thanks to basic-scaling it's 24 hours. So I expect the app should process requests for several hours without problem.
But what I'm seeing instead is that after the processing the request for a while another execution is started. Here's simplified logs I see in Cloud Logging:
13:00:58 GET /api/campaign/generate
13:00:59 Entering campaign_generate
..skipped
13:39:13 Starting generating zip-archive (it's something that takes a while)
14:25:49 Entering campaign_generate
So, at 14:25, 1:25 after the current request came another processing of the same request started!
And now there're two request processings running in parallel.
Needless to say that this increase memory pressure and doubles execution time.
When the first "worker" finished (14:29:28 in our example) its processing, its result isn't being returned to the client. It looks like gUnicorn or GAE simply abandoned the first request. And the client has to wait till the second worker finishes processing.
Why is it happening?
And how can I fix it?
Regarding http requests records in the log.
I did see only one request in Cloud Logging (the first one) when the processing was active, and even after the controller was called for the second time ('Entering campaign_generate' in logs appeared) there was not any new GET-request in the logs. But after that everything completed (actually the second processing returned a response) a mysterious second GET-request appeared. So technically after everything is done, from the server logs' view (Cloud Logging) it looks like there were two subsequent requests from the client. But there weren't! There was only one, and I can see it in the browser's DevTools.
Those two requests have different traceId and requestId http headers.
It's very hard to understand what's going on, I tried running the app locally (on the same data) but it works as intended.
If I understood it well, Google Cloud Run will make an API publicly available. Once a request is received, an instance is started and the job is processed. Once the job done, the instance will be terminated. Is this right?
If so, I presume that Google determine when the instance should be shutdown when the HTTP response in sent back to the client. Is that also right?
In my case the process will run from 10 to 20 Minutes. Can I still send the HTTP response after so much time? Any Advice on how to implement that?
Frankly, all of this is well documented in the cloud run docs:
Somewhat, but this depends on how you configured your scaling https://cloud.google.com/run/docs/about-instance-autoscaling
Also see above, but a request is considered "done" when the HTTP connection is closed (either by you or the client), yes
60 mins is the limit, see:
https://cloud.google.com/run/docs/configuring/request-timeout
Any Advice on how to implement that?
You just keep the connection open for 20mins, but do note the remark on long living connections in the link above.
RESPONSE HEADER
Why am I receiving a network error? Does anyone have a clue what layer this is occurring / how I can resolve this issue?
What I've Tried
(1) Checked CORS... everything seems to be ok.
(2) Tried to add timeouts in YAML file as annotations in my LB.
(Note) The request seems to be timing out after 60 seconds
Process:
(1) Axios POST request triggered from front via button click.
(2) Flask server (back) receives POST request and begins to process.
[ERROR OCCURS HERE] (3) Flask server is still processing request on the back; however the client receives a 504 timeout, and there is also some CORS origin mention (don't think this is the issue though, as I've set my CORS settings properly, and this doesn't pop up for any other requests...).
(4) Server responds with a 200 and successfully sets data.
Current stack:
(1) AWS EKS / Kubernetes for deployment (relevant config shown).
(2) Flask backend.
(3) React frontend.
My initial thoughts are that this has to do with the deployment... works perfectly fine in a local context, but I think that there is some timeout setting; however, I'm unsure where this is / how I can increase the timeout. For additional context, this doesn't seem to happen with short-lived requests... just this one particular that takes more time.
If it's failing specifically for long running calls then you may have to adjust your ELB idle timeout. It's 60 seconds by default. Check out the following resource for reference:
https://aws.amazon.com/blogs/aws/elb-idle-timeout-control/
Some troubleshooting tips here.
I would like to have a custom "under maintenance" page show up if my webservers on EC2 go down.
My current setup 2 A records on Route 53, using DNS Failover with a static page as the secondary and the ELB instance as the primary.
The problem with this is DNS caching - when the server first goes down, the default 502 error appears for a while for clients who were recently on the page. It takes around 5 minutes for our custom maintenance page to show up for them. When the servers come back up, it also takes around 5 minutes for the maintenance page to go away.
The first problem is more pressing for me - I don't want users to see a plain "502 Bad Gateway" message, ever. If they visit our site and things are broken or down, they should always see our custom maintenance HTML page, regardless of whether the ELB targets went down half a second ago or 10 minutes ago.
How can I make it so that if my ELB instance returns a 502, the users will automatically see a custom error page 100% of the time, so that they never see the default "502 Bad Gateway" error page?
We are trying to configure this same thing for our web application. We see the exact same enhancement request pending with AWS since a long time: https://forums.aws.amazon.com/thread.jspa?threadID=72363&start=125&tstart=0
The other option is to use cloudfront for the whole application (not just static content) and configure custom pages for specific error codes: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages-procedure.html; we don't like this option for multiple reasons, one of which is the added complexity.
So, at this time, it looks like we will have to live with this default 502 page.
I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.