Why respond time is 100 times slower than processing request on server? - google-cloud-platform

I have a computer engine server in us-east1-b zone.
n1-highmem-4 (4 vCPUs, 26 GB memory) with 50 GB SSD and everything shows normal in monitoring graphs.
we are using this server as rails based RESTful API.
The problem is when we send a request to the server it takes very long time to receive the response.
Here is our server log:
as you can see it took 00:01 second to response to the request
and here is the response received by postman:
as you can see X-Runtime is 0.036319 as expected but we received the response in 50374 ms which means almost 1 min after server response!

I hope this answer can help people with same problem.
Passenger's highly optimized load balancer assumes that Ruby apps can handle 1 (or thread-limited amount of) concurrent connection(s). This is usually the case and results in optimal load-balancing. But endpoints that deal with SSE/Websockets can handle many more concurrent connections, so the assumption leads to degraded performance.
You can use the force max concurrent requests per process configuration option to override this. The example below shows how to set the concurrency to unlimited for /special_websocket_endpoint:
server {
listen 80;
server_name www.example.com;
root /webapps/my_app/public;
passenger_enabled on;
# Use default concurrency for the app. But for the endpoint
# /special_websocket_endpoint, force a different concurrency.
location /special_websocket_endpoint {
passenger_app_group_name foo_websocket;
passenger_force_max_concurrent_requests_per_process 0;
}
}
In Passenger 5.0.21 and below the option above was not available yet. In those versions there is a workaround for Ruby apps. Enter code below into config.ru to set the concurrency (on the entire app).

Related

How to do server-side backpressure in gRPC?

I just find that in C++, when using AsyncService, even if I don't request a new request, gRPC will still read data from the network. This caused a huge memory usage in my system.
Detailed Scenario:
I have a client that will send a lot of requests to the server.
On the server-side, I didn't request any requests. The server blocked in cq_->Next(&tag, &ok) but was kept consuming memory. Caused an OOM in my system.
So my question is how to prevent the server from reading data from the network when I don't request a new request? i.e. how to do server-side backpressure so I can save the memory??
Could anyone help me? thanks!
EDIT: Reproduce
I made a simple example for you to reproduce this problem, the code is based on the v1.46.3 tag of the official gRPC code base. I just modified the example to make the server don't request any requests and make the client send more requests. Check this commit for what I modified.
git clone -b v1.46.3_reproduce_oom --depth 1 https://github.com/lixin-wei/grpc.git && cd grpc
git submodule update --init
bazel build //examples/cpp/helloworld:all
in one session, start server: ./bazel-bin/examples/cpp/helloworld/greeter_async_server
in aonther session, start client: ./bazel-bin/examples/cpp/helloworld/greeter_async_client2
keep running ps -aux | grep greeter_async_server, you'll notice an increasing memory usage in the server.
The server code is examples/cpp/helloworld/greeter_async_server.cc, the client code is examples/cpp/helloworld/greeter_async_client.cc.
One option is to use the ResourceQuota to restrict buffer memory usage across the server. The size you specify is not an absolute system memory limit, since not all memory in gRPC core/C++ is tracked, but it will result in a cap on the total memory usage.
In the server, you can add:
// Set a maximum memory cap
grpc::ResourceQuota quota("greeter_callback_server");
quota.Resize(30*1024*1024); // 30MB
builder.SetResourceQuota(quota);
And after a memory cap is reached, adding the error code to the client output, the clients will see something like
RPC failed with: Received RST_STREAM with error code 11
On my system, this happens when the server processes reach ~140MB RES memory.
Edit: another option is to set the maximum number of concurrent streams that the server is willing to accept using the GRPC_ARG_MAX_CONCURRENT_STREAMS channel argument. Each unary call is a separate RPC, and handled as a separate stream.

Bypassing Cloud Run 32mb error via HTTP2 end to end solution

I have an api query that runs during a post request on one of my views to populate my dashboard page. I know the response size is ~35mb (greater than the 32mb limits set by cloud run). I was wondering how I could by pass this.
My configuration is set via a hypercorn server and serving my django web app as an asgi app. I have 2 minimum instances, 1gb ram, 2 cpus per instance. I have run this docker container locally and can't bypass the amount of data required and also do not want to store the data due to costs. This seems to be the cheapest route. Any pointers or ideas would be helpful. I understand that I can bypass this via http2 end to end solution but I am unable to do so currently. I haven't created any additional hypecorn configurations. Any help appreciated!
The Cloud Run HTTP response limit is 32 MB and cannot be increased.
One suggestion is to compress the response data. Django has compression libraries for Python or just use zlib.
import gzip
data = b"Lots of content to compress"
cdata = gzip.compress(s_in)
# return compressed data in response
Cloud Run supports HTTP/1.1 server side streaming, which has unlimited response size. All you need to do is use chunked transfer encoding.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

How use throttling in Wso2 EI using client IP

I am planning to use throttling in wso2-ei 6.4.0, From local system i tested the scenario i face some problems could please help me if any one know thanks in advance.
If we restart the wso2-ei node policy is not working. It taking again from starting ( suppose request limit is 10 for 1 hour,Before restarting the node it processing 5 request after restarting it should take remaining 5 request but it accepting 10 request
Throttling is working based on wso2-ei node level but suppose Linux server having 10 nodes how to distribute the throttling policy in Linux server level .
How to consider client ip in throttling. If request coming from F5 load balance i need to consider the requested system IP not F5 server IP.
If we restart the wso2-ei node policy is not working. It taking again from starting ( suppose request limit is 10 for 1 hour,Before restarting the node it processing 5 request after restarting it should take remaining 5 request but it accepting 10 request
The throttle mediator does not store the throttle count. Therefore if you perform a server restart it will reset the throttle count value and start from zero. In a production environment, it is not expected to have frequent server restarts.
Throttling is working based on wso2-ei node level but suppose Linux server having 10 nodes how to distribute the throttling policy in Linux server level .
If you want to maintain the throttle count across all the nodes you need to cluster the nodes. Throttle mediator uses hazelcast cluster messages to maintain a global count across the cluster.

How to handle long requests in Google Cloud Run?

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.

Maximum length of HTTP GET request

What's the maximum length of an HTTP GET request?
Is there a response error defined that the server can/should return if it receives a GET request that exceeds this length?
This is in the context of a web service API, although it's interesting to see the browser limits as well.
The limit is dependent on both the server and the client used (and if applicable, also the proxy the server or the client is using).
Most web servers have a limit of 8192 bytes (8 KB), which is usually configurable somewhere in the server configuration. As to the client side matter, the HTTP 1.1 specification even warns about this. Here's an extract of chapter 3.2.1:
Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.
The limit in Internet Explorer and Safari is about 2 KB, in Opera about 4 KB and in Firefox about 8 KB. We may thus assume that 8 KB is the maximum possible length and that 2 KB is a more affordable length to rely on at the server side and that 255 bytes is the safest length to assume that the entire URL will come in.
If the limit is exceeded in either the browser or the server, most will just truncate the characters outside the limit without any warning. Some servers however may send an HTTP 414 error.
If you need to send large data, then better use POST instead of GET. Its limit is much higher, but more dependent on the server used than the client. Usually up to around 2 GB is allowed by the average web server.
This is also configurable somewhere in the server settings. The average server will display a server-specific error/exception when the POST limit is exceeded, usually as an HTTP 500 error.
You are asking two separate questions here:
What's the maximum length of an HTTP GET request?
As already mentioned, HTTP itself doesn't impose any hard-coded limit on request length; but browsers have limits ranging on the 2 KB - 8 KB (255 bytes if we count very old browsers).
Is there a response error defined that the server can/should return if it receives a GET request exceeds this length?
That's the one nobody has answered.
HTTP 1.1 defines status code 414 Request-URI Too Long for the cases where a server-defined limit is reached. You can see further details on RFC 2616.
For the case of client-defined limits, there isn't any sense on the server returning something, because the server won't receive the request at all.
Browser limits are:
Browser Address bar document.location
or anchor tag
---------------------------------------------------
Chrome 32779 >64k
Android 8192 >64k
Firefox >64k >64k
Safari >64k >64k
Internet Explorer 11 2047 5120
Edge 16 2047 10240
Want more? See this question on Stack Overflow.
A similar question is here: Is there a limit to the length of a GET request?
I've hit the limit and on my shared hosting account, but the browser returned a blank page before it got to the server I think.
Technically, I have seen HTTP GET will have issues if the URL length goes beyond 2000 characters. In that case, it's better to use HTTP POST or split the URL.
As already mentioned, HTTP itself doesn't impose any hard-coded limit on request length; but browsers have limits ranging on the 2048 character allowed in the GET method.
Yes. There isn't any limit on a GET request.
I am able to send ~4000 characters as part of the query string using both the Chrome browser and curl command.
I am using Tomcat 8.x server which has returned the expected 200 OK response.
Here is the screenshot of a Google Chrome HTTP request (hiding the endpoint I tried due to security reasons):
RESPONSE