How to do server-side backpressure in gRPC? - c++

I just find that in C++, when using AsyncService, even if I don't request a new request, gRPC will still read data from the network. This caused a huge memory usage in my system.
Detailed Scenario:
I have a client that will send a lot of requests to the server.
On the server-side, I didn't request any requests. The server blocked in cq_->Next(&tag, &ok) but was kept consuming memory. Caused an OOM in my system.
So my question is how to prevent the server from reading data from the network when I don't request a new request? i.e. how to do server-side backpressure so I can save the memory??
Could anyone help me? thanks!
EDIT: Reproduce
I made a simple example for you to reproduce this problem, the code is based on the v1.46.3 tag of the official gRPC code base. I just modified the example to make the server don't request any requests and make the client send more requests. Check this commit for what I modified.
git clone -b v1.46.3_reproduce_oom --depth 1 https://github.com/lixin-wei/grpc.git && cd grpc
git submodule update --init
bazel build //examples/cpp/helloworld:all
in one session, start server: ./bazel-bin/examples/cpp/helloworld/greeter_async_server
in aonther session, start client: ./bazel-bin/examples/cpp/helloworld/greeter_async_client2
keep running ps -aux | grep greeter_async_server, you'll notice an increasing memory usage in the server.
The server code is examples/cpp/helloworld/greeter_async_server.cc, the client code is examples/cpp/helloworld/greeter_async_client.cc.

One option is to use the ResourceQuota to restrict buffer memory usage across the server. The size you specify is not an absolute system memory limit, since not all memory in gRPC core/C++ is tracked, but it will result in a cap on the total memory usage.
In the server, you can add:
// Set a maximum memory cap
grpc::ResourceQuota quota("greeter_callback_server");
quota.Resize(30*1024*1024); // 30MB
builder.SetResourceQuota(quota);
And after a memory cap is reached, adding the error code to the client output, the clients will see something like
RPC failed with: Received RST_STREAM with error code 11
On my system, this happens when the server processes reach ~140MB RES memory.
Edit: another option is to set the maximum number of concurrent streams that the server is willing to accept using the GRPC_ARG_MAX_CONCURRENT_STREAMS channel argument. Each unary call is a separate RPC, and handled as a separate stream.

Related

How to handle long requests in Google Cloud Run?

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.

Why respond time is 100 times slower than processing request on server?

I have a computer engine server in us-east1-b zone.
n1-highmem-4 (4 vCPUs, 26 GB memory) with 50 GB SSD and everything shows normal in monitoring graphs.
we are using this server as rails based RESTful API.
The problem is when we send a request to the server it takes very long time to receive the response.
Here is our server log:
as you can see it took 00:01 second to response to the request
and here is the response received by postman:
as you can see X-Runtime is 0.036319 as expected but we received the response in 50374 ms which means almost 1 min after server response!
I hope this answer can help people with same problem.
Passenger's highly optimized load balancer assumes that Ruby apps can handle 1 (or thread-limited amount of) concurrent connection(s). This is usually the case and results in optimal load-balancing. But endpoints that deal with SSE/Websockets can handle many more concurrent connections, so the assumption leads to degraded performance.
You can use the force max concurrent requests per process configuration option to override this. The example below shows how to set the concurrency to unlimited for /special_websocket_endpoint:
server {
listen 80;
server_name www.example.com;
root /webapps/my_app/public;
passenger_enabled on;
# Use default concurrency for the app. But for the endpoint
# /special_websocket_endpoint, force a different concurrency.
location /special_websocket_endpoint {
passenger_app_group_name foo_websocket;
passenger_force_max_concurrent_requests_per_process 0;
}
}
In Passenger 5.0.21 and below the option above was not available yet. In those versions there is a workaround for Ruby apps. Enter code below into config.ru to set the concurrency (on the entire app).

Integration test two http requests that depend upon each other in C/C++

I have an async (epoll based) http server written in mix of C and C++ that serves as a message broker and runs on Linux/MacOS. This is the scenario that I am manually testing with curl in multiple shell windows that I want to automate.
Request 1: Long poll asking for a message. There are none, so this request waits until a message arrives.
Request 2: Puts in a message that resolves request 1.
I'm unsure of the best way to orchestrate this. Any recommendations would be massively appreciated. My current thought is to use threads for the requests and have the responses write to files, and then a sleep/wake/check file for data loop with some timeout...but I'm hoping that better tooling/approaches exists :)

WSO2 delivery-garantee pattern implementation: doesn't work sampling processor with more than 20 attempts

I'm quite a newbie in WSO2 so sorry for the mistakes (and for my english too ... )
I need to implement a proxy with delivery-garantee pattern and here you are my solution (I'm started from this post http://charith.wickramaarachchi.org/2012/05/another-message-redelivery-pattern-with.html):
a proxy invoke an external service giving, as input, the initial
client message
if the external service is running all works fine and
the reply is given to the client
if the external service is down or generate a SOAP fault, I'll
put the message in a store (retry store), and then, using a sampling
processor (after a time "t"), I'll try again for "n" max attempts:
at any attempt, if the external service is down or generate a SOAP
fault, I'll put the message again in the retry store, and the
process is repeated
after "n" attempts, if the external service is still out of
service, the message is stored in another store (garbage store)
All works fine when I try to test with one message, but when I try to test with more messages (> 20 but this number is variable ... ), the sampling processor hangs completely, nothing is shown in the logs. Looking in the console, sometimes (but not always ...), the processor is off, deactivate and in this case, to restore, I need to undeploy, stop and restart, and then deploy again my .car.
NOTE: I've to use the sampling processor and not the forwarding processor because this processor, after "n" attempts deactive itself and I can't use it for my goals.
I can't put here the complete code because is too long, but I can give you a sample .car that you can deploy and execute on your WSO2 installation (to simulate the external service I've used the echo service ...).
Here you are the sample car that you can download
Thank you very much in advance: all suggestions are appreciated!!!
Cesare
Message Forwarding Processor
Retrieves the messages stored in a message store and reliably forwards them to a specified endpoint. This processor attempts to send one message at a time and it does not dequeue a message from the store until it receives a response from the target endpoint. Therefore this processor is ideal for implementing in-order delivery scenarios and guaranteed delivery scenarios.
Sampling Processor
Retrieves the messages stored in a message store and injects them to a given sequence at specified intervals. This processor utilizes the Quartz scheduler framework for periodically processing messages. This can be used to implement message rate throttling scenarios.
--> You can use the forwarding processor and configure it so that it will never be deactivated, just add this parameter : <parameter name="max.delivery.attempts">-1</parameter>

Netty file trasfer proxy suffer big connection delay under high concurrency

I am doing a project of building a file transfer proxy using netty which should efficiently handle high concurrency.
Here is my structure:
Back Server, a normal file server just like Http(File Server) example on netty.io which receive and confirm a request and send out a file either using ChunkedBuffer or zero-copy.
Proxy, with both NioServerSocketChannelFactory and NioClientSocketChannelFactory, both using cachedThreadPool, listening to clients' requests and fetch the file from Back Server back to the clients. Once a new client is accepted, the new accepted Channel(channel1) created by NioServerSocketChannelFactory and waiting for the request. Once the request is received, the Proxy will establish a new connection to Back Server using NioClientSocketChannelFactory, and the new Channel(channel2) will send request to Back Server and deliver the response to the client. Each channel1 and channel2 using its own pipeline.
More simply, the procedure is
channel1 accepted
channel1 receives the request
channel2 connected to Back Server
channel2 send request to Back Server
channel2 receive response(including file) from Back Server
channel1 send the response got from channel2 to the client
once transferring is done, channel2 close and channel1 close on flush.(each client only send one request)
Since the required file can be big(10M), the proxy stops channel2.readable when channel1 is NOT writtable, just like example Proxy Server on netty.io.
With the above structure, each client has one accepted Channel and once it send a request it also corresponds to one client Channel until the transferring is done.
Then I use ab(apache bench) to fire up thousands of requests to the proxy and evaluate the request time. Proxy, Back Server and Client are three boxes on one rack which has no other traffic loaded.
The results are weird:
File Size 10MB, when concurrency is 1, connection delay is very small, but when concurrency increases from 1 to 10, top 1% connection delay becomes incredibly high, up to
3 secs. The other 99% are very small. When concurrency increases to 20, 1% goes to 8 sec. And it even causes ab to be timeout if concurrency is higher than 100. The 90% Processing delay are usually linear with the concurrency but 1% can abnormally goes very high under a random number of concurrency(varies over multiple testing).
File Size 1K, everything is fine at lease with concurrency below 100.
Put them on a single local machine, no connection delay.
Can anyone explain this issue and tell me which part is wrong? I saw many benchmarking online, but they are pure ping-pang testing rather than this large file transferring and proxy stuff. Hope this is interested to you guys :)
Thank you!
========================================================================================
After some source coding reading today, I found one place may prevent the new sockets to be accepted. In NioServerSocketChannelSink.bind(), the boss executor will call Boss.run(), which contains a for loop for accepting the incoming sockets. In each iteration of this loop, after getting the accepted channel, AbstractNioWorker.register() will be called which suppose to add new sockets into the selector running in worker executor. However, in
register(), a mutex called startStopLock has to be checked before worker executor invoked. This startStopLock is also used in AbstractNioWorker.run() and AbstractNioWorker.executeInIoThread(), both of which check the mutex before they invoke the worker thread. In other words, startStopLock is used in 3 functions. If it is locked in AbstractNioWorker.register(), the for loop in Boss.run() will be blocked which can cause incoming accept delay. Hope this ganna help.