Akka HTTP server: Acceptor thread in blocked state - akka

I am running Akka HTTP server, where I observe that under load "Acceptor1" and "Acceptor2" threads are always in blocked state. Below is a excerpt from the thread dump.
Why are the acceptors in blocked state? What is a way to increase the number of acceptors?
"qtp1907228381-105 Acceptor1 SelectChannelConnector#0.0.0.0:9000" prio=5 tid=105 BLOCKED
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:225)
at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:109)
at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:938)
local variable: java.lang.String#54885
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
local variable: org.eclipse.jetty.server.AbstractConnector$Acceptor#2
at java.lang.Thread.run(Thread.java:745)

Basically this means it is waiting for more connections to arrive - it's typically not an indication of a problem

Related

Armeria - Request is waiting for ForkJoinPool.commonPool-worker

Request is not being served and getting timed out.
Observation is request is reaching to worker thread - 16:19:23.051 [armeria-common-worker-nio-2-2] DEBUG c.l.a.server.logging.LoggingService . But it is not reaching to Fork join thread ForkJoinPool.commonPool-worker-0]. This is happening for random requests.
Armeria version is - 1.14.0.

AkkaHttpClient: Equivalent of socketTimeout

We have 3 timeouts in Apache-HttpClient:
HttpClients.custom()
.setConnectionManager(cm)
.setDefaultRequestConfig(RequestConfig.custom()
.setConnectTimeout(...)
.setSocketTimeout(...)
.setConnectionRequestTimeout(...)
.build();
Which:
Connection Timeout: The time to
establish the connection with the remote host the
Socket Timeout: The time waiting for data, after establishing the connection; maximum time of inactivity between two data packets
But AkkaHttpClient only has connecting-timeout and doesn't have any configuration property for Socket Timeout. Is There any equivalent prop or way for setting a default Socket Timeout for requests?
In general for timeouts in the client beyond the connecting-timeout, the recommendation is to use the various Akka Streams operators (e.g. idleTimeout), which give you far more control.
There is also a general idle timeout which will close connections if nothing is sent or received: this is intended as a global safety feature, so it can't be configured per-request.

jetty close the connection with idle timeout 30s instead of configured timeout(60s)

I am using spring boot for Rest API services.
We see lots of idle timeout problems when reading data. It reported "java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms " Below is what I configured for the jetty thread pool. Anyone knows why it was failed with timeout 30s not 60s?
int threadPoolIdleTimeout = 60000;
ThreadPool threadpool = new QueuedThreadPool(maxThreads, maxThreads, threadPoolIdleTimeout,
new ArrayBlockingQueue(threadPoolQueueSize));
Unrelated.
That's the thread idle timeout, for reducing the number of idle threads in the thread pool.
The connection idle timeout is a different configuration.
Check the ServerConnector if a normal server connection.
Check the AsyncContext idle timeout if you are using Servlet Async Processing, or Servlet Async I/O.
Check the WebSocket Session if you are doing WebSocket requests.
Check the database DataSource configuration if you are worried about database connection idle timeouts.
Check the HTTP2 Session configuration for dealing with the virtual connections on an HTTP/2 connector.
And many more, etc ...
There's lots of idle timeouts, specific for the situation you are dealing with, be aware of them.

WSO2BPS timeouts and wait nodes are note processed after restart

Using WSO2 BPS 3.6.0 we encountered a serious issue
We have a few processes waiting for external events (with timeout) and several processes polling for updates (using wait node).
The problem arises as soon we restart a server:
* timeouts which are passed during the downtime are not processed
* wait nodes are not processed at all
Reading related articels:
https://issues.jboss.org/browse/RIFTSAW-466
wso2 wait loop doesn't work after restart
I found that the timeout timestamps are stored in the ode_job table. So I tried to update the timeout timestamps (before starting up the BPS server)
update ode_job set ts=(near_future_timestamp) where ts>(before_restart) and ts<(near_future_timestamp)
which resolved the scope timeouts, however the wait nodes are not processed anymore even they were stated in future. That effectively blocks all the polling instances without any means to move them further.
Is there a way to "revive" or timeout the wait nodes after restarting the server?

Netty file trasfer proxy suffer big connection delay under high concurrency

I am doing a project of building a file transfer proxy using netty which should efficiently handle high concurrency.
Here is my structure:
Back Server, a normal file server just like Http(File Server) example on netty.io which receive and confirm a request and send out a file either using ChunkedBuffer or zero-copy.
Proxy, with both NioServerSocketChannelFactory and NioClientSocketChannelFactory, both using cachedThreadPool, listening to clients' requests and fetch the file from Back Server back to the clients. Once a new client is accepted, the new accepted Channel(channel1) created by NioServerSocketChannelFactory and waiting for the request. Once the request is received, the Proxy will establish a new connection to Back Server using NioClientSocketChannelFactory, and the new Channel(channel2) will send request to Back Server and deliver the response to the client. Each channel1 and channel2 using its own pipeline.
More simply, the procedure is
channel1 accepted
channel1 receives the request
channel2 connected to Back Server
channel2 send request to Back Server
channel2 receive response(including file) from Back Server
channel1 send the response got from channel2 to the client
once transferring is done, channel2 close and channel1 close on flush.(each client only send one request)
Since the required file can be big(10M), the proxy stops channel2.readable when channel1 is NOT writtable, just like example Proxy Server on netty.io.
With the above structure, each client has one accepted Channel and once it send a request it also corresponds to one client Channel until the transferring is done.
Then I use ab(apache bench) to fire up thousands of requests to the proxy and evaluate the request time. Proxy, Back Server and Client are three boxes on one rack which has no other traffic loaded.
The results are weird:
File Size 10MB, when concurrency is 1, connection delay is very small, but when concurrency increases from 1 to 10, top 1% connection delay becomes incredibly high, up to
3 secs. The other 99% are very small. When concurrency increases to 20, 1% goes to 8 sec. And it even causes ab to be timeout if concurrency is higher than 100. The 90% Processing delay are usually linear with the concurrency but 1% can abnormally goes very high under a random number of concurrency(varies over multiple testing).
File Size 1K, everything is fine at lease with concurrency below 100.
Put them on a single local machine, no connection delay.
Can anyone explain this issue and tell me which part is wrong? I saw many benchmarking online, but they are pure ping-pang testing rather than this large file transferring and proxy stuff. Hope this is interested to you guys :)
Thank you!
========================================================================================
After some source coding reading today, I found one place may prevent the new sockets to be accepted. In NioServerSocketChannelSink.bind(), the boss executor will call Boss.run(), which contains a for loop for accepting the incoming sockets. In each iteration of this loop, after getting the accepted channel, AbstractNioWorker.register() will be called which suppose to add new sockets into the selector running in worker executor. However, in
register(), a mutex called startStopLock has to be checked before worker executor invoked. This startStopLock is also used in AbstractNioWorker.run() and AbstractNioWorker.executeInIoThread(), both of which check the mutex before they invoke the worker thread. In other words, startStopLock is used in 3 functions. If it is locked in AbstractNioWorker.register(), the for loop in Boss.run() will be blocked which can cause incoming accept delay. Hope this ganna help.