How acceptThreads and acceptQueueSize impact concurrent requests? - jetty

I am new to Jetty.
We are running into an issue with an application using Jetty 7.4 under a load test. Any requests to the Jetty server gets stuck and there is no response returned, even after several hours. Also, new requests seem to behave the same way, just waiting for a response forever. The thread dump does not show any deadlocks and but does show a good number of threads in TIMED_WAITING state. There are no Jetty exceptions in the logs.
Details:
Jetty version is 7.4
OS: OpenAdopt JDK 8
acceptQueueSize: 100
acceptTimeout: 1800000 (30 mins)
acceptors: 1
lingerTime: 10000 (milli secs)
min number of thread: 5
max number of thread: 500
Concurrent request: 700 (70 APIs and 10 per API)
SelectChannelConnector result = new MySelectChannelConnectorImpl();
result.setThreadPool(createThreadPool(url, config/* min and max threads*/));
result.setPort((url.getPort() > 0) ? url.getPort() : port);
result.setMaxIdleTime((int)acceptTimeout);
result.setAcceptors(acceptThreads);
result.setAcceptQueueSize(acceptQueueSize);
result.setSoLingerTime(linger);
I know Jetty 7.4 is end of life but customer is not in a position to upgrade.
We need to prove to the customer that, issue is with jetty version and will be fixed with jetty upgrade.
Can I tweak the acceptThreads and acceptQueueSize for better results?
I want to understand acceptors threads, selectors, jetty worker thread, acceptQueueSize, acceptTimeout.
Any pointer to learn the concepts.
#Joakim Erdfelt

Related

Jetty rejects connection even when there are lot of free threads available in thread pool

In production we are using jetty 9.2.9.v20150224 with the following configuration:
new Server(new QueuedThreadPool(200, 5, 30000, new ArrayBlockingQueue<Runnable>(128)));
maxThreads = 200, minThreads = 5, idleTimout = 30000 ms
I tested our application and it was able to handle 200 requests/sec, but few of our clients complain that sometime even with very less load jetty does not accept any new connection and from the log that's what I found STARTED,5<=18<=200,i=4,q=128. Which as far as I understand shows there are 18 live threads in the thread pool which is way less then 200, but somehow the new connection is still rejected. Following is the snipped from the log:
2021-01-08 00:20:47,813 WARN [qtp1720926658-14444] QueuedThreadPool : QueuedThreadPool[qtp1720926658]#669341c2{STARTED,5<=18<=200,i=4,q=128}[ReservedThreadExecutor#58a5954d{s=3/4,p=1}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint#263c181
2021-01-08 00:20:47,813 WARN [qtp1720926658-14444] QueuedThreadPool : QueuedThreadPool[qtp1720926658]#669341c2{STARTED,5<=18<=200,i=3,q=128}[ReservedThreadExecutor#58a5954d{s=3/4,p=1}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint#431081c5
2021-01-08 00:20:47,813 WARN [qtp1720926658-14445] QueuedThreadPool : QueuedThreadPool[qtp1720926658]#669341c2{STARTED,5<=18<=200,i=2,q=128}[ReservedThreadExecutor#58a5954d{s=3/4,p=1}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint#3866d8cd
2021-01-08 00:20:47,813 WARN [qtp1720926658-14442] QueuedThreadPool : QueuedThreadPool[qtp1720926658]#669341c2{STARTED,5<=18<=200,i=6,q=126}[ReservedThreadExecutor#58a5954d{s=1/4,p=2}] rejected CEP:SocketChannelEndPoint#4aa81d65{/10.0.1.29:53143<->/10.0.1.28:7777,OPEN,fill=FI,flush=-,to=16442/30000}{io=1/0,kio=1,kro=1}->HttpConnection#32cdb0b1[p=HttpParser{s=START,0 of -1},g=HttpGenerator#22c521c0{s=START}]=>HttpChannelOverHttp#e02e50f{r=1,c=false,a=IDLE,uri=null,age=0}:runFillable:BLOCKING
2021-01-08 00:20:47,829 WARN [qtp1720926658-14442] EatWhatYouKill :
java.util.concurrent.RejectedExecutionException: CEP:SocketChannelEndPoint#4aa81d65{/10.0.1.29:53143<->/10.0.1.28:7777,OPEN,fill=FI,flush=-,to=16442/30000}{io=1/0,kio=1,kro=1}->HttpConnection#32cdb0b1[p=HttpParser{s=START,0 of -1},g=HttpGenerator#22c521c0{s=START}]=>HttpChannelOverHttp#e02e50f{r=1,c=false,a=IDLE,uri=null,age=0}:runFillable:BLOCKING
at org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:440)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.execute(EatWhatYouKill.java:370)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:305)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:748)
STARTED,5<=18<=200,i=4,q=128 says ...
5 is your configured minimum threads
<=18<= is your remaining threads available in the pool
200 is your configured max threads
i=4 you have 4 idle threads in the pool (eligible for cleanup/removal)
q=128 you have 128 tasks/jobs in the queue (this value is too high, it hints at your QTP configuration being insufficient for your load)
Keep in mind that when you experience the rejected tasks/jobs it can be the queue or the pool that rejects them, we don't know which one in Jetty 9.2.x (this information is now available in the logs on Jetty 9.4.x)
This limited view of the QTP in the Jetty 9.2.x series does not show you even half of what's going on internally.
Example: in Jetty 9.4.x+ you see ...
Configured Minimum
Configured Maximum
idle thread count
active thread count
busy thread count
inactive thread count
reserved thread count
leased thread count
utilized count = threads - idle - reserved - leased
max threads possible = inactive + idle + reserved + leased + utilized
threads total = idle + reserved + leased + utilized
busy threads = leased + utilized
ready threads = idle + reserved
available = max - busy
maxAvailable = max - leased
utilization = utilized / maxAvailable (0.0 means idle, 1.0 means at capacity)
All of which are exposed in JMX as well.

Gunicorn worker, threads for GPU tasks to increase concurrency/parallelism

I'm using Flask with Gunicorn to implement an AI server. The server takes in HTTP requests and calls the algorithm (built with pytorch). The computation is run on the nvidia GPU.
I need some input as to how can I achieve concurrency/parallelism in this case. The machine has 8 vCPUs, 20 GB memory and 1 GPU, 12 GB memory.
1 worker occupies, 4 GB memory, 2.2GB GPU memory.
max workers I can give is 5. (Because of GPU memory 2.2 GB * 5 workers = 11 GB )
1 worker = 1 HTTP request (max simultaneous requests = 5)
The specific question is
How can I increase the concurrency/parallelism?
Do I have to specify number of threads for computation on GPU?
Now my gunicorn command is
gunicorn --bind 0.0.0.0:8002 main:app --timeout 360 --workers=5 --worker-class=gevent --worker-connections=1000
fast Tokenizers are not thread-safe apparently.
AutoTokenizers seems like a wrapper that uses fast or slow internally. their default is set to fast (not thread-safe) .. you'll have to switch that to slow (safe) .. that's why add the use_fast=False flag
I was able to solve this by:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
Best,
Chirag Sanghvi

Jetty's HTTP/2 client slow?

Doing simple GET requests over a high speed internet connection (within an AWS EC2 instance), the throughput seems to be really low. I've tried this with multiple servers.
Here's the code that I'm using:
HTTP2Client http2Client = new HTTP2Client();
http2Client.start();
SslContextFactory ssl = new SslContextFactory(true);
HttpClient client = new HttpClient(new HttpClientTransportOverHTTP2(http2Client), ssl);
client.start();
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
System.out.println("i" + i);
ContentResponse response = client.GET("https://http2.golang.org/");
System.out.println(response.getStatus());
}
System.out.println(System.currentTimeMillis() - start);
The throughput I get is about 8 requests per second.
This seems to be pretty low (as compared to curl on the command line).
Is there anything that I'm missing? Any way to turbo-charge this?
EDIT: How do I get Jetty to use multiple streams?
That is not the right way to measure throughput.
Depending where you are geographically, you will be dominated by the latency between your client and the server. If the latency is 125 ms, you can only make 8 requests/s.
For example, from my location, the ping to http2.golang.org is 136 ms.
Even if your latency is less, there are good chances that the server is throttling you: I don't think that http2.golang.org will be happy to see you making 10k requests in a tight loop.
I'll be curious to know what is the curl or nghttp latency in the same test, but I guess won't be much different (or probably worse if they close the connection after each request).
This test in the Jetty test suite, which is not a proper benchmark either, shows around 500 requests/s on my laptop; without TLS, goes to around 2500 requests/s.
I don't know exactly what you're trying to do, but your test does not tell you anything about the performance of Jetty's HttpClient.

Why is Go's Martini less performant than Play Framework 2.2.x

I wrote two equal projects in Golang+Martini and Play Framework 2.2.x to compare it's performance. Both have 1 action that render 10K HTML View. Tested it with ab -n 10000 -c 1000 and monitored results via ab output and htop. Both uses production confs and compiled views. I wonder about results:
Play: ~17000 req/sec + constant 100% usage of all cores of my i7 = ~0.059 msec/req
Martini: ~4000 req/sec + constant 70% usage of all cores of my i7 = ~0.25 msec/req
...as I understand martini is not bloated, so why it 4.5 times slower? Any way to speedup?
Update: Added benchmark results
Golang + Martini:
./wrk -c1000 -t10 -d10 http://localhost:9875/
Running 10s test # http://localhost:9875/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 241.70ms 164.61ms 1.16s 71.06%
Req/Sec 393.42 75.79 716.00 83.26%
38554 requests in 10.00s, 91.33MB read
Socket errors: connect 0, read 0, write 0, timeout 108
Requests/sec: 3854.79
Transfer/sec: 9.13MB
Play!Framework 2:
./wrk -c1000 -t10 -d10 http://localhost:9000/
Running 10s test # http://localhost:9000/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 32.99ms 37.75ms 965.76ms 85.95%
Req/Sec 2.91k 657.65 7.61k 76.64%
276501 requests in 10.00s, 1.39GB read
Socket errors: connect 0, read 0, write 0, timeout 230
Requests/sec: 27645.91
Transfer/sec: 142.14MB
Martini running with runtime.GOMAXPROCS(runtime.NumCPU())
I want to use golang in production, but after this benchmark I don't know how can I make such decision...
Any way to speedup?
I made a simple martini app with rendering 1 html file and it's quite fast:
✗ wrk -t10 -c1000 -d10s http://localhost:3000/
Running 10s test # http://localhost:3000/
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 31.94ms 38.80ms 109.56ms 83.25%
Req/Sec 3.97k 3.63k 11.03k 49.28%
235155 requests in 10.01s, 31.17MB read
Socket errors: connect 774, read 0, write 0, timeout 3496
Requests/sec: 23497.82
Transfer/sec: 3.11MB
Macbook pro i7.
This is app https://gist.github.com/zishe/9947025
It helps if you show your code, perhaps you didn't disable logs or missed something.
But timeout 3496 seems bad.
#Kr0e, right! I figured out that heavy use of reflection in DI of martini makes it perform slowly.
I moved to gorilla mux and wrote some martini-style helpers and got a wanted performance.
#Cory LaNou: I can't accept yours comment) Now I agree with you, no framework in prod is good idea. Thanks
#user3353963: See my question: Both uses production confs and compiled views
add this code:
func init(){
martini.Env = martini.Prod
}
sorry, your code have done.

Jetty 8.1 flooding the log file with "Dispatched Failed" messages

We are using Jetty 8.1 as an embedded HTTP server. Under overload conditions the server sometimes starts flooding the log file with these messages:
warn: java.util.concurrent.RejectedExecutionException
warn: Dispatched Failed! SCEP#76107610{l(...)<->r(...),d=false,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}...
The same message is repeated thousands of times, and the amount of logging appears to slow down the whole system. The messages itself are fine, our request handler ist just to slow to process the requests in time. But the huge number of repeated messages makes things actually worse and makes it more difficult for the system to recover from the overload.
So, my question is: is this a normal behaviour, or are we doing something wrong?
Here is how we set up the server:
Server server = new Server();
SelectChannelConnector connector = new SelectChannelConnector();
connector.setAcceptQueueSize( 10 );
server.setConnectors( new Connector[]{ connector } );
server.setThreadPool( new ExecutorThreadPool( 32, 32, 60, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>( 10 )));
The SelectChannelEndPoint is the origin of this log message.
To not see it, just set your named logger of org.eclipse.jetty.io.nio.SelectChannelEndPoint to LEVEL=OFF.
Now as for why you see it, that is more interesting to the developers of Jetty. Can you detail what specific version of Jetty you are using and also what specific JVM you are using?