High load on jetty

High load on jetty - jetty

I'm running load tests on my MBP. The load is injected using gatling.
My web server is jetty 9.2.6
On a heavy load, number of threads remains constant : 300 but the number open socket is growing from 0 to 4000+, which generates a too much open files at OS level.
What does it mean ?
Any idea to improve the situation ?
Here is the output of jetty stat
Statistics:
Statistics gathering started 643791ms ago
Requests:
Total requests: 56084
Active requests: 1
Max active requests: 195
Total requests time: 36775697
Mean request time: 655.7369791202325
Max request time: 12638
Request time standard deviation: 1028.5144674112403
Dispatches:
Total dispatched: 56084
Active dispatched: 1
Max active dispatched: 195
Total dispatched time: 36775697
Mean dispatched time: 655.7369791202325
Max dispatched time: 12638
Dispatched time standard deviation: 1028.5144648655212
Total requests suspended: 0
Total requests expired: 0
Total requests resumed: 0
Responses:
1xx responses: 0
2xx responses: 55644
3xx responses: 0
4xx responses: 0
5xx responses: 439
Bytes sent total: 281222714
Connections:
org.eclipse.jetty.server.ServerConnector#243883582
Protocols:http/1.1
Statistics gathering started 643784ms ago
Total connections: 8788
Current connections open: 1
Max concurrent connections open: 4847
Mean connection duration: 77316.87629452601
Max connection duration: 152694
Connection duration standard deviation: 36153.705226514794
Total messages in: 56083
Total messages out: 56083
Memory:
Heap memory usage: 1317618808 bytes
Non-heap memory usage: 127525912 bytes

Some advice:
Don't have the Client Load and the Server Load on the same machine (don't cheat and attempt to put the load on 2 different VMs on a single physical machine)
Use multiple client machines, not just 1 (when the Jetty developers test load characteristics, we use at least 10:1 ratio of client machines to server machines)
Don't test with loopback, virtual network interfaces, localhost, etc.. Use a real network interface.
Understand how your load client manages its HTTP version + connections (such as keep-alive or http/1.1 close), and make sure you read the response body content, close the response content / streams, and finally disconnect the connection.
Don't test with unrealistic load scenarios. A real-world usage of your server will be a majority of HTTP/1.1 pipelined connections with multiple requests per physical connection. Some on fast networks, some on slow networks, some even on unreliable networks (think mobile)
Raw speed, serving the same content, all on unique connections, is ultimately a fascinating number and can produce impressive results, and also completely pointless and proves nothing about how your application's performance on Jetty will behave with real world scenarios.
Finally, be sure you are testing load in realistic ways.

Related

Why respond time is 100 times slower than processing request on server?

I have a computer engine server in us-east1-b zone.
n1-highmem-4 (4 vCPUs, 26 GB memory) with 50 GB SSD and everything shows normal in monitoring graphs.
we are using this server as rails based RESTful API.
The problem is when we send a request to the server it takes very long time to receive the response.
Here is our server log:
as you can see it took 00:01 second to response to the request
and here is the response received by postman:
as you can see X-Runtime is 0.036319 as expected but we received the response in 50374 ms which means almost 1 min after server response!

I hope this answer can help people with same problem.
Passenger's highly optimized load balancer assumes that Ruby apps can handle 1 (or thread-limited amount of) concurrent connection(s). This is usually the case and results in optimal load-balancing. But endpoints that deal with SSE/Websockets can handle many more concurrent connections, so the assumption leads to degraded performance.
You can use the force max concurrent requests per process configuration option to override this. The example below shows how to set the concurrency to unlimited for /special_websocket_endpoint:
server {
listen 80;
server_name www.example.com;
root /webapps/my_app/public;
passenger_enabled on;
# Use default concurrency for the app. But for the endpoint
# /special_websocket_endpoint, force a different concurrency.
location /special_websocket_endpoint {
passenger_app_group_name foo_websocket;
passenger_force_max_concurrent_requests_per_process 0;
}
}
In Passenger 5.0.21 and below the option above was not available yet. In those versions there is a workaround for Ruby apps. Enter code below into config.ru to set the concurrency (on the entire app).

Django jMeter Api testing fail due to http timeout connection fail

Jmeter config:
webserver:
server name: 120.0.0.1
port no: 9000
Timeouts
Connect: Blank
Response: Blank
Method: GET
Thread user: 500
Ramp-up: 1
loop count: 1 (check box not checked)
In 500 user 70 user fail
here is code
#api_view(['GET'])
def country_list(request):
#country = cache.get('country')
try:
countryData = Country.objects.values('country_id','country_name').all().order_by('country_name')
#countryData = Country.objects.extra(select={'name': 'country_name','id':'country_id'}).values('id','name').all().order_by('country_name')[:5]
serializer = CountrySerializer(countryData,many=True)
#cache.set('country', serializer.data, 30)
return JsonResponse({'data': serializer.data, 'error': 0 })
except (KeyError, Country.DoesNotExist):
return JsonResponse({ 'error': 1 })
and response is here
Thread Name: country 1-169
Sample Start: 2017-05-26 15:43:44 IST
Load time: 21014
Connect Time: 0
Latency: 0
Size in bytes: 2015
Sent bytes:0
Headers size in bytes: 0
Body size in bytes: 2015
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""): text
Response code: Non HTTP response code: java.net.ConnectException
Response message: Non HTTP response message: Connection timed out: connect
HTTPSampleResult fields:
ContentType:
DataEncoding: null

So it looks like you found the bottleneck in your application. However it is hard to say what is the maximum amount of users which can be served providing reasonable response time and when the error start occurring.
I would recommend the following amendments:
Make sure you run JMeter and Django application on different hosts (JMeter is quite resource intensive and this way you will avoid mutual interference)
Change Ramp-up to something above 1 second (i.e. let it be 500 seconds so JMeter will add a user each second). The idea is to increase the load gradually this way you will be able to correlate increasing response time with the increasing load and will be able to exactly determine saturation and failure points. See JMeter Ramp-Up - The Ultimate Guide for more details.
Change Loop Count to -1 so your requests would loop forever
Set desired test duration in the "Scheduler" section of the Thread Group
Monitor OS resources consumption on Django server level using i.e. JMeter PerfMon Plugin.
So when failure occurs you should know at least the following:
how many concurrent users were there when it happened (you can view it using Active Threads Over Time listener or HTML Reporting Dashboard)
whether the failure is caused by lack of i.e. CPU or RAM. If not - further steps would be examining your backend configuration (i.e. whether web/database server configuration is suitable for that many connections, checking logs for any suspicious entries, using Python Profiling Tools to get the reason of slowness or failure, etc)

Counting number of requests per second generated by JMeter client

This is how application setup goes -
2 c4.8xlarge instances
10 m4.4xlarge jmeter clients generating load. Each client used 70 threads
While conducting load test on a simple GET request (685 bytes size page). I came across issue of reduced throughput after some time of test run. Throughput of about 18000 requests/sec is reached with 700 threads, remains at this level for 40 minutes and then drops. Thread count remains 700 throughout the test. I have executed tests with different load patterns but results have been same.
The application response time considerably low throughout the test -
According to ELB monitor, there is reduction in number of requests (and I suppose hence the lower throughput ) -
There are no errors encountered during test run. I also set connect timeout with http request but yet no errors.
I discussed this issue with aws support at length and according to them I am not blocked by any network limit during test execution.
Given number of threads remain constant during test run, what are these threads doing? Is there a metrics I can check on to find out number of requests generated (not Hits/sec) by a JMeter client instance?
Testplan - http://justpaste.it/qyb0

Try adding the following Test Elements:
HTTP Cache Manager
and especially DNS Cache Manager as it might be the situation where all your threads are hitting only one c4.8xlarge instance while the remaining one is idle. See The DNS Cache Manager: The Right Way To Test Load Balanced Apps article for explanation and details.

camel jetty benchmark testing for requests per second

I am building a high load http service that will consume thousands of messages per second and pass it to a messaging system like activemq.
I currently have a rest service (non-camel, non-jetty) that accepts posts from http clients and returns a plain successful respone and i could load test this using apache ab.
We are also looking at camel-jetty as input endpoint since it has integration components for activemq and be part of an esb if required. Before i start building a camel-jetty to activemq route i want to test the load that camel-jetty can support. What should my jetty only route look like,
I am thinking of the route
from("jetty:http://0.0.0.0:8085/test").transform(constant("a"));
and use apache ab to test.
I am concerned if this route provides a real camel-jetty capacity since transform could add overhead. or would it not.
Based on these tests i am planning to build the http-mq with or without camel.

the transform API will not add significant overhead...I just ran a test against your basic route...
ab -n 2000 -c 50 http://localhost:8085/test
and got the following...
Concurrency Level: 50
Time taken for tests: 0.459 seconds
Complete requests: 2000
Failed requests: 0
Write errors: 0
Non-2xx responses: 2010
Total transferred: 2916510 bytes
HTML transferred: 2566770 bytes
Requests per second: 4353.85 [#/sec] (mean)
Time per request: 11.484 [ms] (mean)
Time per request: 0.230 [ms] (mean, across all concurrent requests)
Transfer rate: 6200.21 [Kbytes/sec] received

Benchmarking EC2

I am running some quick tests to try to estimate hw costs for a launch and in the future.
Specs
Ubuntu Natty 11.04 64-bit
Nginx 0.8.54
m1.large
I feel like I must be doing something wrong here. What I am trying to do estimate how many
simultaneous I can support before having to add an extra machine. I am using django app servers but for right now I am just testing nginx server the static index.html page
Results:
$ ab -n 10000 http://ec2-107-20-9-180.compute-1.amazonaws.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking ec2-107-20-9-180.compute-1.amazonaws.com (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: nginx/0.8.54
Server Hostname: ec2-107-20-9-180.compute-1.amazonaws.com
Server Port: 80
Document Path: /
Document Length: 151 bytes
Concurrency Level: 1
Time taken for tests: 217.748 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 3620000 bytes
HTML transferred: 1510000 bytes
Requests per second: 45.92 [#/sec] (mean)
Time per request: 21.775 [ms] (mean)
Time per request: 21.775 [ms] (mean, across all concurrent requests)
Transfer rate: 16.24 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 9 11 10.3 10 971
Processing: 10 11 9.7 11 918
Waiting: 10 11 9.7 11 918
Total: 19 22 14.2 21 982
Percentage of the requests served within a certain time (ms)
50% 21
66% 21
75% 22
80% 22
90% 22
95% 23
98% 25
99% 35
100% 982 (longest request)
So before I even add a django backend, the basic nginx setup can only supper 45 req/second?
This is horrible for an m1.large ... no?
What am I doing wrong?

You've only set the concurrency level to 1. I would recommend upping the concurrency (-c flag for Apache Bench) if you want more realistic results such as
ab -c 10 -n 1000 http://ec2-107-20-9-180.compute-1.amazonaws.com/.

What Mark said about concurrency. Plus I'd shell out a few bucks for a professional load testing service like loadstorm.com and hit the thing really hard that way. Ramp up load until it breaks. Creating simulated traffic that is at all realistic (which is important to estimating server capacity) is not trivial, and these services help by loading resources and following links and such. You won't get very realistic numbers just loading one static page. Get something like the real app running, and hit it with a whole lot of virtual browsers. You can't count on finding the limits of a well configured server with just one machine generating traffic.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js