Normal CPU Usage But Slow Jetty Response Time During Peak Hour

We have several web servers running jetty to serve 100 request per second.
During peak hour, the response time of the jetty becomes slow and the number of request that the jetty handled is dropped.
We have checked that
- The cpu usage of the jetty is around 20-30% which is healthy.
- IO figure is normal
- no slow query in our DB and the DB server is healthy too.
- network is healthy.
By adding more web servers, the problem is solved.
But I don't understand why the CPU usage is not rised when the web traffic loading is heavy?
Does anyone has similar experience?


Huge time latency between nginx and upstream django backdend

So we have a setup of nginx ingress controller as reverse proxy for a django based backend app in production (GKE k8s cluster). We have used opentelemetry to trace this entire stack(Signoz being the actual tool). One of our most critical api is validate-cart.
And we have observed that this api sometimes take a lotta time, like 10-20 seconds and even more. But if we look at the trace of one of such request in Signoz, the actual backend takes very less time like 100ms but the total trace starting from nginx shows 29+ seconds. As you can see from the screen shot attached.
And looking at the p99 latency, the nginx service has way bigger spikes than the order. This graph is populated for the same validate-cart api.
Have been banging my head around this for quite some time and i am still stuck.
I am assuimg there might be some case of request being queued at either nginx or django layer. But i am trusting otel libraries that are used to trace django, to start the trace the moment it hit django layer and since there isn't a big latency at django layer, issue might be at nginx layer. I have traced nginx using the third party module open-tracing since ingress-nginx doesn't yet support opentelemetry where more tracing information about nginx layer is provided.
Nginx opentracing

Concurrent requests to Nginx Server

I am having a problem with my server dealing with a large volume of concurrent users signing on and operating at the same time. Our business case requires the user base to be logging in at the exact same time (1 min window) and performing various operations in our application. Once the server goes past a 1000 concurrent users, the website starts loading very slowly and constant giving 502 errors.
We have reviewed the server metrics (CPU, RAM, Data traffic utilization) on the cloud are most resources are operating at below 10%. Scaling up the server doesn't help.
While the website is constant giving 502 errors are responding after a long time, any direct database queries and SSH connection are working fine. As such we have concluded that issue is primarily focused on the number of concurrent requests the server is handling due to any Nginx or Gunicorn configuration we may have set up incorrectly.
Please advice on any possible incorrect configuration (or any other solution) to this issue :
Server info :
Server specs - AWS c4.8xlarge ( CPU and RAM)
Web Server -Nginx
Backend Server -Gunicorn
nginx conf imageGunicorn conf file

Scalable Inference Server for Object Detection

I have created a Django service (nginx + Gunicorn) for object detection models.
For my case i have 50+ models with resnet 50 based back bone.
Server Machine Specification:
16 CPU
64 GB Ram
I have pre-loaded all the models in my service. I am running 20 inference requests in parallel.
But issue i am facing that Gunicorn intermittently restarts any worker out of 8 workers when inference is running and it is not due to time out. it starts to reloading the model in service again. Due to this my inference requests are failing.
I have notice that might be due to server's all cpus are utilised to 100%.
Can you please suggest the solution or other way to run inference as service ?

504 gateway timeout for any requests to Nginx with lot of free resources

We have been maintaining a project internally which has both web and mobile application platform. The backend of the project is developed in Django 1.9 (Python 3.4) and deployed in AWS.
The server stack consists of Nginx, Gunicorn, Django and PostgreSQL. We use Redis based cache server to serve resource intensive heavy queries. Our AWS resources include:
t1.medium EC2 (2 core, 4 GB RAM)
PostgreSQL RDS with one additional read-replica.
Right now Gunicorn is set to create 5 workers (by following the 2*n+1 rule). Load wise, there are like 20-30 mobile users making requests in every minute and there are 5-10 users checking the web panel every hour. So I would say, not very much load.
Now this setup works alright for 80% days. But when something goes wrong (for example, we detect a bug in the live system and we had to switch off the server for maintenance for few hours. In the mean time, the mobile apps have a queue of requests ready in their app. So when we make the backend live, a lot of users hit the system at the same time.), the server stops behaving normally and started responding with 504 gateway timeout error.
Surprisingly every time this happened, we found the server resources (CPU, Memory) to be free by 70-80% and the connection pool in the databases are mostly free.
Any idea where the problem is? How to debug? If you have already faced a similar issue, please share the fix.
Apache to slow to responde, but CPU and memory not max out

The problem
2 apache servers have a long response time, but I do not see CPU or memory max out.
I have 2 apache server servering static content for client.
This web site has a lot of traffic.
At high traffic I have ~10 request per second (html, css, js, images).
Each HTML is making 30 other request to the servers for loading js, css, and images.
Safari developer tool show that 2MB of that is getting transfer each time I hit a html page
These two server are running on Amazon Web Service
both instances are m1.large (2 CPUS, 7.5 RAM)
I'm serving images in the same server
server are in US but a lot of traffic comes from Europe
I tried
changing from prefork to worker
increasing processses
increasing threads
increasing time out
I'm running benchmarks with ab (apachebench) and I do not see improvement.
My question are:
Is it possible that serving the images and large resorouces like js (400k) might be slowing down the server?
Is it possible that 5 request per second per server is just too much traffic and there is no tuning I can do, so only solution is to add more servers?
does amazon web services have a problem with bandwidth?
New Info
My files are being read from a mounted directory on GlusterFS
Metrics collected with ab (apache bench) run on a EC2 instance on same network
Connections: 500
Concurrency: 200
Server with files on mounted directory (files on glusterfs)
Request per second: 25.26
Time per request: 38.954
Transfer rate: 546.02
Server without files on mounted directory (files on local storage)
Request per second: 1282.62
Time per request: 0.780
Transfer rate: 27104.40
New Question
Is it possible that a reading the resources (htmls, js, css, images) from a mounted directory (NFS or GlusterFS) might slow down dramatically the performance of Apache?
It is absolutely possible (and indeed probable) that serving up large static resources could slow down your server. You have to have Apache worker threads open the entire time that each one of these pieces of content are being downloaded. The larger the file, the longer the download, and the longer you have to hold a thread open. You might be reaching your max threads limits before reaching any sort of memory limitations you have set for Apache.
First, I would recommend getting all of your static content off of your server and into Cloudfront or similar CDN. This will make it to where your web server will only have to worry about the primary web requests. This might take the requests per second (and related number of open Apache threads) down from 10 request/second to like .3 requests/second (based on your 30:1 ratio of primary requests to secondary content requests).
Reducing the number of requests you are serving by over an order of magnitude will certainly help server performance and possibly allow you to reduce down to a single server (or if you still want multiple servers - which is a good idea) possibly reduce the size of your servers.
One thing you will find that basically all high volume websites have in common is that they leave the business of serving up static content to a CDN. Once you get to the point of being a high volume site, you must absolutely consider this (or at least serve static content from different servers using Nginx, Lighty, or some other web server better suited for serving static content than Apache is).
After offloading your static traffic, then you can really start with worrying about tuning your web servers to handle the primary requests. When you get to that point, you will need to know a few things:
The average memory usage for a single request thread
The amount of memory that you have allocated to Apache (maybe 70-80% of overall instance memory if this is dedicated Apache server)
The average amount of time it takes your application to respond to requests
Based on that, it is a pretty simple formula to make a good starting point for tuning your max thread settings.
Say you had the following:
Apache memory: 4000KB
Avg. thread memory: 20KB
Avg. time per request: 0.5 s
That means your configuration could handle request throughput as follows:
100 requests/second = 4000kb / (20kb * 0.5 seconds/request )
Since each request averages 0.5s, you could assume that you would need 50 threads to handle this throughput.
Obviously, you would want to set you max threads higher then 50 to account for request spikes and such, but at least this gives you a good place to start.
Try to start/stop the instance. This will move you to a different host. If the host your instance is on is having any issues, that will mitigate it.
Beyond checking system load numbers, take a look at memory usage, IO and CPU usage.
Look at your system log to see if anything produced an error that may explain the current situation.
