jmeter how to produce large number of service reqest in a second like 100.000 req/per sec - web-services

I have been doing load test for very long in my company but tps never passed 500 transaction per minute. I have more challenging problem right now.
Problem:
My company will start a campaing and ask a questiong to it's customers and first correct answer will be rewarded. Analists expect 100.000 k request in a second at global maximum. (doesnt seem to me that realistic but this can be negotiable)
Resources:
Jmeter,
2 different service requests,
5 x slave with 8 gb ram,
80 mbps internet connection,
3.0 gigahertz
Master computer with same capabilities with slaves.
Question:
How to simulete this scenario, is it possible? What are the limitations. How should be the load model. Are there any alternative to do that?
Any comment is important..

Your load test always need to represent real usage of application by real users so first of all carefully implement your test scenario to mimic real human using a real browser with all its stuff like:
cookies
headers
embedded resources (proper handling of images, scripts, styles, fonts, etc.)
cache
think times
etc.
Make sure your test is following JMeter Best Practices, i.e.:
being run in non-GUI mode
all listeners are disabled
JVM settings are optimised for maximum performance
etc.
Once done you need to set up monitoring of your JMeter engines health metrics like CPU, RAM, Swap usage, Network and Disk IO, JVM stats, etc. in order to be able to see if there is a headroom to continue. JMeter PerfMon Plugin is very handy as its results can be correlated with the Test Metrics.
Start your test from 1 virtual user and gradually increase the load until you reach the target throughput / your application under test dies / JMeter engine(s) run out of resources, whatever comes the first. Depending on the outcome you will either report success or defect or will need to request more hosts to use as JMeter engines / upgrade existing hardware.

Related

what will happen if my virtual machine too slow

i have a newbie question in here, but i'm new to clouds and linux, i'm using google cloud now and wondering when choosing a machine config
what if my machine is too slow? will it make the app crash? or just slow it down
how fast should my vm be? in the image bellow
last 6 hours of a python scripts i'm running and it's cpu usage, it's obviously running for less than %2 of the cpu for most of it's time, but there's a small spike, should i care about the spike? and also, how much should my cpu usage be max before i upgrade? if a script i'm running is using 50-60% of the cpu most of the i assume i'm safe, or what's the max before you upgrade?
what if my machine is too slow? will it make the app crash? or just
slow it down
It depends.
Some applications will just respond slower. Some will fail if they have timeout restrictions. Some applications will begin to thrash which means that all of a sudden the app becomes very very slow.
A general rule, which varies among architects, is to never consume more than 80% of any resource. I use the rule 50% so that my service can handle burst traffic or denial of service attempts.
Based on your graph, your service is fine. The spike is probably normal system processing. If the spike went to 100%, I would be concerned.
Once your service consumes more than 50% of a resource (CPU, memory, disk I/O, etc) then it is time to upgrade that resource.
Also, consider that there are other services that you might want to add. Examples are load balancers, Cloud Storage, CDNs, firewalls such as Cloud Armor, etc. Those types of services tend to offload requirements from your service and make your service more resilient, available and performant. The biggest plus is your service is usually faster for the end user. Some of those services are so cheap, that I almost always deploy them.
You should choose machine family based on your needs. Check the link below for details and recommendations.
https://cloud.google.com/compute/docs/machine-types
If CPU is your concern you should create a managed instance group that automatically scales based on CPU usage. Usually 80-85% is a good value for a max CPU value. Check the link below for details.
https://cloud.google.com/compute/docs/autoscaler/scaling-cpu
You should also consider the availability needed for your workload to keep costs efficient. See below link for other useful info.
https://cloud.google.com/compute/docs/choose-compute-deployment-option

NodeJS CPU utilisation statistics

NOTE: This is on Windows.
I have an application that is started as pm2 start index.js --name dvc -- config.json. Then, I started a new command window to monitor the application pm2 monit. To test the application, I am using the Runner option in Postman where, the number of iterations is set to 1000 with a delay of 0 ms.
In pm2 monit window, the CPU % remains between 0 and 11%. In Task Manager, the node.exe process shows CPU % in 20s. The Process Explorer shows the CPU utilisation close to values as reported by pm2 monit. So, I am not able to conclude exactly what is the CPU utilisation.
Can you please advise?
I would recommend looking into Windows Performance Monitor instead, it exposes more precise counters
Start Performance Monitor (i.e. type perfmon in "Search" or "Run" boxes and click "Enter")
Add a new Counter (click green plus sign)
Choose Process from the "Available counters" and search for node
You should see charts for different counters (including, but not limited to CPU usage)
Be aware of following:
On multi-core processor systems you might need to monitor CPU usage for all cores in order to ensure that your application can be paralellised
Your 1000 iterations don't really create any load as Postman waits for previous response prior to sending a new request therefore you always have only 1 request processed by your system which might be even cached. If you would like to load test your application I would recommend considering another tool capable of sending requests in multithreaded fashion, for example Apache JMeter would be a reasonable choice. Check out REST API Testing - How to Do it Right article for instructions on seting up JMeter for an API load testing.

JMeter: Low CPU usage but response too low

I am trying to load test for web services with 1000 Users using JMeter. I can see CPU usage is around 30% once 1000 Users are injected but when it comes to response, maximum time taken is around 12 Seconds.
My query is, if the CPU is not utilized 100%, maximum time in receiving any response should not get more than few seconds.
It is good that you are monitoring CPU usage on application server side. However the devil may live somewhere else, for instance application can experience the lack of available RAM, does intensive swapping or reaches the limits of network or disk IO so you should consider these metrics as well. Aforementioned ones (and more) can be monitored using JMeter PerfMon Plugin.
Basically the same as point 1, but applied to JMeter side of things. JMeter tests are very resource intensive and if JMeter lacks resources it will be sending requests much slower. So make sure you monitor baseline OS health metrics on JMeter machine(s) as well. Also 1000 users is quite a high load, double check you test corresponds JMeter Best Practices
It may be the bottleneck in your application, i.e. it isn't capable of providing a good response time given 1000 concurrent users. Use the relevant profiler tool to detect the most long running functions and investigate the root cause.

Apache to slow to responde, but CPU and memory not max out

The problem
2 apache servers have a long response time, but I do not see CPU or memory max out.
Details
I have 2 apache server servering static content for client.
This web site has a lot of traffic.
At high traffic I have ~10 request per second (html, css, js, images).
Each HTML is making 30 other request to the servers for loading js, css, and images.
Safari developer tool show that 2MB of that is getting transfer each time I hit a html page
These two server are running on Amazon Web Service
both instances are m1.large (2 CPUS, 7.5 RAM)
I'm serving images in the same server
server are in US but a lot of traffic comes from Europe
I tried
changing from prefork to worker
increasing processses
increasing threads
increasing time out
I'm running benchmarks with ab (apachebench) and I do not see improvement.
My question are:
Is it possible that serving the images and large resorouces like js (400k) might be slowing down the server?
Is it possible that 5 request per second per server is just too much traffic and there is no tuning I can do, so only solution is to add more servers?
does amazon web services have a problem with bandwidth?
New Info
My files are being read from a mounted directory on GlusterFS
Metrics collected with ab (apache bench) run on a EC2 instance on same network
Connections: 500
Concurrency: 200
Server with files on mounted directory (files on glusterfs)
Request per second: 25.26
Time per request: 38.954
Transfer rate: 546.02
Server without files on mounted directory (files on local storage)
Request per second: 1282.62
Time per request: 0.780
Transfer rate: 27104.40
New Question
Is it possible that a reading the resources (htmls, js, css, images) from a mounted directory (NFS or GlusterFS) might slow down dramatically the performance of Apache?
Thanks
It is absolutely possible (and indeed probable) that serving up large static resources could slow down your server. You have to have Apache worker threads open the entire time that each one of these pieces of content are being downloaded. The larger the file, the longer the download, and the longer you have to hold a thread open. You might be reaching your max threads limits before reaching any sort of memory limitations you have set for Apache.
First, I would recommend getting all of your static content off of your server and into Cloudfront or similar CDN. This will make it to where your web server will only have to worry about the primary web requests. This might take the requests per second (and related number of open Apache threads) down from 10 request/second to like .3 requests/second (based on your 30:1 ratio of primary requests to secondary content requests).
Reducing the number of requests you are serving by over an order of magnitude will certainly help server performance and possibly allow you to reduce down to a single server (or if you still want multiple servers - which is a good idea) possibly reduce the size of your servers.
One thing you will find that basically all high volume websites have in common is that they leave the business of serving up static content to a CDN. Once you get to the point of being a high volume site, you must absolutely consider this (or at least serve static content from different servers using Nginx, Lighty, or some other web server better suited for serving static content than Apache is).
After offloading your static traffic, then you can really start with worrying about tuning your web servers to handle the primary requests. When you get to that point, you will need to know a few things:
The average memory usage for a single request thread
The amount of memory that you have allocated to Apache (maybe 70-80% of overall instance memory if this is dedicated Apache server)
The average amount of time it takes your application to respond to requests
Based on that, it is a pretty simple formula to make a good starting point for tuning your max thread settings.
Say you had the following:
Apache memory: 4000KB
Avg. thread memory: 20KB
Avg. time per request: 0.5 s
That means your configuration could handle request throughput as follows:
100 requests/second = 4000kb / (20kb * 0.5 seconds/request )
Since each request averages 0.5s, you could assume that you would need 50 threads to handle this throughput.
Obviously, you would want to set you max threads higher then 50 to account for request spikes and such, but at least this gives you a good place to start.
Try to start/stop the instance. This will move you to a different host. If the host your instance is on is having any issues, that will mitigate it.
Beyond checking system load numbers, take a look at memory usage, IO and CPU usage.
Look at your system log to see if anything produced an error that may explain the current situation.
Checkout Eric J. answer in this thread Amazon EC2 Bitnami Wordpress Extremely Slow

How to serve CPU intensive webservice requests in the cloud?

Background: I'm running a webservice in which each request involves a fair amount of computations (up to 10 seconds on a quadcore machine).
Each request can be broken down to about 150 independent (and equally small) subtasks.
What I'm after: l'm looking for a hosting service that allows me to serve these kinds of requests efficiently in a scalable manner.
What I've considered: I've looked into Google App Engine and Rackspace.
It seems to me as if GAE is intended for simple requests, requiering litte resources to process. Problem with something like Rackspace is that I can't tell in advance how many vCPUs I may need (and even if I knew how big future spikes would be, I don't want to sit with, say, 40 servers idling the rest of the time)
Questions:
Would it be possible to use GAE in the following way:
For each request, split it up into 150 subtasks
Process all subtasks independently by doing 150 concurrent HTTP requests to the same webapp (but through a differrnt method)
Collect the results from the "subresults" and return a response to the original request.
Is there any possibility that Map Reduce for GAE could be of any help?
Is there any other service better suited for this task?
Yes, this is possible. The usual way would be to use Task Queue, possibly via DeferredTask helper class.
1.3 Normal web requests (to frontend instances) are limited to 30s, so doing this in synchronous way is not guaranteed to succeed. Also note that instances are artificially limited to do 10 parallel requests (if multithreading is enabled).
Yes, this is a job for map reduce. But note that map reduce is async - you give it tasks to do and it will be done sometime in the future.
Given the processing you need you might want to look at GAE backends (they are long running with multithrading and come in different sizes). If you need even more processing power, then you might want to look at Compute Engine.
Unless all of these 150 subtasks are read-only activities, trying to run them all in a single thread is just not safe. Web requests are unreliable - people can cancel, hit refresh if it takes too long, close windows in the middle, or just time out due to network issues. The background HTTP requests, likewise, can have a whole mess of problems. The standard solution is to have your front-end code simply build a list of things that need to be done, so it can get back to the user quickly, and have a back-end 'worker' process handle the (potentially unreliable) subtasks. Depending on what your application is doing, you might bounce the user to a "working" screen (like searching for airfare) where they can safely wait for the results to come up, or it might just be stuffed away as a "pending" job (like ordering something from Amazon).
There's countless different ways to handle this basic workflow. If you stick with Google App Engine, they have a "task queue" as part of the platform - providing a simple mechanisms for creating & dispatching background tasks. If you go with Rackspace, their cloud offering is less of a unified platform so you'll have to either roll your own queue or get one to plug into your setup.