i have a newbie question in here, but i'm new to clouds and linux, i'm using google cloud now and wondering when choosing a machine config
what if my machine is too slow? will it make the app crash? or just slow it down
how fast should my vm be? in the image bellow
last 6 hours of a python scripts i'm running and it's cpu usage, it's obviously running for less than %2 of the cpu for most of it's time, but there's a small spike, should i care about the spike? and also, how much should my cpu usage be max before i upgrade? if a script i'm running is using 50-60% of the cpu most of the i assume i'm safe, or what's the max before you upgrade?
what if my machine is too slow? will it make the app crash? or just
slow it down
It depends.
Some applications will just respond slower. Some will fail if they have timeout restrictions. Some applications will begin to thrash which means that all of a sudden the app becomes very very slow.
A general rule, which varies among architects, is to never consume more than 80% of any resource. I use the rule 50% so that my service can handle burst traffic or denial of service attempts.
Based on your graph, your service is fine. The spike is probably normal system processing. If the spike went to 100%, I would be concerned.
Once your service consumes more than 50% of a resource (CPU, memory, disk I/O, etc) then it is time to upgrade that resource.
Also, consider that there are other services that you might want to add. Examples are load balancers, Cloud Storage, CDNs, firewalls such as Cloud Armor, etc. Those types of services tend to offload requirements from your service and make your service more resilient, available and performant. The biggest plus is your service is usually faster for the end user. Some of those services are so cheap, that I almost always deploy them.
You should choose machine family based on your needs. Check the link below for details and recommendations.
https://cloud.google.com/compute/docs/machine-types
If CPU is your concern you should create a managed instance group that automatically scales based on CPU usage. Usually 80-85% is a good value for a max CPU value. Check the link below for details.
https://cloud.google.com/compute/docs/autoscaler/scaling-cpu
You should also consider the availability needed for your workload to keep costs efficient. See below link for other useful info.
https://cloud.google.com/compute/docs/choose-compute-deployment-option
Related
I have a REST API web server, built in .NetCore, that has data heavy APIs.
This is hosted on AWS EC2, I have noticed that the average response time for certain APIs are ~4 seconds and if I turn up the AWS-EC2 specs, the response time goes down to a few milliseconds. I guess this is expected, what I don't understand is that even when I load test the APIs on a lower end CPU, the server never crosses 50% utilization of memory/CPU. So what is the correct technical explanation that makes the APIs perform faster if the lower end CPU never reaches a 100% utilization of memory/CPU?
There is no simple answer, there are so many ec2 variations you need to first figure out what is slowing down your API.
When you 'turn up' your ec2 instance, you are getting some combination of more memory, faster cpu, faster disk and more network bandwidth - and we can't tell which one of those 'more' features are improving your performance. Different instance classes ar optimized for different problems.
It could be as simple as the better network bandwidth, or it could be that your application is disk-bound and the better instance you chose is optimized for i/O performance.
Depending on what feature your instance is lacking, it would help you decide which type of instance to upgrade to - or as you have found out, just upgrade to something 'bigger' and be happy with the performance (at the tradeoff of being more expensive).
At the moment, I have a single c4.large (3.75GB RAM, 2 vCPU) instance in my workers cluster, currently running 21 tasks for 16 services. These tasks range from image processing, to data transformation, most sending HTTP requests too. As you can see, the instance is quite well utilisated.
My question is, how do I know how many tasks to place on an instance? I am placing up to 8 tasks for a service, but I'm unsure as to whether this results in a speed increase, given they are using the same underlying instance. How do I find the optimal placement?
Should I put many chefs in my kitchen, or will just two get the food out to customers faster?
We typically run lots of smaller sized server in our clusters. Like 4-6 t2.small for our workers and place 6-7 tasks on each. The main reason for this is not to speed up processing but reduce the blast radius of servers going down.
We've seen it quite often for a server to simply fail an instance health check and AWS take it down. Having the workers spread out reduces the effect on the system.
I agree with the other people’s 80% rule. But you never want a single host for any kind of critical applications. If that goes down you’re screwed. I also think it’s better to use larger sized servers because of their increase network performance. You should look into a host with enhanced networking, especially because you say you have a lot of HTTP work.
Another thing to consider is disk I/O. If you are piling too many tasks on a host and there is a failure, it’s going to try to schedule those all somewhere else. I have had servers crash because of too many tasks being scheduled and burning through disk credits.
I'm running a Django app in an EC2 instance, which uses RabbitMQ + Celery for task queuing. Are there any drawbacks to running my RabbitMQ node from the same EC2 instance as my production app?
The answer to this questions really depends on the context of your application.
When you're faced with scenarios you should always consider a few things.
Seperation of concerns
Here, we want to make sure that if one of the systems are not responsible for the running of other systems. This includes things like
If the ec2 instance running all the stuff goes down, will the remaining tasks in queue continue running
if my RAM is full, will all systems remain functioning
Can I scale just one segment of my app without having to redesign infrastructure.
By having rabbit and django (with some kind of service, wsgi, gunicorn, waitress etc) all on one box, you loose a lot of resource contingency.
Although RAM and CPU may be abundant, there is a limit to IO, disk writes, network writes etc. This means that if for some reason you have a heavy write function, all other systems may suffer as a result. If you have a heavy write to RAM funciton, the same applies.
So really the downfalls from keeping things in one system that I can see from your question and my own experience are as follows.
Multiple points of failure. If your one instance of rabbit fails, your queues and tasks stop working.
If your app starts generating big traffic, other systems start to contend for recourses.
If any component goes down, that could mean other downtime of other services.
System downtime means complete downtime of all components.
Lots of headaches when your application demands more resources with minimal downtime.
Lots of web traffic will slow down task running
Lots of task running will slow down web requests
Lots of IO will slow down all the things
The rule of thumb that I usually follow is keep single points of failures far from each other - that way you only need to manage those components. A good use case for this would be to use an EC2 instance for your app, another for your workers and another for your rabbit. That way you can apply smaller/bigger instances for just those components if you need to. You can even create AMIs and create autoscaling groups - if it is your use case.
Here are some articles for reference
Seperation of concern
Modern design architectures
Single points of failure
TLDR; If you can run on one EC2 you should but make it easy to scale today.
Both Joshnidhin and Giannis covered the RAM, IO and CPU aspects.
I have run production apps in single instances with containerization and slept with peace of mind that if tomorrow suddenly lots of people want what I have built, I can scale pretty quickly by deploying those containers on different instances instead of one single instance.
Docker allows you to put a limit on CPU consumption and memory usage for each container hence you can also be sure that they will not step into each other.
If we take EC2 instance out of this question it becomes:
Are there any drawbacks in running RabbitMQ Node on the same server as my productions app?
I would say it depends on various things like, kind of workloads and its composition, complexity of the workload, do you expect growth in usage etc.
If your workload is well behaved and the server is big enough for both (app + task q) then why not as there will be only one server to manage. Make sure to protect these 2 process from each other by limiting their system resource usage.
If your traffic is not well behaved then you might want more the one server. In this case having dedicated servers is better (separation of concerns) as you will have to manage more than one server.
Now back to EC2, all the above still apply. EC2 makes horizontal scaling of applications easier so if you have them on separate instance then you can scale them individually and cost effectively. If not when you scale there will be wastage of resources.
I am trying to load test for web services with 1000 Users using JMeter. I can see CPU usage is around 30% once 1000 Users are injected but when it comes to response, maximum time taken is around 12 Seconds.
My query is, if the CPU is not utilized 100%, maximum time in receiving any response should not get more than few seconds.
It is good that you are monitoring CPU usage on application server side. However the devil may live somewhere else, for instance application can experience the lack of available RAM, does intensive swapping or reaches the limits of network or disk IO so you should consider these metrics as well. Aforementioned ones (and more) can be monitored using JMeter PerfMon Plugin.
Basically the same as point 1, but applied to JMeter side of things. JMeter tests are very resource intensive and if JMeter lacks resources it will be sending requests much slower. So make sure you monitor baseline OS health metrics on JMeter machine(s) as well. Also 1000 users is quite a high load, double check you test corresponds JMeter Best Practices
It may be the bottleneck in your application, i.e. it isn't capable of providing a good response time given 1000 concurrent users. Use the relevant profiler tool to detect the most long running functions and investigate the root cause.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
What are the different pros and cons for running (scheduled) background tasks and handling web requests on the same (Java) server?
Some points to consider I thought about:
How the Garbage Collector would operate
Data isolation
CPU / memory usage
Traffic surges
Security
tl;dr
pros
convenience: less machines to operate
cost: sharing infrastructure will keep cost under control
cons
complexity: might not be obvious or simple to manage different apps on the same server
scalability: scaling the API is simpler than scaling batch jobs
availability: if you introduce HA then you will have to implement some sort of locking mechanism to avoid batch job concurrency issues
security: will increase attack surface
Details
Resource usage patterns are very different between online, short-lived request-response operations such as an API and scheduled background tasks such as maintenance jobs or data curating operations.
For this reason it is generally a good idea to isolate such tasks at the lowest possible level running them on different VMs or even physical machines.
Garbage Collector perspective
If the same JVM instance is being used then batch jobs that consume lots of memory and then release it will cause garbage collection to pause the execution of the online requests degrading response time.
This can be mitigated by running each type of operation on its own JVM to minimize the effect of the stop the world effect.
Data Isolation
If the background tasks operate on the same data as the online requests then you can probably reuse at least some data access and mapping layer code potentially saving some work at the expense of introducing coupling at the data layer.
CPU / Memory usage
If the batch job is CPU or memory bound then it will affect the performance of the online requests unless you set some limits on the CPU / memory share that each process can use. This can be done at the VM, container or process level.
Traffic surges
If the batch job use a lot of bandwidth then it will affect online requests ability to send content to the clients. As with CPU and memory, bandwidth can also be throttled per application to mitigate this effect.
Security
Traditionally client facing applications such as APIs should be properly audited and hardened to avoid vulnerabilities. Scheduled background batch proceses usually have a smaller attack surface as they don't need to be exposed to clients and are therefore more difficult to compromise.
Depending on the nature of the deployment of course, sharing the same infrastructure for both applications will increase the risk for the batch jobs.
To me, it really all depends on the kind of server you have and the processes that are executed in the background.
Data Isolation would be the least of my concerns. If your different services operates on the same data, you'll have conflicts no matter what even if they are not on the same server, and you'll have to look into Paxos for a consensus. You can look into Raft for this, which is really well explained and have a few implementations already.
If security is a concern for you, the most important part you need to protect is obviously your data, stored in whatever database you currently have. The server hosting your database should be isolated from the one(s) running the web and background services, since it will force the attacker to compromise the exposed services on your first layer rather than being able to target your DB directly. See the recent attacks onto MongoDB on exposed servers.
Which brings me back to my first point resolving around CPU, Ram and surges. These points could only be answered by knowing the budget at your disposal, the traffic you expect and the kind of jobs you are running.
However, if you do want to host them on the same server and you are flexible about it, you could prevent any downtime/slowdown for your users during a peak of traffic by pausing, or allocating less ressources to your background tasks, and resuming them once it stabilized. You can also think about analyzing the patterns of your traffic every hour and running the jobs only when you know they're won't be any issue at this time.