CPU Credit balance on my AWS EC2 server dropped to zero and now my system is very slow.
What can I do to fix this?
I use t2.micro EC2 instance.
For how long it will be zero? To the end of month, forever?
The credits will only come back if you reduce your CPU load. You can enable T2 Unlimited to avoid the limitation, but please note that extra costs will likely apply.
If you are frequently running out of Credits, you should consider using a larger instance type (eg t2.small, t2.medium) or a different instance family. T2/T3 instances are good for workloads that occasionally burst, but is not ideal for sustained workloads.
See: CPU Credits and Baseline Performance for Burstable Performance Instances
Related
I'm planning to deploy a production environment in the AWS cloud. I'm not sure about which instance type do I choose as my servers. When I did some R&D on choosing instances types, I came to know about Fixed performance instances like (m4, m5, etc) and Burstable instances like (t2, t3, etc). I doubt I can't go with Burstable instances as we are planning to go live/production. Later I came to know about Unlimited mode in Burstable instances.
So my main doubt is If I go with Burstable mode along with unlimited mode, it will work exactly like fixed performance, right? So, any time there is a spike in compute, this mode will burst to the required performance because of unlimited mode.
Is that similar to fixed performance? What is the difference?
The main difference is the cost. If your t2/t3 instances stay within their accrued CPU credits then they will cost much less than the fixed performance m4/m5 instances. As long as they stay within that CPU usage limit and only go into Unlimited CPU mode rarely and for very short periods of time, then they will save you money. If they end up using Unlimited CPU mode a lot then they may end up costing more than the m4/m5 instances at which point you might as well use the fixed performance versions.
AWS EC2 T3 instances offer a balance of compute, memory, and network resources, and are designed to provide a baseline level of CPU performance with the ability to burst above that baseline when needed.
That being said , if you feel your application not CPU intensive and its doesn't usually crosses baseline its good to choose t3 to save the cost. (t3 are cheaper then t2's for linux)
I have a script that I run 24/7 that uses 90-100% CPU constantly. I am running this script in multiple virtual machines from Google Cloud Platform. I run one script per VM.
I am trying to reduce cost by using AWS EC2. I looked at the price per hour of t3-micro (2 vCPU) instances and it says the cost is around $0.01/h, which is cheaper than the GCP's equivalent instance with 2 vCPU.
Now, I tried to run the script in one t3-micro instance, just to have a real estimate of how much each t3-instance running my script will cost. I was expecting the monthly cost per instance to be ~$7.20 (720h/month * $0.01/h). The thing is that I have been running the script for 2-3 days, and the cost reports already show a cost of more than $4.
I am trying to understand why the cost is so far from my estimate (and from AWS monthly calculator's estimate). All these extra cost seem to be from "EC2 Other" and "CPU Credit", but I don't understand these costs.
I suspect these come from my 24-7 full CPU usage, but could someone explain what are these costs and if there is a way to reduce them?
The EC2 instance allows a certain baseline CPU usage: 10% for a t3.micro. When the instance is operating below that threshold it accumulates vCPU credits: which are applied to usage above the threshold. A t3.micro can accumulate up to 12 credits an hour (with one credit being equal to 100% CPU ulitilisation for 1 minute). If you are regularly using more CPU credits than the instance allows will be charged at a higher rate: which I understand to be 5c per vCPU hour.
It may be that t3.micro is not your best choice for that type of workload and you may need to select a different instance type or a bigger instance.
The purple in your chart is CPU credits, not instance usage.
Looks like you enabled “T2/T3 Unlimited” when launching your instance and your script is causing it to bursting beyond the provided capacity. When you burst beyond the baseline capacity, you’re charged for that usage at the prevailing rate. You can read more about T2/T3 Unlimited and burstable performance here.
To bring these costs down, disable T2/T3 unlimited by following instructions here.
I've been running a scheduler for my work load for awhile now. Recently demand has become more inconsistent, and the workload has been backing up at what should be slow points of the week. I've started implementing auto scale groups in two of my regions that scale based on CPU load.
I've got it set at 80% CPU load average, and my queued work is good at maximizing the CPU, and I opted for more, smaller instances that are cheaper to run. Everything appears to be operating ideally, but I just have a concern about instances being started and stopped too often. I know on EC2 you pay for the full hour regardless of how long it runs during that hour, so...
Is the auto scaling taking this into account and leaving them running for at least a certain amount of time like ~30-45 minutes?
Do I have to instead work with the CPU average and the various timeouts to help prevent wasteful start/stops?
Depending on which AMI you're running, you might benefit from per-second billing. In this case, you'll only be charged a minimum of 60 seconds. From my understanding of your use case, this billing method would be ideal (cost-wise) for you, as you seem to frequently start and stop instances that live for short amounts of time.
To my knowledge, there's no built-in mechanism in autoscaling that will try to optimize your EC2 usage to minimise costs.
If, however, you're using an AMI that is not eligible for per-second billing, you could look into Spot instances to further minimse your costs, if your workload applies to this scheduling model.
We are considering upgrading from an t2.micro AWS server instance to a m3.medium instance based on the recommendation here and some research offline. We feel the need to upgrade primarily for speed issues and to ensure google bots crawl our fast growing site fast enough. We have upward of 8000 products (on magento) and that will grow.
While trying to understand what exactly could be the constraint of the current t2.micro instance, we ran through a lot of logs but couldn't find anything specific that could indicate a bottle-neck as such in the current usage.
Could anyone help point out
1. What are the clues that can be found in logs which could show potential bottleneck issues(if-any) with the current t2.micro instance
2. How could we find out if google-bot had issues while crawling and stopped crawling due to server performance related issues.
There are two things to note about t2.micro instances:
They have CPU limitations based upon a CPU credits system
They have limited network bandwidth
CPU credits
The T2 family is very powerful (see comparison between t2.medium and m3.medium), but there is a limit on the amount of CPU that can be used.
From the T2 documentation:
Each T2 instance starts with a healthy initial CPU credit balance and then continuously (at a millisecond-level resolution) receives a set rate of CPU credits per hour, depending on instance size. The accounting process for whether credits are accumulated or spent also happens at a millisecond-level resolution, so you don't have to worry about overspending CPU credits; a short burst of CPU takes a small fraction of a CPU credit.
Therefore, you should look at the CloudWatch CPUCreditBalance metric for the instance to determine whether it has consumed all available credits. If so, then the CPU will be limited to 10% of the time and you either need a larger T2 instance, or you should move away from the T2 family.
In general, T2 instances are great for bursty workloads, where the CPU only spikes at certain times. It is not good for sustained workloads.
Network Bandwidth
Each Amazon EC2 instance type has a limited amount of network bandwidth. This is done to prevent noisy neighbour situations. While AWS only describes bandwidth as Low/Moderate/High, there are some better details at: EC2 Instance Types's EXACT Network Performance?
You can monitor network traffic of your EC2 instances using CloudWatch. Pay attention to NetworkIn and NetworkOut to determine whether the instances are hitting limits.
They appear to be approximately the same in terms of performance.
Model vCPU Mem (GiB) SSD Storage (GB)
m3.medium 1 3.75 1 x 4
Model vCPU CPU Credits / hour Mem (GiB) Storage
t2.medium 2 24 4 EBS-Only
t2.medium allows for burst-able performance whereas m3.medium doesn't. t2.medium even has more vCPU (1 vs 2) and memory (3.75 vs 4) than the m3.medium. The only performance gain is the SSD w/a m3.medium, which I recognize could be significant if I'm doing heavy I/O.
Would this be the only scenario where I would choose an m3.medium over a t2.medium?
I'd like to run a web server that gets 20-30k hits a month so I suspect either is okay for my needs, but what's the better option?
30000 hits per month is on average a visitor every 90 seconds. Unless your site is highly atypical, load on the server is likely to be invisibly small. Bursting will handle spikes up to hundreds (or thousands, with some optimizations) of visitors.
With appropriate caching, a VPS server of comparable specs to a t2.micro can serve a Wordpress blog with 30000 hits PER MINUTE. If you were saturating that continuously, you couldn't rely on burst performance for the t2.micro, of course. A t2.medium is roughly 4x as powerful in all regards as a micro, and a m3.medium has similar RAM and bandwidth but less peak CPU.
The instance storage will be a few times faster than a large EBS GP2 (SSD) volume on the m3.medium, of course. The t2 & c3 medium instances will both have roughly 300-400 Mbit/s network bandwidth, t2.micro gets ~60-70 Mbit.
One benchmark shows that t2.medium in bursting mode actually beats a c3.large (let alone the m3.medium, which is less than half as powerful, at 3 ECU vs 7).
But as noted, you can probably save money by using something less powerful than either of your suggestions and still have excellent performance.
If you don't need the power to completely configure your server, shared hosting or a platform-as-a-service solution will be easier. I recommend OpenShift, because they explicitly suggest a single small gear for up to 50k hits a month. You get 3 of those for free.
If you do need to configure the server, you really only need enough memory to run your server and/or DB. A t2.nano has 512 MB, and a t2.micro has 1 GB. The real performance bottlenecks will probably be disk I/O and network bandwidth. The first can be improved with a larger general-purpose SSD volume (more IOPS), the second by using multiple instances and an ELB.
Make sure you host all static assets in S3 and use caching well, and even the smaller AWS instances can handle hundreds of requests per second.
Basically: "don't worry about it, use the cheapest and easiest thing that will run it."
Although the "hardware" specs look similar for the T2.medium instance and the M3.medium instance, the difference is when you consider Burstable vs. Fixed Performance. See this link from Amazon Web Services:
http://aws.amazon.com/ec2/faqs/#burst
The following quote comes from that link:
Q: When should I choose a Burstable Performance Instance, such as T2?
Workloads ideal for Burstable Performance Instances (e.g. web servers, developer environments, and small databases) don’t use the full CPU often or consistently, but occasionally need to burst. If your application requires sustained high CPU performance, we recommend our Fixed Performance Instances, such as M3, C3, and R3.
A T2 instance accrues CPU credits, but only as long as it runs. If it is stopped or terminated, the credits accrued are gone.
There is an important piece of information further down the page concerning the CPU credits for the T2 instances:
Q: What happens to CPU performance if my T2 instance is running low on credits (CPU Credit balance is near zero)?
If your T2 instance has a zero CPU Credit balance, performance will remain at baseline CPU performance. For example, the t2.micro provides baseline CPU performance of 10% of a physical CPU core. If your instance’s CPU Credit balance is approaching zero, CPU performance will be lowered to baseline performance over a 15-minute interval.
This means if you run out of burstable credits, your performance will be limited to a fixed percentage of a single core until you accrue more; 10% for T2.micro, 20% for T2.small, and 40% for T2.medium.
Another important difference that the OP mentions is the M3.medium instance can be provisioned with 4GB of ephemeral storage, which has much greater I/O capacity than persistent, Elastic Block Storage (EBS). T2 instances do not have this option.
Finally, it depends on what a "hit" is. In my opinion, if a hit means a few static page downloads that are less than 64k or small dynamic pages, then I'd explore the T2 option. For longer sessions, more data traffic, or higher numbers of concurrent users, I'd consider the M3. And if performance over an extended time period is a key issue, I think you're definitely in M3 land.
Look at the logs for your present site or a site similar to what you're setting up and determine which situation you're in.
Benchmark your application on both and determine the right fit for you. That's the only way to know for sure. The "better option" is dependent on how your application runs and your cost requirements.
Alternatively, you could simply choose one, based on cost or other criteria, and if it's insufficient, or overly sufficient, then change the instance type to the other.