I've recently started using Lightsail. I'm currently running on a 3-month free trial that Amazon offers for a 2GB RAM and 1 vCPU Windows instance.
I'm only experiencing with VM's for the moment and using it for personal purposes so I'd really like to take advantage of the free trial to see if it suits me.
However, in the 24-hourish time my instance was running I've noticed it is always slightly above the sustainable zone and into the burstable zone. I've read plenty of articles about burstable but haven't really figured this out:
It seems that my instance is running on 0% REMAINING CPU BURST CAPACITY. For every minute I run on 0%, if I wasn't on free trial, I would be charged? Or even for above 0% capacity I would be charged?
Can Amazon start charging me extra for the fact I'm always slightly above the sustainable zone, even when I'm in free trial?
Can my instance suddenly stop working after a prolonged period of time running in the burstable zone?
Here are screenshots of my instance's metrics:
It is true that the page does not actually say what happens when the burst capacity reaches zero.
The descriptions of burst behaviour appear to match that of the T2-family of Amazon EC2 instances.
The Lightsail pricing makes no mention of an 'Unlimited burst mode', which in EC2 can be used to burst beyond the provided credit balance.
Therefore, it would appear that the Lightsail instance would simply be limited to operate at a maximum of 20% CPU when the burst capacity balances reaches zero.
Related
CPU Credit balance on my AWS EC2 server dropped to zero and now my system is very slow.
What can I do to fix this?
I use t2.micro EC2 instance.
For how long it will be zero? To the end of month, forever?
The credits will only come back if you reduce your CPU load. You can enable T2 Unlimited to avoid the limitation, but please note that extra costs will likely apply.
If you are frequently running out of Credits, you should consider using a larger instance type (eg t2.small, t2.medium) or a different instance family. T2/T3 instances are good for workloads that occasionally burst, but is not ideal for sustained workloads.
See: CPU Credits and Baseline Performance for Burstable Performance Instances
I have a script that I run 24/7 that uses 90-100% CPU constantly. I am running this script in multiple virtual machines from Google Cloud Platform. I run one script per VM.
I am trying to reduce cost by using AWS EC2. I looked at the price per hour of t3-micro (2 vCPU) instances and it says the cost is around $0.01/h, which is cheaper than the GCP's equivalent instance with 2 vCPU.
Now, I tried to run the script in one t3-micro instance, just to have a real estimate of how much each t3-instance running my script will cost. I was expecting the monthly cost per instance to be ~$7.20 (720h/month * $0.01/h). The thing is that I have been running the script for 2-3 days, and the cost reports already show a cost of more than $4.
I am trying to understand why the cost is so far from my estimate (and from AWS monthly calculator's estimate). All these extra cost seem to be from "EC2 Other" and "CPU Credit", but I don't understand these costs.
I suspect these come from my 24-7 full CPU usage, but could someone explain what are these costs and if there is a way to reduce them?
The EC2 instance allows a certain baseline CPU usage: 10% for a t3.micro. When the instance is operating below that threshold it accumulates vCPU credits: which are applied to usage above the threshold. A t3.micro can accumulate up to 12 credits an hour (with one credit being equal to 100% CPU ulitilisation for 1 minute). If you are regularly using more CPU credits than the instance allows will be charged at a higher rate: which I understand to be 5c per vCPU hour.
It may be that t3.micro is not your best choice for that type of workload and you may need to select a different instance type or a bigger instance.
The purple in your chart is CPU credits, not instance usage.
Looks like you enabled “T2/T3 Unlimited” when launching your instance and your script is causing it to bursting beyond the provided capacity. When you burst beyond the baseline capacity, you’re charged for that usage at the prevailing rate. You can read more about T2/T3 Unlimited and burstable performance here.
To bring these costs down, disable T2/T3 unlimited by following instructions here.
I've been running a scheduler for my work load for awhile now. Recently demand has become more inconsistent, and the workload has been backing up at what should be slow points of the week. I've started implementing auto scale groups in two of my regions that scale based on CPU load.
I've got it set at 80% CPU load average, and my queued work is good at maximizing the CPU, and I opted for more, smaller instances that are cheaper to run. Everything appears to be operating ideally, but I just have a concern about instances being started and stopped too often. I know on EC2 you pay for the full hour regardless of how long it runs during that hour, so...
Is the auto scaling taking this into account and leaving them running for at least a certain amount of time like ~30-45 minutes?
Do I have to instead work with the CPU average and the various timeouts to help prevent wasteful start/stops?
Depending on which AMI you're running, you might benefit from per-second billing. In this case, you'll only be charged a minimum of 60 seconds. From my understanding of your use case, this billing method would be ideal (cost-wise) for you, as you seem to frequently start and stop instances that live for short amounts of time.
To my knowledge, there's no built-in mechanism in autoscaling that will try to optimize your EC2 usage to minimise costs.
If, however, you're using an AMI that is not eligible for per-second billing, you could look into Spot instances to further minimse your costs, if your workload applies to this scheduling model.
We are considering upgrading from an t2.micro AWS server instance to a m3.medium instance based on the recommendation here and some research offline. We feel the need to upgrade primarily for speed issues and to ensure google bots crawl our fast growing site fast enough. We have upward of 8000 products (on magento) and that will grow.
While trying to understand what exactly could be the constraint of the current t2.micro instance, we ran through a lot of logs but couldn't find anything specific that could indicate a bottle-neck as such in the current usage.
Could anyone help point out
1. What are the clues that can be found in logs which could show potential bottleneck issues(if-any) with the current t2.micro instance
2. How could we find out if google-bot had issues while crawling and stopped crawling due to server performance related issues.
There are two things to note about t2.micro instances:
They have CPU limitations based upon a CPU credits system
They have limited network bandwidth
CPU credits
The T2 family is very powerful (see comparison between t2.medium and m3.medium), but there is a limit on the amount of CPU that can be used.
From the T2 documentation:
Each T2 instance starts with a healthy initial CPU credit balance and then continuously (at a millisecond-level resolution) receives a set rate of CPU credits per hour, depending on instance size. The accounting process for whether credits are accumulated or spent also happens at a millisecond-level resolution, so you don't have to worry about overspending CPU credits; a short burst of CPU takes a small fraction of a CPU credit.
Therefore, you should look at the CloudWatch CPUCreditBalance metric for the instance to determine whether it has consumed all available credits. If so, then the CPU will be limited to 10% of the time and you either need a larger T2 instance, or you should move away from the T2 family.
In general, T2 instances are great for bursty workloads, where the CPU only spikes at certain times. It is not good for sustained workloads.
Network Bandwidth
Each Amazon EC2 instance type has a limited amount of network bandwidth. This is done to prevent noisy neighbour situations. While AWS only describes bandwidth as Low/Moderate/High, there are some better details at: EC2 Instance Types's EXACT Network Performance?
You can monitor network traffic of your EC2 instances using CloudWatch. Pay attention to NetworkIn and NetworkOut to determine whether the instances are hitting limits.
I've been having a problem with my aws ec2 ubuntu instances, they always have a 100% cpu utilization over certain amount of time (around 8 hours) until i restart it.
The instance is ubuntu server 13.04 and it has a basic LAMP, that's all.
I have a cron job to do a ping every couple minutes to keep a VPN tunnel up, but it shouldn't be doing this.
When it's at 100% cpu utilization i can't ping it, ssh into it or browse it, but it doesn't rejects the connection, it just keep "trying".
Any idea what's the reason behind it? Im guessing it has something to do with Amazon throttling the instance, but it's weird that it has a 100% cpu use over 8 hours.
This is the CPU log of the instance, every other indicator seems normal.
I cant attach images here, so i'm posting a link
100% cpu utilization
EDIT
This happened to me before with other instances, and right now i have an Amazon Linux AMI running at 100% for 4 days straight now, and that one only has tomcat, with no apps deployed. I just realized, its unresponsive, im terminating it.
Author's note, 2019: this post was originally written in 2013, and is about the t1.micro instance type. The current EC2 free tier now allows you to choose either the t1.micro or t2.micro instance class. Unlike the t1.micro's intermittent hard-clamping behavior, the t2.micro runs continuously at full capacity until your CPU credit balance nears depletion, and degrades much more gracefully.
This is the expected behavior. See t1.micro Instances in the EC2 User Guide for Linux Instances.
Note the graphs that say "CPU level limited." I have measured this, and if you consume 100% cpu on a micro instance for more than about 15 seconds, the throttling kicks in and your available cycles drop from 2 ECU to approximately 0.2 ECU (roughly 200MHz) for the next 2-3 minutes, at which point the cycle repeats and you'll be throttled again in just a few seconds if you are still pulling hard on the processor.
During throttle time, you only get ~1/10th of the cycles compared to when you are getting peak performance, because the hypervisor "steals" the rest¹... so you are still going to see that you were using a solid 100%... because you were using all that were available. It doesn't take much to pin a micro to the ceiling. Or floor... so either you are asking too much of the instance class or you have something unexpectedly maxing your CPU.
Establish an SSH connection while the machine is responsive, start "top" running, and then stay connected, so that when it starts to slow down, you already have the tool going that you need to use to find out what's the cpu hog.
¹ hypervisor steals the rest: A common misconception at one time was that the time stolen from EC2 instances by the hypervisor (visible in top and similar utilities) was caused by "noisy neighbors" -- other instances on the same hardware competing for CPU cycles. This is not the cause of stolen cycles. For some older instance families, like the m1, stolen cycles would be seen if AWS had provisioned your instance on a host machine that had faster processors than those specified for the instance class; the cycles were stolen so that the instance had performance matching what you were paying for, rather than the performance of the actual underlying hardware. EC2 instances don't share the physical resourced underlying your virtualized CPU resources.
Run top and see how high st (or steal) is. If st is at 97%, then you are being throttled and only have 3% of your CPU to work with. You don't need to be doing anything CPU intensive for that to be slow!
If that is the case and you cannot change how much CPU you require, the only fix is upgrade to a small instance. Small instances do not have as much throttling.
http://theon.github.io/you-may-want-to-drop-that-ec2-micro-instance.html