I'm trying to launch a fleet of 700 r4.16xlarge instances via spot request.
I used cfncluster to launch an fleet with initial_queue_size = 10
and max_queue_size = 700. However this fleet scaled up and maxed out at 50 instances (and only $120/hr :P). There are many hundreds of tasks queued up in squeue, but something is preventing more instances from being launched.
After I realized this, I attempted to create another fleet of the same instance type in the same region and received the following error message:
- AWS::AutoScaling::AutoScalingGroup ComputeFleet Received 0 SUCCESS signal(s) out of 10. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
I do not know what service limit I am maxing out as my limit for on-demand r4.16xlarge is at 20. Is there a limit for spot request instances separately from the on-demand instances?
I checked ec2 limits as well as the trusted advisor service limits (linked below) and nothing seems to be maxed out.
https://console.aws.amazon.com/trustedadvisor/home?#/category/service-limits
Any help is much appreciated!
Related
I am trying to use aws spot instances (m5.large in eu-west-2 region) with a maximum bid equal to the price of on demand instances. According to https://aws.amazon.com/ec2/spot/instance-advisor/ these instances should have a < 5% frequency of interruption, however, after launching 40 such instances this morning, I have found that within the hour 34 of them were evicted by aws ("instance-terminated-no-capacity" according to the spot requests page on the ec2 dashboard).
This eviction rate looks much too high compared to both amazon's own advisor and other users experiences. Does anybody know what could be causing this behaviour, if there is any better way to debug it or predict it, or if this is just what I should expect from spot instances?
Thank you!
Actually, for m5.large instance in eu-west-2 region(Oregon) it's 5%-10% frequency of interruption, so you can expect a max of 10%. I'm not saying that issue you are facing is because of this.
AWS terminates your spot instances because of any of these reasons,
The Spot price is above the maximum price.
There isn't enough capacity.
Amazon EC2 can't meet the constraints you placed on your Spot request.
In your case, since you are seeing instance-terminated-no-capacity message it is definitely because of the second reason. Since you've asked for 40 such instances, the amazon spot instance pool might not have enough capacity at that time.
The capacity of available spot instances pool depends on the demand for regular instances, and when users ask for regular on-demand instances, AWS will start terminating spot instances to fulfil those requests if there is not enough capacity
In my experience spot interruption is highly variable, and for some instances more or less likely at different times of the day.
If you need 40 instances and they do not need to be in the same availability zone (AZ) to each other you might reduce the chance of a mass interruption of all/most instances if you spread the machines across different AZs within the region as each availability zone has its own pool of machines. Although you will likely increase the chance that some machines will be interrupted.
Note this is not an option if you are using EMR, then they have to be in the same AZ.
Planned to use EC2 Spot instance/fleet as our jenkins slave solution based on this article https://jenkins.io/blog/2016/06/10/save-costs-with-ec2-spot-fleet/.
EXCEPTED
if the spot instances nodes remain free for the specified idle time (I have configured for 5 minutes), then Jenkins releases the nodes, and my Spot fleet nodes will be automatically scaled down.
ACTUAL
my spot instances is still running for days.Also, noticed when I have more pending jobs, Jenkins does not automatically scale my Spot fleet to add more nodes.
Automatic scale up/down supposed to be triggered automatically by aws service? or is this supposed to be triggered by the jenkins plugin?
CONFIGURATION
Jenkins version : 2.121.2-1.1
EC2 Fleet Jenkins Plugin version : 1.1.7
Spot instance configuration :
Request type : request & maintain
Target Capacity : 1
Spot fleet plugin configuration :
Max Idle Minutes Before Scaledown : 5
Minimum Cluster Size : 0
Maximum Cluster Size : 3
Any help or lead would be really appreciated.
I had the same issue and by looking in Jenkins' logs I saw it tried to terminate the instances but was refused to by AWS.
So, I checked in AWS Cloudtrail all the actions Jenkins tried and for which there was an error.
In order for the plugin to scale your Spot Fleet, check that your AWS EC2 Spot Fleet plugin has the following permissions with the right conditions:
ec2:TerminateInstances
ec2:ModifySpotFleetRequest
In my case, the condition in the policy was malformed and didn't work.
I am getting this error Master Instance Group: Exceeded EC2 Instance Quota, when I create a new cluster on Amazon EMR with 1 Master node only or 1 Master and 2 Core nodes. However, there are no EC2 instances running on my account.
What should I do? I raised a ticket, asking if I can get a quicker solution here.
In your case I think you are trying to access a new region or new instance type, AWS sometimes does that when you are in free tier, they allow access to 2-3 regions or free instance types only. Then you have to request access from AWS by raising a case.
But in Normal scenario, this is what happens:
You may face an error like Exceeded EC2 Instance Quota while you are trying to spin up new instances either standalone or in cluster.
This error is caused because you have hit the limit on number of instances allowed in your AWS account.
This limit is region and instance size specific. To get rid of this error you will have to request Amazon to increase the EC2 instance limit.
Requesting a limit increase is simple. Below are the steps:
Most service limit increases can be requested through the AWS Support Center by choosing Create Case and then choosing Service Limit Increase.
Most service limits are specific to a particular AWS Region, so be sure to submit a request for each Region you plan to use. Many services support requesting multiple limit increases for the same service through one support case. After creating your first request, choose Add another request and then choose a new limit type or Region.
This is from the amazon ec2 FAQ :
Q: How quickly can I scale my capacity both up and down?
Amazon EC2 provides a truly elastic computing environment. Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. When you need more instances, you simply call RunInstances, and Amazon EC2 will typically set up your new instances in a matter of minutes. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs.
Now again as per the same FAQ, I am only allowed to launch 20 instances per region. They said, I have to fill in a request form if I need more than 20 instances. So, in effect, I cant spin up more than 20 programmatically ?
What am I missing here ? how can we launch 100 instances let alone thousands. Sorry if this is the wrong place for such a question.
You cannot launch instances beyond the instance limit. You need to make a request to increase the instance limit. This is a safety feature so that:
A wild loop in your SDK/API script does not launch instances continuously
A malicious user does not launch a large number of instances
A hacker gets access to your account and launches a large number of instances
An incorrectly configured autoscaling group launches huge number of instances
If you require more than your instance limit, you need to submit a request to AWS. See: Amazon EC2 Service Limits. AWS will review your request and approve it.
You are missing the fact that limit increase requests are very easy to make and are almost always granted with no questions asked within a day or two.
To request a limit increase:
Open the AWS Support Center page, sign in if necessary, and choose Create Case.
For Regarding, choose Service Limit Increase.
Complete Limit Type, Use Case Description, and Contact method. If this request is urgent, choose Phone as the method of contact instead of Web.
Choose Submit.
AWS faqs provides a clear answer
You are limited to running up to a total of 20 On-Demand instances across the instance family, purchasing 20 Reserved Instances, and requesting Spot Instances per your dynamic Spot limit per region. New AWS accounts may start with limits that are lower than the limits described here. Certain instance types are further limited per region as follows
For Spot instance limits AWS states
The usual Amazon EC2 limits apply to instances launched by a Spot Fleet, such as Spot request price limits, instance limits, and volume limits. In addition, the following limits apply:
The number of active Spot Fleets per region: 1,000
The number of launch specifications per fleet: 50
The size of the user data in a launch specification: 16 KB
The target capacity per Spot Fleet: 3,000
The target capacity across all Spot Fleets in a region: 5,000
A Spot Fleet request can't span regions.
A Spot Fleet request can't span different subnets from the same Availability Zone.
These limits protect you from a hacker attack, stolen API keys, ETC. If you want to increase these limits, you need to send a form to AWS support team: AWS Support Center
Going through the AWS Service Limits documentation, I can't understand how the AWS resources usage is calculated. Does it decrease every time you launch a new instance? Does it increase when you terminate it? Is it a monthly limit? Annual?
The AWS documentation is unclear.
Update:
In many regions, I have Running On-Demand EC2 instances (max number of EC2s limit is 0).
In other regions I have some instances types limit equal to 0.
In other regions, I have instance type limit > Running On-Demand EC2 instances.
Am I missing something?
The limit is anytime limit, not monthly or annual. Suppose your limit is 20 EC2 instances, at any time you can have a max of 20 instances (running + stopped).
When you launch a new instance, your instance count (running + stopped) is checked and only if it is less than 20 (your limit) AWS will continue to launch a new instance. Otherwise you will get a message to increase the limit and no instance is launched.
So your limit is checked when you launch a new instance.
(Current running + stopped) + 1 > instance_limit ==> Cannot launch
(Current running + stopped) + 1 <= instance_limit ==> Launched
This limit is per region and per instance type.
The Amazon EC2 Management Console has a Amazon EC2 Service Limits Report section that displays the current limits on EC2 resources.
The Running On-Demand EC2 instances limit is described as:
The total number of running On-Demand instances that you can have in this region. Some instance types have different limits for this region that count against your total limit; these are listed below. Check the Current Limit column to find out how many instances per instance type you can run.
The limits can be increased by clicking the Request limit increase link and providing a use-case. AWS Customer Service then evaluates the case and adjusts the limits accordingly.
In some situations, customers request a decrease of their limit to avoid accidentally using some of the more-expensive instance types, or to prevent instances from running in unwanted Regions.