Jenkins AWS Spot fleet plugin doesn't automatically scale spot instances - amazon-web-services

Planned to use EC2 Spot instance/fleet as our jenkins slave solution based on this article https://jenkins.io/blog/2016/06/10/save-costs-with-ec2-spot-fleet/.
EXCEPTED
if the spot instances nodes remain free for the specified idle time (I have configured for 5 minutes), then Jenkins releases the nodes, and my Spot fleet nodes will be automatically scaled down.
ACTUAL
my spot instances is still running for days.Also, noticed when I have more pending jobs, Jenkins does not automatically scale my Spot fleet to add more nodes.
Automatic scale up/down supposed to be triggered automatically by aws service? or is this supposed to be triggered by the jenkins plugin?
CONFIGURATION
Jenkins version : 2.121.2-1.1
EC2 Fleet Jenkins Plugin version : 1.1.7
Spot instance configuration :
Request type : request & maintain
Target Capacity : 1
Spot fleet plugin configuration :
Max Idle Minutes Before Scaledown : 5
Minimum Cluster Size : 0
Maximum Cluster Size : 3
Any help or lead would be really appreciated.

I had the same issue and by looking in Jenkins' logs I saw it tried to terminate the instances but was refused to by AWS.
So, I checked in AWS Cloudtrail all the actions Jenkins tried and for which there was an error.
In order for the plugin to scale your Spot Fleet, check that your AWS EC2 Spot Fleet plugin has the following permissions with the right conditions:
ec2:TerminateInstances
ec2:ModifySpotFleetRequest
In my case, the condition in the policy was malformed and didn't work.

Related

Maintaining desired instances when using AWS CodeDeploy with ASG as a capacity provider

I have an application running in ECS with asg as the capacity provider and using code deploy for rolling out new images.
the scaling policy configured in ECS service triggers autoscaling based on the cpu/mem metrics through cloud-watch alarms (target tracking policy).
When i trigger blue/green rollout in code deploy, there is a need for exactly the double the instances in the asg in order to accommodate the replacement version before routing the traffic.
however at this point ASG wont trigger autoscale and hence code deploy will not get enough instances ready to get replacement version started.
i think there are few ways to achieve this (although not tried it) but I am looking for a simpler out of the box solution where i need not to maintain a lot of configuration

ECS/EC2 ASG not scaling out for new service tasks

I'm relatively new to ECS, for the most part its been fine, but lately I'm facing an issue that I can seem to find an intuitive solution for.
I'm running a ECS cluster with an EC2 capacity provider. EC2 has an AutoScalingGroup with a min_capacity: 1 & max_capacity: 5.
Each ECS service task has auto scaling enabled based upon CPU/Memory utilisation.
The issue I'm seeing is that when new Tasks and being deployed as part of our CI/CD, ECS returns unable to place a task because no container instance met all of its requirementsunable to place a task because no container instance met all of its requirements.
I'm wondering how I get ECS to trigger a scale out event for the ASG when this happens? Do I need a particular scaling policy for the ASG? (I feel its related to this)
I attempted to setup a EventBridge/Cloudwatch alarm to trigger a scale out event whenever this happens but seems hacky. It worked, not ideally but worked. Surely there is a nicer/simpler way of doing this?
Any advice or points from experience would be greatly appreciated!
(PS - let me know if you need any more information/screenshots/code examples etc.)

What AWS service limit prevents a large fleet of instances from launching?

I'm trying to launch a fleet of 700 r4.16xlarge instances via spot request.
I used cfncluster to launch an fleet with initial_queue_size = 10
and max_queue_size = 700. However this fleet scaled up and maxed out at 50 instances (and only $120/hr :P). There are many hundreds of tasks queued up in squeue, but something is preventing more instances from being launched.
After I realized this, I attempted to create another fleet of the same instance type in the same region and received the following error message:
- AWS::AutoScaling::AutoScalingGroup ComputeFleet Received 0 SUCCESS signal(s) out of 10. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
I do not know what service limit I am maxing out as my limit for on-demand r4.16xlarge is at 20. Is there a limit for spot request instances separately from the on-demand instances?
I checked ec2 limits as well as the trusted advisor service limits (linked below) and nothing seems to be maxed out.
https://console.aws.amazon.com/trustedadvisor/home?#/category/service-limits
Any help is much appreciated!

Correct way to scailing aws ecs

Now I'm architecting AWS ECS infrastructure.
To auto scale in/out, I used auto scailing.
My system is running on AWS ECS(to deploy docker-compose)
Assume that we have 1 cluster, 1 service with 2 ec2 instance.
I defined scailing policy via CloudWatch if cpu utilization up to 50%.
To autoscailing, we have to apply our policy to ecs service and autoscailing group.
When attach cloudwatch policy to ecs service, it will automatically increase task definition count if cpu utilization up to 50%.
When attach cloudwatch policy to autoscailing group, it will automatically increase ec2 instance count if cpu utilization up to 50%.
After tested it, everything works fine.
But in my service event logs, errors appear like this.
service v1 was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 8bdf994d-9f73-42ec-8299-04b0c5e7fdd3 has insufficient memory available.
I think it occured because of service scailing is start before ec2 instance scailing. (Because service scailing(scale in/out task definition) need to ec2 instance to run it)
But it works fine. Maybe it retry automatically about several times. (I'm not sure)
I wonder that, it is normal configuration on AWS ECS autoscailing?
Or, any missing point in my flow?
Thanks.
ECS can only schedule a service if a container instance is available that matches the containers cpu/memory requirements. Ensure you have this space available to guarantee smooth auto-scaling.
The ec2-asg scaling should happen before service auto-scaling to ensure container instance is available for task scheduler.

How to restrict a Jenkins job on EC2 spot fleet?

I have installed the EC2 spot fleet plugin in Jenkins to use EC2 machines as slaves.
I want only a particular job to be executed on this EC2 fleet.
However, whilst restricting a label to this job where it should runn; every label is now being served by this Amazon EC2 fleet.
Whereas, a particular job should run on this EC2 fleet, since the instances on this fleet are configured to run only this job and not the other jobs.
Before creating spot fleet:
Creating the spot fleet:
After Adding the spot fleet:
So, now every label is using this spot fleet to serve the requests assigned to it. However, this spot fleet can run only a particular job.
How can this be solved so that only a particular job runs on this spot fleet?
Starting from version 1.5.0 EC2 Fleet Jenkins Plugin has property:
Only build jobs with label expressions matching this node
When checked only properly labled jobs will be executed on plugin nodes.