I wonder if that would be possible to temporary stop the worker VM instances so they are not running at night time when I am not working on a cluster development. So far the only way I am aware of to "stop" the instances from running is to delete the cluster itself which I don't want to do. Any suggestions are highly appreciated.
P.S. Edited later
The cluster was created following steps outlined in this guide.
I'm just learning myself but this might help. If you have eksctl installed, you can use it from the command line to scale your cluster. I scale mine down to the min size when I'm not using it:
eksctl get cluster
eksctl get nodegroup --cluster CLUSTERNAME
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes NEWSIZE
To completely scale down the nodes to zero use this (max=0 threw errors):
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes 0 --nodes-max 1 --nodes-min 0
Go to EC2 instances dashboard of your Node Group and from right panel in bottom click on Auto Scaling Groups then select your group by click on checkbox and click edit button and change Desired, Min & Max capacities to 0
Edit the autoscaling group and set the instances to 0.
This will shut down all worker nodes.
Now you can use AWS Automation to schedule a repetitive action through automation documents that will be stopping/starting at given periods of time.
You can't stop the master nodes as they are managed by AWS.
Take a look at the kube-downscaler which can be deployed to cluster to scale in and out the deployments based on time of day.
More cost reduction techniques in this blog.
Related
Background: I'm running docker-compose ecs locally and need to ensure I use Spot instances due to my hobbyist budget.
Question: How do I determine and guarantee that instances are running as Fargate Spot instances?
Evidence:
I have setup the default capacity provider strategy as FARGATE_SPOT
I have both the default-created capacity providers 'FARGATE' and 'FARGATE_SPOT'
capacity providers
default strategy
You can see this in the web console when you view a specific task:
To find this page open click on your cluster from within ECS, then go to the "Tasks" tab and click on the task id.
You can also see this through the aws cli:
aws ecs describe-tasks --cluster <your cluster name> --tasks <your task id> | grep capacityProviderName
I use eksctl to create EKS cluster on AWS
After create a yaml configuration file define EKS cluster follow docs, when I run the command eksctl create cluster -f k8s-dev/k8s-dev.yaml to execute the create cluster action, the log show some lines below:
2021-12-15 16:23:55 [ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
2021-12-15 16:23:55 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
What is the different between nodegroup and managed nodegroup?
I have read from official docs from AWS about managed nodegroup but I'm still can not clearly which exactly reason to choose nodegroup or managed nodegroup?
What would you use when you need to create a EKS cluster?
eksctl only provide option for you to choose nodeGroups or managedNodeGroups docs: https://eksctl.io/usage/container-runtime/#managed-nodes but not describe the different. But I think the follow document will give you the information you need
It describe the different features between EKS managed node groups - Self managed nodes and AWS Fargate
https://docs.aws.amazon.com/eks/latest/userguide/eks-compute.html
Depend on which purpose you want to use to choose the match one with your purpose, and if I was you, I will choose managed nodegroup.
Total noob and have a runaway EKS cluster adding up $$ on AWS.
I'm having a tough time scaling down my cluster ad not sure what to do. I'm following the recommendations here: How to stop AWS EKS Worker Instances reference below
If I run:
"eksctl get cluster", I get the following:
NAME REGION EKSCTL CREATED
my-cluster us-west-2 True
unique-outfit-1636757727 us-west-2 True
I then try the next line "eksctl get nodegroup --cluster my-cluster" and get:
2021-11-15 15:31:14 [ℹ] eksctl version 0.73.0
2021-11-15 15:31:14 [ℹ] using region us-west-2
Error: No nodegroups found
I'm desperate to try and scale down the cluster, but stuck in the above command.
Seems everything installed and is running as intended, but the management part is failing! Thanks in advance! What am I doing wrong?
Reference --
eksctl get cluster
eksctl get nodegroup --cluster CLUSTERNAME
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes NEWSIZE
To completely scale down the nodes to zero use this (max=0 threw errors):
eksctl scale nodegroup --cluster CLUSTERNAME --name NODEGROUPNAME --nodes 0 --nodes-max 1 --nodes-min 0
You don't have managed node group therefore eksctl does not return any node group result. The same applies to aws eks cli.
...scaling down my cluster...
You can logon to the console, goto EC2->Auto Scaling Groups, locate the launch template and scale by updating the "Group details". Depends on how your cluster was created, you can look for the launch template tag kubernetes.io/cluster/<your cluster name> to find the correct template.
Haven't been able to find this in docs. Can I just pause an ECS service so it stopped creating new tasks? Or do I have to delete it to stop that behavior?
I just want to temporarily suspend it from creating new tasks on the cluster
It is enough to set the desired number of tasks for a service to 0.
ECS will automatically remove all running tasks.
aws ecs update-service --desired-count 0 --cluster "ecs-my-ClusterName" --service "service-my-ServiceName-117U7OHVC5NJP"
You can accomplish a "pause" by adjusting your service configuration to match your current number of running tasks. For example, if you currently have 3 running tasks in your service, you'd configure the service as below:
This tells the service:
The number of tasks I want is [current-count]
I want you to maintain at least [current-count]
I don't want more than [current-count
These combined effectively halt your service from making any changes.
The accepted answer is incorrect.
If you set both "Minimum healthy percent" and "Maximum healthy percent" to 100, AWS will give you an error similar to following:
To stop service from creating new tasks, you have to update service by updating task definition and setting desired number of tasks to 0. After that you can use AWS CLI (fastest option) to stop existing running tasks , for example:
aws ecs list-services --cluster "ecs-my-ClusterName"
aws ecs list-tasks --cluster "ecs-my-ClusterName" --service "service-my-ServiceName-117U7OHVC5NJP"
After that you will get the list of the running tasks for the service, such as:
{
"taskArns": [
"arn:aws:ecs:us-east-1:XXXXXXXXXXX:task/12e13d93-1e75-4088-a7ab-08546d69dc2c",
"arn:aws:ecs:us-east-1:XXXXXXXXXXX:task/35ed484a-cc8f-4b5f-8400-71e40a185806"
]
}
Finally use below to stop each task:
aws ecs stop-task --cluster "ecs-my-ClusterName" --task 12e13d93-1e75-4088-a7ab-08546d69dc2c
aws ecs stop-task --cluster "ecs-my-ClusterName" --task 35ed484a-cc8f-4b5f-8400-71e40a185806
UPDATE: By setting the desired number of running tasks to 0, ECS will stop and drain all running tasks in that service. There is no need to stop them individually afterwards using CLI commands originally posted above.
In Jenkins there are two similar plugins available:
Both are linked to the same Jenkins wiki page
I haven't found any documentation for the scalable version of the plugin and I have the following question:
Is it possible to add ECS instances in cluster from 0(none) to 1 using this plugin?
I want to have active ECS instances only when there are jobs to be done.
I will appreciate any help.
Try to uninstall these plugins and compile ecs-slave plugin manually from the branch autoscaling https://github.com/cbamelis/amazon-ecs-plugin
I found a workaround to scale out the number of ECS instances from zero.
I created a new job with the following shell code:
result=$(aws ecs list-container-instances --cluster ${cluster-name} | grep -c arn:aws:ecs:${aws-region}) || true
if [ "$result" = '0' ]
then aws autoscaling set-desired-capacity --auto-scaling-group-name ${asg-name} --desired-capacity 1
else
echo "Container already exists"
fi
replace variables ${cluster-name}, ${aws-region}, ${asg-name} with actual values
This job increases the number of ECS VMs to 1 if it was 0.
Scaling in can be done using Cloudwatch alarm and Autoscaling policy.