I have a service running on AWS EC2 Container Service (ECS). My setup is a relatively simple one. It operates with a single task definition and the following details:
Desired capacity set at 2
Minimum healthy set at 50%
Maximum available set at 200%
Tasks run with 80% CPU and memory reservations
Initially, I am able to get the necessary EC2 instances registered to the cluster that holds the service without a problem. The associated task then starts running on the two instances. As expected – given the CPU and memory reservations – the tasks take up almost the entirety of the EC2 instances' resources.
Sometimes, I want the task to use a new version of the application it is running. In order to make this happen, I create a revision of the task, de-register the previous revision, and then update the service. Note that I have set the minimum healthy percentage to require 2 * 0.50 = 1 instance running at all times and the maximum healthy percentage to permit up to 2 * 2.00 = 4 instances running.
Accordingly, I expected 1 of the de-registered task instances to be drained and taken offline so that 1 instance of the new revision of the task could be brought online. Then the process would repeat itself, bringing the deployment to a successful state.
Unfortunately, the cluster does nothing. In the events log, it tells me that it cannot place the new tasks, even though the process I have described above would permit it to do so.
How can I get the cluster to perform the behavior that I am expecting? I have only been able to get it to do so when I manually register another EC2 instance to the cluster and then tear it down after the update is complete (which is not desirable).
I have faced the same issue where the tasks used to get stuck and had no space to place them. Below snippet from AWS doc on updating a service helped me to make the below decision.
If your service has a desired number of four tasks and a maximum
percent value of 200%, the scheduler may start four new tasks before
stopping the four older tasks (provided that the cluster resources
required to do this are available). The default value for maximum
percent is 200%.
We should have the cluster resources available / container instances available to have the new tasks get started so they can start and the older one can drain.
These are the things i do
Before doing a service update add like 20% capacity to your cluster. You can use the ASG (Autoscaling group) commandline and from the desired capacity add 20% to your cluster. This way you will have some additional instance during deployment.
Once you have the instance the new tasks will start spinning up quickly and the older one will start draining.
But does this mean i will have extra container instances ?
Yes, during the deployment you will add some instances but as the older tasks drain they will hang around. The way to remove them is
Create a MemoryReservationLow alarm (~70% threshold in your case) for like 25 mins (longer duration to be sure that we have over commissioned). As the reservation will go low once you have those extra server not being used they can be removed.
I have seen this before. If your port mapping is attempting to map a static host port to the container within the task, you need more cluster instances.
Also this could be because there is not enough available memory to meet the memory (soft or hard) limit requested by the container within the task.
Related
I have a cloud run service (receiving no traffic) where I set the max instances and min instances to 1 so that it's always running.
When deploying a new instance, the instance count jumps to 3. This is a problem (I make some requests on instance start that hits a 429 if two instances are simultaneously making these requests).
Why is CloudRun instance count going over my max?
I can confirm my settings are correct and looking at the logs there are two new instances that start up.
PS: Cloudrun does have this message, which makes me think what I'm trying to do isn't possible. I just figured it would be because of downtime instead of extra instances.
Revisions using a maximum number of instances of 3 or less might experience unexpected downtime.
Your scenario seems to fit one described in the documentation, in which a new deployment for a Cloud Run service might temporarily create additional instances:
When you deploy a new revision, Cloud Run gradually migrates traffic from the old revision to the new one. Because maximum instance limits are set for each revision, you may temporarily exceed the specified limit during the period after deployment.
Additional instances might be created to handle traffic migration from the previous revision to a new one. Cloud Run offers settings that can alter how migration occurs between revisions. One of these settings is used to instantly serve all new traffic on the new revision. You can test if using this setting helps reduce the number of instances that are created. I tested one of the provided sample services and created multiple revisions, which did not exceed 1 active instance.
I read that it is a cloud run problem you need 10 instances and then the error disappears.
I understood that it is a bug in golang that does not know how to do it with two instances, but with 10 instances it should be fixed.
https://github.com/ahmetb/cloud-run-faq/issues/54
I've been given a AWS environment to look after and it runs ECS on EC2 instances and has scaling configured using ECS Memory Reservation. The system was originally running before Cluster Autoscaling was made generally available so it's just using a cloudwatch metric to scale out and scale in. As far as I can work out it is following a basic AWS design.
The EC2 has an autoscaling group and allows scale from 1 to 5 instances with 1 being the desired state.
There is 1 cluster service running with 6 tasks configured.
5 of those tasks are configured to run up to 2 copies of the task maximum and 1 the desired, the other is set to maximum of 1.
The tasks have MemoryReservation (soft limit) figures configured but not Memory (hard limit).
The tasks are primarily running Java.
The highest memory reservation is set at about 200MB and most are around this figure.
The scale out rule is based on MemoryReservation at 85%.
Docker stats shows most of the tasks are running about 300MB and some exceed 600MB.
The instance size has 4GB of RAM.
If the maximum reservation is 2GB, even if the tasks are consuming more like 3GB in reality, am I right in believing that the scale out rule will NEVER be invoked because 2GB is 50% of available RAM? Do I need to increase the memory reservations to something more realistic?
Also if it is only running a single EC2 instance am I right in thinking even if I increased the MemoryReservation figures to something more realistic, just because there's no theoretical room to start another task it won't spin up a second EC2 instance automatically? Just picked this up from different articles I've been reading when searching.
Thanks
After the update of Capacity Providers in May 2022, Capacity Providers still have a gap to fill in Memory scaling.
As per the OP "ECS Memory Reservation" seems not to even be an option any more (at least in the web console)
And when creating the Capacity Provider, only the target value is configurable.
There are more details into how this Capacity is calculated in this blog, but while it mentions:
This calculation accounts for vCPU, memory, ENI, ports, and GPUs of the tasks and the instances
If you have tasks that not necessarily grow memory consumption, but you have a service with scheduled actions configured to scale tasks (eg: minimum tasks at different times of day)
This case will not trigger a scale out, since the memory in the instances does not get to be used if the tasks simply does not fit in, due to its configuration and you will see errors (in the service events) like:
service myservice was unable to place a task because no container
instance met all of its requirements. The closest matching
container-instance abc123xxxx has insufficient memory available.
This basically mean a scheduled task scaling change may not happen if the task memory setting is just big enough so it doesn't fit in the running instances, and the CapacityProviderReservation does not change because the calculation is only done when tasks are in Provisioned state, which does not happen in this case.
Possible workarounds
Decrease the Capacity Reservation. This basically means "to have spare capacity", ie: by default Reservation is 100 (%) so it tries to use the ASG cluster resources as much as possible, so having a number less than 100, means it will scale out when the cluster is used at that capacity therefore having a margin spare of resources at all times, which means new scheduled tasks will fit in, as long as the spare is enough (eg: calculate per task memory reservation and cluster memory reservation of all expected running tasks)
Setup ASG rules for scaling that match the service scaling rules.
While possible, this may be bound for problems with timing and auto scaling due to other triggers.
A few things:
Cluster AutoScaling usually is just the term ECS uses for "An AutoScaling Group that launches instances into the cluster", and it sounds like that's what you are currently using. Capacity Providers are a newer feature where ECS more directly manages the ASG, which might be the newer feature you're thinking onf
'Desired Capacity' isn't a state that you set for where you want the group to be, its the current amount of capacity that AutoScaling wants there to be in the group. So if a scaling policy goes off and says +1, the desired will change to 2, and then AutoScaling will try to launch an instance since you presumably only had 1 before (since the desired was 1 before)
Memory reservation is based on that 2GB's reserved, so it doesn't mater how much is in use for scaling purposes. This is importaint because even if you had 6/8GB reserved (from 3 2GB tasks), but 7.5Gb in use, ECS would still allow another task to be launched, since there's still 2 reservable GBs
Because of 3) you should probably increase the reservation value, wouldn't want an instance to get overloaded. Java can be nasty about RAM issues. This would also help with your scale out threshold issue.
For your second question, scaling will only happen after the cloudwatch alarm is triggered. So if the metric never goes above that threshold, alarm can't trigger the scaling policy. There are a whole host of cases where just because the alarm triggers, scaling won't happen (more of them for scaling in than scaling out, but it can still happen on scale out too); but the alarm going into the Alarm state is definitely a required step.
I am running the ECS cluster that currently has 3 services running on T3 medium instance. Each of those services is running only one task which has a soft memory limit of 1GB, the hard limit is different for each (but that should not be the problem). I will always have enough memory to run one, new deployed task (new one will also take 1GB, and T3 medium will be able to handle it since it has 4GB total). After the new task is up and running, the old one will be stopped and I will have again 1GB free for the new deployment. I did similar to the CPU (2048 CPU, each task has 512, and 512 free for new deployments).
So everything runs fine now, but I am not completely satisfied with this setup for the future. What will happen if I need to add another service with another task? I need to deploy all existing tasks and to modify their task definitions to use less CPU and memory in order to run this new task (and new deployments). I am planning to get a reserved EC2 instance, so it will not be easy to swap the current EC2 instance with the larger one.
Is there a way to spin up another EC2 instance for the same ECS cluster to handle bursts in my tasks? Also deployments, it's not a perfect scenario to have the ability to deploy only one task, and then wait for old to be killed in order to deploy the next one, without downtimes.
And biggest concern, what if I need new service and task, I need again to adjust all others in order to run a new one and deploy others, which is not very maintainable and what if I cannot lower CPU and memory more because I already reached the lowest point in order to run the task smoothly.
I was thinking about having another EC2 instance for the same cluster, that will handle bursts, deployments, and new services/tasks. But not sure if that's possible and if that's the best way of doing this. I was also thinking about Fargate, but this is much more expensive and I cannot afford it for now. What do you think? Any ideas, suggestions, and hints will be helpful since I am desperate to find the best way to avoid the problems mentioned above.
Thanks in advance!
So unfortunately, there is no out of the box solution to ensure that all your tasks run on min possible (i.e. one) instance. You can use our new feature called Capacity Providers (CP), which will allow you to ensure the minimum number of ec2 instances required to run all your tasks. The major difference between CP vs ASG is that CP gives more weight to task placement (where as ASG will scale in/out based on resource utilization which isn't ideal in your case).
However, it's not an ideal solution. Just as you said in your comment, when the service needs to scale out during a deployment, CP will spin up another instance, the new task will be placed on it and once it gets to Running state, the old task will be stopped.
But now you have an "extra" EC2 instance because there is no way to replace a running task. The only way I can think of would be to use a lambda function that drains the new instance, which will move all the service tasks to the other instance. CP will, after about 15 minutes, terminate this instance as there are no tasks are running on it.
A couple caveats:
CP are new, a little rough around the edges, and you can't
delete/modify them. You can only create or deactivate them.
CP needs an underlying ASG and they must have a 1-1 relationship
Make sure to enable managed scaling when creating CP
Choose 100% capacity target
Don't forget to add a default capacity strategy for the cluster
Minimizing EC2 instances used:
If you're using a capacity provider, the 'binpack' placement strategy minimises the number of EC2 hosts that are used.
However, there are some scale-in scenarios where you can end up with a single task running on its own EC2 instance. As Ali mentions in their answer; ECS will not replace this running task, but depending on your setup, it may be fairly easy for you to replace it yourself by configuring your task to voluntarily 'quit'.
In my case; I always have at least 2 tasks running per service. So I just added some logic to my tasks' healthchecks, so they report as unhealthy after ~6 hours. ECS will spot the 'unhealthy' task, remove it from the load balancer, and spin up a replacement (according to the binpack strategy).
Note: If you take this approach; add some variation to your timeout so you're less likely to have all of your tasks expire at the same time. Something like: expiry = now + timedelta(hours=random.uniform(5.5,6.5))
Sharing memory 'headspace' with soft-limits:
If you set both soft and hard memory limits; ECS will place your tasks based on the soft limit. If your tasks' memory usage varies with usage, it's fairly easy to get your EC2 instance to start swapping.
For example: Say you have a task defined with a soft limit of 900mb, and a hard limit of 1800mb. You spin up a service with 4 running instances. ECS provisions all 4 of these instances on a single t3.medium. Notice here that each instance thinks it can safely use up to 1800mb, when in fact there's very little free memory on the host server. When you hit your service with some traffic; each task tries to use some more memory, and your t3.medium is incapacitated as it starts swapping memory to disk. ECS does not recover from this type of failure very well. It notices that the task instances are no longer available, and will attempt to provision replacements, but the capacity provider is very slow to replace the swapping t3.medium.
My suggestion:
Configure your service to auto-scale based on memory usage (this will be a percentage of your soft-limit), for example: a target memory usage of 70%
Configure your tasks' healthchecks so that they report as unhealthy when they are nearing their soft-limit. This way, your tasks still have some headroom for quick spikes of memory usage, while giving your load balancer a chance to drain and gracefully replace tasks that are getting greedy. This is fairly easy to do by reading the value within /sys/fs/cgroup/memory/memory.usage_in_bytes.
Currently, I dont have AWS Load Balancer setup yet.
Request comes to a single ec2 instance: first hits nginx which then gets forwarded to node/express.
Now, I want to create an autoscaling group, and attach AWS load balancer to distribute the request that comes in. I am wondering if this is a good setup:
Request -> AWS Load Balancer -> Nginx A + EC2 A
-> Nginx B + EC2 B
-> ... C + ... C
Nginx is installed on the same EC2 that has node.js running on it. Nginx config has logic to detect user's location using the geoip module, as well as gzip compression configs and ssl handling.
I will also move the ssl handling to the load balancer.
Ideally (if Nginx can be decoupled from specific Node tasks) you'd want an auto scaling group dedicated to each service and I'd suggest using containerization for this because this is exactly what its meant for, though all this will obviously require some non-trivial changes to your program...
This will enable...
Efficient Resource Allocation
Select instance types with the ideal mix of CPU/RAM/Network/Storage per service (Node or Nginx)
Maintain granular control over the amount of tasks running relative their actual demand.
Intelligent Scaling
Thresholds set to initialize scaling actions need to reflect the resources they're running. You may not want to say, double your more compute intensive Node capacity, when there are spikes in simple read operations to your program. By segmenting the services by resources the thresholds can be tied to the resource your service demands the most of. You may want to scale...
Nginx based on maximum inbound requests over 1 minute period
Node based on average CPU Utilization over 5 minute period
The 'chunks' that your instances are broken into relative to the size of you tasks also makes a big difference in how efficient they will scale. Here's an exaggerated example, on just one service...
1 EC2 t3.large running 5 Node tasks # 50% RAM Utilization.
AS group Hits 70% or whatever thresholds you assigned, scales-out 1 instance
2 instances now running say 6 Node tasks # 30% RAM Utilization
This causes 2 problems...
You're now wasting a lot of money
Possibly more importantly... what is you scale-in threshold? 20% Utilization?
The tighter the gaps of you upper and lower scaling bounds the more efficient you'll be. When the tasks you're running are all homogenous you can add and remove in smaller and more precise 'chunks' of resources.
In the scenario above you'd ideally want something like...
3 t3.small instances running 5 Node tasks
AS group hits 70% Utilization, scales-out 1 instance
Now you have 6 tasks on 4 instances at 50% utilization
Utilization drops to 40% scale-in 1 instance.
You can obviously still do all of this running Node and Nginx on the same underlying resources, but the mathematics of it all gets pretty crazy and makes your system brittle.
(I've simplified the above to target Memory Utilization on the AS group, but in application you'd have ECS adding tasks based on Utillzation, which then adds to the memory Reservation of the cluster, which would then initiate the AS actions.)
Simplified & Efficient Deployment
You don't want to be redeploying your whole Node code base for every update to Nginx configuration.
Simplifies Blue/Green deployments testing and rollbacks.
Minimize the resources you have to spin up for the 'Blue' portion of you deployments.
Use customized AMIs with pre installed binaries if needed fro only the serivce dependent on them
Whether you want to do it immediately or not (and you will) this configuration will allow you to move to Spot Instances to handle more of your variable workloads. Like all of this, you can still use Spot Instances with the configuration you've laid out, but handling termination procedures efficiently and without disruptions is a whole other mess and when you get to that you want the rest of this very organized and working smoothly.
ECS
NLB
I don't know what you're using for deployment, but AWS CodeDeploy will work beautifully with ECS to manage you container clusters as well.
I've decided to start playing with AWS ECS service, and created cluster and one service my issue is that I want to connect it to the AWS auto scaling group.
I have followed the following guide.
The guide works, my issue is that its a total waste of money.
The guide says that I need to add machine when the total amount of CPU units that my services reserve is above 75, but in reality my services always reserve 100%
because I don't want waste money, also its pretty useless to put 3 nodejs tasks on 2 cpu machine, there is no hard-limit anyway.
I am breaking my head on it for few days now, I have no idea how to make them work together properly
EDIT:
Currently this is what happens:
CPU getting above 75%, Service scaling is created 2 new tasks on the same server, which means that now I have 1 instance with 4 tasks
Instance reservation is now 100%, Auto Scaling Group is creating new instance
Once the new instance is created, Service Scaling is removing 2 tasks from the old instance and adding 2 new tasks to the new instance
Its just me or this whole process looks like waste of time? is this how it really should be or (probably) i done something wrong?
I think you are missing a few insights.
For ECS autoscaling to work properly you also have to set up scaling on a ECS Service level.
Then, the scaling flow would look like this:
ECS Service is reaching 100% used CPU and has 100% reserved CPU
ECS Service scales by starting an additional task, making the reserved CPU total at 200%
Auto scaling group sees there is more Reserved capacity than available capacity and launches a new machine.
In addition, you can perfectly run multiple nodejes tasks on a 2 CPU machine. Especially in a micro service environment, these nodejs services can be quite small (128 CPU for example) and still run perfectly fine all together on the same host machine.
Eventually, I figured, what I want to do is not possible.
It is not possible to create a resource-optimized cluster with ECS like in Kubernetes.
(unless of course if you write some magic with lambda)
Service auto-scaling and auto-scaling groups don't work together, you can, however, make it work perfectly with fargate but its expansive, the main issue is that you don't have a trigger to cluster reservation above 100%