Confusion about instances used inside a Amazon Ec2 Container Service - amazon-web-services

When a Ec2 Container Engine cluster is created, it creates a Compute Engine managed instance group to manage the created instances. These instances are from Ec2 service, which means, they are Virtual machines.
But we know that containers represent a new way to deploy containers based on operating-system-level virtualization rather than hardware virtualization
like VMs that are heavyweight and non-portable, isn't a contradiction? correct me if I'm wrong.
We use containers because they are extremely fast (either in boot time or tasks execution) compared to VMs, and they save a lot of space storage. So if we have one node(vm) that can supports 4 containers max, our clients can rapidly lunch 4 containers, but beyond this number, Ec2 autoscaler will need to lunch a new node(vm) to support upcoming containers, which incurs some tasks delay.
Is it impossible to launch containers over physical machines?
And what do you recommend for running critical time execution tasks?

I believe you are working under an erroneous assumption that ECS scales the virtual machines ("container instances" -- the instances where containers will run) directly with task demand.
If that were true, you would have a point, because the cluster would be sluggish and unresponsive any time insufficient container instance resources were not immediately available.
ECS doesn't do that, the presence of the Auto Scaling Group notwithstanding.
Depending on the Amazon EC2 instance types you use in your clusters, and quantity of container instances you have in a cluster, your tasks have a limited amount of resources that they can use when they are run. ECS monitors the resources available in the cluster to work with the schedulers to place tasks. If your cluster runs low on any of these resources, such as memory, you will eventually be unable to launch more tasks until you add more container instances, reduce the number of desired tasks in a service, or stop some of the running tasks in your cluster to free up the constrained resource. (emphasis added)
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_alarm_autoscaling.html
So, no... it doesn't launch the new tasks slowly when you are out of capacity. It doesn't launch them at all.
But don't get ahead of me.
The link above explains, with examples, how scaling of the virtual machines (container instances) is designed to actually work.
Of course, you don't have to make them adaptively scalable at all. You can go with your physical server model (note: I say physical server model -- meaning a fixed, inelastic pool of resources, on always-running virtual machines, since virtual machines is what EC2 provides), and just choose how many instances you wait to have running at all times, essentially emulating physical servers. If you wanted, say, 8 container instances, the "auto scaling group" would maintain exactly 8 at all times, creating replacements if, say, one of them experienced a hardware failure. That "auto" accomplishment would be maintaining the status quo. And, of course, in this configuration, you could manually reconfigure from 8 to, say, 12 and the "auto" accomplishment would be that you'd automatically get 4 new ones to add to the existing 8.
But the idea of how the service is ideally used is that your group of virtual machines scales up and down by rules you devise, to anticipate the resources needed by future tasks -- or a future lack of tasks.
In the example given, memory reservation is the trigger:
When the memory reservation of your cluster rises above 75% (meaning that only 25% of the memory in your cluster is available to for new tasks to reserve), the alarm triggers the Auto Scaling group to add another instance and provide more resources for your tasks and services.
It triggers the addition of more container instances so that you always have whatever you have determined to be the appropriate threshold of surplus capacity already online by the time you need it.
Of course, memory is just one resource, and 75% is just an arbitrary threshold chosen for the example.
Auto Scaling Groups can scale on a variety of triggers -- the phrase of the moon, the price trends in the stock market, whatever is appropriate to anticipating your desired amount of surplus capacity and can be quantified and monitored can be used... but this service does not scale itself directly by the actual attempt to launch a new task when the task can't be launched due to insufficient resources.
Herein lies the flaw in your original argument.
Why virtual machines? Simply enough, because when you destroy a virtual machine because the capacity is not expected to be needed, you stop paying for it.
In this light, perhaps you'll agree that this is not a weakness, it's a strength. Physical servers never stop costing you when you are not using them.
You don't need to pay anything at all for capacity you will not be needing with VMs -- you only have to pay for the capacity you're using plus the amount you need to keep immediately available to handle anticipated demand.
You can have as much idle surplus immediately ready as you are willing to pay for, or you can maximize savings by allowing as little surplus capacity as you are comfortable with being able to access without delay.

Related

ECS clarify on resources

I'm having trouble understanding the config definitions of a task.
I want to understand the resources. There are a few options (if we talk only about memory):
memory
containerDefinitions.memory
containerDefinitions.memoryReservation
There are a few things I'm not sure about.
First of all, the docs say that when the hard limit is exceeded, the container will stop running. Isn't the goal of a container orchestration service to keep the service alive?
Root level memory must be greater than all containers memory. In theory I would imagine once there aren't enough containers deployed, new containers are created for the image. I wouldn't like to use more resources than I need, but if I reserve the memory on root level, first, I do reserve much more than needed, and second, if my application receives a huge load, the whole cluster will shut down if the memory limit is exceeded or what?
I want to implement a system that auto-scales, and I would imagine that this way I don't have to define resources allocated, it just uses the amount needed, and deploys/kills new containers if the load increases/decreases.
For me there are a lot of confusion around ECS, and Fargate, and how it works, how it scales, and the more I read about it, the more confusing it gets.
I would like to set the minimum amount of resources per container, at how much load to create a new container, and at how much load to kill one (because it's not needed anymore).
P.S. not experienced in devops in general, I used kubernetes at my company, and there are things I'm not clear about, just learning this ECS world.
First of all, the docs say that when the hard limit is exceeded, the container will stop running. Isn't the goal of a container orchestration service to keep the service alive?
I would say the goal of a container orchestration service is to deploy your containers, and restart them if they fail for some reason. A container orchestration service can't magically add RAM to a server as needed.
I want to implement a system that auto-scales, and I would imagine that this way I don't have to define resources allocated, it just uses the amount needed, and deploys/kills new containers if the load increases/decreases.
No, you always have to define the amount of RAM and CPU that you want to reserve for each of your Fargate tasks. Amazon charges you by the amount of RAM and CPU you reserve for your Fargate tasks, regardless of what your application actually uses, because Amazon is having to allocate physical hardware resources to your ECS Fargate task to ensure that much RAM and CPU are always available to your task.
Amazon can't add extra RAM or CPU to a running Fargate task just because it suddenly needs more. There will be other processes, of other AWS customers, running on the same physical server, and there is no guarantee that extra RAM or CPU are available on that server when you need it. That is why you have to allocate/reserve all the CPU and RAM resources your task will need at the time it is deployed.
You can configure autoscaling to trigger on the amount of RAM your tasks are using, to start more instances of your task, thus spreading the load across more tasks which should hopefully reduce the amount of RAM being used by each of your individual tasks. You have to realize each of those new Fargate task instances created by autoscaling are spinning up on different physical servers, and each one is reserving a specific amount of RAM on the server they are on.
I would like to set the minimum amount of resources per container, at how much load to create a new container, and at how much load to kill one (because it's not needed anymore).
You need to allocate the maximum amount of resources all the containers in your task will need, not the minimum. Because more physical resources can't be allocated to a single task at run time.
You would configure autoscaling with the target value, of for example 60% RAM usage, and it would automatically add more task instances if the average of the current instances exceeds 60%, and automatically start removing instances if the average of the current instances is well below 60%.

Is there a way to specify a task placement strategy such that a new task is placed on the instance having most available memory?

I have multiple EC2 instances reserved in the ECS cluster. Currently, the tasks are placed by spread. This sometimes causes an issue when certain instances have very little memory remaining, while the other instances have a lot of it.
Is there a way to create a placement strategy to place new tasks on the instance with the most available memory?
According to the documentation you can give the scheduler three kinds of strategies:
Amazon ECS supports the following task placement strategies:
binpack
Tasks are placed on container instances so as to leave the
least amount of unused CPU or memory. This strategy minimizes the
number of container instances in use.
When this strategy is used and a scale-in action is taken, Amazon ECS
will terminate tasks based on the amount of resources that will be
left on the container instance after the task is terminated. The
container instance that will have the most available resources left
after task termination will have that task terminated.
random Tasks are placed randomly.
spread Tasks are placed evenly based on the specified value. Accepted
values are instanceId (or host, which has the same effect), or any
platform or custom attribute that is applied to a container instance,
such as attribute:ecs.availability-zone. Service tasks are spread
based on the tasks from that service. Standalone tasks are spread
based on the tasks from the same task group.
When this strategy is used and a scale-in action is taken, Amazon ECS
will select tasks to terminate that maintains a balance across
Availability Zones. Within an Availability Zone, tasks will be
selected at random.
What you seem to need seems like the opposite of binpack, so you might need to create a custom scheduler.
Alternatively you could increase the required memory in your task definitions with the memoryReservation parameter. That way your containers should get as much memory as they need for their operations.

AWS ECS Scaling based on memoryreservation

I've been given a AWS environment to look after and it runs ECS on EC2 instances and has scaling configured using ECS Memory Reservation. The system was originally running before Cluster Autoscaling was made generally available so it's just using a cloudwatch metric to scale out and scale in. As far as I can work out it is following a basic AWS design.
The EC2 has an autoscaling group and allows scale from 1 to 5 instances with 1 being the desired state.
There is 1 cluster service running with 6 tasks configured.
5 of those tasks are configured to run up to 2 copies of the task maximum and 1 the desired, the other is set to maximum of 1.
The tasks have MemoryReservation (soft limit) figures configured but not Memory (hard limit).
The tasks are primarily running Java.
The highest memory reservation is set at about 200MB and most are around this figure.
The scale out rule is based on MemoryReservation at 85%.
Docker stats shows most of the tasks are running about 300MB and some exceed 600MB.
The instance size has 4GB of RAM.
If the maximum reservation is 2GB, even if the tasks are consuming more like 3GB in reality, am I right in believing that the scale out rule will NEVER be invoked because 2GB is 50% of available RAM? Do I need to increase the memory reservations to something more realistic?
Also if it is only running a single EC2 instance am I right in thinking even if I increased the MemoryReservation figures to something more realistic, just because there's no theoretical room to start another task it won't spin up a second EC2 instance automatically? Just picked this up from different articles I've been reading when searching.
Thanks
After the update of Capacity Providers in May 2022, Capacity Providers still have a gap to fill in Memory scaling.
As per the OP "ECS Memory Reservation" seems not to even be an option any more (at least in the web console)
And when creating the Capacity Provider, only the target value is configurable.
There are more details into how this Capacity is calculated in this blog, but while it mentions:
This calculation accounts for vCPU, memory, ENI, ports, and GPUs of the tasksĀ andĀ the instances
If you have tasks that not necessarily grow memory consumption, but you have a service with scheduled actions configured to scale tasks (eg: minimum tasks at different times of day)
This case will not trigger a scale out, since the memory in the instances does not get to be used if the tasks simply does not fit in, due to its configuration and you will see errors (in the service events) like:
service myservice was unable to place a task because no container
instance met all of its requirements. The closest matching
container-instance abc123xxxx has insufficient memory available.
This basically mean a scheduled task scaling change may not happen if the task memory setting is just big enough so it doesn't fit in the running instances, and the CapacityProviderReservation does not change because the calculation is only done when tasks are in Provisioned state, which does not happen in this case.
Possible workarounds
Decrease the Capacity Reservation. This basically means "to have spare capacity", ie: by default Reservation is 100 (%) so it tries to use the ASG cluster resources as much as possible, so having a number less than 100, means it will scale out when the cluster is used at that capacity therefore having a margin spare of resources at all times, which means new scheduled tasks will fit in, as long as the spare is enough (eg: calculate per task memory reservation and cluster memory reservation of all expected running tasks)
Setup ASG rules for scaling that match the service scaling rules.
While possible, this may be bound for problems with timing and auto scaling due to other triggers.
A few things:
Cluster AutoScaling usually is just the term ECS uses for "An AutoScaling Group that launches instances into the cluster", and it sounds like that's what you are currently using. Capacity Providers are a newer feature where ECS more directly manages the ASG, which might be the newer feature you're thinking onf
'Desired Capacity' isn't a state that you set for where you want the group to be, its the current amount of capacity that AutoScaling wants there to be in the group. So if a scaling policy goes off and says +1, the desired will change to 2, and then AutoScaling will try to launch an instance since you presumably only had 1 before (since the desired was 1 before)
Memory reservation is based on that 2GB's reserved, so it doesn't mater how much is in use for scaling purposes. This is importaint because even if you had 6/8GB reserved (from 3 2GB tasks), but 7.5Gb in use, ECS would still allow another task to be launched, since there's still 2 reservable GBs
Because of 3) you should probably increase the reservation value, wouldn't want an instance to get overloaded. Java can be nasty about RAM issues. This would also help with your scale out threshold issue.
For your second question, scaling will only happen after the cloudwatch alarm is triggered. So if the metric never goes above that threshold, alarm can't trigger the scaling policy. There are a whole host of cases where just because the alarm triggers, scaling won't happen (more of them for scaling in than scaling out, but it can still happen on scale out too); but the alarm going into the Alarm state is definitely a required step.

Problems with Memory and CPU limits in AWS ECS cluster running on reserved EC2 instance

I am running the ECS cluster that currently has 3 services running on T3 medium instance. Each of those services is running only one task which has a soft memory limit of 1GB, the hard limit is different for each (but that should not be the problem). I will always have enough memory to run one, new deployed task (new one will also take 1GB, and T3 medium will be able to handle it since it has 4GB total). After the new task is up and running, the old one will be stopped and I will have again 1GB free for the new deployment. I did similar to the CPU (2048 CPU, each task has 512, and 512 free for new deployments).
So everything runs fine now, but I am not completely satisfied with this setup for the future. What will happen if I need to add another service with another task? I need to deploy all existing tasks and to modify their task definitions to use less CPU and memory in order to run this new task (and new deployments). I am planning to get a reserved EC2 instance, so it will not be easy to swap the current EC2 instance with the larger one.
Is there a way to spin up another EC2 instance for the same ECS cluster to handle bursts in my tasks? Also deployments, it's not a perfect scenario to have the ability to deploy only one task, and then wait for old to be killed in order to deploy the next one, without downtimes.
And biggest concern, what if I need new service and task, I need again to adjust all others in order to run a new one and deploy others, which is not very maintainable and what if I cannot lower CPU and memory more because I already reached the lowest point in order to run the task smoothly.
I was thinking about having another EC2 instance for the same cluster, that will handle bursts, deployments, and new services/tasks. But not sure if that's possible and if that's the best way of doing this. I was also thinking about Fargate, but this is much more expensive and I cannot afford it for now. What do you think? Any ideas, suggestions, and hints will be helpful since I am desperate to find the best way to avoid the problems mentioned above.
Thanks in advance!
So unfortunately, there is no out of the box solution to ensure that all your tasks run on min possible (i.e. one) instance. You can use our new feature called Capacity Providers (CP), which will allow you to ensure the minimum number of ec2 instances required to run all your tasks. The major difference between CP vs ASG is that CP gives more weight to task placement (where as ASG will scale in/out based on resource utilization which isn't ideal in your case).
However, it's not an ideal solution. Just as you said in your comment, when the service needs to scale out during a deployment, CP will spin up another instance, the new task will be placed on it and once it gets to Running state, the old task will be stopped.
But now you have an "extra" EC2 instance because there is no way to replace a running task. The only way I can think of would be to use a lambda function that drains the new instance, which will move all the service tasks to the other instance. CP will, after about 15 minutes, terminate this instance as there are no tasks are running on it.
A couple caveats:
CP are new, a little rough around the edges, and you can't
delete/modify them. You can only create or deactivate them.
CP needs an underlying ASG and they must have a 1-1 relationship
Make sure to enable managed scaling when creating CP
Choose 100% capacity target
Don't forget to add a default capacity strategy for the cluster
Minimizing EC2 instances used:
If you're using a capacity provider, the 'binpack' placement strategy minimises the number of EC2 hosts that are used.
However, there are some scale-in scenarios where you can end up with a single task running on its own EC2 instance. As Ali mentions in their answer; ECS will not replace this running task, but depending on your setup, it may be fairly easy for you to replace it yourself by configuring your task to voluntarily 'quit'.
In my case; I always have at least 2 tasks running per service. So I just added some logic to my tasks' healthchecks, so they report as unhealthy after ~6 hours. ECS will spot the 'unhealthy' task, remove it from the load balancer, and spin up a replacement (according to the binpack strategy).
Note: If you take this approach; add some variation to your timeout so you're less likely to have all of your tasks expire at the same time. Something like: expiry = now + timedelta(hours=random.uniform(5.5,6.5))
Sharing memory 'headspace' with soft-limits:
If you set both soft and hard memory limits; ECS will place your tasks based on the soft limit. If your tasks' memory usage varies with usage, it's fairly easy to get your EC2 instance to start swapping.
For example: Say you have a task defined with a soft limit of 900mb, and a hard limit of 1800mb. You spin up a service with 4 running instances. ECS provisions all 4 of these instances on a single t3.medium. Notice here that each instance thinks it can safely use up to 1800mb, when in fact there's very little free memory on the host server. When you hit your service with some traffic; each task tries to use some more memory, and your t3.medium is incapacitated as it starts swapping memory to disk. ECS does not recover from this type of failure very well. It notices that the task instances are no longer available, and will attempt to provision replacements, but the capacity provider is very slow to replace the swapping t3.medium.
My suggestion:
Configure your service to auto-scale based on memory usage (this will be a percentage of your soft-limit), for example: a target memory usage of 70%
Configure your tasks' healthchecks so that they report as unhealthy when they are nearing their soft-limit. This way, your tasks still have some headroom for quick spikes of memory usage, while giving your load balancer a chance to drain and gracefully replace tasks that are getting greedy. This is fairly easy to do by reading the value within /sys/fs/cgroup/memory/memory.usage_in_bytes.

How to get consistent CPU utilization on AWS

I've now learnt that when I start a new EC2 instance it has a certain number of CPU credits due to which it's performance is high when it starts processing but gradually reduces over time as the credits run out. Past that point, the instance runs at which appears to be the baseline CPU utilisation rate. To numerate, when I started the EC2 instance (t2.nano), Cloudwatch reported around 80% CPU utilisation gradually decreasing down to 5%.
Now I'm happy to use one of the better instance types pending the instance limit request. But whilst that is in progress, I'd like to know whether the issue of reducing performance over time will still hold even with the better instance type?
Would I require a dedicated host setup if I wish to ensure I get consistent CPU utilisation? The only problem I can see here is that I'm running a SQS worker queue and Elastic Beanstalk allows us to easily setup a worker environment which reads messages from the queue. From what I've read and from looking at the configuration options available in Elastic Beanstalk, I don't think I'll be able to launch instances into a dedicated host directly. Most of my reading has lead me to believe that I'll have to learn how to use a VPC. Would that be correct?
So I guess my questions are - would simply increasing the instance type to a more powerful instance guarantee consistent CPU utilisation performance or is a dedicated host required and if so, is it possible to set up one with Elastic Beanstalk or would it have to be setup manually and if it is set up manually can it be configured to work with an SQS queue automatically?
If you want consistent CPU performance, you should avoid the burstable performance instances (the T2 family). All other families of instances (M5, C5, etc) will have consistent CPU performance over time. You can use any instance family with Elastic Beanstalk. No need for a dedicated host.