Task definition CPU reservation on AWS ECS EC2 - amazon-web-services

I am building my cluster on ECS while using EC2 instances. I am curious about specifying CPU reservation on my Task definitions. How does AWS manage my tasks inside of EC2 instances when i leave CPU reservation empty or write 0?
I have read this article: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
And here it says:
when you don’t specify any CPU units for a container, ECS intrinsically enforces two Linux CPU shares for the cgroup (which is the minimum allowed).
I am not really sure what this means and is it different for Tasks, because this is specifically stated for containers?

Cgroups are a feature of the Linux kernel that allow the distribution and hierarchy of services that run on your host.
This enables your containers to operate independently from each other (they will have access to a portion of the available CPU), whilst also providing the ability for higher priority tasks to gain access to the CPU if it required.
A CPU share defines how much of the overall CPU your container can have access to, as you add more containers this becomes a ratio of division between each container. Each of your containers in your case will get 2, if there are 4 containers this is a ratio of 0.25 of the available CPU per each one.
If you define in a task the limits you can cap the maximum of the resource on the host that can be used, of which the CPU shares will then be split in a ratio of. However, this will affect scheduling of new containers (if there is not enough resource available for the task and auto scaling is not enabled your chosen task cannot be scheduled).
There is some documentation here on cgroups, it is technical so if you have little experience of Linux it might be a little confusing.

Related

Increase vCPUS/RAM if needed

I have create a AWS EC2 instance to run a computation routine that works for most cases, however every now and then I get an user that needs to run a computation routine that crashes my program due to lack of RAM.
Is it possible to scale the EC2 instance's RAM and or vCPUs if required or if certain threshold (say when 80% of RAM is used) is reached. What I'm trying to avoid is keeping and unnecessary large instance and only scale resources when needed.
It is not possible to adjust the amount of vCPUs or RAM on an Amazon EC2 instance.
Instead, you must:
Stop the instance
Change the Instance Type
Start the instance
The virtual machine will be provisioned on a different 'host' computer that has the correct resources matched to the Instance Type.
A common approach is to scale the Quantity of instances to handle the workload. This is known as horizontal scaling and works well where work can be distributed amongst multiple computers rather than making a single computer 'bigger' (which is 'Vertical Scaling').
The only exception to the above is when using Burstable performance instances - Amazon Elastic Compute Cloud, which are capable of providing high amounts of CPU but only for limited periods. This is great when you have bursty needs (eg hourly processing or spiky workloads) but should not be used when there is a need for consistent high workloads.

AWS Fargate and its memory management

From AWS documentation I can see that CPU and Memory properties are required in the AWS::ECS::TaskDefinition for Fargate but not in the ContainerDefinition within the resource if Fargate is used.
How does this exactly work? If I do not specify it in the ContainerDefinition it will use as much resources the Task have available? If there is only one container within the task... does it make any sense defining those values? If they are required, it seems pretty redundant and verbose to me.
When you register a task definition, you can specify the total cpu and memory used for the task. This is separate from the cpu and memory values at the container definition level.
If using the Fargate launch type, these task definition fields are required and there are specific values for both cpu and memory that are supported. This will be a hard limit of CPU/Memory to present to the task. For example, if your task is configured to use 1 vCPU and 2 GB of memory, so at the moment the memory limit is 2 GB. If at any moment the task memory utilization exceed the 2 GB, the task will terminate with OutOfMemory error.
Task size: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#task_size
You can also specify cpu and memory resource on the container level. This will be the amount of resources to present to the container (a Task can have multiple containers). If your container attempts to exceed the resource specified here, the container is killed. These fields are optional for tasks using the Fargate launch type, and the only requirement is that the total amount of cpu and memory reserved for all containers within a task be lower than the task-level cpu and memory value, if one is specified.
On a container level, the Docker daemon reserves a minimum of 4 MiB of memory for a container, so you should not specify fewer than 4 MiB of memory for your containers.
Standard Container Definition Parameters: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#standard_container_definition_params
When a container has no specified limits in a TaskDefinition, the container will use all the available resources for the task, which are mandatory for a Fargate task.
This means that there is no need to define them if there is only one container in the TaskDefinition. They can be specified though they are redundant (if equal to those of the task itself) or even harmful (if they are lower than the amount given to the task).
In case more than once container belongs to the same task, Fargate will distribute evenly the resources among all the containers. This may (or not) be a desired behaviour.

Capacity planning on AWS

I need some understanding on how to do capacity planning for AWS and what kind of infrastructure components to use. I am taking the below example.
I need to setup a nodejs based server which uses kafka, redis, mongodb. There will be 250 devices connecting to the server and sending in data every 10 seconds. Size of each data packet will be approximately 10kb. I will be using the 64bit ubuntu image
What I need to estimate,
MongoDB requires atleast 3 servers for redundancy. How do I estimate the size of the VM and EBS volume required e.g. should be m4.large, m4.xlarge or something else? Default EBS volume size is 30GB.
What should be the size of the VM for running the other application components which include 3-4 processes of nodejs, kafka and redis? e.g. should be m4.large, m4.xlarge or something else?
Can I keep just one application server in an autoscaling group and increase as them as the load increases or should i go with minimum 2
I want to generally understand that given the number of devices, data packet size and data frequency, how do we go about estimating which VM to consider and how much storage to consider and perhaps any other considerations too
Nobody can answer this question for you. It all depends on your application and usage patterns.
The only way to correctly answer this question is to deploy some infrastructure and simulate standard usage while measuring the performance of the systems (throughput, latency, disk access, memory, CPU load, etc).
Then, modify the infrastructure (add/remove instances, change instance types, etc) and measure again.
You should certainly run a minimal deployment per your requirements (eg instances in separate Availability Zones for High Availability) and you can use Auto Scaling to add extra capacity when required, but simulated testing would also be required to determine the right triggers points where more capacity should be added. For example, the best indicator might be memory, or CPU, or latency. It all depends on the application and how it behaves under load.

Amazon ECS Task Definition - CPU units & Memory - set container to use 100% of the EC2 available Resources

I'd like to have multiple different services running on an ECS cluster, each service should be running on a single EC2 instance. The EC2 instances type for all services are the same. And I would like those services to use all their hosting EC2 available resources.
I have the assumption that if i use only the soft memory parameter (without using the hard one ) in the Task Configuration, this will allow my container instance to use all the available memory on the EC2 instance hosting it and that i won't be limiting. Is that correct?
As for the EC2 type (t2.micro [vCPU=1, Memory=1Gib] for example) !! is it possible to simply put:
{
...
"memory": 1024,
"cpu": 1024,
...
}
Since the EC2 should be already set up with a bunch of Container Service Requirements.
Is it correct that you're trying to have each ECS Instance handle only a single task per instance?
The short answer to your question is, no. Usually the amount of memory made available to your containers is a bit less than the amount of memory available on the machine itself. This is so that the operating system has enough memory to keep running. From my experience, a T2.Small, which has 2048 MB of memory will end up with 2004 MB available for containers.
When it comes to your task definition, there are two ways of specifying Memory. The memory setting is a hard limit. If the containers memory usage hits this amount, the container will be terminated. If on the other hand, you specify memoryReservation, that much memory will be reserved for the task, but it can use more, up to the total amount of the machine. Check out the Task Definition documentation for further details.
An important consideration here is that only one of memory and memoryReservation are required. If both are used, memoryReservation should be less than memory. If you are only going to specify one of these, I'd recommend memoryReservation, as it will allow your task to use up to the total memory on the machine. If both are used, the memoryReservation will be used in calculating the amount of memory consumed by a task.
When placing tasks on an instance, it looks at the amount of available memory, that is the registered amount of memory for the instance, minus any tasks already placed on it. If this number is less than the amount of memory required for a task, no task will be placed on it. If no instance has enough memory for the task, it will not be placed, and the error will be logged in the Services Events log.
So it's important to look at the amount of memory actually registered by your instance type, and then ensure your memory or memoryReservation are lower than the amount registered by your instances. Otherwise, your tasks will never be placed.
As for cpu, this value is not required, and if not specified, all tasks on an instance are allowed an equal portion of the CPU available on the system. If only one task is on the instance, it can use the entire CPU of the instance by default.

Confusion about instances used inside a Amazon Ec2 Container Service

When a Ec2 Container Engine cluster is created, it creates a Compute Engine managed instance group to manage the created instances. These instances are from Ec2 service, which means, they are Virtual machines.
But we know that containers represent a new way to deploy containers based on operating-system-level virtualization rather than hardware virtualization
like VMs that are heavyweight and non-portable, isn't a contradiction? correct me if I'm wrong.
We use containers because they are extremely fast (either in boot time or tasks execution) compared to VMs, and they save a lot of space storage. So if we have one node(vm) that can supports 4 containers max, our clients can rapidly lunch 4 containers, but beyond this number, Ec2 autoscaler will need to lunch a new node(vm) to support upcoming containers, which incurs some tasks delay.
Is it impossible to launch containers over physical machines?
And what do you recommend for running critical time execution tasks?
I believe you are working under an erroneous assumption that ECS scales the virtual machines ("container instances" -- the instances where containers will run) directly with task demand.
If that were true, you would have a point, because the cluster would be sluggish and unresponsive any time insufficient container instance resources were not immediately available.
ECS doesn't do that, the presence of the Auto Scaling Group notwithstanding.
Depending on the Amazon EC2 instance types you use in your clusters, and quantity of container instances you have in a cluster, your tasks have a limited amount of resources that they can use when they are run. ECS monitors the resources available in the cluster to work with the schedulers to place tasks. If your cluster runs low on any of these resources, such as memory, you will eventually be unable to launch more tasks until you add more container instances, reduce the number of desired tasks in a service, or stop some of the running tasks in your cluster to free up the constrained resource. (emphasis added)
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_alarm_autoscaling.html
So, no... it doesn't launch the new tasks slowly when you are out of capacity. It doesn't launch them at all.
But don't get ahead of me.
The link above explains, with examples, how scaling of the virtual machines (container instances) is designed to actually work.
Of course, you don't have to make them adaptively scalable at all. You can go with your physical server model (note: I say physical server model -- meaning a fixed, inelastic pool of resources, on always-running virtual machines, since virtual machines is what EC2 provides), and just choose how many instances you wait to have running at all times, essentially emulating physical servers. If you wanted, say, 8 container instances, the "auto scaling group" would maintain exactly 8 at all times, creating replacements if, say, one of them experienced a hardware failure. That "auto" accomplishment would be maintaining the status quo. And, of course, in this configuration, you could manually reconfigure from 8 to, say, 12 and the "auto" accomplishment would be that you'd automatically get 4 new ones to add to the existing 8.
But the idea of how the service is ideally used is that your group of virtual machines scales up and down by rules you devise, to anticipate the resources needed by future tasks -- or a future lack of tasks.
In the example given, memory reservation is the trigger:
When the memory reservation of your cluster rises above 75% (meaning that only 25% of the memory in your cluster is available to for new tasks to reserve), the alarm triggers the Auto Scaling group to add another instance and provide more resources for your tasks and services.
It triggers the addition of more container instances so that you always have whatever you have determined to be the appropriate threshold of surplus capacity already online by the time you need it.
Of course, memory is just one resource, and 75% is just an arbitrary threshold chosen for the example.
Auto Scaling Groups can scale on a variety of triggers -- the phrase of the moon, the price trends in the stock market, whatever is appropriate to anticipating your desired amount of surplus capacity and can be quantified and monitored can be used... but this service does not scale itself directly by the actual attempt to launch a new task when the task can't be launched due to insufficient resources.
Herein lies the flaw in your original argument.
Why virtual machines? Simply enough, because when you destroy a virtual machine because the capacity is not expected to be needed, you stop paying for it.
In this light, perhaps you'll agree that this is not a weakness, it's a strength. Physical servers never stop costing you when you are not using them.
You don't need to pay anything at all for capacity you will not be needing with VMs -- you only have to pay for the capacity you're using plus the amount you need to keep immediately available to handle anticipated demand.
You can have as much idle surplus immediately ready as you are willing to pay for, or you can maximize savings by allowing as little surplus capacity as you are comfortable with being able to access without delay.