Get mem and cpu usage from AWS fargate task - amazon-web-services

What APIs are available for tasks running under ECS fargate service to get their own memory and CPU usage?
My use case is for load shedding / adjusting: task is an executor which retrieves work items from a queue and processes in parallel. If load is low it should take on more tasks, if high - shed or take on less tasks.

You can look at Cloudwatch Container Insights. Container Insights reports CPU utilization relative to instance capacity. So if the container is using only 0.2 vCPU on an instance with 2 CPUs and nothing else is running on the instance, then the CPU utilization will only be reported as 10%.
Whereas Average CPU utilization is based on the ratio of CPU utilization relative to reservation. So if the container reserves 0.25 vCPU and it's actually using 0.2 vCPU, then the average CPU utilization (assuming a single task) is 80%. More details about the ECS metrics can be found here

You can get those metrics in CloudWatch by enabling Container Insights. Note that there is an added cost for enabling that.

Related

Openshift K8s cluster CPU and memory metrics issues

We have an AWS hosted openshift cluster running, at the moment we have 10 worker nodes and 3 control planes. We are using new relic as the monitoring platform. Our problem is as follows. Overall cluster resources are low, that is
CPU usage - average 25%
Memory usage - 37%.
But under load, the metrics shows some nodes are fully occupied and are at max CPU and memory usage while others are not and still overall cluster resource usage is low
We have a feeling that we have too much compute resources over provisioned. We have actually noted the same using AWS compute optimizer
How do we make the cluster resource utilization to be optimal like overall utilization of above 70% .
Why are some worker nodes being utilized to the maximum while others are seriously under utilized.
Any links to k8s cluster optimization will be appreciated
Using node toleration to assign some workloads to certain worker nodes

Discrepancy between CPU utilization of ECS service and EC2 instance

I'm seeing some discrepancy between ECS service and EC2 in terms of CPU utilization metrics.
We have an EC2 instance of t2 small type and two different ECS containers running inside it. I have allocated 512 CPU units for one container and 128 CPU units for the other container. Here, the problem is that the CPU utilization goes up to > 90% and the following is the screenshot,
While the CPU utilization of underlying EC2 is not even greater than 40% and the following is the screenshot,
What could be the reason for this discrepancy? What could have been gone wrong?
Well, if you assign CPU units to your containers, CloudWatch will report the CPU usage in relation to the available CPU capacity. Your container with 512 CPU units has access to 0.5 vCPUs and the one with 128 Units has access to 0.125 of a vCPU, which is not a lot, so a high utilization of those is easy to achieve.
Since the CPU utilization of the t2.small, which has about 1 vCPU (ignoring the credit/bursting system for now) is hovering around 20%, my guess is that the first graph is from the smaller container.

AWS Fargate Prices Tasks

I have set up a Task Definition with CPU maximum allocation of 1024 units and 2048 MiB of memory with Fargate being the launch type. When I looked at the costs it was way more expensive than I thought ($ 1.00 per day or $ 0.06 per hour [us-east-1]). What I did was to reduce to 256 units and I am waiting to see if the costs goes down. But How does the Task maximum allocation work? Is the task definition maximum allocation responsible for Fargate provisioning a more powerfull server with a higher cost even if I dont use 100%?
The apps in containers running 24/7 are NestJS application + apache (do not ask why) + redis and I can see that it has low CPU usage but the price is too high for me. Is the fargate the wrong choice for this? Should I go for EC2 instances with ECS?
When you run a task, Fargate provisions a container with the resources you have requested. It's not a question of "use up to this maximum CPU and memory," but rather "use this much CPU and memory." You'll pay for that much CPU and memory for as long as it runs, as per the AWS Fargate pricing. At the current costs, the CPU and memory you listed (1024 CPU units, 2048MiB), the cost would come to $0.04937/hour, or $1.18488/day, or $35.55/month.
Whether Fargate is the right or wrong choice is subjective. It depends what you're optimizing for. If you just want to hand off a container and allow AWS to manage everything about how it runs, it's hard to beat ECS Fargate. OTOH, if you are optimizing for lowest cost, on-demand Fargate is probably not the best choice. You could use Fargate Spot ($10.66/month) if you can tolerate the constraints of spot. Alternatively, you could use an EC2 instance (t3.small # $14.98/month), but then you'll be responsible for managing everything.
You didn't mention how you're running Redis which will factor in here as well. If you're running Redis on Elasticache, you'll incur that cost as well, but you won't have to manage anything. If you end up using an EC2 instance, you could run Redis on the same instance, saving latency and expense, with the trade off that you'll have to install/operate Redis yourself.
Ultimately, you're making tradeoffs between time saved and money spent on managed services.

AWS EC2 instance cost far above estimate, why?

I have a script that I run 24/7 that uses 90-100% CPU constantly. I am running this script in multiple virtual machines from Google Cloud Platform. I run one script per VM.
I am trying to reduce cost by using AWS EC2. I looked at the price per hour of t3-micro (2 vCPU) instances and it says the cost is around $0.01/h, which is cheaper than the GCP's equivalent instance with 2 vCPU.
Now, I tried to run the script in one t3-micro instance, just to have a real estimate of how much each t3-instance running my script will cost. I was expecting the monthly cost per instance to be ~$7.20 (720h/month * $0.01/h). The thing is that I have been running the script for 2-3 days, and the cost reports already show a cost of more than $4.
I am trying to understand why the cost is so far from my estimate (and from AWS monthly calculator's estimate). All these extra cost seem to be from "EC2 Other" and "CPU Credit", but I don't understand these costs.
I suspect these come from my 24-7 full CPU usage, but could someone explain what are these costs and if there is a way to reduce them?
The EC2 instance allows a certain baseline CPU usage: 10% for a t3.micro. When the instance is operating below that threshold it accumulates vCPU credits: which are applied to usage above the threshold. A t3.micro can accumulate up to 12 credits an hour (with one credit being equal to 100% CPU ulitilisation for 1 minute). If you are regularly using more CPU credits than the instance allows will be charged at a higher rate: which I understand to be 5c per vCPU hour.
It may be that t3.micro is not your best choice for that type of workload and you may need to select a different instance type or a bigger instance.
The purple in your chart is CPU credits, not instance usage.
Looks like you enabled “T2/T3 Unlimited” when launching your instance and your script is causing it to bursting beyond the provided capacity. When you burst beyond the baseline capacity, you’re charged for that usage at the prevailing rate. You can read more about T2/T3 Unlimited and burstable performance here.
To bring these costs down, disable T2/T3 unlimited by following instructions here.

AWS ElastiCache for Redis Engine CPU Utilization metrics, how to interpret?

We are using AWS ElastiCache for Redis for our application, and we need some help in understanding the metrics. During high load, we saw a CPU utilization of 30%, But Engine CPU Utilization was showing almost 80%. Could someone please elaborate on the difference between these metrics and what are the optimum limits for those metrics for a better performance.
Thanks in advance.
Now I got a better understanding of both the metrics. When it is CPU Utilization, it is total cpu utilization of that system. And Engine Utilization is specific to the redis process thread which handles all the redis queries. So in a system with 4 cores, as we all know redis processing happens in a single thread, only one core will be used by the redis for processing the queries. So in that case the maximum CPU Utilization by redis will be 25 %.
The engine CPU utilization show you the entire value of the CPU resources being consumed by the host. Whereas the Engine CPU utilization shows you the value of the CPU resource consumed for a particular core.
In this case as Redis is single thread and assuming that there are two cores. If the threshold for CPU utilization is 90% then the actual threshold per core would 90/2 or 45%.
For reference, you can check out: https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.WhichShouldIMonitor.html