I have an EMR Cluster that I created back in April 2020 with 1 master(on-demand), 1 core (spot) and multiple Task nodes (spot). I have been using it actively and things had been going well until last few days ago. For some reason, the cluster has gone into "Waiting" mode as it is trying to find spot instances for the Core node. I have the provisioning set as "After 300 minutes, shift to on-demand instances). I see the status "resizing" for Core node.
I don't know what to do next. I am on Basic Support with AWS. I would really like not to terminate this cluster and rebuild it as I spent a lot of time putting my personal configuration touches on it. What could I have done better to prevent this in the future?
The Resizing of EMR Cluster have lots of issues. It is not advisable to have all spot instance from single box. I would suggest few work arounds.
STOP the resizing. Wait to see the changes . Read error message. Then again issue Resizing request.
If you don't use HDFS (say do everything from S3) , kill the Cluster and get a new one.
If you are using that, I suggest you go through https://aws.amazon.com/blogs/big-data/best-practices-for-resizing-and-automatic-scaling-in-amazon-emr/ .
When creating cluster which requires only spot,use 2 or 3 box-
a. 10% On-Demand Value - 1 instance
b. 20% On-Demand Value - 1 instance
I suggest you keep 1 on-demand if budget permits, and then try.
Related
I'm evaluating Karpenter (https://karpenter.sh/) and I wanted to know if there's a way to vertically scale down a large node with few pods. The only scaling actions seem to be triggered by either unschedulable pods or empty nodes.
Scenario: I scheduled 5 pods and the scheduler gave me one c5d.2xlarge instance, and that resulted in a 65% utilization (not bad). I killed 3 pods and utilization dropped as expected to 25%. I waited for a few hours to see if an optimization process would kick in but .. nothing (over 20 hours). The feature is not well documented, in fact the only reference of it is in this independent article: https://blog.sivamuthukumar.com/karpenter-scaling-nodes-seamlessly-in-aws-eks
How does it work?
Observes the pod resource requests of unscheduled pods
Direct provision of Just-in-time capacity of the node. (Groupless Node
Autoscaling)
Terminating nodes if outdated
Reallocating the pods in nodes for better resource utilization
Am I missing something? Is there a way to do this, using Karpenter or another solution? TIA
So there's a feature request on Karpenter's Github project addressing this specific issue: https://github.com/aws/karpenter/issues/1091. I'll update this answer once a solution is available.
The workaround suggested by the project team, was to set a short TTL on the nodes (like 1 day), forcing Karpenter to evaluate optimization daily.
I have an EMR cluster that can scale up to a maximum of 10 SPOT nodes. When not being used it defaults to 1 CORE node (and 1 MASTER) to save costs obviously. So in total it can scale up to a maximum of 11 nodes 1 CORE + 10 SPOT.
When I run my spark job it takes a while to spin up the 10 SPOT nodes and my job ends up taking about 4hrs to complete.
I tried waiting until all the nodes were spun up, then canceled my job and immediately restarted it so that it can start using the max resources immediately, and my job took only around 3hrs to complete.
I have 2 questions:
1. Is there a way to make YARN spin up all the necessary resources before starting my job? I already specify the spark-submit parameters such as num-executors, executor-memory, executor-cores etc. during job submit.
2. I havent done the cost analysis yet, but is it even worthwhile to do number 1 mentioned above? Does AWS charge for spin up time, even when a job is not being run?
Would love to know your insights and suggestions.
Thank You
I am assuming you are using AWS managed scaling for this. If you can switch to custom scaling you can set more aggressive scaling rules, you can also set the numbers of nodes to scale up by on each upscale and downscale, this will help you converge faster to the required number of nodes.
The only downside to custom scaling is that it will take 5 minutes to trigger.
Is there a way to make YARN spin up all the necessary resources before
starting my job?
I do not know how to achieve this. But, In my opinion, this is not worth doing it. Spark is intelligent enough to do this for us.
It knows how to distribute the task when more instances come up or go away in the cluster. There is a certain spark configuration which you should be aware of to achieve this.
You should set this to true spark.dynamicAllocation.enabled. There are some other relevant configurations that you can change or leave it as it is.
For more detail refer to this documentation spark.dynamicAllocation.enabled
Please see the documentation as per your spark version. This link is for the spark version 2.4.0
Does AWS charge for spin up time, even when a job is not being run?
You get charged for every second of the instance that you use, with a one-minute minimum. It is not important whether your job is being run or not. Even If they are idle in the cluster, you will have to pay for it.
Refer to these link for more detail:
EMR FAQ
EMR PRICING
Hope this gives you some idea about the EMR pricing and certain spark configuration related to the dynamic allocation.
I am trying to automate Autoscaling system in AWS.
Normal deployment for us is just checking code into Gitlab and creating a new Tag, then Gitlab CI\CD runs automatically and pushes the code up to artifactory. we have 3 instances. we autosclaue up to 6 and then reduces it to 3.But then we just need to scale out our auto scaling group in AWS, and it builds 3 new servers that pull the new code down, and then scale the ASG back out, killing the 3 old servers. I want to automate this process. Can anyone help me if we can achieve this after deployment
There is a new feature called "Instance Refresh" that will probably do what you want. You just need to call the StartInstanceRefresh API and give the MinHealthy% (which determines the batch size) and the warmup time (which determines the time between batches).
If will terminate and launch the instances in a batch at about the same time, so unless you're ok with a bit of downtime, probably leave the MinHealthy% at the default of 90% so that it only does 1 instance per batch
https://docs.aws.amazon.com/cli/latest/reference/autoscaling/start-instance-refresh.html
I'm trying to optimize cost for my project for some valid reasons we're running it on very expensive instances.
To the best of my knowledge Amazon charges by hours. For instance, if I'm running my EC2 instance for 1 hour and 4 minutes I'll be charged 2 hours.
What would be the best way to shut down instance closest to the next billing cycle, but not exceeding current one?
I was trying to do this based on uptime, but there is some difference between aws billing and uptime value.
I'm looking to use watchdog sitting on the instance itself. So I can pass parameters during provision and it will shut down itself say after 2 full billing cycles.
You can get the time that Amazon starts billing from the EC2 instance (assumes you have jq installed)
curl -s http://169.254.169.254/latest/dynamic/instance-identity/document/ | jq .pendingTime
and you could run a shell script once a minute to shut down after, say 58 minutes.
But this is a pain. If your processing is able to handle interruptions of an instance running then you should look at using spot instances perhaps with a fixed duration. This allows you to run at a reduced price for a known period of time without any additional costs because of running over.
If your workload is complete before the full hour, stop/terminate your instance right away when the work is complete. No need to keep the instance idle for the remainder of the hour.
The only time this may not be efficient is if you may have more work coming in before the full hour, and then you want to keep it running to process that new work. But that will only be the case if work is sporadic. And if it is sporadic, then it just may be better to keep it running.
I have a service running on AWS EC2 Container Service (ECS). My setup is a relatively simple one. It operates with a single task definition and the following details:
Desired capacity set at 2
Minimum healthy set at 50%
Maximum available set at 200%
Tasks run with 80% CPU and memory reservations
Initially, I am able to get the necessary EC2 instances registered to the cluster that holds the service without a problem. The associated task then starts running on the two instances. As expected – given the CPU and memory reservations – the tasks take up almost the entirety of the EC2 instances' resources.
Sometimes, I want the task to use a new version of the application it is running. In order to make this happen, I create a revision of the task, de-register the previous revision, and then update the service. Note that I have set the minimum healthy percentage to require 2 * 0.50 = 1 instance running at all times and the maximum healthy percentage to permit up to 2 * 2.00 = 4 instances running.
Accordingly, I expected 1 of the de-registered task instances to be drained and taken offline so that 1 instance of the new revision of the task could be brought online. Then the process would repeat itself, bringing the deployment to a successful state.
Unfortunately, the cluster does nothing. In the events log, it tells me that it cannot place the new tasks, even though the process I have described above would permit it to do so.
How can I get the cluster to perform the behavior that I am expecting? I have only been able to get it to do so when I manually register another EC2 instance to the cluster and then tear it down after the update is complete (which is not desirable).
I have faced the same issue where the tasks used to get stuck and had no space to place them. Below snippet from AWS doc on updating a service helped me to make the below decision.
If your service has a desired number of four tasks and a maximum
percent value of 200%, the scheduler may start four new tasks before
stopping the four older tasks (provided that the cluster resources
required to do this are available). The default value for maximum
percent is 200%.
We should have the cluster resources available / container instances available to have the new tasks get started so they can start and the older one can drain.
These are the things i do
Before doing a service update add like 20% capacity to your cluster. You can use the ASG (Autoscaling group) commandline and from the desired capacity add 20% to your cluster. This way you will have some additional instance during deployment.
Once you have the instance the new tasks will start spinning up quickly and the older one will start draining.
But does this mean i will have extra container instances ?
Yes, during the deployment you will add some instances but as the older tasks drain they will hang around. The way to remove them is
Create a MemoryReservationLow alarm (~70% threshold in your case) for like 25 mins (longer duration to be sure that we have over commissioned). As the reservation will go low once you have those extra server not being used they can be removed.
I have seen this before. If your port mapping is attempting to map a static host port to the container within the task, you need more cluster instances.
Also this could be because there is not enough available memory to meet the memory (soft or hard) limit requested by the container within the task.