We setup a batch compute environment, job queue, and job definition. The min CPUs for the compute environment is set to 16, so it should always have at least one EC2 instance running. It's a MANAGED environment. It is not starting any, yet everything is still reporting healthy. I've looked at the troubleshooting page and nothing useful has come of it yet.
Where can I go to see what is going wrong? Is this completely a black box and if I make a mistake somewhere in my config (Probable some kind of ARN permissions problem), I have to scan every line till I happen to see the mistake?
The answer is, look at EC2 Auto Scaling Groups. There should be an autoscaling group named after the compute environment. All of the errors for starting EC2 instances should be in that auto scaling group, which is created and managed by the batch compute environment.
Related
I have inherited a nasty project running on AWS. The architecture is very complex and, not being an AWS expert, I'm not entirely sure where to start unpicking the structure of the project.
I've been asked to "hibernate" the entire system - meaning we can't lose any data, but all running instances can be switched off. As long as it can eventually be resurrected, the data can be stored anywhere within AWS.
From what I can tell, I think everything is controlled by ECS. There are several ECS clusters with various tasks running, and most seem to have a volume attached. I know that if I set the desired instances for any cluster to 0, it will shut down all associated instances. But the question is, if I later set that back to the previous count, will the data come back when the instances do? Or are the volumes deleted once the EC2 instances are terminated (as is usual with a "standalone" EC2 instance).
I'd prefer not to have to set up the entire architecture again manually in the future.
I have tried to understand whether volumes currently in use by the instances for the ECS clusters will be deleted when I reduce the desired instance count to 0, but have been unable to come up with an answer.
I have one infra that use amazon elastic beanstalk to deploy my application.
I need to scale my app adding some spot instances that EB do not support.
So I create a second autoscaling from a launch configuration with spot instances.
The autoscaling use the same load balancer created by beanstalk.
To up instances with the last version of my app, I copy the user data from the original launch configuration (created with beanstalk) to the launch configuration with spot instances (created by me).
This work fine, but:
how to update spot instances that have come up from the second autoscaling when the beanstalk update instances managed by him with a new version of the app?
is there another way so easy as, and elegant, to use spot instances and enjoy the benefits of beanstalk?
UPDATE
Elastic Beanstalk add support to spot instance since 2019... see:
https://docs.aws.amazon.com/elasticbeanstalk/latest/relnotes/release-2019-11-25-spot.html
I was asking this myself and found a builtin solution in elastic beanstalk. It was described here as follows:
Add a file under the .ebextensions folder, for our setup we’ve named the file as spot_instance.config (the .config extension is
important), paste the content available below in the file
https://gist.github.com/rahulmamgain/93f2ad23c9934a5da5bc878f49c91d64
The value for EC2_SPOT_PRICE, can be set through the elastic beanstalk environment configuration. To disable the usage of spot
instances, just delete the variable from the environment settings.
If the environment already exists and the above settings are updates, the older auto scaling group will be destroyed and a new one
is created.
The environment then submits a request for spot instances which can be seen under Spot Instances tab on the EC2 dashboard.
Once the request is fulfilled the instance will be added to the new cluster and auto scaling group.
You can use Spot Advisor tool to ascertain the best price for the instances in use.
A price point of 30% of the original price seems like a decent level.
I personally would just use the on-demand price for the given instance type given this price is the upper boundary of what you would be willing to pay. This reduces the likelihood of being out-priced and thus the termination of your instances.
This might be not the best approach for production systems as it is not possible to split between a number of on-demand instances and an additional number of spot instances and there might be a small chance that there are no spot instances available as someone else is buying the whole market with high bids.
For production use cases I would look into https://github.com/AutoSpotting/AutoSpotting, which actively manages all your auto-scaling groups and tries to meet the balance between the lowest prices and a configurable number or percentage of on-demand instances.
As of 25th November 2019, AWS natively supports using Spot Instances with Beanstalk.
Spot instances can be enabled in the console by going to the desired Elastic Beanstalk environment, then selecting Configuration > Capacity and changing the Fleet composition to "Spot instance enabled".
There you can also set options such as the On-Demand vs Spot percentage and the instance types to use.
More information can be found in the Beanstalk Auto Scaling Group support page
Here at Spotinst, we were dealing with exactly that dilemma for our customers.
As Elastic Beanstalk creates a whole stack of services (Load Balancers, ASG’s, Route 53 access point etc..) that are tied together, it isn’t a simple task to manage Spots within it.
After a lot of research, we figured that removing the ASG will always be prone to errors as keeping the configuration intact gets complex. Instead, we simply replicate the ASG and let our Elastigroup and the ASG live side by side with all the scaling policies only affecting the Elastigroup and the ASG configuration updates feeding there as well.
With the instances running inside Elastigroup, you achieve managed Spot instances with full SLA.
Some of the benefits of running your Spot instances in Elastigroup include:
1) Our algorithm makes live choices for the best Spot markets in terms of price and availability whenever new instances spin up.
2) When an interruption happens, we predict it about 15 minutes in advance and take all the necessary steps to ensure (and insure) the capacity of your group.
3) In the extreme case that none of the markets have Spot availability, we simply fall back to an on-demand instance.
Since AWS clearly states that Beanstalk does not support spot instances out-of-the-box you need to tinker a bit with the thing. My customer wanted mixed environment (on-demand + spot) and full spot. What I created for my customer was the following (I had access to GUI only):
For the mixed env:
start the env with regular instance;
copy the respective launch configuration and chose spot instances during the process;
edit Auto Scaling Group and chose the lc you just edited + be sure to change Termination Policy to NewestInstance.
Such setup will allow you to have basic on-demand fleet (not-terminable) + some extra spots if required, e.g., higher-than-usual traffic. Remember that if you terminate the environment and recreate it then all of your edits will be removed.
For full spot env:
similar steps as before with one difference - terminate the running instance and wait for ASG to launch a new one. If you want it to do without downtime, just give an extra instance for the Desired number, wait for it to launch and then terminate on-demand one.
I do not know much about how AWS works since the person who set the whole thing up does not work with us anymore, and I do not specialize in Amazon at all.
I need to set up an auto-scaling on my EC2 instance. I am currently reading all available tutorials to learn the how-to, but there is one thing I cannot find at all. The auto-scaling automaticaly start new instance of EC2 but I cannot find anything about how to do anything in those instance.
Currently, to start our webservices, we need to log into the instance, pull the code from git and launch the whole thing with PM2. I cannot find anything about how to do all those things automatically at the start of the instance.
I think this is supposed to be basic stuff, but as I said, I know next to nothing about how to start, and I do not have much time to learn (my boss just told me I had to be done by the end of the week !)
So if anyone know where to learn this, that would be really helpful. Thanks!
You need a Launch Configuration for setting up an Auto Scaling Group (ASG). The Launch Configuration is where you define all your instance configurations such as type, disk size, security groups, etc. One of these configurations is AMI ID. The AMI ID refers to the image to be used when launching a new instance in the ASG. So you basically need to launch a machine, install everything needed on it, create an image out of it, create a launch configuration using that image, and use that launch configuration in your ASG. This way you do not need to go to the newly added servers every time. But if you like them to run the updated (last) version of your application, you should have a scheduled job in your image which is triggered on-start. This job is responsible for copying the files (e.g. compiled files) from somewhere (a deployment machine for instance) to the newly added instance and then starting it.
The method for configuring an Amazon EC2 instance does not actually require Auto Scaling. The two main options for configuring an instance are:
Launching from a pre-configured AMI that already contains the desired software, or
Running a startup script via User Data, which executes once the instance has launched
You can choose one of the above and then test it by launching an instance via the management console or from a script that calls the AWS Command-Line Interface (CLI).
To incorporate it into Auto Scaling, configure the Auto Scaling Launch Configuration with the same parameters and then each new instance launched by Auto Scaling will automatically be configured.
Here's what I have in AWS:
Application ELB
Auto Scaling Group with 2 instances in different regions (Windows IIS servers)
Launch Config pointing to AMI_A
all associated back end stuff configured (VPC, subnets, security groups, ect)
Everything works. However, when I need to make an update or change to the servers, I am currently manually creating a new AMI_B, creating a new LaunchConfig using AMI_B, updating the AutoScalingGroup to use the new LaunchConfig, increasing min number of instances to 4, waiting for them to become available, then decreasing the number back to 2 to kill off the old instances.
I'd really love to automate this process. Amazon gave me some links to CLI stuff, and I'm able to script the AMI creation, create the LaunchConfig, and update the AutoScalingGroup...but I don't see an easy way to script spinning up the new instances.
After some searching, I found some CloudFormation templates that look like they'd do what I want, but most do more, and it's a bit confusing to me.
Should I be exploring CloudFormation? Is there a simple guide I can follow to get started? Or should I stay with the scripting I have started?
PS - sorry if this is a repeated question. Things change frequently at AWS, so sometimes the older responses may not be the current best answers.
You have a number of options to automate the process of updating the instances in an Auto Scaling Group to a new or updated Launch Configuration:
CloudFormation
If you do want to use CloudFormation to manage updates to your Auto Scaling Group's instances, refer to the UpdatePolicy attribute of the AWS::AutoScaling::AutoScalingGroup Resource for documentation, and the "What are some recommended best practices for performing Auto Scaling group rolling updates?" page in the AWS Knowledge Center for more advice.
If you'd also like to script the creation/update of your AMI within a CloudFormation resource, see my answer to the question, "Create AMI image as part of a cloudformation stack".
Note, however, that CloudFormation is not a simple tool- it's a complex, relatively low-level service for orchestrating AWS resources, and migrating your existing scripts to it will likely take some time investment due to its steep learning curve.
Elastic Beanstalk
If simplicity is most important, then I'd suggest you evaluate Elastic Beanstalk, which also supports both rolling and immutable updates during deployments, in a more fully managed, console-oriented, platform-as-a-service environment. Refer to my answer to the question, "What is the difference between Elastic Beanstalk and CloudFormation for a .NET project?" for further comparisons between CloudFormation and Elastic Beanstalk.
CodeDeploy
If you want a solution for updating instances in an auto-scaling group that you can plug into existing scripts, AWS CodeDeploy might be worth looking into. You install an agent on your instances, then trigger deployments through the API/CLI/Console and it manages deploying application updates to your fleet of instances. See Deploy an Application to an Auto Scaling Group Using AWS CodeDeploy for a complete tutorial. While CodeDeploy supports 'in-place' deployments and 'blue-green' deployments (see Working With Deployments for details), I think this service assumes an approach of swapping out S3-hosted application packages onto a static base AMI rather than replacing AMIs on each deployment. So it might not be the best fit for your AMI-swapping use case, but perhaps worth looking into anyway.
You want a custom Termination policy on the Auto Scaling Group.
OldestLaunchConfiguration. Auto Scaling terminates instances that have the oldest launch configuration. This policy is useful when you're updating a group and phasing out the instances from a previous configuration.
To customize a termination policy using the console
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
On the navigation pane, choose Auto Scaling Groups.
Select the Auto Scaling group.
For Actions, choose Edit.
On the Details tab, locate Termination Policies. Choose one or more
termination policies. If you choose multiple policies, list them in
the order that you would like them to apply. If you use the Default
policy, make it the last one in the list.
Choose Save.
On the CLI
aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg --termination-policies "OldestLaunchConfiguration"
https://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html
We use Ansible's ec2_asg module for that purpose. There are replace_all_instances and replace_batch_size settings for that purpose. Per documentation:
In a rolling fashion, replace all instances that used the old launch configuration with one from the new launch configuration.
It increases the ASG size by C(replace_batch_size), waits for the new instances to be up and running.
After that, it terminates a batch of old instances, waits for the replacements, and repeats, until all old instances are replaced.
Once that's done the ASG size is reduced back to the expected size.
If you provide target_group_arns, module will check for health of instances in target groups before going to next batch.
Edit: in order to maintain desired number of instances, we first set min to desired.
I am writing a django app which I plan on deploying to AWS via Elastic Beanstalk. I am trying to understand why I would need to specify 'leader_only' for a container command I want to run for my app. More details about this can be found here.
It says:
Additionally, you can use leader_only. One instance is chosen to be
the leader in an Auto Scaling group. If the leader_only value is set
to true, the command runs only on the instance that is marked as the
leader.
If I have several instances running my app because I want to scale it, wouldn't using 'leader_only' run the command on only one instance, and not affect the rest? I am probably misunderstanding the purpose of it, but that seems non-ideal because the environment in the leader may differ from the other instances, and the end user may get different results depending on which instance they happen to connect to.
From a technical point of view, elastic beanstalk is autoscaling group and when you deploy something you need to assume that potentially your commands can be executed simultaneously on several ec2 instances.
Main goal of the leader_only option is to make sure that your commands will be executed on only one ec2 instance. It is useful for use cases such as execution of the db migration scripts, creation of db, etc., that should be executed just once on one ec2. So leader_only is just a marker that some commands will be executed on this instance only.
However, you need to keep in mind, the leader attribute is set once on creation of your environment and in case if leader died and was replaced by new instance possible situation when you don't have any leaders in autoscaling group.
I've done considerable testing of this recently. Both leader_only and EB_IS_COMMAND_LEADER. Both Apache 1 and Apache 2 setups.
The two named values above can be found in many discussions, guides and documents, but the situation is basically this:
You cannot trust being able to reliably detect a leader in a multiple EC2 instance environment, except during deployment and scale up
That means you cannot use the testing of either of the values above to confirm a command will run on exactly one (not zero, not 2+) instance as part of a cron job or scheduled task.
Recent improvements and changes to the way leader status is managed may well mean that a leader is always available during deployments and scale up, but at other times, including after instance replacement, there may not be a leader instance to be found.
There are two main options available if you really need to only run a scheduled task once while managing multiple instances.
A worker environment specifically for scheduled tasks, or another external service like Lambda with EventBridge (CloudWatch Events)
Setup crons to run across all instance in deployment configs. Include a small amount of code before the cron runs which connects to the AWS api, gets a list of current instances and checks the id of the first returned against its own ID to see if it should run the cron.