Set Desired Capacity error during the bootstrap of the instance - amazon-web-services

I've created an ASG with a min size and desired capacity set to 1. The EC2 instance is bind to an Application Load Balancer. I use ignition to define the user data of the Launch Configuration. I run defined in Ignition a script which execute these two commands:
# Set the ASG Desired Capacity - get CoreOS metadata
ASG_NAME=$(/usr/bin/docker run --rm --net=host \
"$AWSCLI_IMAGE" aws autoscaling describe-auto-scaling-instances \
--region="$COREOS_EC2_REGION" --instance-ids="$COREOS_EC2_INSTANCE_ID" \
--query 'AutoScalingInstances[].AutoScalingGroupName' --output text)
echo "Check desired capacity of Auto Scaling group..."
# shellcheck disable=SC2154,SC2086
/usr/bin/docker run --rm --net=host \
$AWSCLI_IMAGE aws autoscaling set-desired-capacity \
--region="$COREOS_EC2_REGION" --auto-scaling-group-name "$ASG_NAME" \
--desired-capacity 3 \
--honor-cooldown
The problem is that I get as error ScalingActivityInProgress so I can't change the desired capacity.
First I'd like to understand the root cause. Is it maybe because the ALB is not healthy when I run the above commands?

Solved removing honor-cooldown param from the request

Related

Docker service update downtime

I have an AWS EC2 with docker service.
The service has just 1 container, and when I update the container (changing image), I have a downtime (about 1 minute).
This is my docker service create code:
docker service create \
--name service-$IMAGE_NAME \
--publish 80:80 \
--env ENVIRONMENT=$(cat /etc/service_environment) \
--env-file=/etc/.env \
--replicas=1 \
--update-failure-action rollback \
--update-order start-first \
$ECR_IMAGE
Here update code:
#pull image from private ECR repository
docker pull $IMAGE
docker service update \
--force \
--image $IMAGE:latest \
--update-failure-action rollback \
--update-order start-first \
service-$IMAGE_NAME
Why this happen? What's wrong?
Thank you
Changing image means you are stopping existing docker instance, and start a new one. That's why it will be down for a while, in your case, 1 minute. It's the time for the docker instance to restart. You can make this seamless using 2 ec2 instance and load balancer. You update 1 instance and repointing the traffic to other instance, and then update the other one after the instance you've update is successful.

How to get Launch Configuration through ASG

I am trying to get launch configuration details by aws-cli, I know there have a command aws autoscaling describe-launch-configurations --launch-configuration-names my-launch-config, but I don't know what is the launch config name, I only have the ASG(auto-scaling group)name. I know in AWS console I can find the launch configuration name in the ASG detail page, but how to do this by aws-cli?
In another word, I want to get launch configuration details but I only have ASG name as input, what should be the command/commands in aws-cli.
You can get LC name using describe-auto-scaling-groups and then use describe-launch-configurations to get its details:
asg_name="dddd"
launch_config_name=$(aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names ${asg_name} \
--query "AutoScalingGroups[0].LaunchConfigurationName" \
--output text)
echo ${launch_config_name}
aws autoscaling describe-launch-configurations \
--launch-configuration-names ${launch_config_name}

How to keep Google Dataproc master running?

I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every cluster I created. I see there is a similar question here: Keep running Dataproc Master node
It looks like it's the initialization action problem. However the post does not give me enough info to fix the issue. Below are the commands I used to create the cluster:
gcloud dataproc clusters create $CLUSTER_NAME \
--project $PROJECT \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--master-machine-type $MASTER_MACHINE_TYPE \
--master-boot-disk-size $MASTER_DISK_SIZE \
--worker-boot-disk-size $WORKER_DISK_SIZE \
--num-workers=$NUM_WORKERS \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--scopes cloud-platform \
--metadata JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
--optional-components=ANACONDA,JUPYTER \
--image-version=1.3
I need the BigQuery connector, GCS connector, Jupyter and DataLab for my cluster.
How can I keep my master node running? Thank you.
As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:
Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
Edit the initialization action to set the environment variable as suggested by yelsayed:
function run_datalab(){
if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \
-v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then
echo 'Cloud Datalab Jupyter server successfully deployed.'
else
err 'Failed to run Cloud Datalab'
fi
}
And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.

AWS Auto scaling groups and non ELB health checks

We have auto scaling groups for one of our cloud formation stacks that has a CPU based alarm for determining when to scale the instances.
This is great but we recently had it scale up from one node to three and one of those nodes failed to bootstrap via cfn-init. Once the workload reduced and the group scaled back down to one node it killed the two good instances and left the partially bootstrapped node as the only remaining instance. This meant that we stopped processing work until someone logged in and re-ran the bootstrap process.
Obviously this is not ideal. What is the best way to notify the auto scaling group that a node is not healthy when it does not sit behind an ELB?
Since this is just initial bootstrap what I'd really like is to communicate back to the auto scaling group that this node failed and have it terminated and a new node spun up in its place.
A colleague just showed me http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-configure-healthcheck.html which looks handy.
If you have your own health check system, you can use the information from your health check system to set the health state of the instances in the Auto Scaling group.
UPDATE - I managed to get this working during launch.
Here's what my UserData section for the ASG looks like:
#!/bin/bash -v
set -x
export AWS_DEFAULT_REGION=us-west-1
cfn-init --region us-west-1 --stack bapi-prod --resource LaunchConfiguration -v
if [[ $? -ne 0 ]]; then
export INSTANCE=`curl http://169.254.169.254/latest/meta-data/instance-id`
aws autoscaling set-instance-health \
--instance-id $INSTANCE \
--health-status Unhealthy
fi
cfn-init --region us-west-1 --stack bapi-prod --resource LaunchConfiguration -v
if [[ $? -ne 0 ]]; then
export INSTANCE=`curl http://169.254.169.254/latest/meta-data/instance-id`
aws autoscaling set-instance-health \
--instance-id $INSTANCE \
--health-status Unhealthy
fi
Can also be done as a one-liner. For example, I'm using the following in Terraform:
runcmd:
- /tmp/runcmd-puppet.sh || { export INSTANCE=`curl http://169.254.169.254/latest/meta-data/instance-id`; aws autoscaling --region eu-west-1 set-instance-health --instance-id $INSTANCE --health-status Unhealthy; }

Stop and Start Elastic Beanstalk Services

I wanted to know if there is an option to STOP Amazon Elastic Beanstalk as an atomic unit as I can do with EC2 servers instead of going through each service (e.g. load balancer, EC2..) and STOP (and START) them independently?
The EB command line interface has an eb stop command. Here is a little bit about what the command actually does:
The eb stop command deletes the AWS resources that are running your application (such as the ELB and the EC2 instances). It however leaves behind all of the application versions and configuration settings that you had deployed, so you can quickly get started again. Eb stop is ideal when you are developing and testing your application and don’t need the AWS resources running over night. You can get going again by simply running eb start.
EDIT:
As stated in the below comment, this is no longer a command in the new eb-cli.
If you have a load-balanced environment you can try the following trick
$ aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-auto-scaling-group \
--min-size 0 --max-size 0 --desired-capacity 0
It will remove all instances from the environment but won't delete the environment itself. Unfortunately you still will pay for elastic load balancer. But usually EC2 is the most "heavy" part.
Does it work for 0?
yes, it does
$ aws autoscaling describe-auto-scaling-groups --region us-east-1 \
--auto-scaling-group-name ASG_NAME \
--query "AutoScalingGroups[].{DesiredCapacity:DesiredCapacity,MinSize:MinSize,MaxSize:MaxSize}"
[
{
"MinSize": 2,
"MaxSize": 2,
"DesiredCapacity": 2
}
]
$ aws autoscaling update-auto-scaling-group --region us-east-1 \
--auto-scaling-group-name ASG_NAME \
--min-size 0 --max-size 0 --desired-capacity 0
$ aws autoscaling describe-auto-scaling-groups --region us-east-1 \
--auto-scaling-group-name ASG_NAME \
--query "AutoScalingGroups[].{DesiredCapacity:DesiredCapacity,MinSize:MinSize,MaxSize:MaxSize}"
[
{
"MinSize": 0,
"MaxSize": 0,
"DesiredCapacity": 0
}
]
And then you can check environment status
$ eb status -v
Environment details for: test
Application name: TEST
Region: us-east-1
Deployed Version: app-170925_181953
Environment ID: e-1234567890
Platform: arn:aws:elasticbeanstalk:us-east-1::platform/Multi-container Docker running on 64bit Amazon Linux/2.7.4
Tier: WebServer-Standard
CNAME: test.us-east-1.elasticbeanstalk.com
Updated: 2017-09-25 15:23:22.980000+00:00
Status: Ready
Health: Grey
Running instances: 0
In the beanstalk webconsole you will see the following message
INFO Environment health has transitioned from Ok to No Data.
There are no instances. Auto Scaling group desired capacity is set to zero.
eb stop is deprecated. I also had the same problem and the only solution I could come up with was to backup the environment and then restore it.
Here's a blog post in which I'm explaining it:
http://pminkov.github.io/blog/how-to-shut-down-and-restore-an-elastic-beanstalk-environment.html