terraform autoscaling group destroy timeouts - amazon-web-services

Is there any way to change the terraform default timeouts?
For example on terraform apply I frequently timeout trying to destroy autoscaling groups:
module.foo.aws_autoscaling_group.bar (deposed #0): Still destroying... (10m0s elapsed)
Error applying plan:
1 error(s) occurred:
* aws_autoscaling_group.bar (deposed #0): group still has 1 instances
If I re-run the terraform apply, it works. It seems like the timeout is 10 minutes -- I'd like to double the time so that it finishes reliably. Alternatively, is there a way to get the auto scaling groups to delete faster?

You can add a timeout to a specific resource inside terraform
timeouts {
create = "60m"
delete = "2h"
}
https://www.terraform.io/docs/configuration/resources.html

i have the same problem when i try to delete autoscaling-group with terraform destroy
I solve the problem with add the follow lines on my resource creation section:
timeouts {
delete = "60m"
}

Related

Erroneous Aws::ECS::Errors::ClusterNotFoundException — what is happening?

I have an ECS cluster, an active service for it, and a task for this service. I am trying to call ListTasks with Ruby AWS SDK.
When there is no active task, it comes through with an empty list, as expected. But when there is a running task, I get the Aws::ECS::Errors::ClusterNotFoundException.
I tried calling ListClusters, and got a successful response:
{:cluster_arns=>["arn:aws:ecs:<region>:<account_num>:cluster/<cluster_name>"], :next_token=>nil}.
I also tried calling DescribeServices, and got a successful response as well: {:clusters=>[{:cluster_arn=>"arn:aws:ecs:<region>:<account_num>:cluster/<cluster_name>", :cluster_name=>"<cluster_name>", :status=>"ACTIVE", :registered_container_instances_count=>0, :running_tasks_count=>1, :pending_tasks_count=>0, :active_services_count=>1, :statistics=>[], :tags=>[], :settings=>[{:name=>"containerInsights", :value=>"enabled"}], :capacity_providers=>["FARGATE_SPOT", "FARGATE"], :default_capacity_provider_strategy=>[{:capacity_provider=>"FARGATE", :weight=>1, :base=>0}], :attachments=>nil, :attachments_status=>nil}], :failures=>[]}.
In addition, I regularly call DescribeServices and UpdateService for the same cluster name successfully.
But the error persists for ListTasks.
Has anyone encountered something similar? What do you think is happening?
UPD The code that generates the error:
##ecs_client = Aws::ECS::Client.new(
region: Aws.config[:region],
access_key_id: Aws.config[:credentials].access_key_id,
secret_access_key: Aws.config[:credentials].secret_access_key
)
...
tasks = ##ecs_client.list_tasks({ cluster: '<cluster_name>' })
If you do not specify a cluster when calling the "ListTasks" API, the "default" cluster is assumed. Also, double check the region used in your script.

How to ensure CodeDeploy keeps all auto-scaling group instances in maintenance while purging cache?

I am working on a deployment process involving the following:
Gitlab runner pushes a Magento 1.9 app to a S3 bucket
Gitlab runner deploys the app using CodeDeploy
CodeDeploy deploys the application on all instances that are up in the auto-scaling group
The issue with this is that CodeDeploy events do not necessarily happen at the exact same second and that might cause issues with the way we reload the application cache.
Our application should clear the cache only when all active instances are in maintenance to avoid getting new http requests (otherwise it might throw a "Front controller reached 100 router match iterations" exception).
We thought of using lock files on a shared folder across all instances but that sounds very old-school.
Any idea on how to ensure all instances are in maintenance for the clear cache to happen would be much appreciated!
I don't know much about Magento, but it sounds like there are a few things that need to happen:
1. Put instances into maintenance mode
Inside of your appspec, you presumably use the ApplicationStop lifecycle hook or another hook to put an individual instance into maintenance mode.
2. Wait for all instances to go into maintenance mode and purge cache
Assuming you put the hosts into maintenance mode in ApplicationStop, you could use another lifecycle hook to wait for all instances to go into maintenance mode. For example, you could have a script in BeforeInstall that checks if all instances got past ApplicationStop and starts the purge if they did (i.e. it's the last one).
Here's some pseudocode:
# BeforeInstall or something other hook script
# 1. Get the instance details from CodeDeploy
instances = listDeploymentInstances().map(instanceId -> getDeploymentInstance(instanceId))
# 2. Check if all of the instances have completed ApplicationStop
for instance in instances {
applicationStopStatus = instance.instanceSummary.lifecycleEvents
.findFirst(event -> event.lifecycleEventName == "ApplicationStop")
.status
# If it's succeeded, we're good to go
if status == "Succeeded
purgeCache()
# If it's failed, you'll have to decide what should be done
else if status == "Failed"
# Abort the deployment or handle some other way
# If it's not completed, ignore it and let another instance kick off the purge
else
return
}
3. Wait for purge to complete and exit maintenance mode
Using a different lifecycle hook, probably ApplicationStart or ValidateService, wait for your purge to complete and exit maintenance mode. As long as the purge takes less than 1 hour, your instances shouldn't time out the lifecycle hooks.
If you wanted to do this via CodeDeploy, I'd do something like the above. Of course, you could manage this outside of the deployment and have some sort of code running on your instances that manages all of that.

Terraform : How to autoscale managed instance group in GCP with stackdriver metric

I want to autoscale instance group based on pubsub metric pubsub.googleapis.com/subscription/num_undelivered_messages . For every 2
undelivered messages, i want to spin up a new worker instance. Manually, This is quite easy with GUI config.
Now, I wrote terraform code for automating this so that we can repeat it.
I looked through terraform documentation for autoscaler and couldn't find a way to do that even though they have mentioned using customMetricUtilizations . I just couldn't make it work.
Here is my autoscaler part :
resource "google_compute_autoscaler" "foobar" {
name = "scaler"
zone = "${var.region}-a"
target = "${google_compute_instance_group_manager.appserver.self_link}"
autoscaling_policy = {
max_replicas = 10
min_replicas = 0
cooldown_period = 60
metric {
name = "pubsub.googleapis.com/subscription/num_undelivered_messages"
target = "2"
type = "GAUGE"
}
}
}
Could anybody please help me figure it out.

cloudformation error: Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement

Good day, I am using the AWS quick start for linux-bastion.
On changing the QSS3BucketName and QSS3KeyPrefix to the ones in my account it throws the error
Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
everything else in the stack is getting created, the script being pulled from the s3 bucket and user data being on the instance. The only issue is, autoscaling group fails to create despite providing the instance and running the user data. My guess is something is happening in the s3accesscreds which i am not able to fathom.
WHAT CAN BE THE CATCH?? would really appreciate any help, thank you
I had a similar problem. You need to make sure that the bastion_bootstrap.sh file here is placed under the correct location in your bucket:
your-bucket-name/your-prefix/scripts/bastion_bootstrap.sh
And, of course, that the bastion_bootstrap script itself doesn't throw any errors. If it does, you'll see them in /var/log/cfn-init.log

How to verify volume successfully created/attached in boto3?

I'm using boto3 client.create_volume and client.attach_volume APIs, but the return values are dictionaries, and the key State within the dictionary is creating for create_volume, and attaching for attach_volume. Is there any way to check if the volume is successfully created/attached within boto3?
Fortunately, boto3 has a concept called Waiters that can do the waiting for you!
See: EC2.Waiter.VolumeInUse
Polls EC2.Client.describe_volumes() every 15 seconds until a successful state is reached. An error is returned after 40 failed checks.
For those using ec2 client (ec2 = boto3.client('ec2')), you can do
ec2.get_waiter('volume_available').wait(VolumeIds=[new_volume['VolumeId']])
See describe_volumes
Pass your volume_id and describe_volumes returns the information about:
Creation State:
'State': 'creating'|'available'|'in-use'|'deleting'|'deleted'|'error'
Attachment State:
'State': 'attaching'|'attached'|'detaching'|'detached'
and lot more information about your volume.