I have a Managed Instance Group of Google Compute Engine VMs (based on a template with container deployment on Container-Optimized OS). The MIG is regional (multi-zoned).
I can release an updated container image (docker run, docker tag, docker push), and then I'd like to restart all VMs in the MIG one by one, so that they can have the updated container (not sure if there's a simpler/better alternative to refresh the VMs attached container). But I also want to introduce a slight delay (say 60 seconds) between each VM's restart event, so that only one or two VMs are unavailable during their restart.
What are some ways to do this programmatically (either via gcloud CLI or their API)?
I tried a rolling restart of the MIG, with maximum unavailable and minimum wait time flags set:
gcloud beta compute instance-groups managed rolling-action restart MIG_NAME \
--project="..." --region="..." \
--max-unavailable=1 --min-ready=60
... but it returns an error:
ERROR: (gcloud.beta.compute.instance-groups.managed.rolling-action.restart) Could not fetch resource:
- Invalid value for field 'resource.updatePolicy.maxUnavailable.fixed': '1'. Fixed updatePolicy.maxUnavailable for regional managed instance group has to be either 0 or at least equal to the number of zones.
Is there a way to perform one-by-one instance restarts with a slight delay in between each action?
Unfortunately the MIGs don't handle this use-case for regional deployments as at Jan 2023. You can, however, orchestrate the rolling update yourself along (sudo code):
for (INSTANCE in instances)
// Force restart the instance
gcloud compute instance-groups managed update-instances MIG_NAME \
--project="..." --region="..." \
--instances=INSTANCE --minimal-action=RESTART \
--most-disruptive-allowed-action=RESTART
WAIT
if (container on INSTANCE not working correctly)
// Break and alert the operator
Related
I plan to use Compute Engine with containers, but everytime I update the container image via gcloud compute instances update-container ... it takes some (down)time stopping and preparing the instance, which causes downtime in the production environment of my application. What pipeline or cloud strategy would you put in place to mitigate this behaviour?
From Updating a container on a VM instance docs,
When you update a VM running a container, Compute Engine performs two
steps:
Updates container declaration on the instance. Compute Engine stores the updated container declaration in instance metadata
under the gce-container-declaration metadata key.
Stops and restarts the instance to actuate the updated configuration, if the instance is running. If the instance is stopped,
update the container declaration and keep the instance stopped. The VM
instance downloads the new image and launches the container on VM
start.
To avoid Application downtime make use of Managed Instance Group.
Managed instance groups maintain high availability of your applications by proactively keeping your instances available, which means in Running state. A MIG automatically recreates an instance that is not Running.
So, Deploy a container on a managed instance group and Update a MIG to a new version of a container image.
I have Google VM windows server 2012 r2, and the server is shutting down daily, I need to stop that if there is any settings to change in the Google Console.
Make sure that Preemptibility is off
billing problem
other people in your organization was set that configuration
another issues such as using cloud scheduler.
please check all above possibilites
There are a few things that could be checked, an instance rebooting itself can be triggered by:
The availability policies configured by default:
Preemptibility Off (recommended): If this is ON the VM you are
running will be a preemptible VM and the instance will be terminated
after 24 hours.
On host maintenance Migrate VM instance (recommended): In case of migrateOnHostMaintenance or hostError events you will get your instance moved to a
new host due to the one your VM is at will have an update or has had an error, you only
have this option or Terminate the VM instance
Automatic restart: If your instance is set to terminate when there is a maintenance event(OnHostMaintenance), or if your instance crashes because of an underlying hardware issue (hostError), you can set up Compute Engine to automatically restart the instance by setting the automaticRestart field to On, this is how it is set by default, and can be turned off manually.
To check if you have had events of this type you can go to your cloud shell in the project where you're having problems and run this command: gcloud compute operations list and use it with a grep just to filter the migrateOnHostMaintenance or the
hostError events like this: gcloud compute operations list | grep migrateOnHostMaintenance and gcloud compute operations list | grep hostError
If you don't found any of the above operations you can use same command and add the instance name gcloud compute operations list | grep INSTANCE_NAME and check fort the start and stop operations by describing them with next command: gcloud compute operations describe OPERATION_ID --zone, you will be able to see the details about the stop or start operations your instance is having, even the user who made the operation.
VM shouldn't auto-shutdown by default unless someone in your organisation has configured it to do so.
The google service that supports this behaviour is Cloud Scheduler. If you take a look at cloud scheduler job and see if it has any jobs listed. If so see what those jobs are doing.
Background
I got the following setup with AWS code deploy:
Currently we have our EC2 application servers connected to an auto-scaling group, but there is a missing step: once a new server is fired up, we don't automatically deploy the latest code on it from our git repo
Question
I was going over this tutorial:
Basically i want to run a bunch of commands as soon as an instance is launched but before it's hooked up to the load balancer.
The above tutorial describes things in general, but I couldn't answer the following questions:
Where do I save the script on the ec2 instance?
How is that script executed once the instance is scaled in but before its connected to the load balancer?
I think you do not need to life cycle hook, the life cycle is useful when you want to perform an action in different stats like stop, start and terminate but you just to pull the latest code and some other commands.
To answer your Question I will suggest below approach, as there are many many more approaches for the same task.
You do not need to save the script or command, place them on s3 or you can run commands just put them in the user data in your launch configuration. You can run them as bash script or you can pull your scripts from aws s3.
This can be the simplest example to handle pull code case. So this will run whenever a new instance launch in this auto-scaling group.
Another example can be to run a complex script, place them on s3 and pull them during scaling up.
I assume you already set permission for s3 and bitbucket. You can run any complex during this time.
The second steps are a bit tricky, you can use a different approach, the instance will never receive traffic until its healthy so start your application once your code updated and all the required scripts done execution than at the end you can run your application.
Another approach can be
a):Health Check Grace Period
Frequently, an Auto Scaling instance that has just come into service
needs to warm up before it can pass the health check. Amazon EC2
Auto Scaling waits until the health check grace period ends before
checking the health status of the instance.
b)Custom Health Checks
If you have your own health check system, you can send the instance's
health information directly from your system to Amazon EC2 Auto
Scaling.
Use the following set-instance-health command to set the health state
of the specified instance to Unhealthy.
aws autoscaling set-instance-health --instance-id i-123abc45d --health-status healthy
You can get instance-id using curl call, the script that we place in the userdata.
If you have custom health checks, you can send the information from your health checks to Amazon EC2 Auto Scaling so that Amazon EC2 Auto Scaling can use this information. For example, if you determine that an instance is not functioning as expected, you can set the health status of the instance to Unhealthy. The next time that Amazon EC2 Auto Scaling performs a health check on the instance, it will determine that the instance is unhealthy and then launch a replacement instance.
c)Instance Warmup
With step scaling policies, you can specify the number of seconds that
it takes for a newly launched instance to warm up. Until its specified
warm-up time has expired, an instance is not counted toward the
aggregated metrics of the Auto Scaling group. While scaling out, AWS
also does not consider instances that are warming up as part of the
current capacity of the group. Therefore, multiple alarm breaches that
fall in the range of the same step adjustment result in a single
scaling activity. This ensures that we don't add more instances than
you need.
Again, the second step is not that big deal, you can control the flow using your script and start the application at the end so then it will mark healthy,
You can also try as-enter-exit-standby but I think custom health checks for warm up can do this job.
I am running a simple java HelloWorld program using docker container in AWS Batch. I have created a managed Compute Environment with following values
Minimum vCPUs 0
Desired vCPUs 0
Maximum vCPUs 256
Instance types optimal
On submitting the Job, the job is executed successfully i.e. the job is submitted to the queue, the scheduler provisions the ec2 instance ( with aws-ecs agent container and java helloworld container which is specified in Job Definition) and the job is successfully completed with the logs in CloudWatch Stream.
My issue is that after the job is succeeded the compute environment (ec2 instance) provisioned by scheduler still keeps on running instead of terminating.
Pls. suggest if I am missing anything.
Your compute environment will terminate if it is idle near the end of an AWS Billing Hour.
Inside the Compute Environment Parameters documentation for AWS Batch, there is a definition of State. A compute environment is in the Enabled state and can accept jobs from the queue. Once the compute environment is in Disabled and idle, toward the end of an AWS billing hour the compute environment is scaled in (which will terminate your EC2 instance).
From Oct 5, 2017
AWS Batch evaluate compute resources more frequently and immediately scale down any idle instances when there are no more runnable jobs in your job queues.
So, your compute environment instance will be terminated immediately if it is idle.
Announcement
Autoscaling helps you to automatically add or remove compute engines based on the load. The prerequisites to autoscaling in GCP are instance template and managed instance group.
This question is a part of another question's answer, which is about building an autoscaled and load-balanced backend.
I have written the below answer that contains the steps to set up autoscaling in GCP.
Autoscaling is a feature of managed instance group in GCP. This helps to handle very high traffic by scaling up the instances and at the same time it also scales down the instances when there is no traffic, which saves a lot of money.
To set up autoscaling, we need the following:
Instance template
Managed Instance group
Autoscaling policy
Health Check
Instance template is a blueprint that defines the machine-type, image, disks of the homogeneous instances that will be running in the autoscaled, managed instance group. I have written the steps for setting up an instance template here.
Managed instance group helps in keeping a group of homogeneous instances that is based on a single instance template. Assuming the instance template as sample-template. This can be set up by running the following command in gcloud:
gcloud compute instance-groups managed \
create autoscale-managed-instance-group \
--base-instance-name autoscaled-instance \
--size 3 \
--template sample-template \
--region asia-northeast1
The above command creates a managed instance group containing 3 compute engines located in three different zones in asia-northeast1 region, based on the sample-template.
base-instance-name will be the base name for all the automatically created instances. In addition to the base name, every instance name will be appended by a uniquely generated random string.
size represents the desired number of instance in the group. As of now, 3 instances will be running all the time, irrespective of the amount of traffic generated by the application. Later, it can be autoscaled by applying a policy to this group.
region (multi-zone) or single-zone: Managed instance group can be either set up in a region (multi-zone) i.e the homogeneous instances will be evenly distributed across all the zones in a given region or all the instances can be deployed in the same zone within a region. It can also be deployed as cross region one, which is currently in alpha.
Autoscaling policy determines the autoscaler behaviour. The autoscaler aggregates data from the instances and compares it with the desired capacity as specified in the policy and determines the action to be taken. There are many auto-scaling policies like:
Average CPU Utilization
HTTP load balancing serving capacity (requests / second)
Stackdriver standard and custom metrics
and many more
Now, Introducing Autoscaling to this managed instance group by running the following command in gcloud:
gcloud compute instance-groups managed \
set-autoscaling \
autoscale-managed-instance-group \
--max-num-replicas 6 \
--min-num-replicas 2 \
--target-cpu-utilization 0.60 \
--cool-down-period 120 \
--region asia-northeast1
The above command sets up an autoscaler based on CPU utilization ranging from 2 (in case of no traffic) to 6 (in case of heavy traffic).
cool-down-period flag specifies the number of seconds to wait after a instance has been started before the associated autoscaler starts to collect information from it.
An autoscaler can be associated to an maximum of 5 different policies. In case of more than one policy, Autoscaler recommends the policy that leaves with the maximum number of instances.
Interesting fact: when an instance is spun up by the autoscaler, it makes sure that the instance runs for atleast 10 minutes irrespective of the traffic. This is done because GCP bills for a minimum of ten minute running time for the compute engine. It also protects against erratic spinning up and shutting down of instances.
Best Practices: From my perspective, it is better to create a custom image with all your software installed than to use a startup script. As the time taken to launch new instances in the autoscaling group should be as minimum as possible. This will increase the speed at which you scale your web app.
This is part 2 of 3-part series about building an autoscaled and load-balanced backend.