Google Cloud cancel replacing instances in group - google-cloud-platform

How is it possible to cancel replacing instances in instance group without website going offline? We have a managed instance group of compute engine instances, we start replace operation with maximum unavailable instances set to 0, if new instance for some reason doesn't become healthy there is an option to remove instance. However it removes all instances making website to go down until a new instance is created. Is it supposed to happen?

This seems to be expected behavior. Please have a look at rolling update in order to update your instance. Striving for 0 downtime could be achievable. To make your servers more responsive and less disruptive, you may consider the following two strategies:
Max surge. This will create instances above the target size in order to speed up the update process.
Opportunistic update. “An opportunistic update is only applied when new instances are created by the managed instance group”

Related

CloudRun is going over max instance count (of 1)

I have a cloud run service (receiving no traffic) where I set the max instances and min instances to 1 so that it's always running.
When deploying a new instance, the instance count jumps to 3. This is a problem (I make some requests on instance start that hits a 429 if two instances are simultaneously making these requests).
Why is CloudRun instance count going over my max?
I can confirm my settings are correct and looking at the logs there are two new instances that start up.
PS: Cloudrun does have this message, which makes me think what I'm trying to do isn't possible. I just figured it would be because of downtime instead of extra instances.
Revisions using a maximum number of instances of 3 or less might experience unexpected downtime.
Your scenario seems to fit one described in the documentation, in which a new deployment for a Cloud Run service might temporarily create additional instances:
When you deploy a new revision, Cloud Run gradually migrates traffic from the old revision to the new one. Because maximum instance limits are set for each revision, you may temporarily exceed the specified limit during the period after deployment.
Additional instances might be created to handle traffic migration from the previous revision to a new one. Cloud Run offers settings that can alter how migration occurs between revisions. One of these settings is used to instantly serve all new traffic on the new revision. You can test if using this setting helps reduce the number of instances that are created. I tested one of the provided sample services and created multiple revisions, which did not exceed 1 active instance.
I read that it is a cloud run problem you need 10 instances and then the error disappears.
I understood that it is a bug in golang that does not know how to do it with two instances, but with 10 instances it should be fixed.
https://github.com/ahmetb/cloud-run-faq/issues/54

ECS is there a way to avoid downtime when I change instance type on Cloudformation?

I have created a cluster to run our test environment on Aws ECS everything seems to work fine including zero downtime deploy, But I realised that when I change instance types on Cloudformation for this cluster it brings all the instances down and my ELB starts to fail because there's no instances running to serve this requests.
The cluster is running using spot instances so my question is there by any chance a way to update instance types for spot instances without having the whole cluster down?
Do you have an AutoScaling group? This would allow you to change the launch template or config to have the new instances type. Then you would set the ASG desired and minimum counts to a higher number. Let the new instance type spin up, go into service in the target group. Then just delete the old instance and set your Auto scaling metrics back to normal.
Without an ASG, you could launch a new instance manually, place that instance in the ECS target group. Confirm that it joins the cluster and is running your service and task. Then delete the old instance.
You might want to break this activity in smaller chunks and do it one by one. You can write small cloudformation template as well because by default if you update the instance type then your instances will be restarted and to avoid zero downtime, you might have to do it one at a time.
However, there are two other ways that I can think of here but both will cost you money.
ASG: Create a new autoscaling group or use the existing one and change the launch configuration.
Blue/Green Deployment: Create the exact set of resources but this time with updated instance type and use Route53's weighted routing policy to control the traffic.
It solely depends upon the requirement, if you can pour money then go with above two approaches otherwise stick with the small deployments.

GCP VM can't start or move TERMINATED instance

I'm running into a problem starting my Google Cloud VM instance. I wanted to restart the instance so I hit the stop button but this was just the beginning of a big problem.
start failed with an error that the zone did not have enough capacity. Message:
The zone 'XXX' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
I tried and retried till I decided to move it to another zone and ran:
gcloud compute instances move VM_NAME --destination-zone NEW_ZONE
I then get the error:
Instance cannot be moved while in state: TERMINATED
What am I supposed to do???
I'm assuming that this is a basic enough issue that there's a common way to solve for this.
Thanks
Edit: I have since managed to start the instance but would like to know what to do next time
The correct solution depends on your criteria.
I assume you're using Preemptible instances for their cost economies but -- as you've seen, there's a price -- sometimes non-preemptible resources are given the priority and sometimes (more frequently than for regular cores) there are insufficient preemptible cores available.
While it's reasonable to want to, you cannot move stopped instances between zones in a region.
I think there are a few options:
Don't use Preemptible. You'll pay more but you'll get more flexibility.
Use Managed Instance Groups (MIGs) to maintain ~1 instance (in the region|zone)
(for completeness) consider using containers and perhaps Cloud Run or Kubernetes
You describe wanting to restart your instance. Perhaps this was because you made some changes to it. If this is the case, you may wish to consider treating your instances as more being more disposable.
When you wish to make changes to the workload:
IMPORTANT ensure you're preserving any important state outside of the instance
create a new instance (at this time, you will be able to find a zone with capacity for it)
once the new instance is running correctly, delete the prior version
NB Both options 2 (MIGs) and 3 (Cloud Run|Kubernetes) above implement this practice.

Single instances on google clound to group of instances

I have one instance on google clound, but the cpu usage is over 99% how can I scale or move that instance to an group I am using node js and mysql, this is possible?
You have different options to create an instance group. you can use an unmanaged instance group link to use dissimilar instances that you can arbitrarily add and remove.
Or you can create an image template and use a managed instance group link with identical instances, where you will be able to autoscale the number of instances.
But to give you a better answer, could you explain a little bit your application? Because if the only issue is the CPU usage, you can change the number of CPUs once the instance has stopped by with the edit button.

Google cloud instance group VM's keep getting reset back to original image

For some reason my instance group VM's keep getting reset back to the original image. i.e after I've installed and configured software everything gets whiped out. Additionally, in some occasions their IP's also change so I have to go and edit my Cloud SQl instance to allow network connections. Anyone seen this behavior before?
It sounds like you're using Managed Instance Groups, which are designed to work with stateless workloads. MIGs will scale their size up and down, if you have Autoscaler enabled, and scaling down will delete instances. The health checking feature can also destroy and recreate instances.
If you need extra software installed on MIG instances, you need to create a single VM the way you want, and then create a Snapshot of that VM's disk (and then an Image from the Snapshot). The Instance Template creates fresh instances from that Image file every time.
Even if you recreate your image the way you want with all software installed, MIGs will still create and destroy instances assuming there is nothing value on any of them. And yes, their IPs could change too, because new instances are being created.