Marking a compute instance as busy to prevent disrupting connections - google-cloud-platform

I have a Golang service using TCP running on GCP's compute VMs with autoscaling. When the CPU usage spikes, new instances are created and deployed (as expected), but when the CPU usage settles again the instances are destroyed. This would be fine and it's entirely reasonable as to why this is done, but destroying instances does not take into account the established TCP connections and thus disconnects users.
I'd like to keep the VM instances running until the last connection has been closed to prevent disconnecting users. Is there a way to mark the instance as "busy" telling the autoscaler not to remove that instance until it isn't busy? I have implemented health checks but these do not signal the busyness of the instance, only whether the instance is alive or not.

You need to enable Connection Draining for your auto-scaling group:
If the group is part of a backend service that has enabled connection draining, it can take up to 60 seconds after the connection draining duration has elapsed before the VM instance is removed or deleted.
Here are the steps on how to achieve this:
Go to the Load balancing page in the Google Cloud Console.
Click the Edit button for your load balancer or create a new load balancer.
Click Backend configuration.
Click Advanced configurations at the bottom of your backend service.
In the Connection draining timeout field, enter a value from 0 - 3600. A setting of 0 disables connection draining.
Currently you can request connection draining timeout upto 3600s (= 1hour) which should be suffice for your requirements.
see: https://cloud.google.com/compute/docs/autoscaler/understanding-autoscaler-decisions

Related

GCP Backend service connection draining options for deployments

Have anybody ever tried to achieve GC HTTP(S) load balancer backend connection draining by either
Setting the capacity of the respective instance groups inside the backend service to 0% (0 RPS)
Removing the instance group(s) from the backend service
Changing the backend service in the URL map to point to another backend service.
I would like to achieve A/B testing deployment with a GCLB in front of two GKE clusters. The docs only say connection draining is triggered for a specific instance when an instance is removed from the instance group (automatically or manually):
https://cloud.google.com/load-balancing/docs/enabling-connection-draining
Those are very particular scenarios, however the expected behaviour is the following:
Setting a max rate per instance or max rate (per instance group) to zero (when the balancing mode is rate), won't drain existing connections. Balancing mode simply helps the load balancer rank backends (instance groups in this situation) from most to least attractive to handle new connections. When the balancing mode is rate and the max RPS is zero, that just means that the backend is "not attractive" even when it is servicing zero requests. But if all backends have RPS set to zero, or if they don't but are near capacity, it's possible that a backend with RPS of zero is equally (as unattractive) as all the other backends.
Removing the instance group as a backend from the backend service will most likely not respect any connection draining because that removes the load balancer from the equation.
This scenario is pretty similar to the above statement, without the downside of removing the load balancer. However I think that pointing the URL map to different backend won't trigger connection draining since instances will be reachable, despite you are refering to a different backend. Downtime is expected, but draining shouldn't be activated.

ELB always reports instances as inservice

I am using aws ELB to report the status of my instances to an autoscaling group so a non-functional instance would be terminated and replaced by a new one. The ELB is configured to ping TCP:3000 every 60 seconds and wait for a timeout of 10 seconds to consider it a health check failure. the unhealthy threshold is 5 consecutive checks.
However the ELB always reports my instances as healthy and inservice all the time even though I periodically manually come across an instance that is timing out and I have to terminate it manually and launch a new one despite the ELB reporting it as inservice all the time
Why does this happen ?
After investigating a little bit I found that
I am trying to assess the health of the app through an api callto a web app running on the instance and wait for the response to timeout to declare the instance faulty. I needed to use http as the protocol to call port 3000 with a custom path through the load balancer instead of tcp.
Note: The api needs to return a status code of 200 for the load balancer to consider it healthy. It now works perfectly

ELB Connection Draining Configuration

So, we are kinda kinda lost using the AWS ELB connection draining feature.
We have an Auto Scaling Group and we have an application that has independent sessions (A session on every instance). We configured the ELB listener over HTTP on port 80, forwarding to port 8080 (this is of course the port where the application is deployed) and we created a LBCookieStickinessPolicy. We also enabled the connection draining for 120 seconds.
The behavior we want:
We want to scale down an instance but since the session is sticked to each instance, we want to "maintain" that session during 120 seconds (Or the connection draining configuration).
The behavior we have:
We have tried to deregister, set to stanby, terminate, stop, set to unhealthy an instance. But no matter what we do, the instance shut downs immediately causing the session to end abruptly. Also, we changed the ELB listener configuration to work over TCP with no luck.
Thoughts?
Connection draining refers to open tcp connections with the client it has nothing to do with sessions on your instance. You may be able to do something with keep-alives if you do a TCP passthrough instead of http listener.
The best route to go is set up sessions to be shared between your instances and then disable stickyness on the load balancer.

Trying to understand how does the AWS scaling work

There is one thing of scaling that I yet do not understand. Assume a simple scenario ELB -> EC2 front-end -> EC2 back-end
When there is high traffic new front-end instances are created, but, how is the connection to the back-end established?
How does the back-end application keep track of which EC2 it is receiving from, so that it can respond to the right end-user?
Moreover, what happen if a connection was established from one of the automatically created instances, and then the traffic is low again and the instance is removed.. the connection to the end-user is lost?
FWIW, the connection between the servers is through WebSocket.
Assuming that, for example, your ec2 'front-ends' are web-servers, and your back-end is a database server, when new front-end instances are spun up they must either be created from a 'gold' AMI that you previously setup with all the required software and configuration information, OR as part of the the machine starting up it must install all of your customizations (either approach is valid). with either approach they will know how to find the back-end server, either by ip address or perhaps a DNS record from the configuration information on the newly started machine.
You don't need to worry about the backend keeping track of the clients - every client talking to the back-end will have an IP address and TCPIP will take care of that handshaking for you.
As far as shutting down instances, you can enable connection draining to make sure existing conversations/connections are not lost:
When Connection Draining is enabled and configured, the process of
deregistering an instance from an Elastic Load Balancer gains an
additional step. For the duration of the configured timeout, the load
balancer will allow existing, in-flight requests made to an instance
to complete, but it will not send any new requests to the instance.
During this time, the API will report the status of the instance as
InService, along with a message stating that “Instance deregistration
currently in progress.” Once the timeout is reached, any remaining
connections will be forcibly closed.
https://aws.amazon.com/blogs/aws/elb-connection-draining-remove-instances-from-service-with-care/

Prevent machine on Amazon from shutting down before all users finished tasks

I'm planning a server environment on AWS with auto scaling over VPC.
My application has some process that is done in several steps on server, and the user should stick to the same server by using ELB's sticky session.
The problem is, that when the auto scaling group suppose to shut down server, some users may be in the middle of the process (the process takes multiple request - for example -
1. create an album
2. upload photos to the album each at a time
3. convert photos to movie and delete photos
4. store movie on S3)
Is it possible to configure the ELB to stop passing NEW users to the server that is about to shut down, while still passing previous users (that has the sticky session set)?, and - is it possible to tell the server to wait for, let's say, 10 min. after the shutdown rule applied before it actually shut down?
Thank you very much
This feature hasn't been available in Elastic Load Balancing at the time of your question, however, AWS has meanwhile addressed the main part of your question by adding ELB Connection Draining to avoid breaking open network connections while taking an instance out of service, updating its software, or replacing it with a fresh instance that contains updated software.
Please not that you still need to specify a sufficiently large timeout based on the maximum time you expect users to finish their activity, see Connection Draining:
When you enable connection draining for your load balancer, you can set a maximum time for the load balancer to continue serving in-flight requests to the deregistering instance before the load balancer closes the connection. The load balancer forcibly closes connections to the deregistering instance when the maximum time limit is reached.
[...]
If your instances are part of an Auto Scaling group and if connection draining is enabled for your load balancer, Auto Scaling will wait for the in-flight requests to complete or for the maximum timeout to expire, whichever comes first, before terminating instances due to a scaling event or health check replacement. [...] [emphasis mine]
The emphasized part confirms that it is not possible to specify an additional timeout that only applies after the last connection has been drained.