We are in the process of developing a WebSocket application that will run on the same application servers that serve our APIs, which are all within a target group of a new Amazon Application Load Balancer.
I'm not certain that a sticky session would even be needed once the socket is upgraded, however, using listener forwarding rules should make easy work of that.
My concern comes from scaling actions performed during auto scaling of the target groups, specifically the scale-in action. As currently scaling actions are based on RequestCountPerTarget, when instances get terminated because that metric isn't above the threshold, this doesn't guarantee that the instance has no active WebSocket connections.
I'm assuming this would mean that when the instance is shutdown and terminated, those socket connections would abruptly be interrupted.
What would be the best way to combat this?
Is there another metric that I could use to auto scale-out the group based on the number of active connections per target to better facilitate WebSocket scaling in addition to API requests?
I thought about creating an SNS topic and a LifecycleHook for instance termination on the auto scaling group that I could handle in the WebSocketHandler to send a message to all sockets on that server that they will need to disconnect and connect to another server in the load balancer.
Related
I have a container deployed on ECS Fargate as a service.
The container should serve long HTTP Websocket connections and perform real time processing. Each connection can live from few minutes to few hours in different use cases.
Each container can serve up to constant amount of connections simultaneously (e.g max 10 connection) to be able to process to input in real-time.
AWS Application Load balancer is at the front of this service.
On regular autoscaling rules - containers number can be scaled out or down by monitoring CPU.
This Application Load balancer is using round robin routing algo for each incoming request.
My question :
Having the requirement of constant HARD limit of connections per container, how can I enforce ALB not to route new connection to a container with no available connection slot?
The service itself inside the container - can it tell ALB that it is closed for new connections? By specific HTTP response maybe?
Is there any other good practice to handle this requirement?
You will need to write your own code for this.
A possible solution is to combine:
Auto Scaling
Lifecycle hooks
Container Instance Draining.
Your code will need to detect how many connections it is processing. When the number hits your limit of 10, remove the container from the auto scaling group. By using Lifecycle hooks, you can keep the container alive. Once your 10 connections reach 0, complete the termination of the container.
Note this will cause a new container to be launched while you are draining the container that has reached its peak.
I don't know of another method to tell the ALB to stop sending traffic to a specific container without removing it. They key is the draining and termination lifecycle part as you want the container to continue to have its connections to the client.
We are using AWS classic ELB for our service and our service can only serve x number of requests at a time. If the number of requests are greater than x then we do not want to route those requests to the instance and neither do we want to lose those requests. We would like to limit the number of connections to the instances registered with the ELB. Is there some ELB setting to configure max connections to instances?
Another solution I could find was to use ELB connection draining but based on the ELB doc [1] , using connection draining will mark the instance as OutofService after serving in-flight requests. Does that mean the instance will be terminated and de-registered from ELB after in-flight requests are served? We do not want to terminate and de-register the instances, we just want to limit the number of connections to the instances. Any solutions?
[1] http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html
ELB is more meant to spread traffic evenly across instances registered for it. If you have more traffic, you throw up more instances to deal with it. This is generally why a load balancer is matched with an auto scaling group. The Auto Scaling Group will look at set constraints and based on that either spins up more instances or pulls them down (ie. your traffic starts to slow down).
Connection draining is more meant for pulling traffic from bad instances so it doesn't get lost. Bad instances mean they aren't passing health checks because something on the instance is broken. ELB by itself doesn't terminate instances, that's another part of what the Auto Scaling Group is meant to do (basically terminate the bad instance and spin up a new instance to replace it). All ELB does is stop sending traffic to it.
It appears your situation is:
Users are sending API requests to your Load Balancer
You have several instances associated with your Load Balancer to process those requests
You do not appear to be using Auto Scaling
You do not always have sufficient capacity to respond to incoming requests, but you do not want to lose any of the requests
In situations where requests come at a higher rate than you can process them, you basically have three choices:
You could put the messages into a queue and consume them when capacity is available. You could either put everything in a queue (simple), or only use a queue when things are too busy (more complex).
You could scale to handle the load, either by using Auto Scaling to add additional Amazon EC2 instances or by using AWS Lambda to process the requests (Lambda automatically scales).
You could drop requests that you are unable to process. Unless you have implemented a queue, this is going to happen at some point if requests rise above your capacity to process them.
The best solution is to use AWS Lambda functions rather than requiring Amazon EC2 instances. Lambda can tie directly to AWS API Gateway, which can front-end the API requests and provide security, throttling and caching.
The simplest method is to use Auto Scaling to increase the number of instances to try to handle the volume of requests you have arriving. This is best when there are predictable usage patterns, such as high loads during the day and less load at night. It is less useful when spikes occur in short, unpredictable periods.
To fully guarantee no loss of requests, you would need to use a queue. Rather than requests going directly to your application, you would need an initial layer that receives the request and pushes it into a queue. A backend process would then process the message and return a result that is somehow passed back as a response. (It's more difficult providing responses to messages passed via a queue because there is a disconnect between the request and the response.)
AWS ELB is practically no limit to get request. If your application handle only 'N' connection, Please go with multiple servers behind the ELB and set ELB health check URL will be your application URL. Once your application not able to respond the request, ELB automatically forward your request to another server which is behind ELB. So that you are not going to miss any request.
There is one thing of scaling that I yet do not understand. Assume a simple scenario ELB -> EC2 front-end -> EC2 back-end
When there is high traffic new front-end instances are created, but, how is the connection to the back-end established?
How does the back-end application keep track of which EC2 it is receiving from, so that it can respond to the right end-user?
Moreover, what happen if a connection was established from one of the automatically created instances, and then the traffic is low again and the instance is removed.. the connection to the end-user is lost?
FWIW, the connection between the servers is through WebSocket.
Assuming that, for example, your ec2 'front-ends' are web-servers, and your back-end is a database server, when new front-end instances are spun up they must either be created from a 'gold' AMI that you previously setup with all the required software and configuration information, OR as part of the the machine starting up it must install all of your customizations (either approach is valid). with either approach they will know how to find the back-end server, either by ip address or perhaps a DNS record from the configuration information on the newly started machine.
You don't need to worry about the backend keeping track of the clients - every client talking to the back-end will have an IP address and TCPIP will take care of that handshaking for you.
As far as shutting down instances, you can enable connection draining to make sure existing conversations/connections are not lost:
When Connection Draining is enabled and configured, the process of
deregistering an instance from an Elastic Load Balancer gains an
additional step. For the duration of the configured timeout, the load
balancer will allow existing, in-flight requests made to an instance
to complete, but it will not send any new requests to the instance.
During this time, the API will report the status of the instance as
InService, along with a message stating that “Instance deregistration
currently in progress.” Once the timeout is reached, any remaining
connections will be forcibly closed.
https://aws.amazon.com/blogs/aws/elb-connection-draining-remove-instances-from-service-with-care/
Is it possible to automatically 'roll over' a connection between auto scaled instances?
Given instances which provide a compute intensive service, we would like to
Autoscale a new instance after CPU reachs say 90%
Have requests for service handled by the new instance.
It does not appear that there is a way with the AWS Dashboard to set this up, or have I missed something?
What you're looking for is a load balancer. If you're using HTTP, this works pretty much out of the box. Clients open connections to the load balancer, which then distributes individual HTTP requests from the connection evenly across instances in your auto scaling group. When a new instance joins the group, the load balancer automatically shifts a portion of the incoming requests over to the new instance.
Things get a bit trickier if you're speaking a protocol other than HTTP(S). A generic TCP load balancer can't tell where one "request" ends and the next begins (or if that even makes sense for your protocol), so incoming TCP connections get mapped directly to a particular backend host. The load balancer will route new connections to the new instance when it spins up, but it can't migrate existing connections over.
Typically what you'll want to do in this scenario is to have clients periodically close their connections to the service and create new ones - especially if they're seeing increased latencies or other evidence that the instance they're talking to is overworked.
I'm planning a server environment on AWS with auto scaling over VPC.
My application has some process that is done in several steps on server, and the user should stick to the same server by using ELB's sticky session.
The problem is, that when the auto scaling group suppose to shut down server, some users may be in the middle of the process (the process takes multiple request - for example -
1. create an album
2. upload photos to the album each at a time
3. convert photos to movie and delete photos
4. store movie on S3)
Is it possible to configure the ELB to stop passing NEW users to the server that is about to shut down, while still passing previous users (that has the sticky session set)?, and - is it possible to tell the server to wait for, let's say, 10 min. after the shutdown rule applied before it actually shut down?
Thank you very much
This feature hasn't been available in Elastic Load Balancing at the time of your question, however, AWS has meanwhile addressed the main part of your question by adding ELB Connection Draining to avoid breaking open network connections while taking an instance out of service, updating its software, or replacing it with a fresh instance that contains updated software.
Please not that you still need to specify a sufficiently large timeout based on the maximum time you expect users to finish their activity, see Connection Draining:
When you enable connection draining for your load balancer, you can set a maximum time for the load balancer to continue serving in-flight requests to the deregistering instance before the load balancer closes the connection. The load balancer forcibly closes connections to the deregistering instance when the maximum time limit is reached.
[...]
If your instances are part of an Auto Scaling group and if connection draining is enabled for your load balancer, Auto Scaling will wait for the in-flight requests to complete or for the maximum timeout to expire, whichever comes first, before terminating instances due to a scaling event or health check replacement. [...] [emphasis mine]
The emphasized part confirms that it is not possible to specify an additional timeout that only applies after the last connection has been drained.