There is one thing of scaling that I yet do not understand. Assume a simple scenario ELB -> EC2 front-end -> EC2 back-end
When there is high traffic new front-end instances are created, but, how is the connection to the back-end established?
How does the back-end application keep track of which EC2 it is receiving from, so that it can respond to the right end-user?
Moreover, what happen if a connection was established from one of the automatically created instances, and then the traffic is low again and the instance is removed.. the connection to the end-user is lost?
FWIW, the connection between the servers is through WebSocket.
Assuming that, for example, your ec2 'front-ends' are web-servers, and your back-end is a database server, when new front-end instances are spun up they must either be created from a 'gold' AMI that you previously setup with all the required software and configuration information, OR as part of the the machine starting up it must install all of your customizations (either approach is valid). with either approach they will know how to find the back-end server, either by ip address or perhaps a DNS record from the configuration information on the newly started machine.
You don't need to worry about the backend keeping track of the clients - every client talking to the back-end will have an IP address and TCPIP will take care of that handshaking for you.
As far as shutting down instances, you can enable connection draining to make sure existing conversations/connections are not lost:
When Connection Draining is enabled and configured, the process of
deregistering an instance from an Elastic Load Balancer gains an
additional step. For the duration of the configured timeout, the load
balancer will allow existing, in-flight requests made to an instance
to complete, but it will not send any new requests to the instance.
During this time, the API will report the status of the instance as
InService, along with a message stating that “Instance deregistration
currently in progress.” Once the timeout is reached, any remaining
connections will be forcibly closed.
https://aws.amazon.com/blogs/aws/elb-connection-draining-remove-instances-from-service-with-care/
Related
Can an AWS Elastic Load Balancer be setup so it sends all traffic to a main server and if that server fails, only then send traffic to a second server.
Have an existing web app I picked up that was never built to run on multiple servers and the client has become worried about redundancy. They don't want to invest enough to make it run well across multiple servers so I was thinking I could setup a second EC2 server with a MySQL slave and periodically copy files from the primary server to the secondary using rsync. Then have an AWS ELB send traffic to the primary server and only if that fails send it to the second server.
AWS load balancers don't support "backup" nodes that only take traffic when the primary is down.
Beyond that, you are proposing a complicated scenario.
was thinking I could setup a second EC2 server with a MySQL slave
If you do that, you can only fail over once, then you can't fail back, because the master database will then be obsolete. For a configuration like this to work and be useful, your two MySQL servers need to be configured with master/master (circular) replication, so that each is a replica of the other. This is an advanced configuration that requires expertise and caution.
For the MySQL component, an RDS instance with multi-AZ enabled will provide you with hands-off fault tolerance of the database.
Of course, the client may be unwilling to pay for this as well.
A reasonable shortcut for small systems might be EC2 instance recovery which will bring the site back up if the underlying hardware fails. This feature replaces a failed instance with a new instance, reattaches the EBS volumes, and starts it back up. If the system is stable and you have a solid backup strategy for all data, this might be sufficient. Effective redundancy as a retrofit is non-trivial.
I run a rabbit HA cluster with 3 nodes and a classic AWS load-balancer(LB) in front of them. There are two apps, one that publishes and the other one that consumes through the LB.
When publisher app starts sending 3 million messages, after short period of time its connection is put into Flow Control state. After the publishing is finished, in publisher app logs I can see that all 3 million messages are sent. On the other hand in consumer app log I can only see 500K - 1M messages (varies between runs), which means that the large number of messages is lost.
So what is happening is that in the middle of a run, classic LB decides to change its IP address or drop connections, thus loosing a lot of messages (see my update for more details).
The issue does not occur if I skip LB and hit the nodes directly, doing load-balancing on app side. Of course in this case I lose all the benefits of ELB.
My question are:
Why is LB changing IP addresses and dropping connections, is that related to high message rate from publisher or Flow Control state?
How to configure LB, so that this issue doesn't occur?
UPDATE:
This is my understanding what is happening:
I use AMQP 0-9-1 and publish without 'publish confirms', so message is considered sent as soon as it's put on a wire. Also, the connection on rabbitmq node is between LB and a node, not Publisher app and a node.
Before the communication enters Flow Control, messages are passed from LB to a node immediately
Then the connection between LB and a node enters Flow Control, Publisher App connection is not blocked and thus it continues to publish at the same rate. That causes messages to pile up on LB.
Then LB decides to change IP(s) or drop the connection for whatever reasons and create a new one, causing all the piled messages to be lost. This is clearly visible from the RabbitMQ logs:
=WARNING REPORT==== 6-Jan-2018::10:35:50 ===
closing AMQP connection <0.30342.375> (10.1.1.250:29564 -> 10.1.1.223:5672):
client unexpectedly closed TCP connection
=INFO REPORT==== 6-Jan-2018::10:35:51 ===
accepting AMQP connection <0.29123.375> (10.1.1.22:1886 -> 10.1.1.223:5672)
The solution is to use AWS network LB. The network LB is going to create a connection between Publisher App and rabbitmq node. So if the connection is blocked or dropped Publisher is going to be aware of that and act accordingly. I have run the same test with 3M messages and not the single message is lost.
In the AWS docs, there's this line which explains the behaviour:
Preserve source IP address Network Load Balancer preserves the client side source IP allowing the back-end to see the IP address of
the client. This can then be used by applications for further
processing.
From: https://aws.amazon.com/elasticloadbalancing/details/
ELBs will change their addresses when they scale in reaction to traffic. New nodes come up, and appear in DNS, and then old nodes may go away eventually, or they may stay online.
It increases capacity by utilizing either larger resources (resources with higher performance characteristics) or more individual resources. The Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer when it scales so that the new resources have their respective IP addresses registered in DNS. The DNS record that is created includes a Time-to-Live (TTL) setting of 60 seconds, with the expectation that clients will re-lookup the DNS at least every 60 seconds. (emphasis added)
— from “Best Practices in Evaluating Elastic Load Balancing”
You may find more useful information in that "best practices" guide, including the concept of pre-warming a balancer with the help of AWS support, and how to ramp up your test traffic in a way that the balancer's scaling can keep up.
The behavior of a classic ELB is automatic, and not configurable by the user.
But it also sounds as if you have configuration issues with your queue, because it seems like it should be more resilient to dropped connections.
Note also that an AWS Network Load Balancer does not change its IP addresses and does not need to scale by replacing resources the way ELB does, because unlike ELB, it doesn't appear to run on hidden instances -- it's part of the network infrastructure, or at least appears that way. This might be a viable alternative.
I am developing an application service based on WSO2 AS. my intention is that the application should be deployed in an AS-cluster in order to cope with the high volume traffic.
the cluster should be a dynamic one in order to scale up or down as per the traffic changes.
also, a user's service might persist in one of the instances for quite some time; in case of failure, a user's service should be restored in a peer instance by the backup and restore mechanism of an object archive(database).
So, the challenge is:
I need to tell the load balancer something about the instance in which the user service persists. so that the load balancer will always route the same user's requests to the same instance in the cluster. and in case of failure, I could update the load balancer with the new instance in which the user's service had been restored.
preferably it could be something that could be generated dynamically by a application server instance; accessible in the program environment; understood and used by the load balancer to route request...
anyone has any idea?
thanks a lot
After googling around for some time. I found an alternative which WSO2 claimed supporting(http://wso2.com/products/elastic-load-balancer/).
NGinx Plus comes with a feature named Session Persistence (https://www.nginx.com/products/session-persistence/) which provides methods directing load balancer of its routing of incoming requests to a specific back end server
Is it possible to automatically 'roll over' a connection between auto scaled instances?
Given instances which provide a compute intensive service, we would like to
Autoscale a new instance after CPU reachs say 90%
Have requests for service handled by the new instance.
It does not appear that there is a way with the AWS Dashboard to set this up, or have I missed something?
What you're looking for is a load balancer. If you're using HTTP, this works pretty much out of the box. Clients open connections to the load balancer, which then distributes individual HTTP requests from the connection evenly across instances in your auto scaling group. When a new instance joins the group, the load balancer automatically shifts a portion of the incoming requests over to the new instance.
Things get a bit trickier if you're speaking a protocol other than HTTP(S). A generic TCP load balancer can't tell where one "request" ends and the next begins (or if that even makes sense for your protocol), so incoming TCP connections get mapped directly to a particular backend host. The load balancer will route new connections to the new instance when it spins up, but it can't migrate existing connections over.
Typically what you'll want to do in this scenario is to have clients periodically close their connections to the service and create new ones - especially if they're seeing increased latencies or other evidence that the instance they're talking to is overworked.
I'm planning a server environment on AWS with auto scaling over VPC.
My application has some process that is done in several steps on server, and the user should stick to the same server by using ELB's sticky session.
The problem is, that when the auto scaling group suppose to shut down server, some users may be in the middle of the process (the process takes multiple request - for example -
1. create an album
2. upload photos to the album each at a time
3. convert photos to movie and delete photos
4. store movie on S3)
Is it possible to configure the ELB to stop passing NEW users to the server that is about to shut down, while still passing previous users (that has the sticky session set)?, and - is it possible to tell the server to wait for, let's say, 10 min. after the shutdown rule applied before it actually shut down?
Thank you very much
This feature hasn't been available in Elastic Load Balancing at the time of your question, however, AWS has meanwhile addressed the main part of your question by adding ELB Connection Draining to avoid breaking open network connections while taking an instance out of service, updating its software, or replacing it with a fresh instance that contains updated software.
Please not that you still need to specify a sufficiently large timeout based on the maximum time you expect users to finish their activity, see Connection Draining:
When you enable connection draining for your load balancer, you can set a maximum time for the load balancer to continue serving in-flight requests to the deregistering instance before the load balancer closes the connection. The load balancer forcibly closes connections to the deregistering instance when the maximum time limit is reached.
[...]
If your instances are part of an Auto Scaling group and if connection draining is enabled for your load balancer, Auto Scaling will wait for the in-flight requests to complete or for the maximum timeout to expire, whichever comes first, before terminating instances due to a scaling event or health check replacement. [...] [emphasis mine]
The emphasized part confirms that it is not possible to specify an additional timeout that only applies after the last connection has been drained.