AWS - Does Elastic Load Balancing actually prevent LOAD BALANCER failover? - amazon-web-services

I've taken this straight from some AWS documentation:
"As traffic to your application changes over time, Elastic Load Balancing scales your load balancer and updates the DNS entry. Note that the DNS entry also specifies the time-to-live (TTL) as 60 seconds, which ensures that the IP addresses can be remapped quickly in response to changing traffic."
Two questions:
1) I was under the impression originally that a single static IP address would be mapped to multiple instances of an AWS load balancer, thereby causing fault tolerance on the balancer level, if for instance one machine crashed for whatever reason, the static IP address registered to my domain name would simply be dynamically 'moved' to another balancer instance and continue serving requests. Is this wrong? Based on the quote above from AWS, it seems that the only magic happening here is that AWS's DNS servers hold multiple A records for your AWS registered domain name, and after 60 seconds of no connection from the client, the TTL expires and Amazon's DNS entry is updated to only start sending requests to active IP's. This still takes 60 seconds on the client side of failed connection. True or false? And why?
2) If the above is true, would it be functionally equivalent if I were using a host provider of say, GoDaddy, entered multiple "A" name records, and set the TTL to 60 seconds?
Thanks!

The ELB is assigned a DNS name which you can then assign to an A record as an alias, see here. If you have your ELB set up with multiple instances you define the health check. You can determine what path is checked, how often, and how many failures indicate an instance is down (for example check / every 10s with a 5s timeout and if it fails 2 times consider it unhealthy. When an instance becomes unhealthy all the remaining instances still serve requests just fine without delay. If the instance returns to a healthy state (for example its passes 2 checks in a row) then it returns as a healthy host in the load balancer.
What the quote is referring to is the load balancer itself. In the event it has an issue or an AZ becomes unavailable its describing what happens with the underlying ELB DNS record, not the alias record you assign to it.
Whether or not traffic is effected is partially dependent on how sessions are handled by your setup. Whether they are sticky or handled by another system like elasticache or your database.

Related

Creating AWS Application Load Balancer Listener rules based on intermidiary DNS host name

In AWS EC2, I have one Application Load Balancer (ALB) alb1-556083996.us-east-1.elb.amazonaws.com, and two target groups tg-blue and tg-green.
In AWS route53, I have two records api.myapp.example.com and api-blue.myapp.example.com.
The route53 record api.myapp.example.com is aliased to api-blue.myapp.example.com which is alieased to an AWS ALB alb1-123456789.us-east-1.elb.amazonaws.com.
Both of these two route53 records are latency based with Evaluate target health enabled.
The route53 record api.myapp.example.com also can be aliased to api-green.myapp.example.com at some point instead of aliasing to api-blue.myapp.example.com.
I'd like to create following ALB listener rules:
When the host api.* will be pointing to api-blue.*, the traffics will be forwarded to the target group tg-blue.
When the host 'api.* will be pointing to api-green.*, the traffics will be forwarded to the target group tg-green.
The default rule will return a FIXED_RESPONSE.
In addition, as Evaluate target health and latency based routing have been enabled on both level of aliases, I'd expect the traffics will never be dropped as long as at least one target group has targets registered to it.
I have LB access logs enabled. When I hit api.myapp.example.com or api-blue.myapp.example.com, the host api-blue.* does not appear in the access logs. So, I'm afraid that host-headers based rule won't work here. I will need a lot of experiments to be able to figure out if the above is working or not.
So, I'm trying to find out whether it is possible or not from someone knowledgable in this area. If it is not possible, how can I achieve this?

ELB cross-AZ balancing DNS resolution with Sticky sessions

I am preparing for AWS certification and came across a question about ELB with sticky session enabled for instances in 2 AZs. The problem is that requests from a software-based load tester in one of the AZs end up in the instances in that AZ only instead of being distributed across AZs. At the same time regular requests from customers are evenly distributed across AZs.
The correct answers to fix the load tester issue are:
Forced the software-based load tester to re-resolve DNS before every
request;
Use third party load-testing service to send requests from
globally distributed clients.
I'm not sure I can understand this scenario. What is the default behaviour of Route 53 when it comes to ELB IPs resolution? In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request? Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?
I can understand that requests from the same instance, with load testing software on it, will go to the same LBed EC2, but why does it have to be an instance in the same AZ? It can be achieved only by Geolocation- or Latency-based routing, but I can't find anything in the specs whether those are the default ones.
When an ELB is in more than one availability zone, it always has more than one public IP address -- at least one per zone.
When you request these records in a DNS lookup, you get all of these records (assuming there are not very many) or a subset of them (if there are a large number, which would be the case in an active cluster with significant traffic) but they are unordered.
If the load testing software resolves the IP address of the endpoint and holds onto exactly one of the IP addresses -- as it a likely outcome -- then all of the traffic will go to one node of the balancer, which is in one zone, and will send traffic to instances in that zone.
But what about...
Cross-Zone Load Balancing
The nodes for your load balancer distribute requests from clients to registered targets. When cross-zone load balancing is enabled, each load balancer node distributes traffic across the registered targets in all enabled Availability Zones. When cross-zone load balancing is disabled, each load balancer node distributes traffic across the registered targets in its Availability Zone only.
https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html
If stickiness is configured, those sessions will initially land in one AZ and then stick to that AZ because they stick to the initial instance where they landed. If cross-zone is enabled, the outcome is not quite as clear, but either balancer nodes may prefer instances in their own zone in that scenario (when first establishing stickiness), or this wasn't really the point of the question. Stickiness requires coordination, and cross-AZ traffic takes a non-zero amount of time due to distance (typically <10 ms) but it would make sense for a balancer to prefer to select instances its local zone for sessions with no established affinity.
In fact, configuring the load test software to re-resolve the endpoint for each request is not really the focus of the solution -- the point is to ensure that (1) the load test software uses all of them and does not latch onto exactly one and (2) that if more addresses become available due to the balancer scaling out under load, that the load test software expands its pool of targets.
In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request?
The software may not see the TTL, may not honor the TTL and, as noted above, may stick to one answer even if multiple are available, because it only needs one in order to make the connection. Every request is not strictly necessary, but it does solve the problem.
Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?
To "resolve DNS" in this context simply means to do a DNS lookup, whatever that means in the specific instance, whether using the OS's DNS resolver or making a direct query to a recursive DNS server. When software establishes a connection to a hostname, it "resolves" (looks up) the associated IP address.
The other solution, "use third party load-testing service to send requests from globally distributed clients," solves the problem by accident, since the distributed clients -- even if they stick to the first address they see -- are more likely to see all of the available addresses. The "global" distribution aspect is a distraction.
ELB relies on random arrival of requests across its external-facing nodes as part of the balancing strategy. Load testing software whose design overlooks this is not properly testing the ELB. Both solutions mitigate the problem in different ways.
The sticky is the issue , see here : https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html
The load balancer uses a special cookie to associate the session with
the instance that handled the initial request, but follows the
lifetime of the application cookie specified in the policy
configuration. The load balancer only inserts a new stickiness cookie
if the application response includes a new application cookie. The
load balancer stickiness cookie does not update with each request. If
the application cookie is explicitly removed or expires, the session
stops being sticky until a new application cookie is issued.
The first solution, to re-resolve DNS will create new sessions and with this will break the stickiness of the ELB . The second solution is to use multiple clients , stickiness is not an issue if the number of globally distributed clients is large.
PART 2 : could not add as comment , is to long :
Yes, my answer was to simple and incomplete.
What we know is that ELB is 2 AZ and will have 2 nodes with different IP. Not clear how many IP , depends on the number of requests and the number of servers on each AZ. Route 53 is rotating the IP for every new request , first time in NodeA-IP , NodeB-IP , second time is NodeB-IP, NodeA-IP. The load testing application will take with every new request the first IP , balancing between the 2 AZ. Because a Node can route only inside his AZ , if the sticky cookie is for NodeA and the request arrives to NodeB , NodeB will send it to one of his servers in AZ2 ignoring the cookie for a server in AZ 1.
I need to run some tests, quickly tested with Route53 with classic ELB and 2 AZ and is rotating every time the IP's. What I want to test if I have a sticky cookie for AZ 1 and I reach the Node 2 will not forward me to Node 1 ( In case of no available servers, is described in the doc this interesting flow ). Hope to have updates in short time.
Just found another piece of evidence that Route 53 returns multiple IPs and rotate them for ELB scaling scenarios:
By default, Elastic Load Balancing will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request. As the traffic profile changes, the controller service will scale the load balancers to handle more requests, scaling equally in all Availability Zones.
And then:
To ensure that clients are taking advantage of the increased capacity, Elastic Load Balancing uses a TTL setting on the DNS record of 60 seconds. It is critical that you factor this changing DNS record into your tests. If you do not ensure that DNS is re-resolved or use multiple test clients to simulate increased load, the test may continue to hit a single IP address when Elastic Load Balancing has actually allocated many more IP addresses.
What I didn't realize at first is that even if regular traffic is distributed evenly across AZs, it doesn't mean that Cross-Zone Load Balancing is enabled. As Michael pointed out regular traffic will naturally come through various locations and end up in different AZs.
And as it was not explicitly mentioned in the test, Cross-AZ balancing might not have been in place.
https://aws.amazon.com/articles/best-practices-in-evaluating-elastic-load-balancing/

Customizing/Architecting AWS ELB to have Zero Downtime

So other day we faced an issue where one of the instance behind our application load balancer failed Instance Status Check and System Check. It took about 10 sec (the minimum we can get) for our ELB to detect this and mark the instance as "unhealthy", however we lost some amount of traffic in those 10 seconds as the ELB kept routing traffic to the unhealthy instance. Is there a solution where we can avoid literally any downtime or am I being too unrealistic?
I'm sure this isn't the answer you want to hear, but in order to minimize traffic loss on your systems if 10s is not tolerable, you'll need to implement your own health check/load balancing solution. My organization has systems where packet loss is unacceptable as well, and that's what we needed to do.
This solution is twofold.
You need to implement your own load-balancing infrastructure. We chose to use Route53 weighted record sets (TTL of 1s, we'll get back to this) with equal weight for each server
Launch an ECS container instance per load-balanced EC2 instance whose sole purpose is to health check. It runs both DNS and IP health checks (requests library in python) and will add/remove the Route53 weighted record real-time as it sees an issue.
In our testing, however, we discovered that while the upstream DNS servers from Route53 honor the 1 second TTL upon removal of a DNS record, they "blacklist" that record (FQDN + IP combo) from coming back up again for up to 10 minutes (we get variance of resolution times from 1m-10m). So you'll be able to failover quickly, but you must take into account it will take up to 10 minutes for the re-addition of the record to be honored.

How rate limiting will work when same instance group is behind two different load balancers

I was reading about rate limiting and auto-scaling in GCP and got stuck at this question:
Scenario:
I created a instance group ig with auto-scaling OFF.
I created a load balancer lb1, details are:
lb1 contains a backend service bs1 which points to instance group
ig and Maximum RPS set to 1000 for whole group.
frontend port :8080
path rule : /alpha/*
lb1 is an external load balancer
I created one more load balancer lb2, details are:
lb2 contains a backend service bs2 which points to instance group
ig and Maximum RPS set to 2000 for whole group.
frontend port :9090
path rule : /beta/*
lb2 is an regional load balancer
Question that I have:
Who will monitor the requests served by the both the load balancers?
Which limit will be honoured 1000 or 2000?
Will the overall requests (i.e via lb1 and lb2) will be rate limited or individual limits will be applied for both the request flows?
TL;DR - The RPS is set in the Backend Service, so each load balancer will have its own RPS limit independent of another.
Who will monitor the requests served by the both the load balancers?
Google Compute Engine (GCE) will monitor the requests being served by the load balancers and direct traffic accordingly to stay within the RPS limit of each backend within the backend service.
Which limit will be honoured 1000 or 2000?
1000 with respect to the first load balancer and 2000 with respect to the second load balancer. Remember that the you're using 2 separate backend services bs1 and bs2 for lb1 and lb2 respectively.
Will the overall requests (i.e via lb1 and lb2) will be rate limited or individual limits will be applied for both the request flows?
Requests going through lb1 for bs1 will conform to maximum of 1000 RPS per backend VM. Requests going through lb2 for bs2 will conform to maximum of 2000 RPS per backend VM. So your service running in any given backend VM instance, should be capable of handling at least 3000 RPS.
Longer version
Instance groups do not have a way to specify RPS, only backend services do. Instance groups only help to group a list of instances. So although you could use the same instance groups in multiple backend services, you need to account for the RPS value you set in the corresponding backend service if your goal is to share instances among multiple backend services. GCE will not be able to figure this out automatically.
A backend service represents a micro-service ideally, which is served by a group of backend VMs (from the instance group). You should calculate beforehand how much maximum RPS a single backend instance (i.e. your service running inside the VM) can handle to set this limit. If you intend to share VMs across backend services, you will need to ensure that the combined RPS limit in the worst case is something that your service inside the VM is able to handle.
Google Compute Engine (GCE) will monitor the metrics per backend service (i.e. number of requests per second in your case) and will use that for load balancing. Each load balancer is logically different, and hence there will be no aggregation across load balancers (even if using the same instance group).
Load distribution algorithm
HTTP(S) load balancing provides two methods of determining instance
load. Within the backend service object, the balancingMode property
selects between the requests per second (RPS) and CPU utilization
modes. Both modes allow a maximum value to be specified; the HTTP load
balancer will try to ensure that load remains under the limit, but
short bursts above the limit can occur during failover or load spike
events.
Incoming requests are sent to the region closest to the user, provided
that region has available capacity. If more than one zone is
configured with backends in a region, the traffic is distributed
across the instance groups in each zone according to each group's
capacity. Within the zone, the requests are spread evenly over the
instances using a round-robin algorithm. Round-robin distribution can
be overridden by configuring session affinity.
maxRate and maxRatePerInstance
In the backend service, there are 2 configuration fields related to RPS, one is maxRate and other is maxRatePerInstance. maxRate can be used to set the RPS per group whereas maxRatePerInstance can be used to set the RPS per instance. It looks like both can be used in conjunction if needed.
backends[].maxRate
integer
The max requests per second (RPS) of the
group. Can be used with either RATE or UTILIZATION balancing modes,
but required if RATE mode. For RATE mode, either maxRate or
maxRatePerInstance must be set.
This cannot be used for internal load balancing.
backends[].maxRatePerInstance
float
The max requests per second (RPS)
that a single backend instance can handle.This is used to calculate
the capacity of the group. Can be used in either balancing mode. For
RATE mode, either maxRate or maxRatePerInstance must be set.
This cannot be used for internal load balancing.
Receiving requests at a higher rate than specified RPS
If you happen to receive requests at a rate higher than the RPS and you have autoscaling disabled, I could not find any documentation on the Google Cloud website regarding the exact expected behavior. The closest I could find is this one, where it specifies that the load balancer will try to keep each instance at or below the specified RPS. So it could mean that the requests could get dropped if it exceeds the RPS, and clients might see one of the 5XX error codes (possibly 502) based on this:
failed_to_pick_backend
The load balancer failed to pick a healthy backend to handle the
request.
502
You could probably figure it out the hard way by setting a fairly low RPS like 10 or 20 and see what happens. Look at the timestamps at which you receive the requests on your backend to determine the behavior. Also, the limiting might not happen on exactly the 11th or 21st request, so try sending far more than that per second to verify if the requests are being dropped.
With Autoscaling
If you enable autoscaling though, this will automatically trigger the autoscaler and make it expand the number of instances in the instance group based on the target utilization level you set in the Autoscaler.
NOTE: Updated answer since you actually specified that you're using 2 separate backend services.

How do I set up ElasticSearch nodes on EC2 so they have consistent DNS entries?

I have 3 ElasticSearch nodes in a cluster on AWS EC2. My client apps use connection pooling and have the public IP addresses for all 3 nodes in their config files.
The problem I have is that EC2 seems to occasionally reassign public IP addresses for these instances. They also change if I stop and restart an instance.
My app will actually stay online since the connection pool will round robin the three known IP addresses, but eventually, all three will change and the app will stop working.
So, how should I be setting up an ElasticSearch cluster on EC2 so that my clients can continue to connect even if the instances change IP addresses?
I could use Elastic IPs, but these are limited to 5 per account and I will eventually have many more than 5 nodes (different environments, dev, staging, test, etc.)
I could use Elastic Load Balancers, and put one node behind each ELB, but that seems like a pretty hacky solution and an improper use of load balancers.
I could create my own DNS entries under my own domain and update the DNS table whenever I notice an IP address has changed, but that seems really error prone if no one is checking the IPs every day.
Is there an option I'm missing?
I haven't seen the changing IP address on a running instance that you're describing, but using this approach, it shouldn't matter:
Use DNS names for everything, not IP addresses.
Lets say you want to hit your cluster via http://elastic.rabblerabble.com:9200.
Create the EC2 instances for your nodes. Name them elastic-0, elastic-1, and elastic-2.
In EC2 Load Balancers, create an ELB named 'es-elb' that includes each of these instances by name, with port forwarding of port 9200.
In Route 53, create unique CNAMEs for each of your instances, with the Public DNS as the value,
and a CNAME for your ELB:
Name Type Value
elastic-0.rabblerabble.com. CNAME Public DNS of instance elastic-0
elastic-1.rabblerabble.com. CNAME Public DNS of instance elastic-1
elastic-2.rabblerabble.com. CNAME Public DNS of instance elastic-2
elastic.rabblerabble.com. CNAME Public DNS of ELB es-elb
There's more needed for security, health checks, etc. but that's outside the scope of the question.
Use one or two query only nodes - referred to in the documentation as "non data" nodes.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html
In front of the cluster we can start one or more "non data" nodes
which will start with HTTP enabled. All HTTP communication will be
performed through these "non data" nodes.
The benefit of using that is first the ability to create smart load
balancers. These "non data" nodes are still part of the cluster, and
they redirect operations exactly to the node that holds the relevant
data. The other benefit is the fact that for scatter / gather based
operations (such as search), these nodes will take part of the
processing since they will start the scatter process, and perform the
actual gather processing.
These nodes don't need much disk (they are query and index processing only). You route all your requests thru them. You can add more and more data nodes as you ingest more data without changing these "non data" nodes. You run a couple of them (to be safe) and use either DNS or Elastic IP addresses. You need far fewer of the IP addresses as these are not data nodes and you tend not to need to change these as frequently as you do data nodes.
This configuration approach is documented in the elasticsearch.yml file, quoted below:
You want this node to be neither master nor data node, but
to act as a "search load balancer" (fetching data from nodes,
aggregating results, etc.)
node.master: false
node.data: false