It has been almost a year since aws deployed the ALB.
There seems to be little to none technical comparison between the classic LB and the ALB.
The only thing that my research yielded is: aws forum question
While the above implies that ALB is not reliable it is only one source.
I am interested in knowing the experience of people who did the switch between ELB and ALB, mainly in regards to latency, resilience, HA.
It seems to me that Layer4 balancing is more robust than Layer7 and therefore the general performance would be better.
I would like to provide my experience here, but I am not going into very detail. Basically my company has over 200k devices on the field actively connecting our backend via websocket. We were using ELB in front our servers and it was working quite well, all the connections were quite stable. We tried to switch ELB to ALB before, but the result wasn't really good and we switched back.
The main problem we found the connections were dropped suddenly (with a pattern like every 15 hrs?). Please check the graph of number of connections I captured from the monitoring system:
As you can see there are some dips in the graph which showing the connections have be dropped. I keep the same server setup, only factor I have changed is from ELB to ALB and got this result.
Related
I'm trying to find a way to relay real requests of production to test environment.
And I've found this.
https://aws.amazon.com/ko/blogs/networking-and-content-delivery/mirror-production-traffic-to-test-environment-with-vpc-traffic-mirroring/
However, I'm using ECS not EC2, and they comprise of ALB and ECS. So I'm wondering if this 'Traffic Mirroring' works with ECS or not, and how to.
Any help would be appreciated.
Leaving answer for someone who might have question same with me.
After spent several hours, I could set up Traffic Mirroring from ecs to ecs.
However the automation for various port could be pretty tricky so I decided not to use traffic mirroring for traffic relay from prod to test.
I chose to use request log instead of Traffic Mirroring feature.
First, the background:
Yesterday our AWS-based business in US West 2, consisting of two auto-scale groups (and various other components like RDS further back) behind an ALB went offline for six hours. Service was only reinstated by building an entirely new ALB (migrating over the rules and target groups).
At 4:15am our local time (GMT+10) the ALB ceased to receive inbound traffic and would not respond to web traffic. We used it for port 80 and port 443 (with SSL cert) termination. At the same time, all target group instances were also marked as "Unhealthy" (although they most certainly were operable) and no traffic was forwarded on to them. DNS resolved correctly to the ALB. It simply stopped responding. Equivalent symptoms to a network router/switch being either switched off or firewalled out of existence.
Our other EC2 servers that were not behind the ALB continued to operate.
Initial thoughts were:
a) deliberate isolation by AWS? Bill not paid, some offence taken at an abuse report? Unlikely and AWS had not notified us of any transgression or reason to take action.
b) A mistake on our part in network configuration? No change had been made in days to NACL or security groups. Further we were sound asleep when it happened, nobody was fiddling with settings. When we built the replacement ALB we used the same NACL and security groups without problem.
c) Maintenance activity gone wrong? This seems most likely. But AWS appeared not to detect the failure. And we didn't pick it up because we considered a complete, inexplicable, and undetected failure of an ALB as "unlikely". We will need to put in place some external healthchecks of our own. We have some based upon Nagios so can enable alarming. But this doesn't help if an ALB is unstable - it is not practical to keep having to build a new one if this reoccurs.
The biggest concern is that this happened suddenly and unexpectedly and that AWS did not detect this. Normally we are never worried about AWS network infrastructure as "it just works". Until now. There's no user-serviceable options for an ALB (eg restart/refresh).
And now my actual question:
Has anyone else ever seen something like this? If so, what can be done to get service back faster or prevent it in the first place? If this happened to you what did you do?
I'm going to close this off.
It happened again the following Sunday, and again this evening. Exact same symptoms. Restoration was initially achieved by creating a new ALB and migrating rules and target groups over. Curiously, the previous ALB was observed to be operational again but when we tried to reinstate it then it failed again.
Creating new ELBs is no longer a workaround and we've switched to an AWS business support to get direct help from AWS.
Our best hypotheses is this: AWS have changed something in their maintenance process and the ALB (which is really just a collection of EC2 instances with some AWS "proprietary code") is failing but it's really just wild speculation.
We are building a scaled application that uses WebSockets on AWS EC2. We were considering using the default ELB (Elastic Load Balancing) for this, but that, unnecessarily, makes the load balancer itself a bottleneck for traffic-heavy operations (see this related thread), so we are currently looking into a way to send the client the connection details of a "good instance" to connect to instead. However, the Elastic Load Balancer API does not seem to support a query of the sort "give me (public) connection details of a good instance", which is odd because that is the core functionality of any load balancer. Maybe I have just not looked at the right place?
UPDATE:
Currently, we are investigating two simple solutions using default implementations:
Use ELB in TCP mode which tunnels all traffic through the ELB.
Simply connect to the public IP of the instance that the ELB connected you to for your GET request. The second solution requires public IPs to be enabled, but does not route all traffic through the ELB.
I was concerned about that very last part because I assumed that the ELB is not in the same building as the instance it gave you. But I assume, it usually is in the same building or has some other high-speed connection to the instances? In that case, the tunneling overhead is negligible.
Both solutions seem to be equally viable, or am I overseeing something?
If your application manages to make the ELB a bottleneck, then you are a pretty big fish. Why don't you try first using their load balancer trusting that they do their job right? It is difficult to make it "better", and the most difficult part about this is to define what is "better" in the first place. You definitely did not very well define that in your question, so I am pretty sure that you are well off using just their load balancer.
In some cases it might make sense to develop your own load balancing logic, especially if your machine usage depends on very special metrics not per se accessible to the ELB system.
Yes, I'd say both solutions are viable.
The upside of the second is that it allows greater customization of the load balancing logic you may want to implement (providing an improvement over ELBs round robin), dispatching requests to a server of your convenience after an initial HTTP GET request.
The downside may be on the security front. It's not clear whether security, and SSL is part of your requirements, but in case it is, the second solution forces you to handle it at the ec2 instances level, which can be inconvenient and affect each node's performance. Otherwise websocket communications may be left unsecured.
Questions about load balancers if you have time.
So I've been using AWS for some time now. Super basic instances, using them to do some tasks whenever I needed something done.
I have a task that needs to be load balanced now. It's not a public service though. It's pretty much a giant cron job that I don't want running on the same servers as my website.
I set up an AWS load balancer, but it doesn't do what I expected it to do.
It get's stuck on one server, and doesn't load balance at all. I've read why it does this, and that's all fine and well, but I need it to be a serious round-robin load balancer.
edit:
I've set up the instances on different zones, but no matter how many instances I add to the ELB, it just uses one. If I take that instance down, it switches to a different one, so I know it's working. But I really would like it to always use a different one under every circumstance.
I know there are alternatives. Here's my question(s):
Would a custom php load balancer be an ok option for now?
IE: Have a list of servers, and have php randomly select a ec2 instance. Wouldn't be scalable at all, bu atleast I could set this up in 2 mins and it can work for now.
or
Should I take the time to learn how HAProxy works, and set that up in place of the AWS ELB?
or
Am I doing it wrong, and AWS's ELB does do round-robin. I just have something configured wrong?
edit:
Structure:
1) Web server finds a task to do.
2) If it's too large it sends it off to AWS (to load balancer).
3) Do the job on EC2
4) Report back via curl to an API
5) Rinse and repeat
Everything works great. But because the connection always comes from my server (one IP) it get's sticky'd to a single EC2 machine.
ELB works well for sites whose loads increase gradually. If you are expecting an uncommon and sudden increase on the load, you can ask AWS to pre-warm it for you.
I can tell you I used ELB in different scenarios and it always worked well for me. As you didn't provide too much information about your architecture, I would bet that ELB works for you, and the case that all connections are hitting only one server, I would ask you:
1) Did you check the ELB to see how many instances are behind it?
2) The instances that you have behind the ELB, are all alive?
3) Are you accessing your application through the ELB DNS?
Anyway, I took an excerpt from the excellent article that does a very good comparison between ELB and HAProxy. http://harish11g.blogspot.com.br/2012/11/amazon-elb-vs-haproxy-ec2-analysis.html
ELB provides Round Robin and Session Sticky algorithms based on EC2
instance health status. HAProxy provides variety of algorithms like
Round Robin, Static-RR, Least connection, source, uri, url_param etc.
Hope this helps.
This point comes as a surprise to many users using Amazon ELB. Amazon
ELB behaves little strange when incoming traffic is originated from
Single or Specific IP ranges, it does not efficiently do round robin
and sticks the request. Amazon ELB starts favoring a single EC2 or
EC2’s in Single Availability zones alone in Multi-AZ deployments
during such conditions. For example: If you have application
A(customer company) and Application B, and Application B is deployed
inside AWS infrastructure with ELB front end. All the traffic
generated from Application A(single host) is sent to Application B in
AWS, in this case ELB of Application B will not efficiently Round
Robin the traffic to Web/App EC2 instances deployed under it. This is
because the entire incoming traffic from application A will be from a
Single Firewall/ NAT or Specific IP range servers and ELB will start
unevenly sticking the requests to Single EC2 or EC2’s in Single AZ.
Note: Users encounter this usually during load test, so it is ideal to
load test AWS Infra from multiple distributed agents.
More info at the Point 9 in the following article http://harish11g.blogspot.in/2012/07/aws-elastic-load-balancing-elb-amazon.html
HAProxy is not hard to learn and is tremendously lightweight yet flexible. I actually use HAProxy behind ELB for the best of both worlds -- the hardened, managed, hands-off reliability of ELB facing the Internet and unwrapping SSL, and the flexible configuration of HAProxy to allow me to fine tune how things hit my servers. I've never lost an HAProxy instance yet, but it I do, ELB will just take that one out of rotation... as I have seen happen when the back-end servers have all become inaccessible, which (because of the way it's configured) makes ELB think the HAProxy is unhealthy, but that's by design in my setup.
Hey guys I was wondering if this seems like a viable solution to the age old problem of Amazon Elastic Load Balancer's lacking a dedicated IP, and thus A record support.
What if I created a micro/small instance and hooked it to an elastic IP. I can then use that IP as my A record address for my website. That instance will forward 100% of its traffic to an ELB load balancer address (Haproxy?), which will then operate normally and forward that traffic to my server pool.
With this architecture I can use my A-record and an ELB.
Are there any downsides to this aside from the cost of the initial instance that forwards its traffic to the ELB?
Will this double forwarding create too much lag or is it really negligible since they're all in AWS?
Thanks for feedback.
If you are currently using Route53 for you DNS, it does have support for handling zone apex.
https://forums.aws.amazon.com/message.jspa?messageID=260459
Not sure if this answers your question since you didn't mention why you need a dedicated ip.
Are there any downsides to this aside from the cost of the initial instance that forwards its traffic to the ELB?
Er, yes. You're loosing about 99.9% of the benefits of ELB.
Will this double forwarding create too much lag or is it really negligible since they're all in AWS?
No, the lag should be small (sub-milisecond). The two main problems are:
1) Your instance will become a bottleneck when your traffic increases. You won't be able to survive a sudden rush, such as being linked from a high-traffic website like Slashdot or Oprah.
The whole point of ELB is that they can manage scaling (the frontend and the backend) for you. If you insert a single box in the flow, it kinda prevents ELB from doing anything useful.
Also, the micro instance can take very little traffic. You have to go to at least a m1.large if you won't want your network packets throttled.
2) Your instance will become a Single Point Of Failure. When your box dies, your website will be down. ELB can prevent problems on both the front and backend with redundancy.
Perhaps if you explained why you needed an A record?
(It is also possible to run your own front-end(s): Just create a box with an EIP, and put nginx and/or HAProxy on it. But as with everything, there are trade-offs.)