What is the major benefit of Active-Active in AWS routing - amazon-web-services

I came across the so called Active-Active or Active-Passive routing. Diagrammed as below.
For the later Active-Passive:
It is easy to understand: Passive (HTTP Server 2) is the Standby service/instance for Active (HTTP Server 1) to fail over.
For the first one Active-Active:
I don't understand what is the major benefit though, it seems to me both service/instance must be up and running the same level and the routing is maybe just something like round robin, wouldn't this be kind of resource/cost wasting? Does it introduce extra computing power? what is the use case for it?

In active-passive mode one web server is sitting there costing you money but not serving any requests. If a sudden surge in traffic came in the extra web server would not be able to help absorb the extra load. The only time the second web server starts being used is when the first web server crashes and can no longer serve requests. This gives you failover in the event of a server crash, but does not help you at all in the event of a sudden surge in traffic.
In active-active mode each web server is serving some of the traffic. In order to scale out your web servers (horizontal scaling) you would have two or more servers, all in "active" mode serving some portion of the web requests. If a sudden surge in traffic comes in, that surge is spread across multiple servers which can hopefully absorb the load, and new servers can be added automatically by AWS as needed, and removed when no longer needed.

Related

Having load balancer before Akka Http multiple applications

I have multiple identical Scala Akka-HTTP applications, each one is installed on a dedicated server (around 10 apps), responding to HTTP requests on port 80. in front of this setup I am using single HAproxy instance that receives all the incoming traffic and balances the workload to these 10 servers.
We would like to change the HAproxy (we suspect that it causes us latency problems) and to use a different load balancer. the requirement is to adopt a different 3rd party load balancer or to develop a simple one using scala that round robin each http request to the backend akka http apps and to proxy back the response.
Is there another recommended load balancer (open source) that I can use to load balance / proxy the http incoming requests to the multiple apps other than HAproxy (maybe APACHE httpd) ?
Does it make sense to write a simple akka http application route as the loadbalancer, register the backend apps hosts in some configuration file, and to roundrobin the requests to them?
maybe I should consider Akka cluster to that purpose ? the thing is, that the applications are already standalone akka http services with no cluster support. and maybe it would be too much to go for clustering. (would like to keep it simple)
What is the best practice to load balance requests to http apps (especially akka http scala apps) as I might be missing something here?
Note - having back pressure is something that we also would like to have, meaning that if the servers are busy, we would like to response with 204 or some status code so our clients wont have timeouts in case my back end is busy.
Although Akka HTTP performance is quite impressive, I would not use it for writing a simple reverse proxy since there are tons of others out there in the community.
I am not sure where you deploy your app, but, the best (and more secure) approach is to use a LB provided by your cloud provider. Most of them has one and usually it has a good benefit-cost.
If your cloud provider does not provide one or you are hosting yourself your app, then first you should take a look on your HAProxy. Did you run tests on HAProxy in an isolated way to see it still has the same latency issues? Are you sure the config optimised for what you want? Does your HAProxy has enough resources (cpu and memory) to operate? Is your HAProxy in the same DataCenter as your deployed app?
If you follow and check all of these questions and still are having latency issues, then I would recommend you to choose another one. There are tons out there, such as Envoy and NGINX. I really like Envoy and I've been using it at work for a few months now without any complains.
Hope I could help.
[]'s

Scalable server hosting

I have simple server now (some xeon cpu hosted somewhere), running apache/php/mysql (no docker, but its a possibility) and Im expecting some heavy traffic and I need my server to handle that.
Currently the server can handle about 100 users at once, I need it to handle couple thousands possibly.
What would be easiest and fastest solution to move my app to some scalable hosting?
I have no experience with AWS or something like that.
I was reading about AWS and similar, but Im mostly confused and not sure what should I choose.
The basic choice is:
Scale vertically by using a bigger computer. However, you will eventually hit a limit and you will have a single-point of failure (one server!), or
Scale horizontally by adding more servers and spreading the traffic across the servers. This has the added advantage of handling failure because, if one server fails, the others can continue serving traffic.
A benefit of doing horizontal scaling in the cloud is the ability to add/remove servers based on workload. When things are busy, add more servers. When things are quiet, remove servers. This also allows you to lower costs when things are quiet (which is not possible on-premises when you own your own equipment).
The architecture involves putting multiple servers behind a Load Balancer:
Traffic comes into a Load Balancer
The Load Balancer sends the request to a server (often based upon some measure of how "busy" each server is)
The server processes the request and sends a response back to the Load Balancer
The Load Balancer sends the response to the original requester
AWS has several Load Balancers available, which vary by need. If you are simply sending traffic to a single application that is installed on all servers, a Network Load Balancer should be sufficient. For situations where different parts of the application are on different servers (eg mobile interface vs web interface), you could use a Application Load Balancer.
AWS also assists with horizontal scaling by providing the Amazon EC2 Auto Scaling service. This allows you to specify details of the servers to launch (disk image, instance type, network settings) and Auto Scaling can then automatically launch new servers when required and terminate ones that aren't required. (Note that they launch and terminate, not start and stop.)
You can further define scaling policies that tell Auto Scaling when to launch/terminate instances by measuring metrics such as CPU Utilization. This way, the number of servers can approximately match the volume of traffic.
It should be mentioned that if you have a database, it should be stored separately to the application servers so that it does not get terminated. You could use the Amazon Relational Database Service (RDS) to run a database for you, or you could run one on a separate Amazon EC2 instance.
If you want to find out more about any of the above technologies, there are plenty of talks on YouTube or blog posts that can explain and demonstrate their use.

AWS and ELB Network throughput limits

My site runs on AWS and uses ELB
I regularly see 2K con-current users, and during these times, requests through my stack would become slow and take a long time to get a response (30s-50s)
None of my servers or database at this time, would show significant load.
Which leads me to believe my issue could be related to ELB.
I have added some images of a busy day on my site, which shows graphs of my main ELB. Can you perhaps spot something that would give me insight into my problem?
Thanks!
UPDATE
The ELB in the screengrabs is my main ELB forwarding to multiple varnish cache servers. In my varnish vcl I would send misses for a couple of URL's but varnish have a queing behavior and what I ended up doing was set a high ttl for these request, and return hit_for_pass for them. What this does is let varnish know in the vcl_recv that these requests should be passed to the back-end immediately. Since doing this, the problem outlined above has completely been fixed
did you ssh into one of the servers? Maybe you reach some connection limit in apache or whatever server you run. Also check the cloudwatch monitors of EBS volumes attached to your instances, maybe they cause a io bottleneck.

Consistency between two Varnish servers behind AWS ELB

We are using ELB to load balance requests between two different Nginx+Varnish servers in two different AZs. These Varnish servers have been configured to balance requests to another ELB distributing requests to our app servers. In this way, we should be able to keep the site working if one AZ stops working.
The issue we are facing with this approach is that we don't know how to keep the site from serving different cached objects to the same client, i.e. keeping the consistency of the cached content between the two Varnish servers.
One possible solution would be using ELB's IP hashing so that depending on the client IP one Varnish or the other would serve the request. This would mitigate the problem somewhat.
Is there any other way to sync the contents between these two Varnish servers?
There is no active state synchronization available in Varnish.
You can do this with a helper processes that tails varnishlog and call out to the other n Varnish servers, but this is brittle and will probably break on you. The common approach is just to do round-robin and have enough traffic that everything is cached where it needs to be. :)
There is some underlying knowledge of how your application behaves baked into your question, but not so many details. Why is it a problem that a different backend has made the response? If they are identical (since you want redundancy, I'd expect that they were?) this shouldn't be an issue.
If the backend replies with user-specific response data for some URLs, it should tell Varnish that with the Vary header.
Adding session stickyness (~ip hashing) in ELB will just hide your problem until one of the AZs goes down and traffic is rerouted, at which point I'd guess you are pretty busy already.
You can enable ELB stickiness to achieve what you need, there is no varnish cluster which shares state between varnish instances.

How can I make an ELB forward requests from a single client to multiple nodes?

I'm currently running a REST API app on two EC2 nodes under a single load balancer. Rather than the standard load-balancing scenario of small amounts of traffic coming from many IPs, I get huge amounts of traffic from only a few IPs. Therefore, I'd like requests from each individual IP to be spread among all available nodes.
Even with session stickiness turned off, however, this doesn't appear to be the case. Looking at my logs, almost all requests are going to one server, with my smallest client going to the secondary node. This is detrimental, as requests to my service can last up to 30 seconds and losing that primary node would mean a disproportionate amount of requests get killed.
How can I instruct my ELB to round-robin for each client's individual requests?
You cannot. ELB uses a non-configurable round-robin algorithm. What you can do to mitigate (and not solve) this problem is adding additional servers to your ELB and/or making the health check requests initiated by your ELB more frequent.
I understand where you're coming from. However, I think you should approach the problem from a different angle. Your problem it appears isn't specifically related to the fact that the load is not balanced. Lets say you do get this balancing problem solved. You're still going to loose a large amount of requests. I don't know how you're clients connect to your services so I can't go into details on how you might fix the problem, but you may want to look at improving the code to be more robust and plan for the connection to get dropped. No service that has connections of 30+ seconds should rely on the connection not getting dropped. Back in the days of TCP/UDP sockets there was a lot more work done on building for failures, somehow that's gotten lost in today's HTTP world.
What I'm trying to say, is if you write the code you're clients are using to connect, build the code to be more robust and handle failures with retries. Once you start performing retries you'll need to make sure that your API calls are atomic and use transactions where necessary.
Lastly, I'll answer your original question. Amazon's ELB's are round robin even from the same computer / ip address. If your clients are always connecting to the same server its most likely the browser or code that is caching the response. If they're not directly accessing your REST API from a browser most languages allow you to get a list of ip's for a given host name. Those ip's will be the ip's of the loadbalancers and you can just shuffle the list and then use the top entry each time. For example you could use the following PHP code to randomly send requests to a different load balancer.
public function getHostByName($domain) {
$ips = gethostbynamel($domain);
shuffle($ips);
return $ips[0];
}
I have had similar issues with Amazon ELB however for me it turned out that the HTTP client used Connection: keep-alive. In other words, the requests from the same client was served over the same connection and for that reason it did not switch between the servers.
I don't know which server you use but it is probably possible to turn off keep-alive forcing the client to make a new connection for every request. This might be a good solution for requests with a lot of data. If you have a large amount of requests with small data it might affect performance negatively.
This may happen when you have the two instances in different availability zones.
When one ELB is working with multiple instances in a single availability zone, it will round-robin the requests between the instances.
When two instances are in two different availability zones, the way ELB works is create two servers (elb servers) each with its own IP, and they balance the load with DNS.
When your client asks the DNS for the IP address of your server, it receives two (or more) responses. Then the client chooses one IP and caches that (the OS usually does). Not much you can do about this, unless you control the clients.
When your problem is that the two instances are in different availability zones, the solution might be to have at least two instances in each availability zone. Then one single ELB server will handle the round-robin across two servers and will have just one IP so when a server fails it will be transparent to the clients.
PS: Another case when ELBs create more servers with unique IPs is when you have a lot of servers in a single availability zone, and one single ELB server can't handle all the load and distribute it to connected servers. Then again, a new server is created to connect the extra instances and the load is distributed using DNS and multiple IPs.