Consistency between two Varnish servers behind AWS ELB - amazon-web-services

We are using ELB to load balance requests between two different Nginx+Varnish servers in two different AZs. These Varnish servers have been configured to balance requests to another ELB distributing requests to our app servers. In this way, we should be able to keep the site working if one AZ stops working.
The issue we are facing with this approach is that we don't know how to keep the site from serving different cached objects to the same client, i.e. keeping the consistency of the cached content between the two Varnish servers.
One possible solution would be using ELB's IP hashing so that depending on the client IP one Varnish or the other would serve the request. This would mitigate the problem somewhat.
Is there any other way to sync the contents between these two Varnish servers?

There is no active state synchronization available in Varnish.
You can do this with a helper processes that tails varnishlog and call out to the other n Varnish servers, but this is brittle and will probably break on you. The common approach is just to do round-robin and have enough traffic that everything is cached where it needs to be. :)
There is some underlying knowledge of how your application behaves baked into your question, but not so many details. Why is it a problem that a different backend has made the response? If they are identical (since you want redundancy, I'd expect that they were?) this shouldn't be an issue.
If the backend replies with user-specific response data for some URLs, it should tell Varnish that with the Vary header.
Adding session stickyness (~ip hashing) in ELB will just hide your problem until one of the AZs goes down and traffic is rerouted, at which point I'd guess you are pretty busy already.

You can enable ELB stickiness to achieve what you need, there is no varnish cluster which shares state between varnish instances.

Related

Application ELB - sticky sessions based on consistent hashing

I couldn't find anything in the documentation but still writing to make sure I did not miss it. I want all connections from different clients with the same value for a certain request parameter to end up on the same upstream host. With ELB sticky session, you can have the same client connect to the same host but no guarantees across different clients.
This is possible with Envoy proxy, see: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash
We already use ELB so if the above is possible with ELB then we can avoid introducing another layer in between with envoy.
UPDATE:
Use-case - in a multi-tenant cloud solution, we want all clients from a given customer account to connect to the same upstream host.
Unfortunately this is not possible to be performed in an ALB.
An application load balancer controls all the logic over which host receives the traffic with features such as ELB sticky sessions and pattern based routing.
If there is no work around then you could look at a Classic Loadbalancer which has support for the application setting the sticky session cookie name and value.
From best practice ideally your application should be stateless, is it possible to look at rearchitecting your app instead of trying work around. Some suggestions I would have are:
Using DynamoDB to store any session based data, moving from a disk based session (if that's what your application does).
Any disk based files that need to persist could be shared between all hosts either using EFS for your Linux based hosts, or FSX on Windows.
Medium/Long term persisting files could be migrated to S3, any assets that rarely change could be stored here and then your application could use S3 rather than disk.
It's important to remember that as I stated above, you should keep your application as stateless as you can. Assume that your EC2 instances could fail, by preparing for this it will make it easier to recover.

What is the major benefit of Active-Active in AWS routing

I came across the so called Active-Active or Active-Passive routing. Diagrammed as below.
For the later Active-Passive:
It is easy to understand: Passive (HTTP Server 2) is the Standby service/instance for Active (HTTP Server 1) to fail over.
For the first one Active-Active:
I don't understand what is the major benefit though, it seems to me both service/instance must be up and running the same level and the routing is maybe just something like round robin, wouldn't this be kind of resource/cost wasting? Does it introduce extra computing power? what is the use case for it?
In active-passive mode one web server is sitting there costing you money but not serving any requests. If a sudden surge in traffic came in the extra web server would not be able to help absorb the extra load. The only time the second web server starts being used is when the first web server crashes and can no longer serve requests. This gives you failover in the event of a server crash, but does not help you at all in the event of a sudden surge in traffic.
In active-active mode each web server is serving some of the traffic. In order to scale out your web servers (horizontal scaling) you would have two or more servers, all in "active" mode serving some portion of the web requests. If a sudden surge in traffic comes in, that surge is spread across multiple servers which can hopefully absorb the load, and new servers can be added automatically by AWS as needed, and removed when no longer needed.

AWS and ELB Network throughput limits

My site runs on AWS and uses ELB
I regularly see 2K con-current users, and during these times, requests through my stack would become slow and take a long time to get a response (30s-50s)
None of my servers or database at this time, would show significant load.
Which leads me to believe my issue could be related to ELB.
I have added some images of a busy day on my site, which shows graphs of my main ELB. Can you perhaps spot something that would give me insight into my problem?
Thanks!
UPDATE
The ELB in the screengrabs is my main ELB forwarding to multiple varnish cache servers. In my varnish vcl I would send misses for a couple of URL's but varnish have a queing behavior and what I ended up doing was set a high ttl for these request, and return hit_for_pass for them. What this does is let varnish know in the vcl_recv that these requests should be passed to the back-end immediately. Since doing this, the problem outlined above has completely been fixed
did you ssh into one of the servers? Maybe you reach some connection limit in apache or whatever server you run. Also check the cloudwatch monitors of EBS volumes attached to your instances, maybe they cause a io bottleneck.

How do I add/remove IPs based on AWS autoscaling?

I have an ELB pointing to two instances of Varnish. The Varnish servers talk to an app server, and both kinds of servers need to be autoscaled.
This is all set up happily, but for one little detail:
The varnish servers have a list of IPs they are proxying for and will accept purges from, and the app server has a list of the varnish server IPs so it can purge pages from cache.
How do I get this information at the time servers are added or removed and trigger a process? I can write a script to tweak the list of IPs on the varnish and app servers once I have it, it's just hooking and fetching this information that's not obvious.
Or am I completely misunderstanding this problem and there's a simpler approach?
Its possible there is a way to configure it without needing to maintain IP lists. Without knowing more about how your application works its hard to provide advise on that solution.
However, you can configure your autoscaling group to send events upon changes to the autoscaling group. This will be handled via SNS -> SQS.
You will need to build a reader for SQS which will then update your configuration.

How can I make an ELB forward requests from a single client to multiple nodes?

I'm currently running a REST API app on two EC2 nodes under a single load balancer. Rather than the standard load-balancing scenario of small amounts of traffic coming from many IPs, I get huge amounts of traffic from only a few IPs. Therefore, I'd like requests from each individual IP to be spread among all available nodes.
Even with session stickiness turned off, however, this doesn't appear to be the case. Looking at my logs, almost all requests are going to one server, with my smallest client going to the secondary node. This is detrimental, as requests to my service can last up to 30 seconds and losing that primary node would mean a disproportionate amount of requests get killed.
How can I instruct my ELB to round-robin for each client's individual requests?
You cannot. ELB uses a non-configurable round-robin algorithm. What you can do to mitigate (and not solve) this problem is adding additional servers to your ELB and/or making the health check requests initiated by your ELB more frequent.
I understand where you're coming from. However, I think you should approach the problem from a different angle. Your problem it appears isn't specifically related to the fact that the load is not balanced. Lets say you do get this balancing problem solved. You're still going to loose a large amount of requests. I don't know how you're clients connect to your services so I can't go into details on how you might fix the problem, but you may want to look at improving the code to be more robust and plan for the connection to get dropped. No service that has connections of 30+ seconds should rely on the connection not getting dropped. Back in the days of TCP/UDP sockets there was a lot more work done on building for failures, somehow that's gotten lost in today's HTTP world.
What I'm trying to say, is if you write the code you're clients are using to connect, build the code to be more robust and handle failures with retries. Once you start performing retries you'll need to make sure that your API calls are atomic and use transactions where necessary.
Lastly, I'll answer your original question. Amazon's ELB's are round robin even from the same computer / ip address. If your clients are always connecting to the same server its most likely the browser or code that is caching the response. If they're not directly accessing your REST API from a browser most languages allow you to get a list of ip's for a given host name. Those ip's will be the ip's of the loadbalancers and you can just shuffle the list and then use the top entry each time. For example you could use the following PHP code to randomly send requests to a different load balancer.
public function getHostByName($domain) {
$ips = gethostbynamel($domain);
shuffle($ips);
return $ips[0];
}
I have had similar issues with Amazon ELB however for me it turned out that the HTTP client used Connection: keep-alive. In other words, the requests from the same client was served over the same connection and for that reason it did not switch between the servers.
I don't know which server you use but it is probably possible to turn off keep-alive forcing the client to make a new connection for every request. This might be a good solution for requests with a lot of data. If you have a large amount of requests with small data it might affect performance negatively.
This may happen when you have the two instances in different availability zones.
When one ELB is working with multiple instances in a single availability zone, it will round-robin the requests between the instances.
When two instances are in two different availability zones, the way ELB works is create two servers (elb servers) each with its own IP, and they balance the load with DNS.
When your client asks the DNS for the IP address of your server, it receives two (or more) responses. Then the client chooses one IP and caches that (the OS usually does). Not much you can do about this, unless you control the clients.
When your problem is that the two instances are in different availability zones, the solution might be to have at least two instances in each availability zone. Then one single ELB server will handle the round-robin across two servers and will have just one IP so when a server fails it will be transparent to the clients.
PS: Another case when ELBs create more servers with unique IPs is when you have a lot of servers in a single availability zone, and one single ELB server can't handle all the load and distribute it to connected servers. Then again, a new server is created to connect the extra instances and the load is distributed using DNS and multiple IPs.