AWS multi-zone distaster recovery and load balancing - best approach?

AWS multi-zone distaster recovery and load balancing - best approach? - amazon-web-services

I’m using Amazon Web Services, and trying to set up a modest system for load balancing and disaster recovery. The application is PHP based, with Zend Framework 2 (ZF2) on the front end, a local memcached server and MySQL through RDS. All servers are running Amazon Linux.
I am trying to configure the elastic load balancer to use two servers in two different AWS “availability zones.” To seamlessly allow one server to shut down and another take over, we need shared PHP sessions. So I set up PHP database sessions with ZF2.
In general, I assume the likelihood of an outage of an AWS zone is considerably lower than chance of a fatal problem in the individual servers or the application itself. So I am considering a different approach:
All the servers in the same availability zone
Separate AWS ElastiCache server (essentially memcached, cannot be used across zones)
PHP sessions stored in the cache (built-in support for memcached)
One emergency server in a different zone – in the rare case of a zone outage, we would change the DNS record to use the different server
Is this a good standard approach to DR and load balancing? I don’t like the DR solution in the case of zone outage, but I haven’t seen a zone go down much, and we can probably handle that level of risk if it simplifies the design. If the load balancer could weight be servers, I would pull all the weight on one zone, with the backup server weighted much lower.

What would be the benefit of keeping all the PHP servers in the same AZ vs distributing them among multiple AZs? I can't think of any, except a very small (3-5ms) latency improvement. Since there's very little downside, why not spread the servers among multiple AZs?
Your Elasticache memcached is still a single point of failure. If the AZ that the Elasticache instance is running in has a problem you will lose sessions. You could switch to use Elasticache w/ Redis (which supports master/slave) to achieve multi-AZ for your cache layer as well.

Related

Are managed databases (e.g. Amazon RDS) slower to access than databases on the same machine (EC2) as the web server

Imagine two cases:
I have web server running in an EC2 instance and it is connected to the database in the RDS, the managed database service.
I have web server and database running in the same EC2 instance.
Is my database in RDS going to be slower to access because it's not in the same machine?
How many milliseconds, approximately, does it add to your latency between the two?
Does this become bottleneck?
What about other managed database services like Azure, GCP, Digital Ocean, etc?
Do they behave the same?

Yes, it will be slower to RDS instances from your Webserver than a database on the same host, because you need to go over the network and that adds latency.
The drawback of running the DB on the same server is that you can't use a managed service to take care of your database and you're mixing largely stateless components (webserver) with stateful components (database). This solution is typically not scalable either. If you add more webservers, things get messy.
Don't know about Azure, GCP or Digital Ocean, but I'd be very surprised if things are different there. There's good reasons to separate these components.

Scalable server hosting

I have simple server now (some xeon cpu hosted somewhere), running apache/php/mysql (no docker, but its a possibility) and Im expecting some heavy traffic and I need my server to handle that.
Currently the server can handle about 100 users at once, I need it to handle couple thousands possibly.
What would be easiest and fastest solution to move my app to some scalable hosting?
I have no experience with AWS or something like that.
I was reading about AWS and similar, but Im mostly confused and not sure what should I choose.

The basic choice is:
Scale vertically by using a bigger computer. However, you will eventually hit a limit and you will have a single-point of failure (one server!), or
Scale horizontally by adding more servers and spreading the traffic across the servers. This has the added advantage of handling failure because, if one server fails, the others can continue serving traffic.
A benefit of doing horizontal scaling in the cloud is the ability to add/remove servers based on workload. When things are busy, add more servers. When things are quiet, remove servers. This also allows you to lower costs when things are quiet (which is not possible on-premises when you own your own equipment).
The architecture involves putting multiple servers behind a Load Balancer:
Traffic comes into a Load Balancer
The Load Balancer sends the request to a server (often based upon some measure of how "busy" each server is)
The server processes the request and sends a response back to the Load Balancer
The Load Balancer sends the response to the original requester
AWS has several Load Balancers available, which vary by need. If you are simply sending traffic to a single application that is installed on all servers, a Network Load Balancer should be sufficient. For situations where different parts of the application are on different servers (eg mobile interface vs web interface), you could use a Application Load Balancer.
AWS also assists with horizontal scaling by providing the Amazon EC2 Auto Scaling service. This allows you to specify details of the servers to launch (disk image, instance type, network settings) and Auto Scaling can then automatically launch new servers when required and terminate ones that aren't required. (Note that they launch and terminate, not start and stop.)
You can further define scaling policies that tell Auto Scaling when to launch/terminate instances by measuring metrics such as CPU Utilization. This way, the number of servers can approximately match the volume of traffic.
It should be mentioned that if you have a database, it should be stored separately to the application servers so that it does not get terminated. You could use the Amazon Relational Database Service (RDS) to run a database for you, or you could run one on a separate Amazon EC2 instance.
If you want to find out more about any of the above technologies, there are plenty of talks on YouTube or blog posts that can explain and demonstrate their use.

Can AWS Elastic Load Balancer be used to only send traffic to a second server if the first fails

Can an AWS Elastic Load Balancer be setup so it sends all traffic to a main server and if that server fails, only then send traffic to a second server.
Have an existing web app I picked up that was never built to run on multiple servers and the client has become worried about redundancy. They don't want to invest enough to make it run well across multiple servers so I was thinking I could setup a second EC2 server with a MySQL slave and periodically copy files from the primary server to the secondary using rsync. Then have an AWS ELB send traffic to the primary server and only if that fails send it to the second server.

AWS load balancers don't support "backup" nodes that only take traffic when the primary is down.
Beyond that, you are proposing a complicated scenario.
was thinking I could setup a second EC2 server with a MySQL slave
If you do that, you can only fail over once, then you can't fail back, because the master database will then be obsolete. For a configuration like this to work and be useful, your two MySQL servers need to be configured with master/master (circular) replication, so that each is a replica of the other. This is an advanced configuration that requires expertise and caution.
For the MySQL component, an RDS instance with multi-AZ enabled will provide you with hands-off fault tolerance of the database.
Of course, the client may be unwilling to pay for this as well.
A reasonable shortcut for small systems might be EC2 instance recovery which will bring the site back up if the underlying hardware fails. This feature replaces a failed instance with a new instance, reattaches the EBS volumes, and starts it back up. If the system is stable and you have a solid backup strategy for all data, this might be sufficient. Effective redundancy as a retrofit is non-trivial.

Usefulness of Amazon ELB (Elastic Load Balancing

We're considering to implement an ELB in our production Amazon environment. It seems it will require that production server instances be synched by a nightly script. Also, there is a Solr search engine which will need to replicated and maintained for each paired server. There's also the issue of debugging - which server is it going to? If there's a crash, do you have to search both logs? If a production app isn't behaving, how do you isolate which one is is, or do you just deploy debugging code to both instances?
We aren't having issues with response time or server load. This seems like added complexity in exchange for a limited upside. It seems like it may be overkill to me. Thoughts?

You're enumerating the problems that arise when you need high availability :)
You need to consider how critical is the availability of the service and take that into account when defining what is the right solution or just over-engineering :)
Solutions to some caveats:
To avoid nightly syncs: Use an EC2 with NFS server and mount share in both EC2 instances. (Or use Amazon EFS when it's available)
Debugging problem: You can configure the EC2 instances behind the ELB to have public IPs, limited in the Security Groups just to the PCs of the developers, and when debugging point your /etc/hosts (or Windows equivalent) to one particular server.
Logs: store the logs in S3 (or in the NFS server commented above)

WebServer and Database server hosted on seperate instances of Amazon EC2

I am planning to run a web-application and expecting a traffic of around 100 to 200 users.
Currently I have set up single Small instance on Amazon. This instance consist of everything – the Webserver(Apache) , the Database Server(MySQL) and the IMAP server( Dovcot). I am thinking of moving out my database server out of this instance and create a separate instance for it. Now my question is –
Do I get latency while communication between my webserver and Database server( Both hosted on separate instances on Amazon )
If yes, what is the standard way to overcome this ? ( or Do I need to set up a Virtual Private Cloud ?)

If you want your architecture to scale you should separate your web server from your database server.
The low latency that you will pay (~1-2ms even between multiple availability zone), will give you better performance as you can scale each tier separately.
You can add small (even micro) instances to handle more web requests behind a load balancer, without the need to duplicate an instance that has to have a database as well
You can add auto-scale group for your web server that will automatically scale your web server tier, based on usage load
You can scale up your DB instance, to have more memory, getting a better cache hit
You can add Elastic Cache between your web server and your database
You can use Amazon RDS as a managed database service, which remove the need for an instance for the database at all (you will pay only for the actual usage of the database in RDS)
Another benefit that you can have is better security on your database. If your database is on a separate instance, you can prevent access to it from the internet. You can use a security group that allows only sql connection from your web server and not http connection from the internet.
This configuration can run in a regular EC2 environment, without the usage of VPC. You can certainly add VPC for an even more control environment, without additional cost nor much increased complexity.
In short, for scalability and high availability you should separate your tiers (web and DB). You will probably also find yourself saving on cost as well.

Of course there will be latency when communicating between separate machines. If they are both in the same availability zone it will be extremely low, typically what you'd expect for two servers on the same LAN.
If they are in different availability zones in the same region, expect a latency on the order of 2-3ms (per information provided at the 2012 AWS re:Invent conference). That's still quite low.
Using a VPC will not affect latency. That does not give you different physical connections between instances, just virtual isolation.
Finally, consider using Amazon's RDB (Relational Database Service) instead of a dedicated EC2 instance for your MySql database. The cost is about the same, and Amazon takes care of the housekeeping.

Do I get latency while communication between my webserver and Database server( Both hosted on separate instances on Amazon )
Yes, but it's rather insignificant compared to the benefits gained by separating the roles.
If yes, what is the standard way to overcome this ? ( or Do I need to set up a Virtual Private Cloud ?)
VPC increases security and ease of management of the resources, it does not affect performance. A latency of a millisecond or two isn't normally problematic for a SQL database. Writes are transactional so data isn't accessible to other requests until it's 100% completed and committed. I/O throughput and availability are much more of a concern, which is why separating the database and the application is important.
I'd highly recommend that you take a look at RDS, which is AWS's version of a managed MySQL, Oracle, or MS SQL Server database server. This will allow you to easily setup and manage your database, including cross-availability zone replication and automated backups. I also wrote a blog post yesterday that's fairly relevant to your question.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js