AWS architecture help for running database dumps

AWS architecture help for running database dumps - amazon-web-services

I have mysql running on one ec2-instance and tableau uses this database. mysqldump runs from production servers every 4 hours during which the system is down for probably 10-15 mins due to the dump. I am planning to have another ec2 instance with mysql running and and elb on top of these two instances so that the system wont be down trough the dump. For this I might have to de-register the instances from elb during the dump and register them back after the dump. Is this the right way to do it in the situations like this?

You can't use an ELB with MySQL servers. The ELB wouldn't know which server was master and which was slave, so it wouldn't know which to send updates to.
Is there any reason you aren't using Amazon's RDS service for your database servers? It provides automated snapshots that don't cause any down-time. It also makes it easy to create a read-replica against which you could perform mysqldumps without affecting the main server.

Currently you are taking logical backups of your system every 4 hours. Logical backups in most cases should only be used in a worst case scenario. In the event of a restore, logical backups are very slow compared to alternatives, such as snapshots and binary backups. If snapshoting using Amazon RDS or any of the other multitude of alternatives out there in your environment is not an option, I would look into Xtrabackup. This is a free stand alone HOT online binary backup tool that can be used with a Vanilla install of MySQL. This should not bring down your production server, assuming you are using InnoDB and not an alternative storage engine such as MyISAM. I personally used it for hot online binary backups and to automate building slaves in my previous work environment. A binary backups bottleneck is your network speed in terms of the restore process and is exponentially faster than a logical backup.
If setting up another MySQL instance is your only option look into GTID replication and/or Master-Passive HA environment in order to take the mysqldump off of the secondary non-active production server so that your production environment does not go down.
The bottom line is that you should not be taking production down to do a logical backup every 4 hours. This is def not ideal in any production environment.

Have a look at Amazon Database Migration Service (https://aws.amazon.com/dms/). It allows you to do zero-downtime database migration or just synchronization.

Related

Optimizing latency between application server (EC2) and RDS

here's how the story goes.
We started transforming a monolith, single-machine, e-commerce application (Apache/PHP) to cloud infrastructure. Obviously, the application and the database (MySQL) were on the same machine.
We decided to move to AWS. And as the first step of transformation, we decided to split the database and application. Hosting application on a c4.xlarge machine. And hosting database to RDS Aurora MySQL on a db.r5.large machine, with default options.
This setup performed well. Especially the database performance went up high.
Unfortunately, when the traffic spiked up, we started experiencing long response times. Looked like RDS, although being really fast for executing queries, wasn't returning results fast enough over the network to the EC2 machine.
So that was our conclusion after an in-depth analysis of the setup including Apache/MySQL/PHP tuning parameters. The delayed response time was definitely due to the network latency between EC2 and RDS/Aurora machine, both machines being in the same region.
Before adding additional resources (ex: ElastiCache etc) we'd first like to look into any default configuration we can play around with to solve this problem.
What do you think we missed there?

One of the bigest strength with the cloud is the scalability and you should always design your application to utilise it and it sounds like your RDS instance is getting chocked due to nr of request more than the process time for the queries. So rather go more small instances with load balancing than one big doing all the job. And with Load Balancers you will get away from a singel point of failure due to you can have replicas of your database and they can even be placed in different AZ.
Here is a blogpost you can read on the topic:
https://aws.amazon.com/blogs/database/scaling-your-amazon-rds-instance-vertically-and-horizontally/
Good luck in your aws journey.

The Best answer to your question is using read replicas, but remember only your read requests could be sent to your read replicas so you would need to design your application that way
Also for some cost savings, you should try aurora serverless
One more option is passing traffic between ec2 and rds through a private network rather than using the public internet to connect your ec2 to rds that can be one of the mistakes that might be happening

Major differences of AWS and normal VPS (server)

I have a very basic idea on servers. So far I have only worked with few Ubuntu VPS server which I can easily maintain, install a database, upload my code and run my projects. And to save static data like image/video I use local SSD storage of my server.
Now I got some projects where AWS is required to use. In the beginning, I thought it would be very similar to my normal Ubuntu based VPS server. But while I start researching/reading articles also their own docs I find out it has lots more cool features for server and at the same, it's little complicated for a beginner. I would be really glad if someone give his time and reply on these questions of mine to clear concept about AWS of mine and people like me
As my plan is to use one EC2 instance to run my project. But I can see many experts suggest to use Elastic Beanstalk and create EC2 instance inside that. While I can directly run my project with EC2 without taking help from Elastic Beanstalk. So why it's better / what other help do it(Elastic Beanstalk) provide?
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
Lastly, what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
Thanks in advance for giving time :)

Let me try to answer your questions inline.
As my plan is to use one EC2 instance to run my project. But I can
see many experts suggest to use Elastic Beanstalk and create EC2
instance inside that. While I can directly run my project with EC2
without taking help from Elastic Beanstalk. So why it's better /
what other help do it(Elastic Beanstalk) provide?
If you are planning to use a single server and a database going with EC2 and RDS would be straightforward. However, if you are planning to set up, autoscaling (automatically increasing the number of servers only when load increases and return back to one server), load balancing and DevOps support, you need to set them up which requires more knowledge on AWS platform. AWS Elastic Beanstalk does these for you automatically, also by giving you the options to select the technology of your application and simply upload the code.
When I am checking the pricing of EC2(On-demand > Linux Unix) it says ECU as Variable. What does that mean? And where does ECU work
ECU is simply a rough figure to compare the processing across multiple EC2 classes that are having the different levels processing power.
Instance Storage (GB) as EBS only. Does that mean I can't have any storage with my server I must buy separately? But in my previous VPS server, I use to get fewer storages with my server. Because storage is required if I want to install new software like MySQL/Redis/Python each of them requires local storage. Also if I want to upload my code or few static images it requires storage.
EBS storage is reliable storage (With internal redundancy) that will last beyond your instance lifetime. Which means, you can upgrade the EC2 class and install software, or store files, which will remain in the EBS volume unless you delete it.
Since you are basically paying for the GBs, you can also create another EBS volume for static files and mount it to the EC2 instance if you want.
Like storage do I also need to buy other instances for a database? Like if I want to use PostgreSQL as my database do I need to buy AWS RDS or I can install that inside my Linux system?
It's not mandatory but recommended since you can even use a smaller instance for a web server and use another one for the DB. It's up to you. For example, the cost would be roughly similar if you use two small EC2 instances for a web server and DB server (Or use RDS) or use a single medium-size EC2 instance where both DB and web is running.
Lastly what are the main differences of my normal VPS Linux server and in AWS EC2 Linux server?
You will get more options in terms of selecting the hardware underneath since AWS provides different configuration options. In addition, EC2 instances are able to utilize the AWS ecosystem for Networking, Security, Load balancing & etc for better-optimized solution architectures in terms of reliability, security, performance & etc.

Q1) Beanstalk is a management application. AWS has several: CloudFormation, OpsWorks. Third party vendors have their own: Chef, Ansible, Terraform, etc. I really like Beanstalk and how it makes deploying code very easy for small sites (one command). I can scale up or scale down with a button push. I also use CloudFormation every day for just about everything.
Q2) ECU is a AWS Equivalent Compute Unit used to compare one instance with another. How does that translate to physical CPUs? Don't know as AWS does not publish its absolute meaning. Use is only to compare EC2 instances.
Q3) When you launch an EC2 instance, you will need storage. This is an additional cost (around $0.10 per GB per month). You will specify the size and type of storage (there are a number of types). There is also Instance Store Volumes. Stay away from these unless you really understand how to use them (they don't persist a shutdown so all data is lost). There are good use cases for Instance Store (AI, Big Data, Image processing), but a website is not one of them.
Q4) If your EC2 instance is big enough (2 GB of memory and larger), you can install PostgreSQL, MySQL, etc on your EC2 instance. Otherwise AWS has a number of database optios: DynamoDB, RDS, Aurora, etc.
Q5) Difficult to answer as each vendor offers its own set of features. EC2 instances are virtual machines. You have control over the raw power of that VM. Most VPS servers have management interfaces that EC2 does not. Usually EC2 is more expensive than VPS servers.
Watch a couple of AWS videos on YouTube. This will help you to understand AWS and why it is so successful in the cloud. Linux Academy, A Cloud Guru, etc. have very good training courses on AWS.
AWS Essentials: EC2 Basics
If you have further questions, open a new StackOverflow question per question. You will seldom get answers to long multi-question questions.

Use RDS or a container (ECS) for database? Advantages and disadvantages

I want to host a database on AWS. RDS is one option but I have heard something about containers (and ECS). I see containers as useful for testing, but I'm uncertain about running a production database on one. What would be the advantages and disadvantages of each of them?

Run a database yourself on an EC2 instance:
You choose the database
You control all the configuration
You control what else runs on that machine
Backup, restoration, and other tasks can be customized
You are wholly responsible for keeping the DB running
You are wholly responsible for backups
Run a database on RDS:
Limited selection of DBs
You can run Aurora, Amazon's proprietary DB
Some (few) configuration options are not permitted
No access to the underlying machine
Automatic backups
Basic maintenance is automated
You can't run a cheaper DB than the smallest machine Amazon will rent
Run a database inside a container on an EC2 instance:
All the advantages & disadvantages of running the DB yourself, plus
You have to do some extra work to persist data across containers
You can easily run the exact same DB setup for local development, in test, and in production
You pay some additional overhead (small)
Process isolation makes it easy to share a machine (maybe your entire workload is less than a t2.micro)
Running a DB in a container under ECS doesn't really get you advantages over managing the containers yourself. But if you're using ECS for the rest of your stack and you're putting the DB in a container, then you'd just want to use ECS for that also.

Migrating Redis to AWS Elasticache with minimal downtime

Let's start by listing some facts:
Elasticache can't be a slave of my existing Redis setup. Real shame, that would be so much more efficent.
I have only one Redis server to migrate, with roughly 3gb of data.
Downtime must be less than 10 mins. I assume the usual "stop the site, stop redis, provision cluster with snapshot" will take longer than this.
Similar to this question: How do I set an elasticache redis cluster as a slave?
One idea on how this might work:
Set Redis to use an AOF and trigger BGSAVE at the same time.
When BGSAVE finishes, provision the Elasticache cluster with RDB seed.
Stop the site and shut down my local Redis instance.
Use an aof-replay tool to replay the AOF into Elasticache.
Start the site again, pointed at the Elasticache cluster.
My questions:
How can I guarantee that my AOF file begins at exactly the point the RDB file ends, and that no data will be written in between?
Is there an AOF tool supported by the maintainers of Redis, or are they all third-party solutions, and therefore (potentially) of questionable reliability?*
* No offence intended to any authors of such tools, I'm sure they're great, I just feel much more confident using a tool written by the same team as the product to avoid potential compatibility bugs.

I have only one Redis server to migrate, with roughly 3gb of data
I would halt, save the REDIS to S3 and then upload it to a new cluster.
I'm guessing 10 mins to save the file and get it into s3.
10 minutes to just launch an elasticache cluster from that data.
Leaves you ten extra minutes to configure and test.
But there is a simple way of knowing EXACTLY how long.
Do a test migration of it.
DONT stop your live system
Run BGSAVE and get a dump of your Redis (leave everything running as normal)
move the dump S3
launch an elasticache cluster for it.
Take DETAILED notes, TIME each step, copy the commands to a notepad window.
Put a Word/excel document so you have a migration document. That way you know how long it takes and there are no surprises. Let us know how it goes.

ElastiCache has online migration support. You can use the start-migration API to start migration from self managed cluster to ElastiCache cluster.
aws elasticache start-migration --replication-group-id <ElastiCache Replication Group Id> --customer-node-endpoint-list "Address='<IP Address>',Port=<Port>"
The input to the API is your ElastiCache replication group id and the IP and port of the master of your self managed cluster. You need to ensure that the IP address is accessible from ElastiCache node. (An example IP address would be the private IP address of the master of your self managed cluster). This API will make the master node of the ElastiCache cluster call 'SLAVEOF' on the master of your self managed cluster. This will establish a replication stream and will start migrating data from self-managed cluster to ElastiCache cluster. During migration, the master of the ElastiCache cluster will stop accepting writes sent to it directly. You can start using ElastiCache cluster from your application for reads.
Once you have all your data in ElastiCache cluster, you can use the complete-migration API to stop the migration. This API will stop the replication from self managed cluster to ElastiCache cluster.
aws elasticache complete-migration --replication-group-id <ElastiCache Replication Group Id>
After this, the master of the ElastiCache cluster will start accepting writes. You can start using ElastiCache cluster from your application for both read and write.
The following limitations to be aware of for this migration method:
An existing or newly created ElastiCache deployment should meet the following requirements for migration:
It's cluster-mode disabled using Redis engine version 5.0.5 or higher.
It doesn't have either encryption in-transit or encryption at-rest enabled.
It has Multi-AZ with Auto-Failover enabled.
It has sufficient memory available to fit the data from your Redis on EC2 instance. To configure the right reserved memory settings, see Managing Reserved Memory.

There are a few ways to migrate the data without downtime. They are harder to achieve though.
you could have your app write to two redis instances simultaneously - one of which would be on EC. Once the caches are both 'warm', you could just restart your app, and read from the EC cache.
You could initially migrate to EC2 instead of EC. not really what you were hoping to hear, I imagine. this is easy to do because you can set EC2 as salve of your redis instance. Also, migrating from EC2 to EC is somewhat easier (the data is already on AWS), so there's a benefit for users with huge sets of data.
You could, in theory, intercept the commands from the client and send them to EC, thus effectively "replicating". But this requires some programming ( I dont believe a tool like this exists ATM) and would be hard with multiple, ephemeral clients.

Migrating from AWS RDS to an AWS EC2 running MySQL

Yes, fellow SOrs, I'm doing it backwards. I tried an AWS RDS but the CPU seems to be spiking so often that I need the flexibility of an EC2 to run some fine tuning. I'm not a MySQL expert, so I'm asking:
How can I create a setup on the EC2 so that it reads and replicates my RDS?
Ideally I'd do the switch in real time via DNS but first I need the EC2 to act like a clone of the RDS updating with any new data happening between now and the actual migration period.
Any pointers are much appreciated. Thanks!

Why can't you use mysql-tuner with RDS?
You shouldn't need to run sysbench, since Amazon handles OS level tuning for you on RDS
Aurora is a drop-in replacement for MySQL and will scale better than any MySQL cluster you could setup on EC2
You should be addressing why your Wordpress instance is hammering the database so much instead of trying to optimize the database.
You should put a CDN in front of your Wordpress site and cache as much as you can to reduce the load on both your web server and database server. It looks like there are also solutions out there for using Redis to cache data so that Wordpress doesn't have to constantly go back to MySQL for data.
Amazon provides the CloudFront CDN, but I would also recommend looking into CloudFlare.
Honestly, given your number of concurrent users, unless you have tons of dynamic constantly changing content, you should be able to run your entire site on a t2.micro with CloudFlare in front of it with cache everything enabled.

I'd like to offer an update:
Mark B's input has been extremely valuable as I have discovered that I can run mysql tuner remotely and touch the RDS. Therefore there was no need to migrate after all.
The RDS CPU spikes were due to a large amount of non-INDEX JOINs.
I have added indexes and the results are fantastic:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js