EC2 goes down when db backup cron is running - amazon-web-services

We are using T3 small instance of EC2 running ubuntu 18.04. It is having one local Mongo service, some node services.
At 19:30 IST, the cron is scheduled to dump the Mongo database and store it in GCS cloud.
Sometimes this cron is causing the server to shut down, though not everytime.
I checked the cloudwatch metrics and read ops are around 10k+ at that time and burst balance also going down.
Is there any way to mitigate this?

If your Read Ops are high, you would probably also see a large number of queued tasks at the same time which will cause performance to drop.
The most costly solution would be to upgrade your disk to PIOPs and set the number of IOPs as the highest consumption point, but this will have a cost impact on your solution.
The following steps will help to mitigate performance:
Do not run the backup on the mongodb server, run it from a standalone server.
Add a standby replica and backup from that as to not affect the performance of your primary node.
Add pauses between backup actions, rather than everything at once can you backup over a longer period with gaps in between to improve the overall performance.

Related

Why is disk IO on my new AWS EC2 instance so much slower?

I have a regular EC2 instance A with a 200GB SSD filled with data. I used this disk to create an AMI and used that AMI to spin up another EC2 instance B with the same specs.
B started almost instantaneously which surprised me since I thought there would be a delay while AWS copies my 200GB EBS to the SSD corresponding to the new instance. However I noticed IO is extremely slow on B. It takes 3x as long to parse data on B.
Why is this, and how can I overcome this? It's too slow for my application which requires fast disk IO.
This happens because a newly-created EBS volume is built from S3 on-demand: when EC2 first reads a block from that volume it's retrieved from S3. You only get the "full" EBS performance once all blocks have been loaded. This is a huge problem, btw, for big databases restored from snapshot.
One solution may be fast snapshot restore. Although the docs don't describe what's happening behind the scenes, my guess is that they do a parallel disk copy from an existing EBS image. However, you will pay $0.75 per hour per snapshot, and are limited to 10 restores per hour.
Given the use-case that you described in another question, I think that the best solution is to keep an on-demand instance that you start and stop for your job. Assuming you're using Linux, you are charged per-second, so if you only run for 10-20 minutes out of the hour, you'll pay a pro-rated price. And unlike spot instances, you'll know that the machine will always be available and always be able to finish the job.
Another alternative is to just leave the spot instance running. If you're running for a significant portion of every hour, you're not really saving that much by shutting the instance down.

Does AWS DBInstance maintenance keep data intact

We are using CloudFormation Template to create MySQL AWS::RDS::DBInstance.
My question is when there is maintenance in progress while applying OS upgrades or software/security patches, will
Database Instance be unavailable for the time of maintenance
Does it wipe out data from database instance during maintenance?
If answer to first is yes, will using DBCluster help avoid that short downtime, if I use more than one instances?
From the documentation I did not receive any indication that there is any loss of data possibility.
Database Instance be unavailable for the time of maintenance
They may reboot the server to apply the maintenance. I've personally never seen anything more than a reboot, but I suppose it's possible they may have to shut it down for a few minutes.
Does it wipe out data from database instance during maintenance?
Definitely not.
If answer to first is yes, will using DBCluster help avoid that short
downtime, if I use more than one instances?
Yes, a database in cluster mode would fail-over to another node while they were applying patches to one node.
I am actively working of RDS Database system from the last 5 years. Based on my experience, my answer to your questions as follows in BOLD.
Database Instance be unavailable for the time of maintenance
[Yes, Your RDS system will be unavailable during maintenance of database]
Does it wipe out data from database instance during maintenance?
[ Definitely BIG NO ]
If answer to first is yes, will using DBCluster help avoid that short downtime, if I use more than one instances?
[Yes, In cluster Mode or Multi A-Z Deployment, Essentially AWS apply the patches on the backup node or replica first and then failover to this patch instance. Last, there would be some downtime during the g switchover process]

What AWS service can I use to efficiently process large amounts of S3 data on a weekly basis?

I have a large amount of images stored in an AWS S3 bucket.
Every week, I run a classification task on all these images. The way I'm doing it currently is by downloading all the images to my local PC, processing them, then making database changes once the process is complete.
I would like to reduce the amount of time spent downloading images to increase the overall speed of the classification task.
EDIT2:
I actually am required to process 20,000 images at a time to increase performance of the classification engine. This means I can't use Lambdas since the maximum option for RAM available is 3GB and I need 16GB to process all 20,000 images
The classification task uses about 16GB of RAM. What AWS service can I use to automate this task? Is there a service that can be put on the same VLAN as the S3 Bucket so that images transfer very quickly?
The entire process takes about 6 hours to do. If I spin up an EC2 with 16GB of RAM it would be very cost ineffective as it would finish after 6 hours then spend the remainder of the week sitting there doing nothing.
Is there a service that can automate this task in a more efficient manner?
EDIT:
Each image is around 20-40KB. The classification is a neural network, so I need to download each image so I can feed it through the network.
Multiple images are processed at the same time (batches of 20,000), but the processing part doesn't actually take that long. The longest part of the whole process is the downloading part. For example, downloading takes about 5.7 hours, processing takes about 0.3 hours in total. Hence why I'm trying to reduce the amount of downloading time.
For your purpose you can still use EC2 instance. And if you have large amount of data to be downloaded from S3, you can attach and EBS volume to the instance.
You need to setup the instance with all the tools and software required for running your job. And when you don't have any process to run, you can shut down the instance. And boot it up when you want to run the process.
EC2 instances are not charged for the time they are in stopped state. You will be charged for the EBS volume and Elasitc IP attached to the Instance.
You also will be charged for the storage of the EC2 image on S3.
But I think these cost will be less than the cost of running EC2 instance all the time.
You can schedule start and stop the instance using AWS instance scheduler.
https://www.youtube.com/watch?v=PitS8RiyDv8
You can also use AutoScaling but that would be more complex solution than using the Instance Scheduler.
I would look into Kinesis streams for this, but it's hard to tell because we don't know exactly what processing you are doing to the images

AWS architecture help for running database dumps

I have mysql running on one ec2-instance and tableau uses this database. mysqldump runs from production servers every 4 hours during which the system is down for probably 10-15 mins due to the dump. I am planning to have another ec2 instance with mysql running and and elb on top of these two instances so that the system wont be down trough the dump. For this I might have to de-register the instances from elb during the dump and register them back after the dump. Is this the right way to do it in the situations like this?
You can't use an ELB with MySQL servers. The ELB wouldn't know which server was master and which was slave, so it wouldn't know which to send updates to.
Is there any reason you aren't using Amazon's RDS service for your database servers? It provides automated snapshots that don't cause any down-time. It also makes it easy to create a read-replica against which you could perform mysqldumps without affecting the main server.
Currently you are taking logical backups of your system every 4 hours. Logical backups in most cases should only be used in a worst case scenario. In the event of a restore, logical backups are very slow compared to alternatives, such as snapshots and binary backups. If snapshoting using Amazon RDS or any of the other multitude of alternatives out there in your environment is not an option, I would look into Xtrabackup. This is a free stand alone HOT online binary backup tool that can be used with a Vanilla install of MySQL. This should not bring down your production server, assuming you are using InnoDB and not an alternative storage engine such as MyISAM. I personally used it for hot online binary backups and to automate building slaves in my previous work environment. A binary backups bottleneck is your network speed in terms of the restore process and is exponentially faster than a logical backup.
If setting up another MySQL instance is your only option look into GTID replication and/or Master-Passive HA environment in order to take the mysqldump off of the secondary non-active production server so that your production environment does not go down.
The bottom line is that you should not be taking production down to do a logical backup every 4 hours. This is def not ideal in any production environment.
Have a look at Amazon Database Migration Service (https://aws.amazon.com/dms/). It allows you to do zero-downtime database migration or just synchronization.

Downgrade Amazon RDS Multi AZ deployment to Standard Deployment

What might happen if I downgrade my Multi AZ deployment to standard deployment? Is there any possibility of i/o freeze or data loss? if yes, what might be the proper way to minimize downtime of data availability.
I have tried downgrading from Multi AZ deployment to a standard deployment.
The entire process took around 2-3 minutes (The transition time should depend upon your database size). The transition was seamless. We did not experience any downtime. Our website was working as expected during this period.
Just to ensure that nothing gets affected, I took a snapshot and a manual data base dump before downgrading.
Hope this helps.