AWS Disaster recovery together with backup and storage - amazon-web-services

I have an implementation of hybrid AWS setup where I have an on-prem hadoop cluster and also replication enabled towards an AWS setup with similar hadoop cluster running at low capacity for disaster recovery. This is an active active disaster recovery setup in AWS. Is it still recommended to take backups for data that is stored on AWS?

Is it still recommended to take backups for data that is stored on AWS?
Not clear what AWS services you're referring to
Well, let's say you have an S3 bucket only bound to us-east-1 and that region becomes unavailable... You can't access your data. Therefore, it's encouraged to replicate to another region. However S3 supposedly has several 9's of availability, and if an AWS service is down in a major region, it's probably expected that a good portion of the internet is in-accessible, not only your data

Related

AWS RDS disaster recovery using cross-account

We are running AWS RDS PostgreSQL, with daily automatic snapshots, encrypted by AWS managed KMS key. My objective is to minimize risks and data loss, in case when main AWS account (running RDS) got compromised or RDS deleted/damaged in some way.
What we've implemented so far: RDS snapshots are shared with different (backup) account, periodically copied to backup account and re-encrypted with the KMS key from the backup account, to make copies local, and independent from the main AWS account.
I'm wondering if there are better ways to minimize recovery time objective and recovery point objective in case of a disaster event?
This AWS blog post seems to weigh the options well.
Automated backups are limited to a single AWS Region while manual snapshots and Read Replicas are supported across multiple Regions.
Having cross region Read replica would give you the best RPO and RTO as you can promote replica to be an independent instance which should improve your RPO / RTO
Alternatively, if you choose to use Amazon Aurora Backtrack it seems to offer a similar option to having a read replica but I do not have a personal experience with this feature so can't say how effective it is in improving RTO and RPO.
I wrote two scripts implementing flow at the diagram drawn above ^^^, the idea is to run them daily:
src_acc_take_share_rds_snapshot.py in src account:
list available RDS snapshots according to provided regexp
recrypt them with KMS key, shared from dst account
share recrypted RDS snapshots with the dst account
remove old decrypted snapshots
dst_acc_copy_shared_rds_snapshot_to_local.py in dst account
list RDS snapshots, shared in src account with dst account
copy RDS snapshots from src account to dst account
remove old decrypted snapshots
fire an SNS message if desired snapshot count != actual
and put them at GitHub https://github.com/mvasilenko/dr-rds-share-snapshot

Move AWS EC2 snapshot from S3 to Glacier

I ran a full initial snapshot of an EC2 instance I don't see a need for in the forseeable future. I plan to terminate the instance later today and turn off the domain DNS for the site pointing to it.
I'm aware AWS stores snapshots on S3, but Glacier is cheaper. So as an alternative, can I set a lifecycle policy on the snapshot so it automatically moves to Glacier after a period of time? If so, how exactly can I do this since the S3 console doesn't provide access to snapshot buckets? (The shorter the time-to-moving the better in my case -- I want the cheap, long-term storage)
Once moved, I'll want to delete the snapshot from S3. There will be no more incremental changes or snapshots; this is it.
Please be specific with CLI commands or steps if you don't mind -- I'm not terribly familiar with AWS yet.
There is no native way to do this. EBS snapshots are stored in S3, but that is a “behind the scenes” implementation detail. The snapshots are not visible in an S3 bucket, not are they exposed via the S3 API. So you cannot move them to Glacier.
A third party tool called N2WS that recently announced support for offloading snapshots to Glacier at AWS re:Invent 2018. However it stores the snapshots in its own format. It is running “on top of” AWS rather than doing it natively.
http://n2ws.com/

amazon web services - Durability

Can you let me know if data on below AWS technology keeps data on
Multiple Facilities? How many? Different Availability Zones?
S3, EBS, Dynamo DB
Also want to know in general what is the distance between two AZ, want to make sure that any catastrophe can destroy complete region?
Just to Start Point out All the above asked questions are easily answered in AWS Documentation.
What is Region and Availability-Zone ?
Refer This Documentation
Each region is a separate geographic area. Each region has multiple,
isolated locations known as Availability Zones.
Also want to know in general what is the distance between two AZ ?
I don't think any one would know answer to that , Amazon Does not Publish such kind of Information about their Data Centers,they are secretive about it.
Now to Start with S3 , As Per AWS Documentation:
Although, by default, Amazon S3 stores your data across multiple
geographically distant Availability Zones.
Now You can Also Enable Cross Region Replilcation as per AWS documentation but that will incur extra cost :
Cross-region replication is a bucket-level configuration that enables
automatic, asynchronous copying of objects across buckets in different
AWS Regions.
Now for EBS as per AWS Documentation :
Each Amazon EBS volume is automatically replicated within its
Availability Zone to protect you from component failure, offering high
availability and durability
Also As per Documentation You can Create Point In Time Snapshot and make it available in Another AWS Region and all the Snapshots are backed up on AWS S3.
Now for DyanamoDB as per AWS Documentation :
DynamoDB stores data in partitions. A partition is an allocation of
storage for a table, backed by solid-state drives (SSDs) and
automatically replicated across multiple Availability Zones within an
AWS Region.
Now you can make it available across region for more details please refer to this AWS Documentation
Hope This Clears your Doubts!
By default all these services replicate the data in different AZ(availability zones) which are in the same AWS region.
But AWS also provided the mechanism to replicate the data across different region(which you can choose), so that you can have more fault tolerant and low latency for the users(you can serve your users from the servers which is in the same region).
However keep in mind that replicating data across multiple zones involves more cost.
You can read AWS doc http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html to know where all aws regions and AZ presents to figure out the where they are located.
Whole Idea to keep different AZ and region is to provide high availability, so you shouldn't bother about the distance and availability, if you are having replication across multi AZ or region.
Edit :- Thanks to Michael for pointing out that EBS volumes are only replicated (mirrored) within the AZ where the volume is created

Amazon s3 vs Ec2 Storing Files

Which one is better for storing pictures and videos uploaded by user ?
Amazon s3 or Filesystem EC2 ?
While opinion-based questions are discouraged on StackOverflow, and answers always depend upon the particular situation, it is highly likely that Amazon S3 is your better choice.
You didn't say whether only wish to store the data, or whether you also wish to serve the data out to users. I'll assume both.
Benefits of using Amazon S3 to store static assets such as pictures and videos:
S3 is pay-as-you-go (only pay for the storage consumed, with different options depending upon how often/fast you wish to retrieve the objects)
S3 is highly available: You don't need to run any servers
S3 is highly durable: Your data is duplicated across three data centres, so it is more resilient to failure
S3 is highly scalable: It can handle massive volumes of requests. If you served content from Amazon EC2, you'd have to scale-out to meet requests
S3 has in-built security at the object, bucket and user level.
Basically, Amazon S3 is a fully-managed storage service that can serve static assets out to the Internet.
If you were to store data on an Amazon EC2 instance, and serve the content from the EC2 instance:
You would need to pre-provision storage using Amazon EBS volumes (and you pay for the entire volume even if it isn't all used)
You would need to Snapshot the EBS volumes to improve durability (EBS Snapshots are stored in Amazon S3, replicated between data centres)
You would need to scale your EC2 instances (make them bigger, or add more) to handle the workload
You would need to replicate data between instances if you are running multiple EC2 instances to meet request volumes
You would need to install and configure the software on the EC2 instance(s) to manage security, content serving, monitoring, etc.
The only benefit of storing this static data directly on an Amazon EC2 instance rather than Amazon S3 is that it is immediately accessible to software running on the instance. This makes the code simpler and access faster.
There is also the option of using Amazon Elastic File System (EFS), which is NAS-like storage. You can mount an EFS volume simultaneously on multiple EC2 instances. Data is replicated between multiple Availability Zones. It is charged on a pay-as-you-go basis. However, it is only the storage layer - you'd still need to use Amazon EC2 instance(s) to serve the content to the Internet.

SQL Backups to S3

I have the following Amazon EC2 configuration
Prod Web & DB server (Virginia)
Web & DB server (Oregon)
I would like to store my SQL backups in S3 so that they are available to be restored to my standby server in case the Virginia region goes down for any period of time (which has been known to happen :)
Here are the following 2 regions I am considering for my S3 bucket
US Standard
Oregon
I attempted first to specify Oregon. However, when I do that, I am unable (for some reason) to upload to that bucket from my Virginia instance. However, I am worried that if I specify US Standard, that my S3 bucket will not be available in the event Virginia becomes unavailable.
Does anyone have any recommendations for overcoming the issues with either of these scenarios?
Thanks!
My recommendation for you is to use RDS (Relational Database Service), which is basically managed RDBMS service for MySQL (or MS-SQL or Oracle). It takes care for backup and restore for the DB.
With MySQL is has the option to have an automatic stand-by in a different availability zone in each region. When you use the option for "Multi-AZ", it will create the stand-by with its backup in a synchronize way. This way your fail over will be very close to real time.