How to synchronize data between 2 EBS volumes in AWS? - amazon-web-services

I have 2 EBS volumes in 2 availability zones in the same region, one is primary and another is backup. Generally, I just read and write data from primary volume. Is it possible to synchronize data from primary to back up EBS volume? if yes, how can I do that?
Thanks

1 year after this question has been posted but I hope it helps anyone looking into this.
Amazon EFS is a great solution. An alternative for what you require is using Snapshots. With AWS Backup you can schedule Amazon EBS snapshots and have them shared across AZ or even different accounts.
As very well proposed in the previous answer, you should first try to understand your performance requirements for the workload and also the RPO and RTO requirements.
Comparing EFS and EBS, I could say that:
A. EFS (Elastic File System) is a managed parallel NFS (based on NFSv4). You are going to mount it as a directory. EFS leverages the same technology as EBS, and the disks are replicated in the AZ and also between AZ. You don’t chose or control the disks, just what performance you expect from the managed service.
B. EBS (Elastic Block Storage) is also network attached but is a block storage, which means that your OS will see it as a disk and not a directory. You have to format it as a file system (or group it with other EBS and create LVM, RAID, etc) before you can use it. EBS are replicated within the same AZ but bot across AZ. You can have snapshots of your EBS and copy them to the other AZ, for example.
So you have to take into account not only the performance you require but also what type of storage (block or file) your application need.

Can you use EFS for this? You might be able to avoid having to replicate the data is you can have the primary and backup instance/applications looking at the same data volume.

Related

can i use multi-attach ebs volume with multiple ec2 in one zone for read and write some shared files?

i have multiple micro services and all use some local files, now i want to run each micro service on EC2 instance separately and perform file operations
(i found some hints from here :- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html )
so i want to know, is it possible?
if possible, then what should configuration of EC2 ?
if not possible then how can i archive it?
Definitely, yes.
According to documentation, there are some limitations:
Your EC2 instances should be in one Availability Zone
EBS multi-attach supported only for io1/io2 EBS volume family
You should use a file system that's cluster-aware (not EX4, etc...)
In case of microservices communication, best practice is use EFS that can be mounted to your EC2 instances. In case of EFS, you can use share storage between availability zones within VPC that increases availability of your application.
Yes, it's possible. However, multiple writes at a time might result in corrupted files (been there, done that). You can install Gluster to prevent that.
On the other hand, It's recommend to use EFS instead of EC2 multi attach for this kind of work, just remember to put dump file to EFS to increase iops.

How to use single EBS volume with a feel of spot EC2 instance

I am preparing for AWS certification and I found following question in mock test.
The Question is as mentioned in below image :
And they have mentioned EBS volume in the question, I selected to choose "Provisioned IOPS SSD Volume" to implement scalable and high throughput.
But the correct answer was EFS with the following justification.
But, I think EBS volume can only be mapped with one EC2 instance at a time. Can we map one EBS volume with feet of multiple EC2 instance ?
No, you can't map an EBS volume to more than one instance at a time.
But EFS doesn't use EBS, and an EFS filesystem had no meaningful limit on the number of EC2 instances that can access it simultaneously.
The question isn't a very good one. In fact, it proposes an initial scenario that you would never use.
EBS volumes attached to members of an auto-scaling group would never be used to store CMS documents uploaded by users, because those volumes will ether be destroyed or left attached to nothing when the cluster scales in and some of the instances are terminated due to the decreased load.
The giveaway to the correct answer lies in the fact that the question asks for a scalable, high-throughput, POSIX-compliant filesystem and this is pretty much the definition of Amazon EFS. EFS will scale larger than the largest provisioned IOPS EBS volume.

Cost-effectively store volumes that I won't need for a few months?

I have two EC2 instances I created this summer for personal use while learning basic ML concepts and doing Kaggle competitions. I'd like to save the work on them on eventually be able to use them again if I'm interested in competing in a Kaggle competition again without having to setup a new instance, but probably won't need them for a few months (and when I do need them, it won't be at a moment's notice).
Each instance has an 128gb EBS gp2 volume that's costing me ~$13/month. I was wondering if there's a way that I could pull these off AWS so that I'm not still paying for them when I don't need them. Is there a feature where I can store a snapshot outside of AWS and eventually upload it to AWS and restore the volumes if I need them?
Or is there a much cheaper (slower) storage method for keeping them on AWS? (sc1 volumes are $0.025/GB-month, but is there something even cheaper?)
Edit: Clarified volume type ($0.10/GB-month gp2)
Edit2: I think my best bet for now is to snapshot them since each only has ~30GB of used space (60GB*$0.05 = $3/month) and delete the original volumes.
If you wish to retain the exact contents of the disk volumes, the choice really comes down to:
Amazon EBS volume snapshots
ISO images
Amazon EBS volume snapshots are only charged for blocks that are used. They are the easiest to create and restore. It is not possible to export an Amazon EBS snapshot.
If you wish to move a disk image out of Amazon EC2 (eg to download, or to store in Amazon S3), use a standard disk utility to create a .iso image of the disk. This can later be restored to a new disk volume, and can even be directly mounted in read-only mode using disk utilities.
You can put all this data into Amazon Glacier which is far more cheaper ( around 10% cost )

AWS - HA NFS - Best practices

Anyone have a sound strategy for implementing NFS on AWS in such a way that it's not a SPoF (single point of failure), or at the very least, be able to recover quickly if an instance crashes?
I've read this SO post, relating to the ability to share files with multiple EC2 instances, but it doesn't answer the question of how to ensure HA with NFS on AWS, just that NFS can be used.
A lot of online assets are saying that AWS EFS is available, but it is still in preview mode and only available in the Oregon region, our primary VPC is located in N. Cali., so can't use this option.
Other online assets are saying that GlusterFS is a way to go, but after some research I just don't feel comfortable implementing this solution due to race conditions and performance concerns.
Another options is SoftNAS but I want to avoid bringing in an unknown AMI into a tightly controlled, homogeneous environment.
Which leaves NFS. NFS is what we use in our dev environment and works fine, but it's dev, so if it crashes we go get a couple beers while systems fixes the problem, but on production, this is obviously a no go.
The best solution I can come up with at this point is to create an EBS and two EC2 instances. Both instances will be updated as normal (via puppet) to maintain stack alignment (kernel, nfs libs etc), but only one instance will mount the EBS. We set up a monitor on the active NFS instance, and if it goes down, we are notified and we manually detach and attach to the backup EC2 instance. I'm thinking we also create a network interface that can also be de/re-attached so we only need to maintain a single IP in DNS.
Although I suppose we could do this automatically with keepalived, and a IAM policy that will allow the automatic detachment/re-attachment.
--UPDATE--
It looks like EBS volumes are tied to specific availability zones, so re-attaching to an instance in another AZ is impossible. The only other option I can think of is:
Create EC2 in each AZ, in public subnet (each have EIP)
Create route 53 healthcheck for TCP:2049
Create route 53 failover policies for nfs-1 (AZ1) and nfs-2 (AZ2)
The only question here is, what's the best way to keep the two NFS servers in-sync? Just cron an rsync script between them?
Or is there a best practice that I am completely missing?
There are a few options to build a highly available NFS server. Though I prefer using EFS or GlusterFS because all these solutions have their downsides.
a) DRBD
It is possible to synchronize volumes with the help of DRBD. This allows you to mirror your data. Use two EC2 instances in different availability zones for high availability. Downside: configuration and operation is complex.
b) EBS Snapshots
If a RPO of more than 30 minutes is reasonable you can use periodic EBS snapshots to be able to recover from an outage in another availability zone. This can be achieved with an Auto Scaling Group running a single EC2 instance, a user-data script and a cronjob for periodic EBS snapshots. Downside: RPO > 30 min.
c) S3 Synchronisation
It is possible to synchronize the state of an EC2 instance acting as NFS server to S3. The standby server uses S3 to stay up to date. Downside: S3 sync of lots of small files will take too long.
I recommend watching this talk from AWS re:Invent: https://youtu.be/xbuiIwEOCAs
AWS has reviewed and approved a number of SoftNAS AMIs, which are available on AWS Marketplace. The jointly published SoftNAS Architecture on AWS White Paper provides more details:
Security (pages 4-11)
HA across AZs (pages 13-14)
You can also try a 30 day free trial to see if it meets your needs.
http://softnas.com/tryaws
Full disclosure: I work for SoftNAS.

How stable is EBS?

I'm thinking about saving data from EC2 instances to the EBS and later save the result on S3. I don't have a lot of experience working with EBS, so my questions are:
How stable they are? I mean how often (if any) you had problem with EBS. Do they crash if overloaded or something like this?
What are the chances of loosing data from EBS?
Is it possible to mount one EBS to the multiple Instances? (let's say two ec2 share the same ebs )
I assume you've read AWS's take on EBS
Pretty stable. Last year, 10% of EBS volumes failed in 2-3 data centers in us-east for a couple hours. This is the only issue I've ever had with them.
I've never lost data from EBS. Even if I had, I take hourly snapshots (stored in s3), so I would have been just fine.
Not at the same time. To attach it to another instance, you must detach from the currently attached one.
Perhaps what you're look is s3fs - a way to mount s3 as a filesystem.
EBS is quite stable and every data you write is redundantly copied in 3 disks inside a AZ. If you take regular snapshots of your EBS volumes you can protect your data more. Since EBS operate in AZ scope it is recommended to moves assets like user documents, images, videos to Amazon S3. S3 offer more redundancy and availability than an EBS Volume.
You cannot mount single EBS volume to Multiple EC2 instances. You will have to use Solution like GlusterFS on AWS so that multiple EC2 instances can talk to common storage pool.