How to copy / move data from AWS EBS to EFS? - amazon-web-services

I've mostly found old answers and I'm not sure I understand the AWS guide.
The AWS guide for some reason assumes I have an NFS server as source. Other people suggest simply mounting EBS and EFS and using a simple "cp" command. Unfortunately that's very slow.
What's the fastest way of moving data from EBS to EFS?
Is EFS File Sync the right option?
Old answers suggesting simple cp command
AWS switch from EBS to EFS
AWS EBS Snapshot to EFS
File Sync Guide assuming NFS
https://docs.aws.amazon.com/efs/latest/ug/walkthrough-file-sync-ec2.html

I spoke to support and the File Sync tool only makes sense for existing NFS servers.
You may run into problems with burst credits on the EFS, depending on how much you migrate. For me it seems like a long running rsync is the best option. There is no better way to do this.

Related

Backup and restore files in ECS Tasks in AWS

Description
I have an ECS cluster that has multiple tasks. Every task is a Wordpress website. These task will automatically start and stop based on some Lambda functions. To persist the files when a task goes down for some reason I tried using EFS, but that is very slow when the burst credits ran out.
Now I use the volume type: Bind Mount (just using the normal filesystem, nothing fancy here). The websites are a lot faster but not persisted anymore. When an instance goes down the files of that website are gone. ECS starts the task again but without the files the websites break.
First solution
My first solution is to run an extra container in the task that makes backups once a day and stores it in S3. All files are automatically stored in a .tar.gz and uploaded to S3. This all works fine but I don't have a way yet to restore these backups yet. These things should be considered:
When a new tasks starts: need to check if current task/website already has a backup
If the latest backup should be restored: download .tar.gz from S3 and unzip it
To realize this I think it should be a bash script or something like it and run it on startup of a task?
Possible second solution
Another solution I thought about and I think is a lot cleaner is instead of having an extra container doing backups every day. Mount EFS to each task and have it sync data between the Bind Mount and EFS. This way EFS is a backup storage location instead of the working file system for my websites. Other pros: The tasks/websites will have more recent backups and I have more CPUs and Memory in my EC2 instances in my ECS cluster for other tasks.
Help?
I would like some opinions on the solutions above and maybe some advice if the second solution is any good and some tips on how to implement it. Any other advice would be helpful too!

Can I use create a share between an EC2 instances and my local machine?

Just to give you a context... I'm new to the aws world and all the services that provides.
I have a legacy application which I need to share some binarys with a client, and I was trying to use a ec2 instance (Amazon Linux AMI) with samba, to map it into a windows local machine.
I was able to establish a conection with another ec2 instances (same vpc), just as a tryout. But I wasn't able to do so with my windows machine or even with a linux vm I have.
The inbound rules for this concept ec2 instance was fully open (All traffic allowed).
Main question
Is it possible to do? Share a file system between a ec2-instances with a (over internet) local machine?
Just saying:
S3 storage isn't an option.
And in my region FSX still ain't implemented and for latency reasons is a no go.
Please ask as many questions you want, I'll try to anwser them as fast as I Can.
Kind Regads.
TL;DR - it's possible, but there's no 'simple' solution (in my opinion).
I thought of two possible solutions that you can implement, here we go ...
1: AWS EFS, AWS Direct Connect and Docker
A possible solution would be using AWS Elastic File System (EFS), AWS Direct Connect and a Docker Linux container.
Drawbacks
If it's the first time you encounter with the above AWS services or Docker, then it's going to be a bit of a journey to learn about them
EFS pricing - it's not so cheap, and you also need to consider the inbound and outbound traffic, it's best to use the calculator that is in the pricing page
EFS performance - if you only share files then it should be okay, but if you expect to get high speeds, then remember that it's not an EBS volume, so for higher speeds you need to pay more money
AWS Direct Connect pricing - you also need to take that into consideration
Security - I'm not sure how sensitive your data is, but you need to make sure you create a very strict VPC, with Security Groups and Network Access List rules - read about the VPC Security Best Practices
Steps to implement the solution
Follow the Walkthrough: Create and Mount a File System On-Premises with AWS Direct Connect and VPN, also, here are the steps on how to combine it with Docker
(Optional) To make it a bit easier - for Windows to "support" Linux file-system, you should use Windows Git Bash. If you're not sure how to use install 3rd-party apps in Windows Git Bash (like aws-vault) then read this blog post
Create an EFS in AWS, and mount it to your EC2 instance, read more about it here
Use AWS Direct Connect to connect to your VPC from your local Windows machine
Install Docker for Windows on your local machine
Create a Docker Volume, and mount the same EFS to that volume - a good example for this step
Test it - SSH to your EC2 instance, create a file on the EFS volume and then check in your local Docker Linux container that this file appears on the EFS volume
I omitted the security steps because it's up to you how strict you want your solution to be.
2: Using S3 as a shared file-system
You can try out this tool s3fs-fuse, but you'll still need to use a Docker Linux container since you're on Windows. I haven't tested it but it looks promising. You can read this blog post, it's a step-by-step tutorial on how to do it, and also shares some other possible solutions.

Google Cloud Persistent Disk Backup Strategy

We are working on setting up a GPC environment with some Windows servers. Traditional backup is to backup our data daily. I know that I can run a disk snapshot.
gcloud compute --project=projectid disks snapshot diskname --zone=zonename --snapshot-names=snapshotname
I also understand that the snapshot is a forever incremental snapshot. However, I want an ability to schedule this. I am not sure what the best approach is for this. I am not sure if this is even the best way to do this.
I appreciate any guidance in regards to do backups of instances. I have this created in AWS using Lambda I am just not sure how to do this in GPC.
Google App Engine can schedule tasks. You could use this to invoke a function running in a GAE app that calls the snapshot API.
You can now set snapshot schedule to backup persistent disks.
https://cloud.google.com/compute/docs/disks/scheduled-snapshots

Whats the best way/method to backup and AWS EFS to s3

Amazon suggests using a backup EFS and data pipeline for EFS backups but I am thinking of a simpler method to do the backup on a s3 bucket. What are your thoughts/suggestions, any scripts?
I would recommend to use whatever Amazon does. However, for smaller projects where cost effectiveness overweights reliability, custom approaches could be utilized.
Checkout the s3 sync AWS CLI command. I believe that is what you are looking for:
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Uploading File to S3, then process in EMR and last transfer to Redshift

I am new in this forum and technology and looking for your advice. I am working on POC and below are my requirement. Could you please guide me the way to achieve the result.
Copy data from NAS to S3.
Use S3 as a source in EMR Job with target to S3/Redshift.
Any link, pdf will also helpful.
Thanks,
Pardeep
There's a lot here that you're asking and there's not a lot of info on your use case to go by so I'm going to be very general in my answer and hopefully it at least points you in the right direction.
You can use Lambda to copy data from your NAS to S3. Assuming your NAS is on-premise and assuming you have a VPN into your VPC or even Direct Connect configured, then you can use a VPC enabled Lambda function to read from the NAS on-premise and write to S3.
If your NAS is running on EC2 the above will remain the same except there's no need for VPN or Direct Connect.
Are you looking to kick off the EMR job from Lambda? You can use S3 as a source for EMR to then output to S3 either from within Lambda or via other means as well.
If you can provide more info on your use case we could probably give you a better quality answer.
Copy data from NAS to S3.
Really depends on the amount of data and the frequency on which you run the copy job. If the data in GBs, then you can install AWS CLI on a machine where NFS is attached. AWS CLI command like CP can be multithreaded and can easily copy your datasets to S3. You might also enable S3 transfer acceleration to speed things up. Having AWS Direct connect to your company network can also speed up any transfers from on-premis to AWS.
http://docs.aws.amazon.com/cli/latest/topic/s3-config.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html
https://aws.amazon.com/directconnect/
If the data is in TBs (which is probably distributed across multiple volumes), then you might have to consider using physical transfer utilities like AWS Snowball,AWSImportExport or AWS Snowmobile based on the use-case.
https://aws.amazon.com/cloud-data-migration/
Use S3 as a source in EMR Job with target to S3/Redshift.
Again, as there are lot of applications on EMR, there are lot of choices. Redshift supports COPY/UNLOAD commands to S3 which any application can make use of. If you want to use SPARK on EMR , then installing databricks spark-redshift driver is a viable option for you.
https://github.com/databricks/spark-redshift
https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html
https://aws.amazon.com/blogs/big-data/powering-amazon-redshift-analytics-with-apache-spark-and-amazon-machine-learning/