AWS cross region replication to multiple regions? - amazon-web-services

I am trying to set up cross region replication so that my original file will be replicated to two different regions. Right now, I can only get it to replicate to one other region.
For example, my files are on US Standard. When a file is uploaded it is replicated from US Standard to US West 2. I would also like for that file to be replicated to US West 1.
Is there a way to do this?

It appears the Cross-Region Replication in Amazon S3 cannot be chained. Therefore, it cannot be used to replicate from Bucket A to Bucket B to Bucket C.
An alternative would be to use the AWS Command-Line Interface (CLI) to synchronise between buckets, eg:
aws s3 sync s3://bucket1 s3://bucket2
aws s3 sync s3://bucket1 s3://bucket3
The sync command only copies new and changed files. Data is transferred directly between the Amazon S3 buckets, even if they are in different regions -- no data is downloaded/uploaded to your own computer.
So, put these commands in a cron job or a Scheduled Task to run once an hour and the buckets will nicely replicate!
See: AWS CLI S3 sync command documentation

Related

Does S3 Replication/S3 Batch Ops offer Data Integrity?

I have a use case where we want to transfer data between AWS Accounts. I want to use the S3 Replication/S3 Batch Ops/DataSync provided they can ensure the data integrity so that I don't have to use additional checks after data is transferred.
I have used S3 document sync across two different AWS accounts via AWS CLI. My use case was to push data from one bucket to another bucket in different AWS accounts so I used AWS CLI command.
I was satisfied with the Data Integrity in this process. Next time when I used to run the sync, it used to transfer only newly created item in source S3 bucket.

download files from AWS S3 bucket in parallel

I want to download million of files from S3 bucket which will take more than a week to be downloaded one by one - any way/ any command to download those files in parallel using shell script ?
Thanks,
AWS CLI
You can certainly issue GetObject requests in parallel. In fact, the AWS Command-Line Interface (CLI) does exactly that when transferring files, so that it can take advantage of available bandwidth. The aws s3 sync command will transfer the content in parallel.
See: AWS CLI S3 Configuration
If your bucket has a large number of objects, it can take a long time to list the contents of the bucket. Therefore, you might want to sync the bucket by prefix (folder) rather than trying it all at once.
AWS DataSync
You might instead want to use AWS DataSync:
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect... Move active datasets rapidly over the network into Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.
DataSync uses a protocol that takes full advantage of available bandwidth and will manage the parallel downloading of content. A fee of $0.0125 per GB applies.
AWS Snowball
Another option is to use AWS Snowcone (8TB) or AWS Snowball (50TB or 80TB), which are physical devices that you can pre-load with content from S3 and have it shipped to your location. You then connect it to your network and download the data. (It works in reverse too, for uploading bulk data to Amazon S3).

AWS S3 replication without versioning

I have enabled AWS S3 replication in an account and I want to replicate the same S3 data to another account and it all works fine. But I don't want to use S3 versioning because of its additional cost.
So is there any other way to accommodate this scenario?
The automated Same Region Replication(SRR) and Cross Region Replication(CRR) requires versioning to be activated due to the way that data is replicated between S3 buckets. For example, a new version of an object might be uploaded while a bucket is still being replicated, which can lead to problems without having separate versions.
If you do not wish to retain other versions, you can configure Amazon S3 Lifecycle Rules to expire (delete) older versions.
An alternative method would be to run the AWS CLI aws s3 sync command at regular intervals to copy the data between buckets. This command would need to be run on an Amazon EC2 instance or even your own computer. It could be triggered by a cron schedule (Linux) or a Schedule Task (Windows).

AWS Synchronize S3 bucket with EC2 instance

I would like to synchronize an S3 bucket with a single directory on multiple Windows EC2 instances. When a file is uploaded or deleted from the bucket, I would like it to be immediately pushed or removed respectively from all of the instances. New instances will be added/removed frequently (multiple times per week). Files will be uploaded/deleted frequently as well. The files sizes could be up to 2gb in size. What AWS services or features can solve this?
Based on what you've described, I'd propose the following solution to this problem.
You need to create an SNS topic for S3 change notifications. Then you need a script that's going to subscribe to this topic from your machines. This script will update files on your machines based on changes coming from S3. It should support basic CRUD operations.
Run this script and then sync contents of your S3 to your machine when it starts using aws-cli mentioned above.
Yes, i have used the aws cli s3 "sync" command to keep a local server's content updated with S3 changes. It allows a local target directory's files to be synchronized with a bucket or prefix.
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Edit : This following answer is to sync EC2 with S3 Bucket, Source : EC2 & Destination : Bucket.
If it were for only one instance, then only aws cli sync(with --delete option) would have been worked for both: putting files to S3 bucket and to delete.
But the case here is for Multiple Instances, so if we use aws s3 sync with --delete option, there would be a problem.
To explain it simply, consider Instance I1 with files a.jpg & b.jpg to be synced to Bucket.
Now a CRON job has synced the files with the S3 bucket.
Now we have Instance I2 which has files c.jpg & d.jpg.
So when the CRON job of this Instance runs, it puts the files c.jpg & d.jpg and also deletes the files a.jpg & b.jpg, because those files doesn't exist in Instance I2.
So to rectify the problem we have two approaches :
Sync all files across all Instances(Costly and removes the purpose of S3 altogether).
Sync files without the --delete option, and implement the deletion separately(using aws s3 rm).

Copying multiple files in large volume between two s3 buckets which are in the different regions

I need to copy a large chunk of data, around 300 GB of files from say bucket A which is in us-east region and to bucket B which is in ap-southeast region. Also I need to change the structure of the bucket. Like I need to push the files to different folders on bucket B according to the image name which is in the bucket A. I tried to using AWS Lambda but it's not available in ap-southeast.
Also how much would it cost since data will be transferred between regions?
Method
The AWS Command-Line Interface (CLI) has the aws s3 cp command that can be used to move objects between buckets (even in different regions), and can rename them at the same time.
aws s3 cp s3://bucket-in-us/foo/bar.txt s3://bucket-in-ap/foo1/foo2/bar3.txt
There is also the aws s3 sync option that can be used to synchronize content between two buckets, but that doesn't help your requirement to rename objects.
Cost
Data Transfer charges from US regions to another region are shown on the Amazon S3 pricing page as US$0.02/GB.
Use bucket replication and then create another bucket in your target region and do your S3 object key manipulation.
Read more on S3 cross-region replication.