Is it possible to replicate a specific S3 folder between 2 buckets? - amazon-web-services

Does anyone know if it is possible to replicate just a folder of a bucket between 2 buckets using AWS S3 replication feature?
P.S.: I don't want to replicate the entire bucket, just one folder of the bucket.
If it is possible, what configurations I need to add to filter that folder in the replication?

Yes. Amazon S3's Replication feature allows you to replicate objects at a prefix (say, folder) level from one S3 bucket to another within same region or across regions.
From the AWS S3 Replication documentation,
The objects that you want to replicate — You can replicate all of the objects in the source bucket or a subset. You identify a subset by providing a key name prefix, one or more object tags, or both in the configuration.
For example, if you configure a replication rule to replicate only objects with the key name prefix Tax/, Amazon S3 replicates objects with keys such as Tax/doc1 or Tax/doc2. But it doesn't replicate an object with the key Legal/doc3. If you specify both prefix and one or more tags, Amazon S3 replicates only objects having the specific key prefix and tags.
Refer to this guide on how to enable replication using AWS console. Step 4 talks about enabling replication at prefix level. The same can be done via Cloudformation and CLI as well.

Yes you can do this using the Cross-Region Replication feature. You can replicate the object either in the same region or a different one. The replicated object in the new bucket will keep their original storage class, object name and object permissions.
However, you can change the owner to the new owner of the destination bucket.
Despite all of this, there are disadvantages of this feature:-
You cannot replicate objects which are present in the source bucket
before you create the replication rule using CRR. Only the ones
which are created after replication rule can be created.
You cannot use SSE-C encryption in replication.

You can do this with sync command.
aws s3 sync s3://SOURCE_BUCKET_NAME s3://NEW_BUCKET_NAME
You must grant the destination account the permissions to perform the cross-account copy.

Related

AWS S3: Is there a way to replicate objects from destination to source bucket

We have S3 replication infrastructure in place to redirect PUTs/GETs to replica (destination) S3 bucket if primary (source) is down.
But I'm wondering how to copy objects from destination bucket to source once primary is restored.
You can use Cross-Region Replication - Amazon Simple Storage Service.
This can also be configured for bi-directional sync:
Configure CRR for bucket-1 to replicate to bucket-2
Configure CRR for bucket-2 to replicate to bucket-1
I tested it and it works!
CRR requires that you have Versioning activated in both buckets. This means that if objects are overwritten, then the previous versions of those objects are still retained. You will be charged for storage of all the versions of each object. You can delete old versions if desired.

Custom ACL for EMR Hive output objects written to S3

set fs.s3.canned.acl=BucketOwnerFullControl;
above line is an example of configuring emr's hive jobs to write objects to s3 using canned ACL (http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-s3-acls.html)
I was wondering if I can have custom ACL in the same way.
Use case:
EMR writes to S3 (regionA) which is then replicated to regionB. And Athena in RegionB to query the replicated objects. Inspite of RegionB being the owner of the bucket, the objects from regionA replicated under this bucket are not owned by RegionB.
So if anyone knows a way to set the ACL of the objects to allow read by cross account, I would appreciate the help.
Thanks.
EMR doesn't support writing objects with custom ACLs to s3 at the moment.

In AWS how to Copy file from one s3 bucket to another s3 bucket using lambda function

I have created two S3 buckets named as 'ABC' and 'XYZ'.
If i upload the file(Object) in 'ABC' bucket it should get automatically copy to 'XYZ'.
For Above scenario i have to write a lambda function using node.js
i am newly learning the lambda so if you provide the details steps it will be good for me.
it will be good if we can do it by web console otherwise Np.
This post should be useful to copy between buckets of same region, https://aws.amazon.com/blogs/compute/content-replication-using-aws-lambda-and-amazon-s3/
If the usecase you are trying to achieve is for DR purpose in another region, you may use this https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/. S3 natively does the replication for you but it's unclear if that is in same region or different region from your question

backing up s3 buckets best practice

I want to do a daily backup for s3 buckets. I was wondering if anyone knew what was best practice?
I was thinking of using a lambda function to copy contents from one s3 bucket to another as the s3 bucket is updated. But that won't mitigate against an s3 failure. How do I copy contents from one s3 bucket to another Amazon service like Glacier using lamda? What's the best practice here for backing up s3 buckets?
NOTE: I want to do a backup not archive (where content is deleted afterward)
Look into S3 cross-region replication to keep a backup copy of everything in another S3 bucket in another region. Note that you can even have the destination bucket be in a different AWS Account, so that it is safe even if your primary S3 account is hacked.
Note that a combination of Cross Region Replication and S3 Object Versioning (which is required for replication) will allow you to keep old versions of your files available even if they are deleted from the source bucket.
Then look into S3 lifecycle management to transition objects to Glacier to save storage costs.

Does AWS S3 cross-region replication use same URL for multiple regions?

Using S3 cross-region replication, if a user downloads http://mybucket.s3.amazonaws.com/myobject , will it automatically download from the closest region like cloudfront? So no need to specify the region in the URL like http://mybucket.s3-[region].amazonaws.com/myobject ?
http://aws.amazon.com/about-aws/whats-new/2015/03/amazon-s3-introduces-cross-region-replication/
Bucket names are global, and cross-region replication involves copying it to a different bucket.
In other words, example.us-west-1 and example.us-east-1 is not valid, as there can only be one bucket named 'example'.
That's implied in the announcement post- Mr. Barr is using buckets named jbarr and jbarr-replication.
Using S3 cross-Region replication will put your object into two (or more) buckets in two different Regions.
If you want a single access point that will choose the closest available bucket then you want to use Multi-Region Access Points (MRAP)
MRAP makes use of Global Accelerator and puts bucket requests onto the AWS backbone at the closest edge location, which provides faster, more reliable connection to the actual bucket. Global Accelerator also chooses the closest available bucket. If a bucket is not available, it will serve the request from the other bucket providing automatic failover
You can also configure it in an active/passive configuration, always serving from one bucket until you initiate a failover
From the MRAP page on AWS console it even shows you a graphical representation of your replication rules
s3 is global service, no need specify the region. The S3 name has to be unique globally.
when you create s3, you need specify the region, however it doesn't mean you need put the region name when you access it. To speed up the access speed from other region, there are several options like
-- Amazon S3 Transfer Acceleration with same bucket name.
-- Or use set up another bucket with different name in different region and enable cross region replication. Create an origin group with two origins for cloudfront.