How to copy S3 objects between regions with Amazon AWS PHP SDK? - amazon-web-services

I'm trying to copy Amazon AWS S3 objects between two buckets in two different regions with Amazon AWS PHP SDK v3. This would be a one-time process, so I don't need cross-region replication. Tried to use copyObject() but there is no way to specify the region.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
Source:
http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectUsingPHP.html

You don't need to specify regions for that operation. It'll find out the target bucket's region and copy it.
But you may be right, because on AWS CLI there is source region and target region attributes which do not exist on PHP SDK. So you can accomplish the task like this:
Create an interim bucket in the source region.
Create the bucket in the target region.
Configure replication from the interim bucket to target one.
On interim bucket set expiration rule, so files will be deleted after a short time automatically from the interim bucket.
Copy objects from source bucket to interim bucket using PHP SDK.
All your objects will also be copied to another region.
You can remove the interim bucket one day later.
Or use just cli and use this single command:
aws s3 cp s3://my-source-bucket-in-us-west-2/ s3://my-target-bucket-in-us-east-1/ --recursive --source-region us-west-2 --region us-east-1

Different region bucket could also be different account. What others had been doing was to copy off from one bucket and save the data temporary locally, then upload to different bucket with different credentials. (if you have two regional buckets with different credentials).
Newest update from CLI tool allows you to copy from bucket to bucket if it's under the same account. Using something like what Çağatay Gürtürk mentioned.

Related

How do I copy S3 objects from one AWS account to another? [duplicate]

I have read-only access to a source S3 bucket. I cannot change permissions or anything of the sort on this source account and bucket. I do not own this account.
I would like to sync all files from the source bucket to my destination bucket. I own the account that contains the destination bucket.
I have a separate sets of credentials for the source bucket that I do not own and the destination bucket that I do own.
Is there a way to use the AWS CLI to sync between buckets using two sets of credentials?
aws s3 sync s3://source-bucket/ --profile source-profile s3://destination-bucket --profile default
If not, how can I setup permissions on my owned destination bucket to that I can sync with the CLI?
The built-in S3 copy mechanism, at the API level, requires the request be submitted to the target bucket, identifying the source bucket and object inside the request, and using a single set of credentials that has both authorization to read from the source and write to the target.
This is the only supported way to copy from one bucket to another without downloading and uploading the files.
The standard solution is found at http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html.
You can grant their user access to write your bucket or they can grant your user access to their bucket... but copying from one bucket to another without downloading and re-uploading the files is impossible without the complicity of both account owners to establish a single set of credentials with both privileges.
Use rclone for this. It's convenient but it does download and upload the files I believe which makes it slow for large data volumes.
rclone --config=creds.cfg copy source:bucket-name1/path/ destination:bucket-name2/path/
creds.cfg:
[source]
type = s3
provider = AWS
access_key_id = AAA
secret_access_key = bbb
[target]
type = s3
provider = AWS
access_key_id = CCC
secret_access_key = ddd
For this use case, I would consider Cross-Region Replication Where Source and Destination Buckets Are Owned by Different AWS Accounts
... you set up cross-region replication on the source
bucket owned by one account to replicate objects in a destination
bucket owned by another account.
The process is the same as setting up cross-region replication when
both buckets are owned by the same account, except that you do one
extra step—the destination bucket owner must create a bucket policy
granting the source bucket owner permission for replication actions.

Copy objects between buckets using aws sdk

Using AWS CLI; we can copy or sync files directly from one bucket to other. Using SDK; I can see api for download and upload. But can we directly copy files from one bucket to other bucket ( in different aws account) using sdk !
Yes. The CopyObject API call can copy an object between Amazon S3 buckets, including bucket in different regions and different accounts.
To copy objects between accounts, the one set of credentials requires sufficient permission to Read from the source bucket and Write to the destination bucket. You can either:
Use credentials from the destination account, and use a Bucket Policy on the source bucket to grant Read access, or
Use credentials from the source account, and use a Bucket policy on the destination bucket to grant Write access. Make sure you set ACL=public-read to pass ownership of the object to the destination Account.
Please note that it only copies one object at a time, so you would need to loop through a list of objects and call CopyObject for each one individually if you wish to copy multiple objects.
It's easy, see all the CLI commands with the help:
aws s3 --help
Upload a file:
aws s3 cp <path-to-file-from-local> s3://<S3_BUCKET_NAME>/<file-name>
Download a file:
aws s3 cp s3://<S3_BUCKET_NAME>/<file-name> <path-to-file-from-local>
Move a file:
aws s3 mv s3://<S3_BUCKET_NAME>/<file-name> s3://<S3_BUCKET_NAME>/<file-name>
You can use . to specify the current directory, eg:
aws s3 cp s3://MyBucket/Test.txt .

Accessing s3 bucket on AWS ParallelCluster

I have a requirement of accessing S3 bucket on the AWS ParallelCluster nodes. I did explore the s3_read_write_resource option in the ParallelCluster documentation. But, it is not clear as to how we can access the bucket. For example, will it be mounted on the nodes, or will the users be able to access it by default. I did test the latter by trying to access a bucket I declared using the s3_read_write_resource option in the config file, but was not able to access it (aws s3 ls s3://<name-of-the-bucket>).
I did go through this github issue talking about mounting S3 bucket using s3fs. In my experience it is very slow to access the objects using s3fs.
So, my question is,
How can we access the S3 bucket when using s3_read_write_resource option in AWS ParallelCluster config file
These parameters are used in ParallelCluster to include S3 permissions on the instance role that is created for cluster instances. They're mapped into Cloudformation template parameters S3ReadResource and S3ReadWriteResource . And later used in the Cloudformation template. For example, here and here. There's no special way for accessing S3 objects.
To access S3 on one cluster instance, we need to use the aws cli or any SDK . Credentials will be automatically obtained from the instance role using instance metadata service.
Please note that ParallelCluster doesn't grant permissions to list S3 objects.
Retrieving existing objects from S3 bucket defined in s3_read_resource, as well as retrieving and writing objects to S3 bucket defined in s3_read_write_resource should work.
However, "aws s3 ls" or "aws s3 ls s3://name-of-the-bucket" need additional permissions. See https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/.
I wouldn't use s3fs, as it's not AWS supported, it's been reported to be slow (as you've already noticed), and other reasons.
You might want to check the FSx section. It can create an attach an FSx for Lustre filesystem. It can import/export files to/from S3 natively. We just need to set import_path and export_path on this section.

Is my s3 bucket set to the correct region?

When I go to console in AWS by clicking the yellow cube in the top corner it directs me to the following url:
https://ap-southeast-1.console.aws.amazon.com/console/home?region=ap-southeast-1
This is correct, cause my app is used primarily in Southeast Asia.
Now when I go to my S3 bucket, right click and select properties, I see:
Bucket: examplebucket
Region: US Standard
I believe that when I first created my AWS account I had set it to us-west-2 and then later changed it to ap-southeast-1. Is there something I need to do is change the region of the s3 bucket from 'US Standard'?
In the navbar, under global it says "S3 does not require region selection." which is confusing to me.
The bucket is being used for photo storage. The majority of my web users are in Southeast Asia.
It would certainly make sense to locate the bucket closest to the majority of your users. Also, consider using Amazon CloudFront to cache objects, providing even faster data access to your users.
Each Amazon S3 bucket resides in a single region. Any data placed into that bucket stays within that region. It is also possible to configure cross-region replication of buckets, which will copy objects from one bucket to a different bucket in a different region.
The Amazon S3 management console displays all buckets in all regions (hence the message that "S3 does not require region selection"). Clicking on a bucket will display the bucket properties, which will show the region in which the bucket resides.
It is not possible to 'change' the region of a bucket. Instead, you should create a new bucket in the desired region and copy the objects to the new bucket. The easiest way to copy the files is via the AWS Command-Line Interface (CLI), with a command like:
aws s3 cp s3://source-bucket s3://destination-bucket --recursive
If you have many files, it might be safer to use the sync option, which can be run multiple times (in case of errors/failures):
aws s3 sync s3://source-bucket s3://destination-bucket
Please note that if you wish to retain the name of the bucket, you would need to copy to a temporary bucket, delete the original bucket, wait for the bucket name to become available again (10 minutes?), create the bucket in the desired region, then copy the objects to the new bucket.

AWS CLI syncing S3 buckets with multiple credentials

I have read-only access to a source S3 bucket. I cannot change permissions or anything of the sort on this source account and bucket. I do not own this account.
I would like to sync all files from the source bucket to my destination bucket. I own the account that contains the destination bucket.
I have a separate sets of credentials for the source bucket that I do not own and the destination bucket that I do own.
Is there a way to use the AWS CLI to sync between buckets using two sets of credentials?
aws s3 sync s3://source-bucket/ --profile source-profile s3://destination-bucket --profile default
If not, how can I setup permissions on my owned destination bucket to that I can sync with the CLI?
The built-in S3 copy mechanism, at the API level, requires the request be submitted to the target bucket, identifying the source bucket and object inside the request, and using a single set of credentials that has both authorization to read from the source and write to the target.
This is the only supported way to copy from one bucket to another without downloading and uploading the files.
The standard solution is found at http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html.
You can grant their user access to write your bucket or they can grant your user access to their bucket... but copying from one bucket to another without downloading and re-uploading the files is impossible without the complicity of both account owners to establish a single set of credentials with both privileges.
Use rclone for this. It's convenient but it does download and upload the files I believe which makes it slow for large data volumes.
rclone --config=creds.cfg copy source:bucket-name1/path/ destination:bucket-name2/path/
creds.cfg:
[source]
type = s3
provider = AWS
access_key_id = AAA
secret_access_key = bbb
[target]
type = s3
provider = AWS
access_key_id = CCC
secret_access_key = ddd
For this use case, I would consider Cross-Region Replication Where Source and Destination Buckets Are Owned by Different AWS Accounts
... you set up cross-region replication on the source
bucket owned by one account to replicate objects in a destination
bucket owned by another account.
The process is the same as setting up cross-region replication when
both buckets are owned by the same account, except that you do one
extra step—the destination bucket owner must create a bucket policy
granting the source bucket owner permission for replication actions.