Amazon S3 / Merge between two buckets - amazon-web-services

I have two buckets a and b. Bucket b contains 80% of the objects in a.
I want to copy the remain 20% objects which in a into b, without downloading the objects to local storage.
I saw the AWS Command Line Interface, but as I understant, it copy all the objects from a to b, but as I said - I want that it will copy only the files which exist in a but doesn't exists in b.

Install aws cli and configure it with access credentials
Make sure both buckets have the same directory structure
AWS S3 docs
The following sync command syncs objects under a specified prefix and
bucket to objects under another specified prefix and bucket by copying
s3 objects. A s3 object will require copying if the sizes of the two
s3 objects differ, the last modified time of the source is newer than
the last modified time of the destination, or the s3 object does not
exist under the specified bucket and prefix destination. In this
example, the user syncs the bucket mybucket2 to the bucket mybucket.
The bucket mybucket contains the objects test.txt and test2.txt. The
bucket mybucket2 contains no objects:
aws s3 sync s3://mybucket s3://mybucket2

You can use the AWS SDK and write a php or other supported language script that will make a list of filenames from both buckets, use array_diff to find out the files that are not common, then copy the files from Bucket A to memory then place the file in Bucket B.
This is a good place to start: https://aws.amazon.com/sdk-for-php/
More in depth on creating arrays of filenames (keys): [http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingPHP.html][2]
Some code for retrieving keys
$objects = $s3->getIterator('ListObjects', array('Bucket' => $bucket));
foreach ($objects as $object) {
echo $object['Key'] . "\n";
}
Here describes how to move keys from bucket to bucket
// Instantiate the client.
$s3 = S3Client::factory();
// Copy an object.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
You are going to want to pull the keys from both buckets, and do an array_diff to get a resulting set of keys that you can then loop through and transfer. Hope this helps.

Related

How to copy s3 bucket from a different account which requires different Access Key ID and Secret Key?

I need to copy some S3 objects from a client. The client sent us the key and secret and I can list the object using the following command.
AWS_ACCESS_KEY_ID=.... AWS_SECRET_ACCESS_KEY=.... aws s3 ls s3://bucket/company4/
I will need to copy/sync s3://bucket/company4/ (very large) of our client's S3. In this question Copy content from one S3 bucket to another S3 bucket with different keys, it mentioned that it can be done by creating a bucket policy on the destination bucket. However, we probably don't have permission to create the bucket policy because we have limited AWS permissions in our company.
I know we can finish the job by copying the external files to local file system first and then upload them to our S3 bucket. Is there a more efficient way to do the work?

Copy objects between buckets using aws sdk

Using AWS CLI; we can copy or sync files directly from one bucket to other. Using SDK; I can see api for download and upload. But can we directly copy files from one bucket to other bucket ( in different aws account) using sdk !
Yes. The CopyObject API call can copy an object between Amazon S3 buckets, including bucket in different regions and different accounts.
To copy objects between accounts, the one set of credentials requires sufficient permission to Read from the source bucket and Write to the destination bucket. You can either:
Use credentials from the destination account, and use a Bucket Policy on the source bucket to grant Read access, or
Use credentials from the source account, and use a Bucket policy on the destination bucket to grant Write access. Make sure you set ACL=public-read to pass ownership of the object to the destination Account.
Please note that it only copies one object at a time, so you would need to loop through a list of objects and call CopyObject for each one individually if you wish to copy multiple objects.
It's easy, see all the CLI commands with the help:
aws s3 --help
Upload a file:
aws s3 cp <path-to-file-from-local> s3://<S3_BUCKET_NAME>/<file-name>
Download a file:
aws s3 cp s3://<S3_BUCKET_NAME>/<file-name> <path-to-file-from-local>
Move a file:
aws s3 mv s3://<S3_BUCKET_NAME>/<file-name> s3://<S3_BUCKET_NAME>/<file-name>
You can use . to specify the current directory, eg:
aws s3 cp s3://MyBucket/Test.txt .

Get a file into an S3 bucket in every region

I have a .zip file (in an S3 bucket) that needs to end up in an S3 bucket in every region within a single account.
Each of those buckets have identical bucket policies that allow for my account to upload files to them, and they all follow the same naming convention, like this:
foobar-{region}
ex: foobar-us-west-2
Is there a way to do this without manually dragging the file in the console into every bucket, or using the aws s3api copy-object command 19 times? This may need to happen fairly frequently as the file is updated, so I'm looking for a more efficient way to do it.
One way I thought about doing it was by making a lambda that has an array of all 19 regions I need, then loop through them to create 19 region-specific bucket names, each of which will have the object copied into it.
Is there a better way?
Just simply putting it into bash function. By using aws cli and jq you can do the following;
aws s3api list-buckets | jq -rc .Buckets[].Name | while read i; do
echo "Bucket name: ${i}"
aws s3 cp your_file_name s3://${i}/
done
A few options:
An AWS Lambda function could be triggered upon upload. It could then confirm whether the object should be replicated (eg I presume you don't want to copy every file that is uploaded?), then copy them out. Note that it can take quite a while to copy to all regions.
Use Cross-Region Replication to copy the contents of a bucket (or sub-path) to other buckets. This would be done automatically upon upload.
Write a bash script or small Python program to run locally that will copy the file to each location. Note that it is more efficient to call copy_object() to copy the file from one S3 bucket to another rather than uploading 19 times. Just upload to the first bucket, then copy from there to the other locations.

Can a single aws cli command do 'aws s3 cp from a single origin file to multiple destination files'?

I need to clone a cross-bucket copied file as below:
# 1. copying file A -> file_B
aws s3 cp s3://bucket_a/file_A s3://bucket_b/file_B
# 2. cloning file_B -> file_C
aws s3 cp s3://bucket_b/file_B s3://bucket_b/file_C
Is there shorter/better way to do so?
EDIT:
bucket_a -> bucket_b is cross region (bucket_a and bucket_b are on the other side of earth)
file_B and file_C have the same name but with different prefix (so it's like bucket_b/prefix_a/file_B and bucket_b/prefix_b/file_B)
in summary, I want the file_A in a origin bucket_a to be copied in two places of the destination bucket_b, looking for a way to copy once instead of copy twice
The AWS Command-Line Interface (CLI) can copy multiple files, but each file is only copied once.
If your goal is to replicate the contents of a bucket to another bucket, you could use Cross-Region Replication (CRR) - Amazon Simple Storage Service but it only works between regions and it only copies objects that are stored after CRR is activated.
You can always write a script or program yourself using an AWS SDK to do whatever you wish.

How to copy S3 objects between regions with Amazon AWS PHP SDK?

I'm trying to copy Amazon AWS S3 objects between two buckets in two different regions with Amazon AWS PHP SDK v3. This would be a one-time process, so I don't need cross-region replication. Tried to use copyObject() but there is no way to specify the region.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
Source:
http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectUsingPHP.html
You don't need to specify regions for that operation. It'll find out the target bucket's region and copy it.
But you may be right, because on AWS CLI there is source region and target region attributes which do not exist on PHP SDK. So you can accomplish the task like this:
Create an interim bucket in the source region.
Create the bucket in the target region.
Configure replication from the interim bucket to target one.
On interim bucket set expiration rule, so files will be deleted after a short time automatically from the interim bucket.
Copy objects from source bucket to interim bucket using PHP SDK.
All your objects will also be copied to another region.
You can remove the interim bucket one day later.
Or use just cli and use this single command:
aws s3 cp s3://my-source-bucket-in-us-west-2/ s3://my-target-bucket-in-us-east-1/ --recursive --source-region us-west-2 --region us-east-1
Different region bucket could also be different account. What others had been doing was to copy off from one bucket and save the data temporary locally, then upload to different bucket with different credentials. (if you have two regional buckets with different credentials).
Newest update from CLI tool allows you to copy from bucket to bucket if it's under the same account. Using something like what Çağatay Gürtürk mentioned.