Boto3 Amazon s3 copy object between buckets and capture response of destinatio - amazon-web-services

I am copying objects between two s3 buckets. As a part copy process i am renaming the file, is there a way to capture the Object "key" response from destination after a copy succeeds
Reference:how to copy s3 object from one bucket to another using python boto3
s3_resource.meta.client.copy(copy_source, destination_bucket, modified_filename)

The only way that I know of doing this is to make a call to list the objects in the target bucket and make sure that your file modified_filename is in the keys. Something like this should work (assuming you have only one profile in your ~/.aws/config or ~/.aws/credentials file:
s3_client = boto3.client('s3')
for obj in s3_client.list_objects(Bucket=destination_bucket)['Contents']:
if modified_filename in obj['Key']:
successful_copy = True

Related

AWS S3 copy files from one bucket to other using boto3

I am planning to use AWS Python SDK (Boto3) to copy files from one bucket to other. Below is the same code I got from AWS documentation :
dest_object.copy_from(CopySource={
'Bucket': self.object.bucket_name,
'Key': self.object.key
})
My question is how do I trigger this code and where should I deploy this code?
I originally thought of Lambda function but I am looking for alternate options in case Lambda times out for larger files ( 1 TB etc.).
Can I use Airflow to trigger this code somehow? may be invoke through Lambda ? Looking for suggestions from AWS experts.
The easiest way to copy new files to another bucket is to use Amazon S3 Replication. It will automatically copy new objects to the selected bucket, no code required.
However, this will not meet your requirement of deleting the incoming file after it is copied. Therefore, you should create an AWS Lambda function and add a S3 trigger. This will trigger the Lambda function whenever an object is created in the bucket.
The Lambda function should:
Extract the bucket and object name from the event parameter
Copy the object to the target bucket
Delete the original object
The code would look something like:
import boto3
import urllib
TARGET_BUCKET = 'target_bucket' # Change this
def lambda_handler(event, context):
s3_resource = boto3.resource('s3')
# Loop through each incoming object
for record in event['Records']:
# Get incoming bucket and key
source_bucket = record['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(record['s3']['object']['key'])
# Copy object to different bucket
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete original object
s3_resource.Bucket(source_bucket).Object(source_key).delete()
The copy process is unlikely to approach the 15-minute limit of AWS Lambda, but it is worth testing on large objects.

automatically move object from one s3 bucket to another

I want to automatically move objects from first s3 bucket to second bucket. As and when a file is created or uploaded to first bucket, that should be moved across to the second bucket. There shouldn't be any copy of the file on the source bucket after the transfer.
I have seen examples of aws s3 sync but that leaves a copy on the source bucket and it's not automated.
aws mv command from cli will move the files across but how to automate the process. Creating a lambda notification and send the files to second bucket could solve but I am looking for a more automated simpler solution. Not sure if there is anything we could do with SQS? Is there anything we can set on the source bucket which would automatically send the object to the second? Appreciate any ideas
There is no "move" command in Amazon S3. Instead, it involves CopyObject() and DeleteObject(). Even the AWS CLI aws mv command does a Copy and Delete.
The easiest way to achieve your object is:
Configure an Event on the source bucket to trigger an AWS Lambda function
Code the Lambda function to copy the object to the target bucket, then delete the source object
Here's some sample code:
import boto3
import urllib
TARGET_BUCKET = 'my-target-bucket'
def lambda_handler(event, context):
# Get incoming bucket and key
source_bucket = event['Records'][0]['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Copy object to different bucket
s3_resource = boto3.resource('s3')
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete the source object
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).delete()
It will copy the object to the same path in the destination bucket and then delete the source object.
The Lambda function should be assigned an IAM Role that has sufficient permission to GetObject from the source bucket and CopyObject + PutObject in the destination bucket.

Boto3 S3 cp recursive from one region to another

I am looking for a way to copy from an S3 bucket in one region to another S3 bucket in a different region via a python script.
I am able to do it using AWS CLI using:
aws s3 cp source-bucket target-bucket --recursive --source-region region1 --region region2
However, I want to see if similar is possible within python script using boto3.
Whatever I have researched seems to work for same region using boto3.resource and resource.meta.cleint.copy
Amazon S3 can only 'copy' one object at a time.
When you use that AWS CLI command, it first obtains a list of objects in the source bucket and then calls copy_object() once for each object to copy (it uses multithreading to do multiple copies simultaneously).
You can write your own python script to do the same thing. (In fact, the AWS CLI is a Python program!) Your code would also need to call list_objects_v2() and then call copy_object() for each object to copy.
Given that the buckets are in different regions, you would send the commands to destination region, referencing the source region, for example:
s3_client = boto3.client('s3', region_name='destination-region-id)
response = s3_client.copy_object(Bucket=..., Key=..., CopySource={'Bucket': 'source-bucket', 'Key': 'source-key'})

How to copy s3 bucket from a different account which requires different Access Key ID and Secret Key?

I need to copy some S3 objects from a client. The client sent us the key and secret and I can list the object using the following command.
AWS_ACCESS_KEY_ID=.... AWS_SECRET_ACCESS_KEY=.... aws s3 ls s3://bucket/company4/
I will need to copy/sync s3://bucket/company4/ (very large) of our client's S3. In this question Copy content from one S3 bucket to another S3 bucket with different keys, it mentioned that it can be done by creating a bucket policy on the destination bucket. However, we probably don't have permission to create the bucket policy because we have limited AWS permissions in our company.
I know we can finish the job by copying the external files to local file system first and then upload them to our S3 bucket. Is there a more efficient way to do the work?

Amazon S3 / Merge between two buckets

I have two buckets a and b. Bucket b contains 80% of the objects in a.
I want to copy the remain 20% objects which in a into b, without downloading the objects to local storage.
I saw the AWS Command Line Interface, but as I understant, it copy all the objects from a to b, but as I said - I want that it will copy only the files which exist in a but doesn't exists in b.
Install aws cli and configure it with access credentials
Make sure both buckets have the same directory structure
AWS S3 docs
The following sync command syncs objects under a specified prefix and
bucket to objects under another specified prefix and bucket by copying
s3 objects. A s3 object will require copying if the sizes of the two
s3 objects differ, the last modified time of the source is newer than
the last modified time of the destination, or the s3 object does not
exist under the specified bucket and prefix destination. In this
example, the user syncs the bucket mybucket2 to the bucket mybucket.
The bucket mybucket contains the objects test.txt and test2.txt. The
bucket mybucket2 contains no objects:
aws s3 sync s3://mybucket s3://mybucket2
You can use the AWS SDK and write a php or other supported language script that will make a list of filenames from both buckets, use array_diff to find out the files that are not common, then copy the files from Bucket A to memory then place the file in Bucket B.
This is a good place to start: https://aws.amazon.com/sdk-for-php/
More in depth on creating arrays of filenames (keys): [http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingPHP.html][2]
Some code for retrieving keys
$objects = $s3->getIterator('ListObjects', array('Bucket' => $bucket));
foreach ($objects as $object) {
echo $object['Key'] . "\n";
}
Here describes how to move keys from bucket to bucket
// Instantiate the client.
$s3 = S3Client::factory();
// Copy an object.
$s3->copyObject(array(
'Bucket' => $targetBucket,
'Key' => $targetKeyname,
'CopySource' => "{$sourceBucket}/{$sourceKeyname}",
));
You are going to want to pull the keys from both buckets, and do an array_diff to get a resulting set of keys that you can then loop through and transfer. Hope this helps.