I want to automatically move objects from first s3 bucket to second bucket. As and when a file is created or uploaded to first bucket, that should be moved across to the second bucket. There shouldn't be any copy of the file on the source bucket after the transfer.
I have seen examples of aws s3 sync but that leaves a copy on the source bucket and it's not automated.
aws mv command from cli will move the files across but how to automate the process. Creating a lambda notification and send the files to second bucket could solve but I am looking for a more automated simpler solution. Not sure if there is anything we could do with SQS? Is there anything we can set on the source bucket which would automatically send the object to the second? Appreciate any ideas
There is no "move" command in Amazon S3. Instead, it involves CopyObject() and DeleteObject(). Even the AWS CLI aws mv command does a Copy and Delete.
The easiest way to achieve your object is:
Configure an Event on the source bucket to trigger an AWS Lambda function
Code the Lambda function to copy the object to the target bucket, then delete the source object
Here's some sample code:
import boto3
import urllib
TARGET_BUCKET = 'my-target-bucket'
def lambda_handler(event, context):
# Get incoming bucket and key
source_bucket = event['Records'][0]['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Copy object to different bucket
s3_resource = boto3.resource('s3')
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete the source object
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).delete()
It will copy the object to the same path in the destination bucket and then delete the source object.
The Lambda function should be assigned an IAM Role that has sufficient permission to GetObject from the source bucket and CopyObject + PutObject in the destination bucket.
Related
I am planning to use AWS Python SDK (Boto3) to copy files from one bucket to other. Below is the same code I got from AWS documentation :
dest_object.copy_from(CopySource={
'Bucket': self.object.bucket_name,
'Key': self.object.key
})
My question is how do I trigger this code and where should I deploy this code?
I originally thought of Lambda function but I am looking for alternate options in case Lambda times out for larger files ( 1 TB etc.).
Can I use Airflow to trigger this code somehow? may be invoke through Lambda ? Looking for suggestions from AWS experts.
The easiest way to copy new files to another bucket is to use Amazon S3 Replication. It will automatically copy new objects to the selected bucket, no code required.
However, this will not meet your requirement of deleting the incoming file after it is copied. Therefore, you should create an AWS Lambda function and add a S3 trigger. This will trigger the Lambda function whenever an object is created in the bucket.
The Lambda function should:
Extract the bucket and object name from the event parameter
Copy the object to the target bucket
Delete the original object
The code would look something like:
import boto3
import urllib
TARGET_BUCKET = 'target_bucket' # Change this
def lambda_handler(event, context):
s3_resource = boto3.resource('s3')
# Loop through each incoming object
for record in event['Records']:
# Get incoming bucket and key
source_bucket = record['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(record['s3']['object']['key'])
# Copy object to different bucket
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete original object
s3_resource.Bucket(source_bucket).Object(source_key).delete()
The copy process is unlikely to approach the 15-minute limit of AWS Lambda, but it is worth testing on large objects.
I'm using lambda to modify some csv files from s3 bucket and writing it to a different s3 bucket using AWS Javascript SDK. The buckets for getObject and putObject are in different regions. The lambda is in the same region as the destination buckets. But the modified files in the destination buckets have this error in them
AuthorizationHeaderMalformedThe authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-2'us-west-2.
Whenever the source and destination buckets are in same region, I get the proper modified files.
What changes do I need to make this work when source and destination bucket are in different regions.
Thanks
S3 service is global, but bucket itself in regional, which means when you neet to use a bucket you need to do it using the same region where the bucket exists.
If I understood correct your source bucket is in us-west-2 and your destination bucket is in us-east-1.
So you need to use like this:
s3_source = boto3.client('s3', region_name='us-west-2')
... your logic to get and handle the file ...
s3_destination = boto3.client('s3', region_name='us-east-1')
... your logic to write the file ...
I have 2 AWS accounts. account1 has 1 file in bucket1 in us-east-1 region. I am trying to copy file from account 1 to account2 in bucket2 under us-west-2 region. I have all the required IAM policies in place and same credentials work for both accounts. I am using python boto3 library.
cos = boto3.resource('s3', aws_access_key_id=COMMON_KEY_ID, aws_secret_access_key=COMMON_ACCESS_KEY, endpoint_url="https://s3.us-west-2.amazonaws.com")
copy_source = {
'Bucket': bucket1,
'Key': SOURCE_KEY
}
cos.meta.client.copy(copy_source, "bucket2", TARGET_KEY)
As seen the copy function is executed on client object pointing to target account2/us-west-2. How does it get the source files in account1/us-east1? Am I supposed to provide SourceClient as input to copy function?
The cleanest way to perform such a copy is:
Use credentials (IAM User or IAM Role) from Account-2 that have GetObject permission on Bucket-1 (or all buckets) and PutObject permissions on Bucket-2
Add a Bucket policy to Bucket-1 that allows the Account-2 credentials to GetObject from the bucket
Send the copy command to the destination region
This method is good because it only requires one set of credentials.
A few things to note:
If you instead copy files using credentials from the source account, be sure to set ACL=bucket-owner-full-control to handover ownership to the destination bucket.
The resource copy() method allows a SourceClient to be specified. I don't think this is available for the client copy() method.
I need to enable the bucket replication for my s3 bucket.
I referred below 2 links for this and written the code (see below). But replication not enabled. But weirdly, a file created in the source bucket.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_bucket_replication
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTreplication.html
Could anyone help me with this?
source bucket name = tst;
Destination bucket name = rst
import boto3
s3 = boto3.client('s3',endpoint_url=S3_URL,aws_access_key_id=ACCESS_ID,aws_secret_access_key=SECRET_KEY,region_name=REGION)
s3.put_bucket_replication(Bucket='tst',ReplicationConfiguration={'Role': 'arn:aws:iam::10000003:root','Rules': [{'Status': 'Enabled','Destination': {'Bucket': 'arn:aws:s3:::rst'}}]})
I am copying objects between two s3 buckets. As a part copy process i am renaming the file, is there a way to capture the Object "key" response from destination after a copy succeeds
Reference:how to copy s3 object from one bucket to another using python boto3
s3_resource.meta.client.copy(copy_source, destination_bucket, modified_filename)
The only way that I know of doing this is to make a call to list the objects in the target bucket and make sure that your file modified_filename is in the keys. Something like this should work (assuming you have only one profile in your ~/.aws/config or ~/.aws/credentials file:
s3_client = boto3.client('s3')
for obj in s3_client.list_objects(Bucket=destination_bucket)['Contents']:
if modified_filename in obj['Key']:
successful_copy = True