AWS S3 copy files from one bucket to other using boto3 - amazon-web-services

I am planning to use AWS Python SDK (Boto3) to copy files from one bucket to other. Below is the same code I got from AWS documentation :
dest_object.copy_from(CopySource={
'Bucket': self.object.bucket_name,
'Key': self.object.key
})
My question is how do I trigger this code and where should I deploy this code?
I originally thought of Lambda function but I am looking for alternate options in case Lambda times out for larger files ( 1 TB etc.).
Can I use Airflow to trigger this code somehow? may be invoke through Lambda ? Looking for suggestions from AWS experts.

The easiest way to copy new files to another bucket is to use Amazon S3 Replication. It will automatically copy new objects to the selected bucket, no code required.
However, this will not meet your requirement of deleting the incoming file after it is copied. Therefore, you should create an AWS Lambda function and add a S3 trigger. This will trigger the Lambda function whenever an object is created in the bucket.
The Lambda function should:
Extract the bucket and object name from the event parameter
Copy the object to the target bucket
Delete the original object
The code would look something like:
import boto3
import urllib
TARGET_BUCKET = 'target_bucket' # Change this
def lambda_handler(event, context):
s3_resource = boto3.resource('s3')
# Loop through each incoming object
for record in event['Records']:
# Get incoming bucket and key
source_bucket = record['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(record['s3']['object']['key'])
# Copy object to different bucket
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete original object
s3_resource.Bucket(source_bucket).Object(source_key).delete()
The copy process is unlikely to approach the 15-minute limit of AWS Lambda, but it is worth testing on large objects.

Related

automatically move object from one s3 bucket to another

I want to automatically move objects from first s3 bucket to second bucket. As and when a file is created or uploaded to first bucket, that should be moved across to the second bucket. There shouldn't be any copy of the file on the source bucket after the transfer.
I have seen examples of aws s3 sync but that leaves a copy on the source bucket and it's not automated.
aws mv command from cli will move the files across but how to automate the process. Creating a lambda notification and send the files to second bucket could solve but I am looking for a more automated simpler solution. Not sure if there is anything we could do with SQS? Is there anything we can set on the source bucket which would automatically send the object to the second? Appreciate any ideas
There is no "move" command in Amazon S3. Instead, it involves CopyObject() and DeleteObject(). Even the AWS CLI aws mv command does a Copy and Delete.
The easiest way to achieve your object is:
Configure an Event on the source bucket to trigger an AWS Lambda function
Code the Lambda function to copy the object to the target bucket, then delete the source object
Here's some sample code:
import boto3
import urllib
TARGET_BUCKET = 'my-target-bucket'
def lambda_handler(event, context):
# Get incoming bucket and key
source_bucket = event['Records'][0]['s3']['bucket']['name']
source_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
# Copy object to different bucket
s3_resource = boto3.resource('s3')
copy_source = {
'Bucket': source_bucket,
'Key': source_key
}
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).copy(copy_source)
# Delete the source object
s3_resource.Bucket(TARGET_BUCKET).Object(source_key).delete()
It will copy the object to the same path in the destination bucket and then delete the source object.
The Lambda function should be assigned an IAM Role that has sufficient permission to GetObject from the source bucket and CopyObject + PutObject in the destination bucket.

s3 presign url between s3 buckets using Lambda

hope all is well. I am trying to upload a file that is situated in my s3 bucket to another bucket. However, I want to use a lambda function to upload it to another bucket using s3 presign URL as I want it to have an expiration feature in the new bucket. I passed the object file URL as key when uploading to destination bucket but does not seem to work. Some guidance would be appreciated.
import json
import time
import boto3
s3= boto3.client('s3')
time.sleep(10)
bucket_name_file='mybucketname'
#fetch last modified item from bucket
response = s3.list_objects_v2(Bucket=bucket_name2)
all = response2['Contents']
latest = max(all, key=lambda x: x['LastModified'])
my_file_name=latest['Key']
url_of_my_filename='https://'+bucket_name_file+'.s3.amazonaws.com/'+my_file_name
###################################################
destination_bucket_to_send='my_destination_bucket'
url=s3_client.generate_presigned_url('put_object',
Params={'Bucket': destination_bucket,
'Key':url_of_my_filename,
},
ExpiresIn=20000)
It would appear that your goal is to use Amazon Translate to translate some text via a Transcription Job. You then want to offer the resulting translation via a temporary URL.
To accomplish this, you can create an Amazon S3 pre-signed URL on the object that was created by the transcription job. This URL can then be used from the Internet to obtain the translation. Once the expiry period has passed, the URL will no longer provide access to the object.

Are incoming files via AWS Transfer Family into S3 taggable?

At the moment I am facing a problem, that I can't determine if a file was PUT via the AWS Transfer Family or via the S3 GUI.
Is there any change to default tag files which are PUT on S3 via AWS Transfer Family?
Regards
Ribase
There is S3 object metadata described in the Transfer Family user guide for post upload processing, which indicates Transfer Family uploaded this.
One use case and application of using the metadata is when an SFTP user has an inbox and an outbox. For the inbox, objects are put by an SFTP client. For the outbox, objects are put by the post upload processing pipeline. If there is an S3 event notification, the downstream service on the processor side can do an S3 HeadObject call for the metadata, dismiss if it does not have the metadata, and only process incoming files.
You could also use Transfer Family managed workflows to apply a Tag step. An example of application of using the Tag step can be found in demo 1 of the AWS Transfer Family managed workflows demo video.
Configure the S3 bucket where Transfer Family is writing the files to trigger a Lambda using an Event Notification.
Use this Boto3 code in the Lambda. It will tag the file with the principal that placed the file in S3. If it is the Transfer Familiy then it is the role that was assigned to Transfer Family to write the files to the bucket. If it is a user uploading the files via the Console then it will be that users role.
import boto3
import json
import urllib.parse
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
principal = event['Records'][0]['userIdentity']['principalId']
try:
s3 = boto3.client('s3')
response = s3.put_object_tagging(
Bucket = bucket,
Key = key,
Tagging={
'TagSet': [
{
'Key': 'Principal',
'Value': str(principal)
},
]
}
)
except Exception as e:
print('Error {}.'.format(e))

How to stop ec2 instance using lambda if there is no objects in the folder/ of s3 bucket?

basically i tried lambda function and s3 "All object delete event" trigger but it's stopping the instance even if i delete one file from the given folder of bucket. whenever a object gets deleted from the s3 bucket dump/ directory the following function stop the instance, but what i want is it should only stop if there is no file left in the dump/ directory.
import boto3
region = 'us-west-1'
instances = ['i-12345cb6de4f78g9h']
ec2 = boto3.client('ec2', region_name=region)
def lambda_handler(event, context):
ec2.stop_instances(InstanceIds=instances)
print('stopped your instances: ' + str(instances))
There is no trigger from S3 to Lambda that is "empty bucket". The S3 triggers are:
New object created events
Object removal events
Restore object events
Reduced Redundancy Storage
Replication events
The closest is s3:ObjectRemoved:Delete or s3:ObjectRemoved:DeleteMarkerCreated. Your code will have to do the work to see if the bucket is now empty when you get the Delete trigger. If the bucket is empty then stop the EC2.
It appears that you have configured Amazon S3 to trigger your AWS Lambda function for All object delete events. This means that the Lambda function will be triggered whenever any object is deleted in the bucket (for a given Prefix, if defined).
Since you only wish to Stop the EC2 instance when the bucket is empty, you will need to add code to your Lambda function that determines whether the bucket is empty. If it is empty, it should then Stop the instance.
The function can do this by listing the contents of the bucket. If no objects are returned in the listing, then it will know that the bucket is empty.
Alternative method
I am assuming that you have code running on the EC2 instance that processes objects from the Amazon S3 bucket and then deletes the object once it is processed. Therefore, after deleting the object, your code on the EC2 instance could just as easily check the contents of the S3 bucket and, if there are no more objects, it could stop the instance itself (without having to use an AWS Lambda function). This can be done by either calling StopInstances(), or by simply telling the Operating System to Shutdown (eg sudo shutdown now -h). This will then stop the instance. (Make sure you include -h to halt the machine.)
Or, you could totally change your architecture to feed information to the EC2 instance via an Amazon SQS queue rather than using files in an Amazon S3 bucket.
See: Auto-Stop EC2 instances when they finish a task - DEV Community
Why not checking the content of the bucket, if there is no object then stop the instance. May use code like below
import boto3
region = 'us-west-1'
instances = ['i-12345cb6de4f78g9h']
ec2 = boto3.client('ec2', region_name=region)
s3 = boto3.resource('s3')
def lambda_handler(event, context):
# get the bucket and a list of all objects
bucket = s3.Bucket('bucket_name')
count_obj = 0
for i in bucket.objects.all():
count_obj = count_obj + 1
# check bucket object length, stop instance if there is no object
if count_obj == 0:
ec2.stop_instances(InstanceIds=instances)
print('stopped your instances: ' + str(instances))
This is not something that can be easily triggered using an event notification. You can run a cloud watch event and schedule your Lambda function to run after every hour for example. Within the Lambda, you can check whether your S3 bucket is empty or not. If it's empty, you can turn off your Ec2 instance.
This medium guide is doing something similar: https://medium.com/geekculture/terraform-setup-for-automatically-turning-off-ec2-instances-upon-inactivity-d7f414390800

Boto3 Amazon s3 copy object between buckets and capture response of destinatio

I am copying objects between two s3 buckets. As a part copy process i am renaming the file, is there a way to capture the Object "key" response from destination after a copy succeeds
Reference:how to copy s3 object from one bucket to another using python boto3
s3_resource.meta.client.copy(copy_source, destination_bucket, modified_filename)
The only way that I know of doing this is to make a call to list the objects in the target bucket and make sure that your file modified_filename is in the keys. Something like this should work (assuming you have only one profile in your ~/.aws/config or ~/.aws/credentials file:
s3_client = boto3.client('s3')
for obj in s3_client.list_objects(Bucket=destination_bucket)['Contents']:
if modified_filename in obj['Key']:
successful_copy = True