how to delete a file in s3 via lambda? - amazon-web-services

I am able to copy a file from one bucket to another, but not sure if i'm doing this wrong but i can't delete the file . any thoughts?
import boto3
import os
from requests_aws4auth import AWS4Auth
session = boto3.Session()
credentials = session.get_credentials()
aws4auth = AWS4Auth(credentials.access_key,credentials.secret_key,region, service, session_token=credentials.token)
s3 = boto3.resource('s3')
name = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3.meta.client.copy({'Bucket': name, 'key': key}, targetBucket, key)
s3.meta.client.delete({{'Bucket': name, 'key': key}})

Since you are creating s3 = boto3.resource('s3'), you may use it to delete the object.
For this you would create Object and then used its delete method. For example:
s3 = boto3.resource('s3')
object_to_be_deleted = s3.Object(name, key)
object_to_be_deleted.delete()
Also since you are using lambda, make sure that your function's execution role has permissions to delete the object or there are no bucket policies which prohibit such an action.

I would suggest you to use boto3 client() rather than resource(). Anyways, here is what I tried and worked for me:
To copy file
import boto3
client = boto3.client('s3')
copy_source = {'Bucket': 'from-bucket-s3', 'Key': 'cfn.json'}
client.copy(copy_source, 'to-bucket-s3', 'other-cfn.json')
To delete file
import boto3
client = boto3.client('s3')
client.delete_object(Bucket='to-bucket-s3', Key='other-cfn.json')
boto3 client() supports vast number of APIs than resource()

Related

Copying files between S3 buckets -- only 2 files copy

I am trying to copy multiple files from one s3 bucket to another s3 bucket using lambda function but it is just copying 2 files in destination s3 bucket.
Here is my code:
# using python and boto3
import json
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
source_bucket_name = event['Records'][0]['s3']['bucket']['name']
file_name = event['Records'][0]['s3']['object']['key']
destination_bucket_name = 'nishantnkd'
copy_object = {'Bucket': source_bucket_name, 'Key': file_name}
s3_client.copy_object(CopySource=copy_object,
Bucket=destination_bucket_name, Key=file_name)
return {'statusCode': 3000,
'body': json.dumps('File has been Successfully Copied')}
I presume that the Amazon S3 bucket is configured to trigger the AWS Lambda function when a new object is created.
When the Lambda function is triggered, it is possible that multiple event records are sent to the function. Therefore, it should loop through the event records like this:
# using python and boto3
import json
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
for record in event['Records']: # This loop added
source_bucket_name = record['s3']['bucket']['name']
file_name = urllib.parse.unquote_plus(record['s3']['object']['key']) # Note this change too
destination_bucket_name = 'nishantnkd'
copy_object = {'Bucket': source_bucket_name, 'Key': file_name}
s3_client.copy_object(CopySource=copy_object, Bucket=destination_bucket_name, Key=file_name)
return {'statusCode': 3000,
'body': json.dumps('File has been Successfully Copied')}

Copying S3 objects from one account to other using Lambda python

I'm using boto3 to copy files from s3 bucket from one account to other. I need a similar functionality like aws s3 sync. Please see my code. My company has decided to 'PULL' from other S3 bucket (source account). Please don't suggest replication, S3 batch, S3 trigger Lambda..etc. We have gone through all these options and my management do not want to do any configuration at source side. Can you please review this code and let me know if this code works for thousands of objects. Source bucket has nearly 10000 objects. We will create this lambda function in destination account and create a cloudwatch event to trigger the lambda once in a day.
I am checking ETag so that modified files will be copied across when this function is triggered.
Edit: I simplified my code just to see pagination works. It's working if I don't add client.copy(). If I add this line in for loop after reading 3,4 objects it's throwing "errorMessage": "2021-08-07T15:29:07.827Z 82757747-7b72-4f29-ae9f-22e95f969d6c Task timed out after 3.00 seconds". Please advise. Please note that 'test/' folder in my source bucket has around 1100 objects.
import os
import logging
import botocore
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
DEST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')
prefix = 'test/'
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page in page_iterator:
for obj in page['Contents']:
index += 1
print ("I am looking for {} in the source bucket".format(obj['ETag']))
copy_source = {'Bucket': SOURCE_BUCKET, 'Key': obj['Key']}
client.copy(copy_source, DEST_BUCKET, obj['Key'])
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise
This version is working fine if I increase the Lambda timeout to 15 min and memory to 512MB. This checks if the source object already exists in destination before copying.
import boto3
import os
import logging
import botocore
from botocore.client import Config
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
config = Config(connect_timeout=5, retries={'max_attempts': 0})
client = boto3.client('s3', config=config)
#client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
DEST_BUCKET = os.environ.get('DST_BUCKET')
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
REGION = os.environ.get('REGION')
prefix = ''
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator_src = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
page_iterator_dest = paginator.paginate(Bucket=DEST_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page_source in page_iterator_src:
for obj_src in page_source['Contents']:
flag = "FALSE"
for page_dest in page_iterator_dest:
for obj_dest in page_dest['Contents']:
# checks if source ETag already exists in destination
if obj_src['ETag'] in obj_dest['ETag']:
flag = "TRUE"
break
if flag == "TRUE":
break
if flag != "TRUE":
index += 1
client.copy_object(Bucket=DEST_BUCKET, CopySource={'Bucket': SOURCE_BUCKET, 'Key': obj_src['Key']}, Key=obj_src['Key'],)
print ("source ETag {} and destination ETag {}".format(obj_src['ETag'],obj_dest['ETag']))
print ("source Key {} and destination Key {}".format(obj_src['Key'],obj_dest['Key']))
print ("Number of objects copied{}".format(index))
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise

Dump lambda output to csv and have it email as an attachment

I have a lambda function that generates a list of untagged buckets in AWS environment. Currently I send the output to a slack channel directly. Instead I would like to have my lambda dump the output to a csv file and send it as a report. Here is the code for it, let me know if you need any other details.
import boto3
from botocore.exceptions import ClientError
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
#Printing the S3 buckets with no tags
s3 = boto3.client('s3')
s3_re = boto3.resource('s3')
buckets = []
print('Printing buckets with no tags..')
for bucket in s3_re.buckets.all():
s3_bucket = bucket
s3_bucket_name = s3_bucket.name
try:
response = s3.get_bucket_tagging(Bucket=s3_bucket_name)
except ClientError:
buckets.append(bucket)
print(bucket)
for bucket in buckets:
data = {"text": "%s bucket has no tags" % (bucket)}
r = http.request("POST", "https://hooks.slack.com/services/~/~/~",
body = json.dumps(data),
headers = {"Content-Type": "application/json"})

How to use boto3 to write to S3 standard infrequent access?

I searched in the boto3 doc but didn't find relevant information there. In this link, it is mentioned that it can be done using
k.storage_class='STANDARD_IA'
Can someone share a full code snippet here? Many thanks.
New file
import boto3
client = boto3.client('s3')
client.upload_file(
Filename = '/tmp/foo.txt',
Bucket = 'my-bucket',
Key = 'foo.txt',
ExtraArgs = {
'StorageClass': 'STANDARD_IA'
}
)
Existing file
From How to change storage class of existing key via boto3:
import boto3
s3 = boto3.client('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(
CopySource = copy_source,
Bucket = 'target-bucket',
Key = 'target-key',
ExtraArgs = {
'StorageClass': 'STANDARD_IA',
'MetadataDirective': 'COPY'
}
)
From the boto3 Storing Data example, it looks like the standard way to put objects in boto3 is
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
But to set the storage class, S3.Object.Put suggests we'd want to use parameter:
StorageClass='STANDARD_IA'
So combining the two, we have:
import boto3
s3 = boto3.resource('s3')
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'), StorageClass='STANDARD_IA')
Hope that helps

Retrieve S3 file as Object instead of downloading to absolute system path

I just started learning and using S3, read the docs. Actually I didn't find anything to fetch the file into an object instead of downloading it from S3? if this could be possible, or I am missing something?
Actually I want to avoid additional IO after downloading the file.
You might be looking for the get_object() method of the boto3 S3 client:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object
This will get you a response object dictionary with member Body that is a StreamingBody object, which you can use as normal file and call .read() method on it. To get the entire content of the S3 object into memory you would do something like this:
s3_client = boto3.client('s3')
s3_response_object = s3_client.get_object(Bucket=BUCKET_NAME_STRING, Key=FILE_NAME_STRING)
object_content = s3_response_object['Body'].read()
I prefer this approach, equivalent to a previous answer:
import boto3
s3 = boto3.resource('s3')
def read_s3_contents(bucket_name, key):
response = s3.Object(bucket_name, key).get()
return response['Body'].read()
But another approach could read the object into StringIO:
import StringIO
import boto3
s3 = boto3.resource('s3')
def read_s3_contents_with_download(bucket_name, key):
string_io = StringIO.StringIO()
s3.Object(bucket_name, key).download_fileobj(string_io)
return string_io.getvalue()
You could use StringIO and get file content from S3 using get_contents_as_string, like this:
import pandas as pd
from io import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('YOUR_BUCKET')
fileName = "test.csv"
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))