How to use boto3 to write to S3 standard infrequent access? - amazon-web-services

I searched in the boto3 doc but didn't find relevant information there. In this link, it is mentioned that it can be done using
k.storage_class='STANDARD_IA'
Can someone share a full code snippet here? Many thanks.

New file
import boto3
client = boto3.client('s3')
client.upload_file(
Filename = '/tmp/foo.txt',
Bucket = 'my-bucket',
Key = 'foo.txt',
ExtraArgs = {
'StorageClass': 'STANDARD_IA'
}
)
Existing file
From How to change storage class of existing key via boto3:
import boto3
s3 = boto3.client('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(
CopySource = copy_source,
Bucket = 'target-bucket',
Key = 'target-key',
ExtraArgs = {
'StorageClass': 'STANDARD_IA',
'MetadataDirective': 'COPY'
}
)

From the boto3 Storing Data example, it looks like the standard way to put objects in boto3 is
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
But to set the storage class, S3.Object.Put suggests we'd want to use parameter:
StorageClass='STANDARD_IA'
So combining the two, we have:
import boto3
s3 = boto3.resource('s3')
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'), StorageClass='STANDARD_IA')
Hope that helps

Related

Copying files between S3 buckets -- only 2 files copy

I am trying to copy multiple files from one s3 bucket to another s3 bucket using lambda function but it is just copying 2 files in destination s3 bucket.
Here is my code:
# using python and boto3
import json
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
source_bucket_name = event['Records'][0]['s3']['bucket']['name']
file_name = event['Records'][0]['s3']['object']['key']
destination_bucket_name = 'nishantnkd'
copy_object = {'Bucket': source_bucket_name, 'Key': file_name}
s3_client.copy_object(CopySource=copy_object,
Bucket=destination_bucket_name, Key=file_name)
return {'statusCode': 3000,
'body': json.dumps('File has been Successfully Copied')}
I presume that the Amazon S3 bucket is configured to trigger the AWS Lambda function when a new object is created.
When the Lambda function is triggered, it is possible that multiple event records are sent to the function. Therefore, it should loop through the event records like this:
# using python and boto3
import json
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
for record in event['Records']: # This loop added
source_bucket_name = record['s3']['bucket']['name']
file_name = urllib.parse.unquote_plus(record['s3']['object']['key']) # Note this change too
destination_bucket_name = 'nishantnkd'
copy_object = {'Bucket': source_bucket_name, 'Key': file_name}
s3_client.copy_object(CopySource=copy_object, Bucket=destination_bucket_name, Key=file_name)
return {'statusCode': 3000,
'body': json.dumps('File has been Successfully Copied')}

how to delete a file in s3 via lambda?

I am able to copy a file from one bucket to another, but not sure if i'm doing this wrong but i can't delete the file . any thoughts?
import boto3
import os
from requests_aws4auth import AWS4Auth
session = boto3.Session()
credentials = session.get_credentials()
aws4auth = AWS4Auth(credentials.access_key,credentials.secret_key,region, service, session_token=credentials.token)
s3 = boto3.resource('s3')
name = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3.meta.client.copy({'Bucket': name, 'key': key}, targetBucket, key)
s3.meta.client.delete({{'Bucket': name, 'key': key}})
Since you are creating s3 = boto3.resource('s3'), you may use it to delete the object.
For this you would create Object and then used its delete method. For example:
s3 = boto3.resource('s3')
object_to_be_deleted = s3.Object(name, key)
object_to_be_deleted.delete()
Also since you are using lambda, make sure that your function's execution role has permissions to delete the object or there are no bucket policies which prohibit such an action.
I would suggest you to use boto3 client() rather than resource(). Anyways, here is what I tried and worked for me:
To copy file
import boto3
client = boto3.client('s3')
copy_source = {'Bucket': 'from-bucket-s3', 'Key': 'cfn.json'}
client.copy(copy_source, 'to-bucket-s3', 'other-cfn.json')
To delete file
import boto3
client = boto3.client('s3')
client.delete_object(Bucket='to-bucket-s3', Key='other-cfn.json')
boto3 client() supports vast number of APIs than resource()

Dump lambda output to csv and have it email as an attachment

I have a lambda function that generates a list of untagged buckets in AWS environment. Currently I send the output to a slack channel directly. Instead I would like to have my lambda dump the output to a csv file and send it as a report. Here is the code for it, let me know if you need any other details.
import boto3
from botocore.exceptions import ClientError
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
#Printing the S3 buckets with no tags
s3 = boto3.client('s3')
s3_re = boto3.resource('s3')
buckets = []
print('Printing buckets with no tags..')
for bucket in s3_re.buckets.all():
s3_bucket = bucket
s3_bucket_name = s3_bucket.name
try:
response = s3.get_bucket_tagging(Bucket=s3_bucket_name)
except ClientError:
buckets.append(bucket)
print(bucket)
for bucket in buckets:
data = {"text": "%s bucket has no tags" % (bucket)}
r = http.request("POST", "https://hooks.slack.com/services/~/~/~",
body = json.dumps(data),
headers = {"Content-Type": "application/json"})

How could I list all objects under a prefix in S3

There are more than 3k objects under the prefix. I use the following code to list all objects to get their names, but the API only retrieve 1000 objects.
s3_client = boto3.client('s3')
response = s3_client.list_objects(
Bucket = "my-bucket",
Prefix = "my-prefix",
MaxKeys=50000
)
s3 = boto3.resource('s3')
bucket = s3.Bucket(S3)
print(len(response['Contents'])) # only retrieve 1000
Use paginators to loop through multiple pages. See: Creating Paginators
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
operation_parameters = {'Bucket': 'my-bucket',
'Prefix': 'my-prefix'}
page_iterator = paginator.paginate(**operation_parameters)
for page in page_iterator:
print(page['Contents'])

Retrieve S3 file as Object instead of downloading to absolute system path

I just started learning and using S3, read the docs. Actually I didn't find anything to fetch the file into an object instead of downloading it from S3? if this could be possible, or I am missing something?
Actually I want to avoid additional IO after downloading the file.
You might be looking for the get_object() method of the boto3 S3 client:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object
This will get you a response object dictionary with member Body that is a StreamingBody object, which you can use as normal file and call .read() method on it. To get the entire content of the S3 object into memory you would do something like this:
s3_client = boto3.client('s3')
s3_response_object = s3_client.get_object(Bucket=BUCKET_NAME_STRING, Key=FILE_NAME_STRING)
object_content = s3_response_object['Body'].read()
I prefer this approach, equivalent to a previous answer:
import boto3
s3 = boto3.resource('s3')
def read_s3_contents(bucket_name, key):
response = s3.Object(bucket_name, key).get()
return response['Body'].read()
But another approach could read the object into StringIO:
import StringIO
import boto3
s3 = boto3.resource('s3')
def read_s3_contents_with_download(bucket_name, key):
string_io = StringIO.StringIO()
s3.Object(bucket_name, key).download_fileobj(string_io)
return string_io.getvalue()
You could use StringIO and get file content from S3 using get_contents_as_string, like this:
import pandas as pd
from io import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('YOUR_BUCKET')
fileName = "test.csv"
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))