I just started learning and using S3, read the docs. Actually I didn't find anything to fetch the file into an object instead of downloading it from S3? if this could be possible, or I am missing something?
Actually I want to avoid additional IO after downloading the file.
You might be looking for the get_object() method of the boto3 S3 client:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object
This will get you a response object dictionary with member Body that is a StreamingBody object, which you can use as normal file and call .read() method on it. To get the entire content of the S3 object into memory you would do something like this:
s3_client = boto3.client('s3')
s3_response_object = s3_client.get_object(Bucket=BUCKET_NAME_STRING, Key=FILE_NAME_STRING)
object_content = s3_response_object['Body'].read()
I prefer this approach, equivalent to a previous answer:
import boto3
s3 = boto3.resource('s3')
def read_s3_contents(bucket_name, key):
response = s3.Object(bucket_name, key).get()
return response['Body'].read()
But another approach could read the object into StringIO:
import StringIO
import boto3
s3 = boto3.resource('s3')
def read_s3_contents_with_download(bucket_name, key):
string_io = StringIO.StringIO()
s3.Object(bucket_name, key).download_fileobj(string_io)
return string_io.getvalue()
You could use StringIO and get file content from S3 using get_contents_as_string, like this:
import pandas as pd
from io import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('YOUR_BUCKET')
fileName = "test.csv"
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
Related
I have recently completed this tutorial from AWS on how to create a thumbnail generator using lambda and S3: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-tutorial.html . Basically, I'm uploading an image file to my '-source' bucket and then lambda generates a thumbnail and uploads it to my '-thumbnail' bucket.
Everything works as expected. However, I wanted to use s3 object URL in the '-thumbnail' bucket so that I can load the image from there for a small app I'm building. The issue I'm having is that the URL doesn't display the image in the browser but instead downloads the file. This causes my app to error out.
I did some research and learned that I had to change the content-type to image/jpeg and then also made the object public using ACL. This works for all of the other buckets I have except the one that has the thumbnail. I have recreated this bucket several times. I even copied the settings from my existing buckets. I have compared settings to all the other buckets and they appear to be the same.
I wanted to reach out and see if anyone has ran into this type of issue before. Or if there is something I might be missing.
Here is the code I'm using to generate the thumbnail.
import boto3
from boto3.dynamodb.conditions import Key, Attr
import os
import sys
import uuid
import urllib.parse
from urllib.parse import unquote_plus
from PIL.Image import core as _imaging
import PIL.Image
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['DB_TABLE_NAME'])
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
recordId = key
tmpkey = key.replace('/', '')
download_path = '/tmp/{}{}'.format(uuid.uuid4(), tmpkey)
upload_path = '/tmp/resized-{}'.format(tmpkey)
try:
s3.download_file(bucket, key, download_path)
resize_image(download_path, upload_path)
bucket = bucket.replace('source', 'thumbnail')
s3.upload_file(upload_path, bucket, key)
print(f"Thumbnail created and uploaded to {bucket} successfully.")
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
else:
s3.put_object_acl(ACL='public-read',
Bucket=bucket,
Key=key)
#create image url to add to dynamo
url = f"https://postreader-thumbnail.s3.us-west-2.amazonaws.com/{key}"
print(url)
#create record id to update the appropriate record in the 'Posts' table
recordId = key.replace('.jpeg', '')
#add the image_url column along with the image url as the value
table.update_item(
Key={'id':recordId},
UpdateExpression=
"SET #statusAtt = :statusValue, #img_urlAtt = :img_urlValue",
ExpressionAttributeValues=
{':statusValue': 'UPDATED', ':img_urlValue': url},
ExpressionAttributeNames=
{'#statusAtt': 'status', '#img_urlAtt': 'img_url'},
)
def resize_image(image_path, resized_path):
with PIL.Image.open(image_path) as image:
#change to standard/hard-coded size
image.thumbnail(tuple(x / 2 for x in image.size))
image.save(resized_path)
This could happen if the Content-Type of the file you're uploading is binary/octet-stream , you can modify your script like below to provide custom content-type while uploading.
s3.upload_file(upload_path, bucket, key, ExtraArgs={'ContentType':
"image/jpeg"})
After more troubleshooting the issue was apparently related to the bucket's name. I created a new bucket with a different name than it had previously. After doing so I was able to upload and share images without issue.
I edited my code so that the lambda uploads to the new bucket name and I am able to share the image via URL without downloading.
I am able to copy a file from one bucket to another, but not sure if i'm doing this wrong but i can't delete the file . any thoughts?
import boto3
import os
from requests_aws4auth import AWS4Auth
session = boto3.Session()
credentials = session.get_credentials()
aws4auth = AWS4Auth(credentials.access_key,credentials.secret_key,region, service, session_token=credentials.token)
s3 = boto3.resource('s3')
name = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3.meta.client.copy({'Bucket': name, 'key': key}, targetBucket, key)
s3.meta.client.delete({{'Bucket': name, 'key': key}})
Since you are creating s3 = boto3.resource('s3'), you may use it to delete the object.
For this you would create Object and then used its delete method. For example:
s3 = boto3.resource('s3')
object_to_be_deleted = s3.Object(name, key)
object_to_be_deleted.delete()
Also since you are using lambda, make sure that your function's execution role has permissions to delete the object or there are no bucket policies which prohibit such an action.
I would suggest you to use boto3 client() rather than resource(). Anyways, here is what I tried and worked for me:
To copy file
import boto3
client = boto3.client('s3')
copy_source = {'Bucket': 'from-bucket-s3', 'Key': 'cfn.json'}
client.copy(copy_source, 'to-bucket-s3', 'other-cfn.json')
To delete file
import boto3
client = boto3.client('s3')
client.delete_object(Bucket='to-bucket-s3', Key='other-cfn.json')
boto3 client() supports vast number of APIs than resource()
I am receiving images on a S3 bucket. Using a lambda function, I want to resize the images to a thumbnail and copy the thumbnail into another s3 bucket. The following is the code:
import json
import boto3
import ast
from urllib.request import urlopen
import time
from boto3.dynamodb.conditions import Key, Attr
from PIL import Image
s3_client=boto3.client('s3')
s3_res = boto3.resource('s3')
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
tnlBuck = s3_res.Bucket('aivuthumbnail')
for record in event['Records']:
bucket=record['s3']['bucket']['name']
ikey = record['s3']['object']['key']
params = {'Bucket': bucket, 'Key': ikey}
proj = ikey.split('_')[0]
outlet = ikey.split('_')[1]
parameter = ikey.split('_')[2]
dat = ikey.split('_')[3]
table = client.Table("telescopeImageReceipt")
table.put_item(Item={'image':ikey,'project':proj,'outlet':outlet,'parameter':parameter,'date':dat})
url = s3_client.generate_presigned_url(ClientMethod='get_object', Params=params)
with urlopen(url) as conn:
image = Image.open(conn)
MAX_SIZE = (100, 100)
image.thumbnail(MAX_SIZE)
image.copy("Bucket":tnlBuck)
I have changed the last line to various combinations. But nothing works. The lambda function has full access to S3, Dynamodb and Cloudwatch logs.
The following were some of the options I tried and got the error messages:
Option Tried: tnlBuck.copy(image, ikey)
Error : Expecting dictionary formatted: {"Bucket": bucket_name, "Key": key} but got <PIL.JpegImagePlugin.JpegImageFile image
Option Tried: s3_client.copy({"Bucket":tnlBuck, "Key":ikey})
Error: TypeError: copy() missing 2 required positional arguments: 'Bucket' and ‘Key'
Option tried: image.copy({"Bucket":tnlBuck, "Key":ikey})
Error: TypeError: copy() takes 1 positional argument but 2 were given
Other options had more or less similar errors or thrown a syntax error.
You need to use an S3 bucket to copy the image to it and not the PIL Image object.
Your code should be changed to this:
import json
import io
import boto3
import ast
from urllib.request import urlopen
import time
from boto3.dynamodb.conditions import Key, Attr
from PIL import Image
s3_client=boto3.client('s3')
s3_res = boto3.resource('s3')
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
tnlBuck = s3_res.Bucket('aivuthumbnail')
for record in event['Records']:
bucket=record['s3']['bucket']['name']
ikey = record['s3']['object']['key']
params = {'Bucket': bucket, 'Key': ikey}
proj = ikey.split('_')[0]
outlet = ikey.split('_')[1]
parameter = ikey.split('_')[2]
dat = ikey.split('_')[3]
table = client.Table("telescopeImageReceipt")
table.put_item(Item={'image':ikey,'project':proj,'outlet':outlet,'parameter':parameter,'date':dat})
url = s3_client.generate_presigned_url(ClientMethod='get_object', Params=params)
with urlopen(url) as conn:
image = Image.open(conn)
MAX_SIZE = (100, 100)
image.thumbnail(MAX_SIZE)
img_bytes = io.BytesIO()
image.save(img_bytes, format='JPEG')
img_bytes.seek(0)
tnl_bucket.Object(ikey).put(Body=img_bytes.read())
You should use tnl_bucket to create a new object from thumbnailed image bytes.
img_bytes = io.BytesIO()
image.save(img_bytes, format='JPEG')
img_bytes.seek(0)
tnl_bucket.Object(ikey).put(Body=img_bytes.read())
PIL can save to file on path or BytesIO. You need to return to the stream beginning with .seek(0) so it can be read from the start to get bytes for put method.
I am using Lambda to read image files when they are uploaded to S3 through a S3 trigger. The following is my code:
import json
import numpy as np
import face_recognition as fr
def lambda_handler(event, context):
for record in event['Records']:
bucket=record['s3']['bucket']['name']
key = record['s3']['object']['key']
print(bucket,key)
This correctly prints the bucket name and key. However how do I read the image so that I can run face-recognition module on the image. Can i generate the arn for each uploaded image and use it to read the same?
You can read the image from S3 directly:
s3 = boto3.client('s3')
resp = s3.get_object(Bucket=bucket, Key=key)
image_bytes = resp['Body'].read()
I searched in the boto3 doc but didn't find relevant information there. In this link, it is mentioned that it can be done using
k.storage_class='STANDARD_IA'
Can someone share a full code snippet here? Many thanks.
New file
import boto3
client = boto3.client('s3')
client.upload_file(
Filename = '/tmp/foo.txt',
Bucket = 'my-bucket',
Key = 'foo.txt',
ExtraArgs = {
'StorageClass': 'STANDARD_IA'
}
)
Existing file
From How to change storage class of existing key via boto3:
import boto3
s3 = boto3.client('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(
CopySource = copy_source,
Bucket = 'target-bucket',
Key = 'target-key',
ExtraArgs = {
'StorageClass': 'STANDARD_IA',
'MetadataDirective': 'COPY'
}
)
From the boto3 Storing Data example, it looks like the standard way to put objects in boto3 is
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
But to set the storage class, S3.Object.Put suggests we'd want to use parameter:
StorageClass='STANDARD_IA'
So combining the two, we have:
import boto3
s3 = boto3.resource('s3')
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'), StorageClass='STANDARD_IA')
Hope that helps