Lambda Copy Image from one bucket to another after resize - amazon-web-services

I am receiving images on a S3 bucket. Using a lambda function, I want to resize the images to a thumbnail and copy the thumbnail into another s3 bucket. The following is the code:
import json
import boto3
import ast
from urllib.request import urlopen
import time
from boto3.dynamodb.conditions import Key, Attr
from PIL import Image
s3_client=boto3.client('s3')
s3_res = boto3.resource('s3')
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
tnlBuck = s3_res.Bucket('aivuthumbnail')
for record in event['Records']:
bucket=record['s3']['bucket']['name']
ikey = record['s3']['object']['key']
params = {'Bucket': bucket, 'Key': ikey}
proj = ikey.split('_')[0]
outlet = ikey.split('_')[1]
parameter = ikey.split('_')[2]
dat = ikey.split('_')[3]
table = client.Table("telescopeImageReceipt")
table.put_item(Item={'image':ikey,'project':proj,'outlet':outlet,'parameter':parameter,'date':dat})
url = s3_client.generate_presigned_url(ClientMethod='get_object', Params=params)
with urlopen(url) as conn:
image = Image.open(conn)
MAX_SIZE = (100, 100)
image.thumbnail(MAX_SIZE)
image.copy("Bucket":tnlBuck)
I have changed the last line to various combinations. But nothing works. The lambda function has full access to S3, Dynamodb and Cloudwatch logs.
The following were some of the options I tried and got the error messages:
Option Tried: tnlBuck.copy(image, ikey)
Error : Expecting dictionary formatted: {"Bucket": bucket_name, "Key": key} but got <PIL.JpegImagePlugin.JpegImageFile image
Option Tried: s3_client.copy({"Bucket":tnlBuck, "Key":ikey})
Error: TypeError: copy() missing 2 required positional arguments: 'Bucket' and ‘Key'
Option tried: image.copy({"Bucket":tnlBuck, "Key":ikey})
Error: TypeError: copy() takes 1 positional argument but 2 were given
Other options had more or less similar errors or thrown a syntax error.

You need to use an S3 bucket to copy the image to it and not the PIL Image object.
Your code should be changed to this:
import json
import io
import boto3
import ast
from urllib.request import urlopen
import time
from boto3.dynamodb.conditions import Key, Attr
from PIL import Image
s3_client=boto3.client('s3')
s3_res = boto3.resource('s3')
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
tnlBuck = s3_res.Bucket('aivuthumbnail')
for record in event['Records']:
bucket=record['s3']['bucket']['name']
ikey = record['s3']['object']['key']
params = {'Bucket': bucket, 'Key': ikey}
proj = ikey.split('_')[0]
outlet = ikey.split('_')[1]
parameter = ikey.split('_')[2]
dat = ikey.split('_')[3]
table = client.Table("telescopeImageReceipt")
table.put_item(Item={'image':ikey,'project':proj,'outlet':outlet,'parameter':parameter,'date':dat})
url = s3_client.generate_presigned_url(ClientMethod='get_object', Params=params)
with urlopen(url) as conn:
image = Image.open(conn)
MAX_SIZE = (100, 100)
image.thumbnail(MAX_SIZE)
img_bytes = io.BytesIO()
image.save(img_bytes, format='JPEG')
img_bytes.seek(0)
tnl_bucket.Object(ikey).put(Body=img_bytes.read())
You should use tnl_bucket to create a new object from thumbnailed image bytes.
img_bytes = io.BytesIO()
image.save(img_bytes, format='JPEG')
img_bytes.seek(0)
tnl_bucket.Object(ikey).put(Body=img_bytes.read())
PIL can save to file on path or BytesIO. You need to return to the stream beginning with .seek(0) so it can be read from the start to get bytes for put method.

Related

Lambda task timeout but no application log

I have a python lambda that triggers by S3 uploads to a specific folder. The lambda function is to process the uploaded file and outputs it to another folder on the same S3 bucket.
The issue is that when I do a bulk upload using AWS console, some files do not get processed. I ended up setting a dead letter queue to catch these invocations. While inspecting the message in the queue, there is a request ID which I tried to find it in the lambda logs.
These are the logs for the request ID:
Now the odd part is that in the python code, the first line after the imports is print('Loading function') which does not show up in the lambda log?
Added the python code here. It should still print the Processing file name: " + key which is inside the handler ya?
import urllib.parse
from datetime import datetime
import boto3
from constants import CONTENT_TYPE, XML_EXTENSION, VALIDATING
from xml_process import *
from s3Integration import download_file
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
print("Processing file name: " + key)
try:
response = s3.get_object(Bucket=bucket, Key=key)
xml_content = response["Body"].read()
content_type = response["ContentType"]
tree = ET.fromstring(xml_content)
key_file_name = key.split("/")[1]
# Creating a temporary copy by downloading file to get the namespaces
temp_file_name = "/tmp/" + key_file_name
download_file(key, temp_file_name)
namespaces = {node[0]: node[1] for _, node in ET.iterparse(temp_file_name, events=['start-ns'])}
for name, value in namespaces.items():
ET.register_namespace(name, value)
# Preparing path for file processing
processed_file = key_file_name.split(".")[0] + "_processed." + key_file_name.split(".")[1]
print(processed_file, "processed")
db_record = XMLMapping(file_path=key,
processed_file_path=processed_file,
uploaded_by="lambda",
status=VALIDATING, uploaded_date=datetime.now(), is_active=True)
session.add(db_record)
session.commit()
if key_file_name.split(".")[1] == XML_EXTENSION:
if content_type in CONTENT_TYPE:
xml_parse(tree, db_record, processed_file, True)
else:
print("Content Type is not valid. Provided value: ", content_type)
else:
print("File extension is not valid. Provided extension: ", key_file_name.split(".")[1])
return "success"
except Exception as e:
print(e)
raise e
I don't think its a permission issue as other files uploaded in the same batch were processed successfully.

Dump lambda output to csv and have it email as an attachment

I have a lambda function that generates a list of untagged buckets in AWS environment. Currently I send the output to a slack channel directly. Instead I would like to have my lambda dump the output to a csv file and send it as a report. Here is the code for it, let me know if you need any other details.
import boto3
from botocore.exceptions import ClientError
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
#Printing the S3 buckets with no tags
s3 = boto3.client('s3')
s3_re = boto3.resource('s3')
buckets = []
print('Printing buckets with no tags..')
for bucket in s3_re.buckets.all():
s3_bucket = bucket
s3_bucket_name = s3_bucket.name
try:
response = s3.get_bucket_tagging(Bucket=s3_bucket_name)
except ClientError:
buckets.append(bucket)
print(bucket)
for bucket in buckets:
data = {"text": "%s bucket has no tags" % (bucket)}
r = http.request("POST", "https://hooks.slack.com/services/~/~/~",
body = json.dumps(data),
headers = {"Content-Type": "application/json"})

I would like to export DynamoDB Table to S3 bucket in CSV format using Python (Boto3)

This question has been asked earlier in the following link:
How to write dynamodb scan data's in CSV and upload to s3 bucket using python?
I have amended the code as advised in the comments. The code looks like as follows:
import csv
import boto3
import json
dynamodb = boto3.resource('dynamodb')
db = dynamodb.Table('employee_details')
def lambda_handler(event, context):
AWS_BUCKET_NAME = 'session5cloudfront'
s3 = boto3.resource('s3')
bucket = s3.Bucket(AWS_BUCKET_NAME)
path = '/tmp/' + 'employees.csv'
try:
response = db.scan()
myFile = open(path, 'w')
for i in response['Items']:
csv.register_dialect('myDialect', delimiter=' ', quoting=csv.QUOTE_NONE)
with myFile:
writer = csv.writer(myFile, dialect='myDialect')
writer.writerows(i)
print(i)
except :
print("error")
bucket.put_object(
ACL='public-read',
ContentType='application/csv',
Key=path,
# Body=json.dumps(i),
)
# print("here")
body = {
"uploaded": "true",
"bucket": AWS_BUCKET_NAME,
"path": path,
}
# print("then here")
return {
"statusCode": 200,
"body": json.dumps(body)
}
I am a novice, please help me in fixing this code as it is having problem in inserting data in file created in S3 Bucket.
Thanks
I have revised the code to be simpler and to also handle paginated responses for tables with more than 1MB of data:
import csv
import boto3
import json
TABLE_NAME = 'employee_details'
OUTPUT_BUCKET = 'my-bucket'
TEMP_FILENAME = '/tmp/employees.csv'
OUTPUT_KEY = 'employees.csv'
s3_resource = boto3.resource('s3')
dynamodb_resource = boto3.resource('dynamodb')
table = dynamodb_resource.Table(TABLE_NAME)
def lambda_handler(event, context):
with open(TEMP_FILENAME, 'w') as output_file:
writer = csv.writer(output_file)
header = True
first_page = True
# Paginate results
while True:
# Scan DynamoDB table
if first_page:
response = table.scan()
first_page = False
else:
response = table.scan(ExclusiveStartKey = response['LastEvaluatedKey'])
for item in response['Items']:
# Write header row?
if header:
writer.writerow(item.keys())
header = False
writer.writerow(item.values())
# Last page?
if 'LastEvaluatedKey' not in response:
break
# Upload temp file to S3
s3_resource.Bucket(OUTPUT_BUCKET).upload_file(TEMP_FILENAME, OUTPUT_KEY)

AWS Lambda : read image from S3 upload event

I am using Lambda to read image files when they are uploaded to S3 through a S3 trigger. The following is my code:
import json
import numpy as np
import face_recognition as fr
def lambda_handler(event, context):
for record in event['Records']:
bucket=record['s3']['bucket']['name']
key = record['s3']['object']['key']
print(bucket,key)
This correctly prints the bucket name and key. However how do I read the image so that I can run face-recognition module on the image. Can i generate the arn for each uploaded image and use it to read the same?
You can read the image from S3 directly:
s3 = boto3.client('s3')
resp = s3.get_object(Bucket=bucket, Key=key)
image_bytes = resp['Body'].read()

Retrieve S3 file as Object instead of downloading to absolute system path

I just started learning and using S3, read the docs. Actually I didn't find anything to fetch the file into an object instead of downloading it from S3? if this could be possible, or I am missing something?
Actually I want to avoid additional IO after downloading the file.
You might be looking for the get_object() method of the boto3 S3 client:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object
This will get you a response object dictionary with member Body that is a StreamingBody object, which you can use as normal file and call .read() method on it. To get the entire content of the S3 object into memory you would do something like this:
s3_client = boto3.client('s3')
s3_response_object = s3_client.get_object(Bucket=BUCKET_NAME_STRING, Key=FILE_NAME_STRING)
object_content = s3_response_object['Body'].read()
I prefer this approach, equivalent to a previous answer:
import boto3
s3 = boto3.resource('s3')
def read_s3_contents(bucket_name, key):
response = s3.Object(bucket_name, key).get()
return response['Body'].read()
But another approach could read the object into StringIO:
import StringIO
import boto3
s3 = boto3.resource('s3')
def read_s3_contents_with_download(bucket_name, key):
string_io = StringIO.StringIO()
s3.Object(bucket_name, key).download_fileobj(string_io)
return string_io.getvalue()
You could use StringIO and get file content from S3 using get_contents_as_string, like this:
import pandas as pd
from io import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('YOUR_BUCKET')
fileName = "test.csv"
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))