Dynamically resizing images using AWS S3/Lambda - amazon-web-services

So I have a web app that stores images in a single bucket, following this principle (folder with the name of the user id, pictures with the name of user id + some random characters in the respected user id folder).
Now I already implemented a python script that takes uploaded image from a single bucket (root folder, or any folder I specify) and outputs it to another bucket/folder I specify. I'm just wondering if it's possible to do this in real time with my situation (I don't even need to export the resized pics to another bucket, they can stay in the same folder the original was uploaded to). This is part of the script I'm using right now. Any help appreciated.
s3_client = boto3.client('s3')
def resize_image(image_path, resized_path):
with Image.open(image_path) as image:
image.thumbnail((128, 128))
image.save(resized_path)
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
download_path = '/{}{}'.format(uuid.uuid4(), key)
upload_path = '/resized-{}'.format(key)
s3_client.download_file(bucket, key, download_path)
resize_image(download_path, upload_path)
s3_client.upload_file(upload_path, '{}-resized'.format(bucket), key)

Ah! It looks like you grabbed the sample code from the Lambda documentation: Tutorial: Using AWS Lambda with Amazon S3 - AWS Lambda
You can configure an Amazon S3 Event to trigger the AWS Lambda function whenever a new object is added to the S3 bucket. In fact, that is how the tutorial operates. This is effectively "real-time" because it triggers as soon as an object is uploaded. (Just configure the prefixes so it doesn't trigger an infinite loop.)
An alternative to resizing the images yourself is to use a service that can resize on-the-fly, such as:
Cloudinary
Imgix

Related

When uploading a file into aws s3 with boto3 is it possible to get the s3 object url as a return value?

I am uploading an image-file into AWS S3 using boto3 library. I noticed that the S3 object url ending does not match with the given Key. Is it possible to get the S3 object url as a return value from boto3 upload_file function?
example:
import boto3
s3 = boto3.client('s3')
file_location = ...
bucket = ...
folder = ...
filename = ...
url = s3.upload_file(
Filename=file_location,
Bucket=bucket,
Key=f'{folder}/{filename}',
)
I read from docs that it might be possible with a callback function, but I could not get it working with boto3.
If not what is the simplest way to get the uploaded object url?
Using the AWS SDK, you can get a URL for an object in an Amazon S3 bucket. I am not sure there is a Python example for this use case however, you can get an idea how to perform this task by looking at the Java example.
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/s3/src/main/java/com/example/s3/GetObjectUrl.java
Okey, my problem was that the the filenames I was adding to create the file Key contained a hashtag symbol which has a specific meaning in url. AWS was automatically changing the hashtags into %23, which created a mismatch between Key and URL. Now I changed the naming convention of the file into containing no hashtag, so no more problem occurs.

AWS Lambda create folder in S3 bucket

I have a Lambda that runs when files are uploaded to S3-A bucket and moves those files to another bucket S3-B. The challenge is that I need create a folder inside S3-B bucket with a corresponding date of uploaded files and move the files to the folder. Any help or ideas are greatly apprecited. It might sound confusing so feel free to ask questions.Thank you!
Here's a Lambda function that can be triggered by an Amazon S3 Event and move the object to another bucket:
import json
import urllib
from datetime import date
import boto3
DEST_BUCKET = 'bucket-b'
def lambda_handler(event, context):
s3_client = boto3.client('s3')
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
dest_key = str(date.today()) + '/' + key
s3_client.copy_object(
Bucket=DEST_BUCKET,
Key=dest_key,
CopySource=f'{bucket}/{key}'
)
The only thing to consider is timezones. The Lambda function runs in UTC and you might be expecting a slightly different date in your timezone, so you might need to adjust the time accordingly.
Just to clear up some confusion, in S3 there is no such thing as a folder. What you see in the interface is actually running the ListObjects using a prefix. The prefix is what you are seeing as the folder hierarchy.
To help illustrate this an object might have a key (which is a piece of metadata that defines its name) of folder/subfolder/file.txt, in the console you're actually using a prefix of folder/subfolder/*. This makes sense if you think of S3 more like a key value store, where the value is the object itself.
For this reason you can make a key on a prefix that has not existed before without creating any other hierarchical features.
In your Lambda function, you will need to download the files locally and then upload them to their new object key (remembering to delete the old object). Some SDKS will have an automated function that will perform all of these steps for you (such as Boto3 with the copy function).

How to programmatically add files to AWS S3 Bucket?

I am trying to write a lambda function that capture an image of my PCs webcam feed every time a trigger occurs. I want to programmatically add them to an S3 bucket without overriding them with the same key (like "image.jpg"). What's the best way to do something where the filename is incremented every time the function is called (ex: image1.jpg, image2.jpg, etc)? Note: I am using Boto3 to upload to S3 buckets.
You can store a current counter in DynamoDB or in parameter store.
Or just use a timestamp with enough resolution and be done with it.
an easy way to do this is to add date-time to the name of the image while storing/uploading to s3 , with this you always will have a new key name in your bucket. Code given below will do your work.
import boto3
import datetime
i = datetime.datetime.now()
ptr=str(i)
smg='group1'+ptr+'.jpeg'
s3 = boto3.resource('s3')
s3.meta.client.upload_file('local/file/group1.jpeg', 'bucket_name', smg)

using file stored on aws s3 storage -- media storage as input of a process in django view

I am trying to make my django app in production in aws, i used elastic beanstalk to deploy it so the ec2 instance is created and connected to an rds database mysql instance and i use a bucket in amazon s3 storage to store my media files on it.
When user upload a video, it is stored in s3 as : "https://bucketname.s3.amazonaws.com/media/videos/videoname.mp4".
In django developpement mode, i was using the video filename as an input to a batch script which gives a video as output.
My view in developpement mode is as follow:
def get(request):
# get video
var = Video.objects.order_by('id').last()
v = '/home/myproject/media/videos/' + str(var)
# call process
subprocess.call("./step1.sh %s" % (str(v)), shell=True)
return render(request, 'endexecut.html')
In production mode in aws (Problem), i tried:
v = 'https://bucketname.s3.amazonaws.com/media/videos/' + str(var)
but the batch process doesn't accept the url as input to process.
How can i use my video file from s3 bucket to do process with in my view as i described ? Thank you in advance.
You should not hard-code that string. There are a couple of things wrong with that:
"bucketname" is not the name of your bucket. You should use the name of your bucket if this would at all work.
Your Media File URl (In settings.py) should be pointing to the bucket url where your files are saved (If it's well configured). So you can make use of:
video_path = settings.MEDIA_URL + video_name
I am assuming you are using s3boto to handle your storages (That's not a prerequisite though, it only makes your storage handling smarter and it's highly recommended if you are pushing to s3 from a django app)

aws lambda s3 events for existing files

I am considering moving to lambdas and after spending some time reading docs and various blogs with user experiences I am still struggling with a simple question. Is there a proposed/proper way to use lambda with existing s3 files?
I have an s3 bucket that contains archived data spanning a couple of years. The size of these data is rather large (hundreds of GB). Each file is a simple txt file. Each line in the file represents an event and it's just a comma separated string.
My endgame is to consume these files, parse each one of them line by line apply some transformation, create batches of lines and send them to an external service. From what I've read so far, if I write a proper lambda, this will be triggered by an s3 event (for example an upload of a new file).
Is there a way to apply the lambda to all the existing contents of my bucket?
Thanks
For existing resources you would need to write a script that gets a listing of all your resources and sends each item to a Lambda function somehow. I'd probably look into sending the location of each of your existing S3 objects to a Kenesis stream and configure a Lambda function to pull records from that stream and process them.
Try using s3cmd.
s3cmd modify --recursive --add-header="touched:touched" s3://path/to/s3/bucket-or-folder
This will modify metadata and invoke an event for lambda
I had a similar problem I solved it with minimal changes to my existing Lambda function. The solution involves creating API Gateway trigger (in addition to S3 trigger) - the API gateway trigger is used to process historical files in S3 & the regular S3 trigger will processes my files as new files are uploaded to my S3 bucket.
Initially - I started by building my function to expect a S3 event as trigger. Recall that the S3 events have this structure - so I would look for the S3 bucket name and key to process - like so:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'], encoding='utf-8')
temp_dir = tempfile.TemporaryDirectory()
video_filename = os.path.basename(key)
local_video_filename = os.path.join(temp_dir.name, video_filename)
s3_client.download_file(bucket, key, local_video_filename)
But when you send the API Gateway trigger there is no "Records" object in the request/event. You can use query parameters in the API Gateway Trigger - so the modification required to the above snippet of code is:
if 'Records' in event:
# this means we are working off of an S3 event
records_to_process = event['Records']
else:
# this is for ad-hoc posts via API Gateway trigger for Lambda
records_to_process = [{
"s3":{"bucket": {"name": event["queryStringParameters"]["bucket"]},
"object":{"key": event["queryStringParameters"]["file"]}}
}]
for record in records_to_process:
# below lines of code s same as the earlier snippet of code
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'], encoding='utf-8')
temp_dir = tempfile.TemporaryDirectory()
video_filename = os.path.basename(key)
local_video_filename = os.path.join(temp_dir.name, video_filename)
s3_client.download_file(bucket, key, local_video_filename)
Postman result of sending the post request
Try to copy your bucket content and catch create events with lambda.
copy:
s3cmd sync s3://from/this/bucket/ s3://to/this/bucket
for larger buckets:
https://github.com/paultuckey/s3_bucket_to_bucket_copy_py