I have a Lambda that runs when files are uploaded to S3-A bucket and moves those files to another bucket S3-B. The challenge is that I need create a folder inside S3-B bucket with a corresponding date of uploaded files and move the files to the folder. Any help or ideas are greatly apprecited. It might sound confusing so feel free to ask questions.Thank you!
Here's a Lambda function that can be triggered by an Amazon S3 Event and move the object to another bucket:
import json
import urllib
from datetime import date
import boto3
DEST_BUCKET = 'bucket-b'
def lambda_handler(event, context):
s3_client = boto3.client('s3')
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
dest_key = str(date.today()) + '/' + key
s3_client.copy_object(
Bucket=DEST_BUCKET,
Key=dest_key,
CopySource=f'{bucket}/{key}'
)
The only thing to consider is timezones. The Lambda function runs in UTC and you might be expecting a slightly different date in your timezone, so you might need to adjust the time accordingly.
Just to clear up some confusion, in S3 there is no such thing as a folder. What you see in the interface is actually running the ListObjects using a prefix. The prefix is what you are seeing as the folder hierarchy.
To help illustrate this an object might have a key (which is a piece of metadata that defines its name) of folder/subfolder/file.txt, in the console you're actually using a prefix of folder/subfolder/*. This makes sense if you think of S3 more like a key value store, where the value is the object itself.
For this reason you can make a key on a prefix that has not existed before without creating any other hierarchical features.
In your Lambda function, you will need to download the files locally and then upload them to their new object key (remembering to delete the old object). Some SDKS will have an automated function that will perform all of these steps for you (such as Boto3 with the copy function).
Related
I am uploading an image-file into AWS S3 using boto3 library. I noticed that the S3 object url ending does not match with the given Key. Is it possible to get the S3 object url as a return value from boto3 upload_file function?
example:
import boto3
s3 = boto3.client('s3')
file_location = ...
bucket = ...
folder = ...
filename = ...
url = s3.upload_file(
Filename=file_location,
Bucket=bucket,
Key=f'{folder}/{filename}',
)
I read from docs that it might be possible with a callback function, but I could not get it working with boto3.
If not what is the simplest way to get the uploaded object url?
Using the AWS SDK, you can get a URL for an object in an Amazon S3 bucket. I am not sure there is a Python example for this use case however, you can get an idea how to perform this task by looking at the Java example.
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/s3/src/main/java/com/example/s3/GetObjectUrl.java
Okey, my problem was that the the filenames I was adding to create the file Key contained a hashtag symbol which has a specific meaning in url. AWS was automatically changing the hashtags into %23, which created a mismatch between Key and URL. Now I changed the naming convention of the file into containing no hashtag, so no more problem occurs.
I am new to aws. I'm looking for python boto3 library API calls in aws for below scenarios.
API call to get the list of files using s3 path
API call to remove all the files under s3 path
API call to check if the given s3 path is exits or not
I appreciate If any one can help me on this.
"Paths" (directories, folders) do not actually exist in Amazon S3. It uses a flat (non-hierarchical) storage model where the filename (Key) of each object contains the full path of the object.
However, much of the functionality of paths is still provided by referencing a Prefix, which refers to the first part of a Key.
For example, let's say there is an object with a Key of: invoices/january/invoice.txt
It has a Prefix of invoices/ and also has a prefix of invoices/january/. A Prefix simply checks "Does the Key start with this string?"
Therefore, you can get the list of files using s3 path with:
import boto3
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket('my-bucket').objects.filter(Prefix='invoices/'):
print(object.key)
Or, using the client method:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='my-bucket', Prefix='invoices/')
for object in response['Contents']:
print(object['Key'])
To remove all the files under s3 path you would need to use the above code to iterate through each object, and then call delete_object(). Alternatively, you could build a list of Keys to delete and then call delete_objects().
To check if the given s3 path is exits or not you can call head_object(). Please note that this will work on an object, but will not work on a "path" because directories do not actually exist.
However, if you create a Folder in the Amazon S3 management console, a zero-length object is created with the name of the directory. This will make it "appear" that there is a directory, but it is not required. You can create an object in any path without actually creating the directories. They will simply "appear". Then, when all objects in that directory are deleted, the directory will no longer be displayed. It's magic!
See also: Amazon S3 examples — Boto3 documentation
I have two different accounts
1) Account one which is vendor account and they gave us AccessID and secret key for access.
2) Our Account where we have full access.
We need to copy files from Vendor S3 bucket to Our S3 bucket using boto3 Python 3.7 scripts.
What is the best function in boto3 to use to get best performance.
I tried using get_object and put_object. Problem with this scenario is I am actually reading the file body and writing it. How do we just copy from one account to another account with the faster copy mode?
Is there any setup I can do from my end to directly copy. We are okay to use Lambda as well as long as I get good performance. I cannot request any changes from vendor except that they give us access keys.
Thanks
Tom
One of the fastest ways to copy data between 2 buckets is to use S3DistCp, worth to use it only if you have a lot of files to copy, it will copy them in a distributed way with an EMR cluster.
Lambda function with boto3 will be an option, only if copy takes less then 5 minutes if longer you can consider using ECS tasks (basically Docker containers).
Regarding the part how to copy with boto3 you can check here.
Looks like that you can do something like:
import boto3
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
source_bucket_name = 'src_bucket_name'
destination_bucket_name = 'dst_bucket_name'
paginator = s3_client.get_paginator('list_objects')
response_iterator = paginator.paginate(
Bucket=source_bucket_name,
Prefix='your_prefix',
PaginationConfig={
'PageSize': 1000,
}
)
objs = response_iterator.build_full_result()['Contents']
keys_to_copy = [o['Key'] for o in objs] # or use a generator (o['Key'] for o in objs)
for key in keys_to_copy:
print(key)
copy_source = {
'Bucket': source_bucket_name,
'Key': key
}
s3_resource.meta.client.copy(copy_source, destination_bucket_name, key)
The proposed solution first get the name of the objects to copy, then it calls the copy command for each object.
To make it faster instead of using a for loop, you can use async.
If you run the code in a Lambda or ECS task remember to create a IAM role with access to both Source Bucket and Destination bucket.
I am trying to write a lambda function that capture an image of my PCs webcam feed every time a trigger occurs. I want to programmatically add them to an S3 bucket without overriding them with the same key (like "image.jpg"). What's the best way to do something where the filename is incremented every time the function is called (ex: image1.jpg, image2.jpg, etc)? Note: I am using Boto3 to upload to S3 buckets.
You can store a current counter in DynamoDB or in parameter store.
Or just use a timestamp with enough resolution and be done with it.
an easy way to do this is to add date-time to the name of the image while storing/uploading to s3 , with this you always will have a new key name in your bucket. Code given below will do your work.
import boto3
import datetime
i = datetime.datetime.now()
ptr=str(i)
smg='group1'+ptr+'.jpeg'
s3 = boto3.resource('s3')
s3.meta.client.upload_file('local/file/group1.jpeg', 'bucket_name', smg)
If There are too many files on a bucket, and I want to get only 100 newest files,
How can I get only these list?
s3.bucket.list seems not to have that function. Is there anybody who know this?
please let me know. thanks.
There is no way to do this type of filtering on the service side. The S3 API does not support it. You might be able to accomplish something like this by using prefixes in your object names. For example, if you named all of your objects using a pattern like this:
YYYYMMDD/<objectname>
20140618/foobar (as an example)
you could use the prefix parameter of the ListBucket request in S3 to return only the object that were stored today. In boto, this would look like:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket')
for key in bucket.list(prefix='20140618'):
# do something with the key object
You would still have to retrieve all of the objects with that prefix and then sort them locally based on their last_modified_date but that would be much easier than listing all of the objects in the bucket and then sorting.
The other option would be to store metadata object the S3 objects in a database like DynamoDB and then query that database to find the objects to retrieve from S3.
You can find out more about hierarchical listing in S3 here
Can you try this code. This worked for me.
import boto,operator,time
con = boto.connect_s3()
key_repo = []
bucket = con.get_bucket('<your bucket name>')
bucket_keys = bucket.get_all_keys()
for object in bucket_keys:
t = (object.key,time.strptime(object.last_modified[:19], "%Y-%m-%dT%H:%M:%S"))
key_repo.append(t)
key_repo.sort(key=lambda item:item[1], reverse=1)
for key in key_repo[:10]: #top 10 items in the list
print key[0], ' ',key[1]
PS : I am beginner to Python so the code might not be optimized. Fell free to edit the answer to provide best code.