Add whole S3 images bucket to Amazon Rekognition collection - amazon-web-services

In AWS CLI, I can add to a collection only a single image at a time.
Is there any way to add the whole S3 bucket to a collection?

The IndexFaces() API call accepts only one image at a time, but can index up to 100 faces from that image.
If you wish to add faces from multiple images (eg a whole bucket or folder), you would need to call IndexFaces() multiple times (once per image). This would involve a call to Amazon S3 to list the files, then a loop to call IndexFaces().
It would be relatively simple in a scripting language like Python.

Related

S3 Bucket AWS CLI takes forever to get specific files

I have a log archive bucket, and that bucket has 2.5m+ objects.
I am looking to download some specific time period files. For this I have tried different methods but all of them are failing.
My observation is those queries start from oldest file, but the files I seek are the newest ones. So it takes forever to find them.
aws s3 sync s3://mybucket . --exclude "*" --include "2021.12.2*" --include "2021.12.3*" --include "2022.01.01*"
Am I doing something wrong?
Is it possible to make these query start from newest files so it might take less time to complete?
I also tried using S3 Browser and CloudBerry. Same problem. Tried with a EC2 that is inside the same AWS network. Same problem.
2.5m+ objects in an Amazon S3 bucket is indeed a large number of objects!
When listing the contents of an Amazon S3 bucket, the S3 API only returns 1000 objects per API call. Therefore, when the AWS CLI (or CloudBerry, etc) is listing the objects in the S3 bucket it requires 2500+ API calls. This is most probably the reason why the request is taking so long (and possibly failing due to lack of memory to store the results).
You can possibly reduce the time by specifying a Prefix, which reduces the number of objects returned from the API calls. This would help if the objects you want to copy are all in a sub-folder.
Failing that, you could use Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You could then extract from that CSV file a list of objects you want to copy (eg use Excel or write a program to parse the file). Then, specifically copy those objects using aws s3 cp or from a programming language. For example, a Python program could parse the script and then use download_file() to download each of the desired objects.
The simple fact is that a flat-structure Amazon S3 bucket with 2.5m+ objects will always be difficult to list. If possible, I would encourage you to use 'folders' to structure the bucket so that you would only need to list portions of the bucket at a time.

Is there a way to remove image metadata from all existing images in S3?

I have a huge S3 bucket of images, it's impractical to manually edit each one or write a script to strip the metadata one by one, is there another way to strip the image metadata from all of the images?
To be clear: I mean the image metadata (Exif etc), NOT the s3 metadata.
Thank you!
No, I recommend using S3 Batch Operations + AWS Lambda scripts to do this.

Sorting s3 picture files according to size

I want to map s3 bucket's picture files size.
Is it possible to get a bucket's file percentage which are bigger than 5mb?
Your question isn't too clear, but it appears that you want to obtain information about the size of objects in an Amazon S3 bucket.
The GET Bucket (List Objects) Version 2 API call (and its equivalent in various SDKs such as list-objects in the AWS CLI and list_objects_v2() in Python) will return a list of objects in a bucket, including the size of the objects. You could then use this information to calculate which objects are consuming the most storage space.
When listing objects, the only filter is the ability to specify a path (folder). It is not possible to list files based upon their size. Instead, all objects in the desired path will be returned.
If you have many objects (eg millions), it might be easier to use Amazon S3 Inventory, which can provide a daily CSV file listing all objects in a bucket.

Is there any service on AWS that can help me convert mp4 files to mp3?

I'm new to Amazon web services and I'm wondering if the platform offers any solution to convert media files to different formats ( mp4 to mp3) or do I have to use a lambda function with a third party library to achieve this.
Thank you !
You can get up and running quickly with Elastic Transcoder. You will need to:
create two s3 buckets, your 'inbox' and 'outbox'
add a transcoder pipeline specifying which bucket is your in/out buckets, and you what file types you want to transcode from and two.
you can set up a trigger so that every time something hits the in bucket, the process runs, or you can place something in the in bucket and use the sdk or cli to trigger a job.
Two things to note:
When you fire a job, you have to pass in the name of the file that will be created. If the file already exists in the out bucket, an error will be thrown.
As with all of aws' complete services, you get a little free up front, then it gets expensive. Once you get the hang of it, you can save some money rolling your own in lambda like this

How to use AWS Elastic Transcoder on a bucket full of videos?

So I have an S3 bucket full of over 200GB of different videos. It would be very time consuming to manually set up jobs to transcode all of these.
How can I use either the web UI or aws cli to transcode all videos in this bucket at 1080p, replicating the same output path in a different bucket?
I also want any new videos added to the original bucket to be transcoded automatically immediately after upload.
I've seen some posts about Lambda functions, but I don't know anything about this.
A lambda function is just a temporary machine that runs some code.
The sample code in your link is what you are looking for as a solution. You can call your lambda function once for each item in the S3 bucket and kick off concurrent processing of the entire bucket.