AWS CLI filename search: only look inside a specific folder? - amazon-web-services

Is there a way to not look through the entire bucket when searching for a filename?
We have millions of files so each search like this one takes minutes:
aws s3api list-objects --bucket XXX --query "Contents[?contains(Key, 'tokens.json')]"
I can also make the key contain the folder name, but that doesn't speed things up at all:
aws s3api list-objects --bucket XXX --query "Contents[?contains(Key, 'folder/tokens.json')]"

There is a prefix option. You have to use this option not the query syntax because the query is applied after the list object occurs. See the details in the documentation.

If you are regularly searching for objects within an Amazon S3 with a large number of objects, you could consider using Amazon S3 Inventory, which can provide a regular CSV listing of the objects in the bucket.

Related

AWS S3 old object creation date

We want to find out files uploaded/creation date of oldest object present in AWS S3 bucket.
Could you please suggest how we can get it.
You can use the AWS Command-Line Interface (CLI) to list objects sorted by a field:
aws s3api list-objects --bucket MY-BUCKET --query 'sort_by(Contents, &LastModified)[0].[Key,LastModified]' --output text
This gives an output like:
foo.txt 2021-08-17T21:53:46+00:00
See also: How to list recent files in AWS S3 bucket with AWS CLI or Python

How to copy subset of files from one S3 bucket folder to another by date

I have a bucket in AWS S3. There are two folders in the bucket - folder1 & folder2. I want to copy the files from s3://myBucket/folder1 to s3://myBucket/folder2. But there is a twist: I ONLY want to copy the items in folder1 that were created after a certain date. I want to do something like this:
aws s3 cp s3://myBucket/folder1 s3://myBucket/folder2 --recursive --copy-source-if-modified-since
2020-07-31
There is no aws-cli command that will do this for you in a single line. If the number of files is relatively small, say a hundred thousands or fewer I think it would be easiest to write a bash script, or use your favourite language's AWS SDK, that lists the first folder, filters on creation date and issues the copy commands.
If the number of files is large you can create an S3 Inventory that will give you a listing of all the files in the bucket, which you can download and generate the copy commands from. This will be cheaper and quicker than listing when there are lots and lots of files.
Something like this could be a start, using #jarmod's suggestion about --copy-source-if-modified-since:
for key in $(aws s3api list-objects --bucket my-bucket --prefix folder1/ --query 'Contents[].Key' --output text); do
relative_key=${key/folder1/folder2}
aws s3api copy-object --bucket my-bucket --key "$relative_key" --source-object "my-bucket/$key" --copy-source-if-modified-since THE_CUTOFF_DATE
done
It will copy each object individually, and it will be fairly slow if there are lots of objects, but it's at least somewhere to start.

Extract Links Within Specific Folder in AWS S3 Buckets

I am trying to get my AWS S3 API to list objects that I have stored in my S3 buckets. I have successfully used the code below to pull some of the links from my S3 buckets.
aws s3api list-objects --bucket my-bucket --query Contents[].[Key] --output text
The problem is the output in my command prompt is not listing the entire S3 Bucket inventory list. Is it possible to alter this code so that the output on my CLI lists the full inventory? If not, is there a way to alter the code to target specific file names within the S3 Bucket? For example, all the file names in my bucket are dates, so I would try and pull all the links from the file titled 3_15_20 Videos within the "my-bucket" bucket. Thanks in advance!
From list-objects — AWS CLI Command Reference:
list-objects is a paginated operation. Multiple API calls may be issued in order to retrieve the entire data set of results. You can disable pagination by providing the --no-paginate argument.
Therefore, try using --no-paginate and see whether it returns all objects.
If you are regularly listing a bucket that contains a huge number of objects, you could also consider using Amazon S3 Inventory, which can provide a daily CSV file listing the contents of a bucket.

how can I download selective date range files from S3 bucket based on given date range like 08th aug to 15 Aug using AWS CLI?

I am able to filter a particular date's data but not the date range data.
Like 12-09-2019 to 15-09-2019 using AWS CLI
eg. to filter 2019 year's data i am using
--recursive --exclude "*" --include "2019"
You will need to use the s3api to handle the query which uses the JMESPath syntax
aws s3api list-objects-v2 --bucket BUCKET --query "Contents[?(LastModified>='2019-09-12' && LastModified<='2019-09-15')].Key"
You can also specify time as well
aws s3api list-objects-v2 --bucket BUCKET --query "Contents[?(LastModified>='2019-09-12T12:00:00.00Z' && LastModified<='2019-09-15T12:00:00.00Z')].Key"
The downside to this approach is that it must list every object and perform the query. For large buckets if you can limit to a prefix it will speed up your search.
aws s3api list-objects-v2 --bucket BUCKET --prefix PREFIX --query "Contents[?(LastModified>='2019-09-12T12:00:00.00Z' && LastModified<='2019-09-15T12:00:00.00Z')].Key"
And if your primary lookup is by date then look to store the objects in date/time sort order as you can use the prefix option to speed up your searches. A couple of examples.
prefix/20190615T041019Z.json.gz
2019/06/15/T041019Z.json.gz
This will

Using AWS CLI to query file names inside folders?

Our bucket structure goes from MyBucket -> CustomerGUID(folder) -> [actual files]
I'm having a hell of a time trying to use the AWS CLI (on windows) --query option to try and locate a file across all of the customer folders. Can someone look at my --query and see what i'm doing wrong here? Or tell me the proper way to search for a specific file name?
This is an example of how i'm able to list ALL the files in the bucket LastModified by a date.
I need to limit the output based on filename, and that is where i'm getting stuck. When I look at the individual files in S3, I can see other files have a "Key", is the Key the 'name' of the file?
See Photo
aws s3 ls s3://mybucket --recursive --output text --query "Contents[?contains(LastModified) > '2018-12-8']"
The aws s3 ls command only returns a text list of objects.
If you wish to use --query, then use: aws s3api list-objects
See: list-objects — AWS CLI Command Reference