We want to find out files uploaded/creation date of oldest object present in AWS S3 bucket.
Could you please suggest how we can get it.
You can use the AWS Command-Line Interface (CLI) to list objects sorted by a field:
aws s3api list-objects --bucket MY-BUCKET --query 'sort_by(Contents, &LastModified)[0].[Key,LastModified]' --output text
This gives an output like:
foo.txt 2021-08-17T21:53:46+00:00
See also: How to list recent files in AWS S3 bucket with AWS CLI or Python
I have an Amazon S3 bucket in AWS, and I have tried to list all the files in the bucket by:
aws s3 ls s3://bucket-1 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_1_files
This works for most files, but there are some files that are listed in bucket_1_files, but when I search for the file in the Amazon S3 bucket in the console, I cannot find the files (no matches returned when I search for the name) - would anyone know of possible reasons this could be the case? The file is a .png file, and there are other .png files listed that I can find within the console.
I think there is something wrong with the command I'm using, when I just do
aws s3 ls s3://bucket-1 --recursive
I am finding that a lot of these missing files actually have a "t" in front of them
Rather than playing with aws, sed and sort, you can list objects with:
aws s3api list-objects --bucket bucket-1 --query 'Contents[].[Key]' --output text
However, it will only return 1000 objects.
Terraform deploys an S3 bucket using a name prefix.
How do I get the full name of an S3 bucket by providing name prefix to the AWS CLI command?
aws s3api list-buckets --query 'Buckets[*].[Name]' --output text | grep "admin-passwords"
Using the above AWS CLI command and providing the name prefix/string/identifier to the grep command we can achieve the desired output.
Output:
admin-passwords202104262231251212001
You can use a combination of AWS command to list all the buckets and grep your string/identifier from the output of that command. Since all the buckets names are unique, it should work easily. For example -
aws s3 ls | grep "my-bucket-identifier"
Is there a way to not look through the entire bucket when searching for a filename?
We have millions of files so each search like this one takes minutes:
aws s3api list-objects --bucket XXX --query "Contents[?contains(Key, 'tokens.json')]"
I can also make the key contain the folder name, but that doesn't speed things up at all:
aws s3api list-objects --bucket XXX --query "Contents[?contains(Key, 'folder/tokens.json')]"
There is a prefix option. You have to use this option not the query syntax because the query is applied after the list object occurs. See the details in the documentation.
If you are regularly searching for objects within an Amazon S3 with a large number of objects, you could consider using Amazon S3 Inventory, which can provide a regular CSV listing of the objects in the bucket.
This stackoverflow answer helped a lot. However, I want to search for all PDFs inside a given bucket.
I click "None".
Start typing.
I type *.pdf
Press Enter
Nothing happens. Is there a way to use wildcards or regular expressions to filter bucket search results via the online S3 GUI console?
As stated in a comment, Amazon's UI can only be used to search by prefix as per their own documentation:
http://docs.aws.amazon.com/AmazonS3/latest/UG/searching-for-objects-by-prefix.html
There are other methods of searching but they require a bit of effort. Just to name two options, AWS-CLI application or Boto3 for Python.
I know this post is old but it is high on Google's list for s3 searching and does not have an accepted answer. The other answer by Harish is linking to a dead site.
UPDATE 2020/03/03: AWS link above has been removed. This is a link to a very similar topic that was as close as I could find. https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
AWS CLI search:
In AWS Console,we can search objects within the directory only but not in entire directories, that too with prefix name of the file only(S3 Search limitation).
The best way is to use AWS CLI with below command in Linux OS
aws s3 ls s3://bucket_name/ --recursive | grep search_word | cut -c 32-
Searching files with wildcards
aws s3 ls s3://bucket_name/ --recursive |grep '*.pdf'
You can use the copy function with the --dryrun flag:
aws s3 ls s3://your-bucket/any-prefix/ .\ --recursive --exclude * --include *.pdf --dryrun
It would show all of the files that are PDFs.
If you use boto3 in Python it's quite easy to find the files. Replace 'bucket' with the name of the bucket.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
for obj in bucket.objects.all():
if '.pdf' in obj.key:
print(obj.key)
The CLI can do this; aws s3 only supports prefixes, but aws s3api supports arbitrary filtering. For s3 links that look like s3://company-bucket/category/obj-foo.pdf, s3://company-bucket/category/obj-bar.pdf, s3://company-bucket/category/baz.pdf, you can run
aws s3api list-objects --bucket "company-bucket" --prefix "category/" --query "Contents[?ends-with(Key, '.pdf')]"
or for a more general wildcard
aws s3api list-objects --bucket "company-bucket" --prefix "category/" --query "Contents[?contains(Key, 'foo')]"
or even
aws s3api list-objects --bucket "company-bucket" --prefix "category/obj" --query "Contents[?ends_with(Key, '.pdf') && contains(Key, 'ba')]"
The full query language is described at JMESPath.
The documentation using the Java SDK suggests it can be done:
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html
Specifically the function listObjectsV2Result allows you to specify a prefix filter, e.g. "files/2020-01-02*" so you can only return results matching today's date.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsV2Result.html
My guess the files were uploaded from a unix system and your downloading to windows so s3cmd is unable to preserve file permissions which don't apply on NTFS.
To search for files and grab them try this from the target directory or change ./ to target:
for i in `s3cmd ls s3://bucket | grep "searchterm" | awk '{print $4}'`; do s3cmd sync --no-preserve $i ./; done
This works in WSL in windows.
I have used this in one of my project but its a bit of hard coding
import subprocess
bucket = "Abcd"
command = "aws s3 ls s3://"+ bucket + "/sub_dir/ | grep '.csv'"
listofitems = subprocess.check_output(command, shell=True,)
listofitems = listofitems.decode('utf-8')
print([item.split(" ")[-1] for item in listofitems.split("\n")[:-1]])