How to delete a bucket starting with a name using aws cli? - amazon-web-services

Is there any command of aws cli to delete all buckets starting with a specific name?
When I ran:
aws s3 ls
I got so many buckets, we need to frequently cleanup. If got any command, I can create a pipeline for it in gitlab and use that to cleanup.
Like you observe, I want to delete all the buckets starting with somename-
I tried using
aws s3 rb --force s3://somename-*
It didn't work.

There's no built in way to accomplish this. On Unix-like platforms, you can list all of the buckets and use some tools to filter the list and call in the CLI to remove all of the buckets that match some pattern:
aws s3api list-buckets --query 'Buckets[].[Name]' --output text | grep "^somename-" | xargs -n1 -IB echo aws s3 rb s3://B
Remove the "echo " after verifying the command will remove the buckets you want removed.

Related

How do I get the full name of an S3 bucket which contains a given string using AWS CLI?

Terraform deploys an S3 bucket using a name prefix.
How do I get the full name of an S3 bucket by providing name prefix to the AWS CLI command?
aws s3api list-buckets --query 'Buckets[*].[Name]' --output text | grep "admin-passwords"
Using the above AWS CLI command and providing the name prefix/string/identifier to the grep command we can achieve the desired output.
Output:
admin-passwords202104262231251212001
You can use a combination of AWS command to list all the buckets and grep your string/identifier from the output of that command. Since all the buckets names are unique, it should work easily. For example -
aws s3 ls | grep "my-bucket-identifier"

How to determine s3 folder size?

I have a bunch of s3 folders for different projects/clients and I would like to estimate total size (so I can for instance consider reducing sizes/cost). What is a good way to determine this?
I can do this with a combination of Python and the AWS client:
import os
bucket_rows = os.popen('aws s3 ls').split(chr(10))
sizes = dict()
for bucket in bucket_rows:
buck = bucket.split(' ')[-1] # the full row contains additional information
cmd = f"aws s3 ls --summarize --human-readable --recursive s3://{buck}/ | grep 'Total'"
sizes[buck] = os.popen(cmd).read()
As stated here AWS CLI natively supports --query parameter with can determine the size of every object in S3 bucket.
aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"
I hope it helps.
If you want to check via console.
If you mention a folder by folder and not bucket, then just select that object go to "action" drop down and select "Get total size"
If you mean bucket by folder , then go to management tab and in that go to metrics it will show entire bucket size
This would do the magic 🚀
for bucket_name in `aws s3 ls | awk '{print $3}'`; do
echo "$bucket_name"
aws s3 ls s3://$bucket_name --recursive --summarize | tail -n2
done

AWS CLI Download list of S3 files

We have ~400,000 files on a private S3 bucket that are inbound/outbound call recordings. The files have a certain pattern to it that lets me search for numbers both inbound and outbound. Note these calls are on the Glacier storage class
Using AWS CLI, I can search through this bucket and grep the files I need out. What I'd like to do is now initiate an S3 restore job to expedited retrieval (so ~1-5 minute recovery time), and then maybe 30 minutes later run a command to download the files.
My efforts so far:
aws s3 ls s3://exetel-logs/ --recursive | grep .*042222222.* | cut -c 32-
Retreives the key of about 200 files. I am unsure of how to proceed next, as aws s3 cp wont work for any objects in storage class.
Cheers,
The AWS CLI has two separate commands for S3: s3 ands3api. s3 is a high level abstraction with limited features, so for restoring files, you'll have to use one of the commands available with s3api:
aws s3api restore-object --bucket exetel-logs --key your-key
If you afterwards want to copy the files, but want to ensure to only copy files which were restored from Glacier, you can use the following code snippet:
for key in $(aws s3api list-objects-v2 --bucket exetel-logs --query "Contents[?StorageClass=='GLACIER'].[Key]" --output text); do
if [ $(aws s3api head-object --bucket exetel-logs --key ${key} --query "contains(Restore, 'ongoing-request=\"false\"')") == true ]; then
echo ${key}
fi
done
Have you considered using a high-level language wrapper for the AWS CLI? It will make these kinds of tasks easier to integrate into your workflows. I prefer the Python implementation (Boto 3). Here is example code for how to download all files from an S3 bucket.

AWS S3 download and copy

We have a bucket in AWS S3 where backups from production are being copy to.
My task is to copy the most recent backup file from AWS S3 to the local sandbox SQL Server, then do the restore.
I have installed all of the AWS tools for windows on the local server. Credentials to connect to AWS S3 work, etc.
My local server can list all of the files in the AWS S3 bucket. I can successfully download a single file if I specifically name that file.
Here is an example of that working pulling the most recent copy from July 25, 2016.
aws s3 cp s3://mybucket/databasefile_20160725.zip E:\DBA
My goal is to have a copy script that only pulls the most recent file, which I won't know the name of. I want to schedule this.
Nothing I google or try is getting me the correct syntax to do this.
to retrieve the latest file in your bucket you can do the following
aws s3api list-objects --bucket "mybucket" |\
jq '.Contents | sort_by(.LastModified) | .[-1].Key' --raw-output
The first command will list the objects of your bucket in Json, the elements of the JSon are listed here
Then you want to sort the element from their last modified date, take the last element, and you want they Key (i.e. name of the file in bucket). Adding the --raw-output flag to stripe quotes from the key name
You can reuse that in script or pipe it with the s3 cp command like below
aws s3api list-objects --bucket "mybucket" |\
jq '.Contents | sort_by(.LastModified) | .[-1].Key' --raw-output |\
xargs -I {} aws s3 cp s3://mybucket/{} E:\DBA

How to search an Amazon S3 Bucket using Wildcards?

This stackoverflow answer helped a lot. However, I want to search for all PDFs inside a given bucket.
I click "None".
Start typing.
I type *.pdf
Press Enter
Nothing happens. Is there a way to use wildcards or regular expressions to filter bucket search results via the online S3 GUI console?
As stated in a comment, Amazon's UI can only be used to search by prefix as per their own documentation:
http://docs.aws.amazon.com/AmazonS3/latest/UG/searching-for-objects-by-prefix.html
There are other methods of searching but they require a bit of effort. Just to name two options, AWS-CLI application or Boto3 for Python.
I know this post is old but it is high on Google's list for s3 searching and does not have an accepted answer. The other answer by Harish is linking to a dead site.
UPDATE 2020/03/03: AWS link above has been removed. This is a link to a very similar topic that was as close as I could find. https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
AWS CLI search:
In AWS Console,we can search objects within the directory only but not in entire directories, that too with prefix name of the file only(S3 Search limitation).
The best way is to use AWS CLI with below command in Linux OS
aws s3 ls s3://bucket_name/ --recursive | grep search_word | cut -c 32-
Searching files with wildcards
aws s3 ls s3://bucket_name/ --recursive |grep '*.pdf'
You can use the copy function with the --dryrun flag:
aws s3 ls s3://your-bucket/any-prefix/ .\ --recursive --exclude * --include *.pdf --dryrun
It would show all of the files that are PDFs.
If you use boto3 in Python it's quite easy to find the files. Replace 'bucket' with the name of the bucket.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
for obj in bucket.objects.all():
if '.pdf' in obj.key:
print(obj.key)
The CLI can do this; aws s3 only supports prefixes, but aws s3api supports arbitrary filtering. For s3 links that look like s3://company-bucket/category/obj-foo.pdf, s3://company-bucket/category/obj-bar.pdf, s3://company-bucket/category/baz.pdf, you can run
aws s3api list-objects --bucket "company-bucket" --prefix "category/" --query "Contents[?ends-with(Key, '.pdf')]"
or for a more general wildcard
aws s3api list-objects --bucket "company-bucket" --prefix "category/" --query "Contents[?contains(Key, 'foo')]"
or even
aws s3api list-objects --bucket "company-bucket" --prefix "category/obj" --query "Contents[?ends_with(Key, '.pdf') && contains(Key, 'ba')]"
The full query language is described at JMESPath.
The documentation using the Java SDK suggests it can be done:
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html
Specifically the function listObjectsV2Result allows you to specify a prefix filter, e.g. "files/2020-01-02*" so you can only return results matching today's date.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsV2Result.html
My guess the files were uploaded from a unix system and your downloading to windows so s3cmd is unable to preserve file permissions which don't apply on NTFS.
To search for files and grab them try this from the target directory or change ./ to target:
for i in `s3cmd ls s3://bucket | grep "searchterm" | awk '{print $4}'`; do s3cmd sync --no-preserve $i ./; done
This works in WSL in windows.
I have used this in one of my project but its a bit of hard coding
import subprocess
bucket = "Abcd"
command = "aws s3 ls s3://"+ bucket + "/sub_dir/ | grep '.csv'"
listofitems = subprocess.check_output(command, shell=True,)
listofitems = listofitems.decode('utf-8')
print([item.split(" ")[-1] for item in listofitems.split("\n")[:-1]])