Delete files from folder in S3 bucket - amazon-web-services

I have an AWS S3 bucket test-bucket with a data folder. The data folder will have multiple files.
I am able to delete the files in the S3 bucket.
But what I want is to delete the files in the data folder without deleting the folder.
I tried the following:
aws s3 rm s3://test-bucket/data/*
Also checked using --recursive option, but that does not work.
Is there a way I can delete the files in the folder using AWS CLI?

Following aws cli command worked:
aws s3 rm s3://test-bucket --recursive --exclude="*" --include="data/*.*"

you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.
this aws cli commands should work:
aws s3 rm s3://<your_bucket_name> --exclude "*" --include "<your_regex>"
if you want to include sub-folders you should add the flag --recursive
or with unix commands:
aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I% <your_os_shell> -c 'aws s3 rm s3:// <your_bucket_name>/% $1'
explanation:
list all files on the bucket --pipe-->
get the 4th parameter(its the file name) --pipe-->
run delete script with aws cli

aws s3 rm s3://bucket/folder1/folder2/ --recursive --dryrun
From what I see happening when I try it, adding the slash at the end means delete below folder2, not including folder2.

We can remove all files including sub-folders as well from AWS-S3-BUCKET
In Node.js by using the below function.
the same command can be used in AWS-CLI by configuring AWS in the command-line.
function removeAllFilesFromBucket(){
const S3_REGION = "eu-west-1";
const S3_BUCKET_NAME = "sample-staging";
let filePathBucket = S3_BUCKET_NAME+'/assets/videos';
let awsS3ShellCommand = 'aws s3 rm s3://'+filePathBucket+' --region '+S3_REGION+' --recursive';
var { exec } = require('child_process');
exec(awsS3ShellCommand, (err, stdout, stderr) => {
if (err) {
console.log('err',err);
return;
}else{
console.log('Bucket files and sub-folders Deleted successfully !!!');
console.log(`stdout: ${stdout}`);
console.log(`stderr: ${stderr}`);
}
});
}

Related

How can i download specified file from s3 bucket

I'm trying to download one file from my s3 bucket
I'm trying this command:
aws s3 sync %inputS3path% %inputDatapath% --include "20211201-1500-euirluclprd01-olX8yf.1.gz"
and I habve also tried_
aws s3 sync %inputS3path% %inputDatapath% --include "*20211201-1500-euirluclprd01-olX8yf.1*.gz"
but when command is executing, I'm get all file that's include folder
Folder looks like :
/2021/12/05
20211201-1500-euirluclprd01-olX8yf.1.gz
20211201-1505-euirluclprd01-olX8yf.1.gz
You can use aws s3 cp to copy a specific file. For example:
aws s3 cp s3://bucketname/path/file.gz .
Looking at your variables, you could probably use:
aws s3 cp %inputS3path%/20211201-1500-euirluclprd01-olX8yf.1.gz %inputDatapath%

Put files already in S3 in a folder

I used aws s3 sync myfolder my-bucket to copy files from EC2 to a bucket in s3. Now, they're loose (not in a folder). The usual mkdir and mv commands don't seem to be available - how can I make a folder and put my files into it?
One would be to to sync again, but this time with the folder name which you want (it will be created automatically):
aws s3 sync myfolder s3://my-bucket/my-folder-on-s3
The other way would be mv with --recursive option:
aws s3 mv s3://my-bucket s3://my-bucket/my-folder-on-s3 --recursive
The syntax to copy them to S3 would be:
aws s3 sync localdir s3://my-bucket
If you wish to move files around an Amazon S3 bucket, use:
aws s3 mv s3://my-bucket/object1.txt s3://my-bucket/folder1/

Deleting S3 files using AWS data pipeline

I want to delete all S3 keys starting with some prefix using AWS data Pipeline.
I am using AWS Shell Activity for this.
These are the argument
"scriptUri": "https://s3.amazonaws.com/my_s3_bucket/hive/removeExitingS3.sh",
"scriptArgument": "s3://my_s3_bucket/output/2017-03-19",
I want to delete all S3 keys starting with 2017-03-19 in output folder. What should be command to do this?
I have tried this command in .sh file
sudo yum -y upgrade aws-cli
aws s3 rm $1 --recursive
This is not working.
Sample files are
s3://my_s3_bucket/output/2017-03-19/1.txt
s3://my_s3_bucket/output/2017-03-19/2.txt
s3://my_s3_bucket/output/2017-03-19_3.txt
EDIT:
The date(2017-03-19) is dynamic and this is output of #{format(#scheduledStartTime,"YYYY-MM-dd")}. So effectively
"scriptArgument": "s3://my_s3_bucket/output/{format(#scheduledStartTime,"YYYY-MM-dd")}"
Try
aws s3 rm $1 --recursive --exclude "*" --include "2017-03-19*" --include "2017-03-19/*"
with
"scriptArgument": "s3://my_s3_bucket/output/"
EDIT:
As the date is a dynamic param, pass it as the second scriptArgument to the Shell command activity,
aws s3 rm $1 --recursive --exclude "*" --include "$2*" --include "$2/*"

Selective file download in AWS CLI

I have files in S3 bucket. I was trying to download files based on a date, like 08th aug, 09th Aug etc.
I used the following code, but it still downloads the entire bucket:
aws s3 cp s3://bucketname/ folder/file \
--profile pname \
--exclude \"*\" \
--recursive \
--include \"" + "2015-08-09" + "*\"
I am not sure, how to achieve this. How can I download selective date file?
This command will copy all files starting with 2015-08-15:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2015-08-15*" --recursive
If your goal is to synchronize a set of files without copying them twice, use the sync command:
aws s3 sync s3://BUCKET/ folder
That will copy all files that have been added or modified since the previous sync.
In fact, this is the equivalent of the above cp command:
aws s3 sync s3://BUCKET/ folder --exclude "*" --include "2015-08-15*"
References:
AWS CLI s3 sync command documentation
AWS CLI s3 cp command documentation
Bash Command to copy all files for specific date or month to current folder
aws s3 ls s3://bucketname/ | grep '2021-02' | awk '{print $4}' | aws s3 cp s3://bucketname/{} folder
Command is doing the following thing
Listing all the files under a bucket
Filtering out all the files of 2021-02 i.e. all files of feb month of 2021
Filtering out only the name of them
running command aws s3 cp on specific files
In case your bucket size is large in the upwards of 10 to 20 gigs,
this was true in my own personal use case, you can achieve the same
goal by using sync in multiple terminal windows.
All the terminal sessions can use the same token, in case you need to generate a token for prod environment.
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date1*" --profile UR_AC_SomeName
and another terminal window (same pwd)
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date2*" --profile UR_AC_SomeName
and another two for "name_date3*" and "name_date4*"
Additionally, you can also do multiple excludes in the same sync
command as in:
$ aws s3 sync s3://bucket-name/sub-name/another-name my-local-path/
--exclude="*.log/*" --exclude=img --exclude=".error" --exclude=tmp
--exclude="*.cache"
This Bash Script will copy all files from one bucket to another by modified-date using aws-cli.
aws s3 ls <BCKT_NAME> --recursive | sort | grep "2020-08-*" | cut -b 32- > a.txt
Inside Bash File
while IFS= read -r line; do
aws s3 cp s3://<SRC_BCKT>/${line} s3://<DEST_BCKT>/${line} --sse AES256
done < a.txt
aws cli is really slow at this. I waited hours and nothing really happened. So I looked for alternatives.
https://github.com/peak/s5cmd worked great.
supports globs, for example:
s5cmd -numworkers 30 cp 's3://logs-bucket/2022-03-30-19-*' .
is really blazing fast, so you can work with buckets that have s3 access logs without much fuss.

Recursive list s3 bucket contents with AWS CLI

How can I recursively list all all the contents of a bucket using the AWS CLI similar to using find . on Unix.
aws s3 ls s3://MyBucket --recursive complains with unknown option.
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations claims that --recursive is a valid parameter.
aws s3 ls s3://MyBucket --recursive works fine for me.
Try updating your AWS CLI. My version is aws-cli/1.6.2
aws --version
With recent AWS CLI versions, --recursive option is supported.
You can list recursively all the files under a bucket named MyBucket using following command:
aws s3 ls s3://MyBucket/ --recursive
You can list recursively all the files under a folder named MyFolder in the bucket, using following command:
aws s3 ls s3://MyBucket/MyFolder/ --recursive
As #Francisco Cardoso said, the final / is very important. It allows to list the content of the folder instead of the folder itself
For more information, see: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
I am not able to interpret the link you referred properly: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations
However, I was able to make --recursive option work with respect to this link: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#single-local-file-and-s3-object-operations
as per this link, cp, mv and rm supports --recursive option.
The one that you are trying is ls.
I tried using cp and rm with --recursive option and it is working fine.
You can not list recursively all the contents of a bucket via -
aws s3 ls s3://MyBucket
To list object from a folder you need to execute command as -
aws s3 ls s3://MyBucket/MyFolder/
This above command lists object that reside inside folder named MyFolder.
To get an objects list from such a logical hierarchy from Amazon S3, you need specify the full key name for the object in the GET operation.
--recursive Command is performed on allfiles or objects under the specified directory or prefix.
Thanks
Below one line bash script is able to perform:- how to list all S3 buckets with their objects recursively, list bucket name and count objects also.
/usr/bin/sudo /usr/local/bin/aws s3 ls |awk '{print $NF}'| while read l;do echo -e "#######---$l objects---##########\n\n";/usr/bin/sudo /usr/local/bin/aws s3 ls $l|nl;done