Deleting S3 files using AWS data pipeline - amazon-web-services

I want to delete all S3 keys starting with some prefix using AWS data Pipeline.
I am using AWS Shell Activity for this.
These are the argument
"scriptUri": "https://s3.amazonaws.com/my_s3_bucket/hive/removeExitingS3.sh",
"scriptArgument": "s3://my_s3_bucket/output/2017-03-19",
I want to delete all S3 keys starting with 2017-03-19 in output folder. What should be command to do this?
I have tried this command in .sh file
sudo yum -y upgrade aws-cli
aws s3 rm $1 --recursive
This is not working.
Sample files are
s3://my_s3_bucket/output/2017-03-19/1.txt
s3://my_s3_bucket/output/2017-03-19/2.txt
s3://my_s3_bucket/output/2017-03-19_3.txt
EDIT:
The date(2017-03-19) is dynamic and this is output of #{format(#scheduledStartTime,"YYYY-MM-dd")}. So effectively
"scriptArgument": "s3://my_s3_bucket/output/{format(#scheduledStartTime,"YYYY-MM-dd")}"

Try
aws s3 rm $1 --recursive --exclude "*" --include "2017-03-19*" --include "2017-03-19/*"
with
"scriptArgument": "s3://my_s3_bucket/output/"
EDIT:
As the date is a dynamic param, pass it as the second scriptArgument to the Shell command activity,
aws s3 rm $1 --recursive --exclude "*" --include "$2*" --include "$2/*"

Related

aws s3 cp multiple files in one command

I am copying multiple files to s3 using one command
[ec2-user#ip-172-31-38-250 ~]$ ls *.rpm
1.rpm
2.rpm
3.rpm
how do I copy these 3 rpms in 1 aws cli command?
I tried
aws s3 cp *.rpm s3://mybucket1/rpms/
I got an error
Unknown options: 1.rpm, 2.rpm, 3.rpm,s3://mybucket1/rpms/
You can use filters:
aws s3 cp . s3://mybucket1/rpms/ --recursive --exclude "*" --include "*.rpm"
or with sync:
aws s3 sync . s3://mybucket1/rpms/ --exclude "*" --include "*.rpm"

How to get files to copy from S3 bucket

Need some help with cp command in AWS CLI. I am trying to copy files from S3 bucket to a local folder. The command I used seems to have run successfully in Powershell, but the folder is still empty.
Command:
aws s3 cp s3://<my bucket path> <my local destination> --exclude "*" --include "*-20201023*" --recursive --dryrun
The --dryrun parameter prohibits the command from actually copying anything. It just shows you what would happen. Try removing that parameter and running the command.

aws cli rm when wild cards are needed

It says in docs for aws cli that wild cards are not supported. You can use --include and --exclude options. But that could take a while when files structure is wide.
aws s3 rm s3://your-bucket/your-folder/year=2020/month=05/ --exclude "*" --include "*/provider=400/qk=0001" --include "*/provider=400/qk=0002" --include "*/provider=400/qk=0003" --include "*/provider=400/qk=0010" ...
So what are other options?
In shell terminal you can do next trick:
for i in `s3://your-bucket/your-folder/year=2020/month=05/day={01,02,03,04,05,06,07,08,09,10...}/provider=400/qk={0001,0002,0003,0010,...}; do aws s3 rm $i --recursive; done

Logrotate Postrotate aws s3 Wildcards

I am trying to rotate a bunch of log files and upload them to S3 with the postrotate command.
However, it appears that the postrotate script is not expanding the * glob wildcard:
My logrotate configuration:
/var/log/application/*.log {
missingok
dateext
size 500M
notifempty
copytruncate
compress
rotate 1512
postrotate
/usr/bin/aws s3 mv /var/log/application/*.gz s3://mygreatbucket/
endscript
}
The error I see when running logrotate with that configuration:
The user-provided path /var/log/application/*.gz does not exist.
This is a message from aws cli s3 command. Which I can replicate if I manually run my command:
/usr/bin/aws s3 mv '/var/log/application/*.gz' s3://mygreatbucket
(note the single quotes).
What can I do so that the glob wildcard is expanded during the postrotate step?
The AWS cli documentation states that their CLI tool does directly support glob wildcards. Instead you should use --include or --exclude parameters.
I ended up using:
/usr/bin/aws s3 mv /var/log/application/ s3://mybucket --exclude '*' --include '*.gz' --recursive
The --recursive flag is important, otherwise it won't work.

How to copy multiple files matching name pattern to AWS S3 bucket using AWS CLI?

I would like to copy files matching a file name pattern from my machine to an AWS S3 bucket using AWS CLI. Using the standard unix file name wildcards does not work:
$ aws s3 cp *.csv s3://wesam-data/
Unknown options: file1.csv,file2.csv,file3.csv,s3://wesam-data/
I followed this SO answer addressing a similar problem that advises using the --exclude and --include filters as explained here as shown below without success.
$ aws s3 cp . s3://wesam-data/ --exclude "*" --include "*.csv"
Solution
$ aws s3 cp . s3://wesam-data/ --exclude "*" --include "*.csv" --recursive
Explanation
It turns out that I have to use the --recursive flag with the --include & --exclude flags since this is a multi-file operation.
The following commands are single file/object operations if no --recursive flag is provided.
cp
mv
rm