aws cli rm when wild cards are needed - amazon-web-services

It says in docs for aws cli that wild cards are not supported. You can use --include and --exclude options. But that could take a while when files structure is wide.
aws s3 rm s3://your-bucket/your-folder/year=2020/month=05/ --exclude "*" --include "*/provider=400/qk=0001" --include "*/provider=400/qk=0002" --include "*/provider=400/qk=0003" --include "*/provider=400/qk=0010" ...
So what are other options?

In shell terminal you can do next trick:
for i in `s3://your-bucket/your-folder/year=2020/month=05/day={01,02,03,04,05,06,07,08,09,10...}/provider=400/qk={0001,0002,0003,0010,...}; do aws s3 rm $i --recursive; done

Related

aws sync exclude not excluding all files

The below aws sync command does execute, but I can not seem to exclude the xxxx files as that have the --include pattern in them.
It will always be xxxx but I am trying to exclude them from the sync. Thank you :).
files in directory
xxxx.id.1.bam
xxxx.id.1.bai
aaa.id.1.bam
aaa.id.1.bai
bbb.bam
bbb.bai
desired
aaa.id.1.bam
aaa.id.1.bai
command
aws s3 sync . s3://bucket/ --exclude "*" --exclude "*xxxx" --include "*.id.1.bam" --include "*.id.1.bai" --dryrun
The order of --exclude and --include metters. It should be:
aws s3 sync . s3://bucket/ --exclude "*" --include "*.id.1.bam" --include "*.id.1.bai" --exclude "xxxx.*" --dryrun

How to copy multiple files matching name pattern to AWS S3 bucket using AWS CLI?

I would like to copy files matching a file name pattern from my machine to an AWS S3 bucket using AWS CLI. Using the standard unix file name wildcards does not work:
$ aws s3 cp *.csv s3://wesam-data/
Unknown options: file1.csv,file2.csv,file3.csv,s3://wesam-data/
I followed this SO answer addressing a similar problem that advises using the --exclude and --include filters as explained here as shown below without success.
$ aws s3 cp . s3://wesam-data/ --exclude "*" --include "*.csv"
Solution
$ aws s3 cp . s3://wesam-data/ --exclude "*" --include "*.csv" --recursive
Explanation
It turns out that I have to use the --recursive flag with the --include & --exclude flags since this is a multi-file operation.
The following commands are single file/object operations if no --recursive flag is provided.
cp
mv
rm

Deleting S3 files using AWS data pipeline

I want to delete all S3 keys starting with some prefix using AWS data Pipeline.
I am using AWS Shell Activity for this.
These are the argument
"scriptUri": "https://s3.amazonaws.com/my_s3_bucket/hive/removeExitingS3.sh",
"scriptArgument": "s3://my_s3_bucket/output/2017-03-19",
I want to delete all S3 keys starting with 2017-03-19 in output folder. What should be command to do this?
I have tried this command in .sh file
sudo yum -y upgrade aws-cli
aws s3 rm $1 --recursive
This is not working.
Sample files are
s3://my_s3_bucket/output/2017-03-19/1.txt
s3://my_s3_bucket/output/2017-03-19/2.txt
s3://my_s3_bucket/output/2017-03-19_3.txt
EDIT:
The date(2017-03-19) is dynamic and this is output of #{format(#scheduledStartTime,"YYYY-MM-dd")}. So effectively
"scriptArgument": "s3://my_s3_bucket/output/{format(#scheduledStartTime,"YYYY-MM-dd")}"
Try
aws s3 rm $1 --recursive --exclude "*" --include "2017-03-19*" --include "2017-03-19/*"
with
"scriptArgument": "s3://my_s3_bucket/output/"
EDIT:
As the date is a dynamic param, pass it as the second scriptArgument to the Shell command activity,
aws s3 rm $1 --recursive --exclude "*" --include "$2*" --include "$2/*"

Glob pattern with amazon s3

I want to move files from one s3 bucket to another s3 bucket.I want to move only files whose name starts with "part".I can do it by using java.But is it possible to do it with amazon CLI. Can we use GlobPattern in CLI.
my object name are like:
part0000
part0001
Yes, this is possible through the aws CLI, using the --include and --exclude options.
As an example, you can use the aws s3 sync command to sync your part files:
aws s3 sync --exclude '*' --include 'part*' s3://my-amazing-bucket/ s3://my-other-bucket/
You can also use the cp command, with the --recursive flag:
aws s3 cp --recursive --exclude '*' --include 'part*' s3://my-amazing-bucket/ s3://my-other-bucket/
Explanation:
aws: The aws CLI command
s3: The aws service to interface with
sync: The command to the service to do
--exclude <value>: The UNIX-style wildcard to ignore, except by include statements
--include <value>: The UNIX-style wildcard to act upon.
As noted in the documentation, you can also specify --include and --exclude multiple times.

AWS S3 `--exclude` being ignored

I can't make --exclude work on AWS S3. Neither of the three versions of the commands work. No matter how I exclude the directories, they are still being uploaded.
root#taurus [/]# aws s3 sync / s3://server.taurus --exclude "disk3/*" --exclude "backup/*"
root#taurus [/]# aws s3 sync / s3://server.taurus --exclude 'disk3/*' --exclude 'backup/'
root#taurus [/]# aws s3 sync / s3://server.taurus --exclude 'disk3/' --exclude 'backup/'
Please see my AWS CLI version below.
root#taurus [/]# aws --version
aws-cli/1.10.14 Python/2.6.6 Linux/2.6.32-531.29.2.lve1.3.11.1.el6.x86_64.debug botocore/1.4.5
root#taurus [/]#
What could be wrong?
From the AWS Command-Line Interface (CLI) documentation for the sync command:
--include (string) Don't exclude files or objects in the command that match the specified pattern. See Use of Exclude and Include Filters for details.
--exclude (string) Exclude all files or objects from the command that matches the specified pattern.
So (strange as it may seem), you must must specify objects to --include AND objects to --exclude. Using --include * is acceptable.
Specifying --exclude on its own will not match any files.