AWS S3 CLI creating multiple directories when using mv command - amazon-web-services

I am trying to move S3 bucket files from one folder to an archive folder in the same S3 bucket and I am using mv command to do this. While moving I want to exclude the movement of files in the archive folder.
I am using the following command
aws s3 mv s3://mybucket/incoming/ s3://mybucket/incoming/archive/ --recursive --exclude incoming/archive/" --include "*.csv"
but this command is moving the files but also creating multiple hierarchical archive folder when running multiple times
so,
1st run - files moved from /mybucket/incoming/ to
/mybucket/incoming/archive/
2nd run - new files moved from
/mybucket/incoming/ to /mybucket/incoming/archive/archive/
3rd run -
new files moved from /mybucket/incoming/ to
/mybucket/incoming/archive/archive/archive/
4th run - new files
moved from /mybucket/incoming/ to
/mybucket/incoming/archive/archive/archive/archive/
Can someone suggest/advise what exactly I am doing wrong here?

Use:
aws s3 mv s3://bucket/incoming/ s3://bucket/incoming/archive/ --recursive --include "*.csv" --exclude "archive/*"
The order of include/exclude is important, and the references are relative to the path given.

Related

Delete files older than 30 days under S3 bucket recursively without deleting folders using PowerShell

I can delete files and exclude folders with following script
aws s3 rm s3://my-bucket/ --recursive --exclude="*" --include="*/*.*"
when i tried to add pipe to delete only older files, i'm unable to.. please help with the script.
aws s3 rm s3://my-bucket/ --recursive --exclude="*" --include="*/*.*" | Where-Object {($_.LastModified -lt (Get-Date).AddDays(-31))}
The approach should be to list the files you need, then pipe the results to a delete call (a reverse of what you have). This might be better managed by a full blown script rather than a one line shell command. There's an article on this and some examples here.
Going forward, you should let S3 versioning take care of this, then you don't have to manage a script or remember to run it. Note: it'll only work with files that are added after versioning has been enabled.

Download list of specific files from AWS S3 using CLI

I am trying to download only specific files from AWS. I have the list of file URLs. Using the CLI I can only download all files in a bucket using the --recursive command, but I only want to download the files in my list. Any ideas on how to do that?
This is possibly a duplicate of:
Selective file download in AWS S3 CLI
You can do something along the lines of:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2018-02-06*" --recursive
https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
Since you have the s3 urls already in a file (say file.list), like -
s3://bucket/file1
s3://bucket/file2
You could download all the files to your current working directory with a simple bash script -
while read -r line;do aws s3 cp "$line" .;done < test.list
People, I found out a quicker way to do it: https://stackoverflow.com/a/69018735
WARNING: "Please make sure you don't have an empty line at the end of your text file".
It worked here! :-)

AWS S3 Bucket upload all only zip files

I'm trying to upload all my zip files in folder to my s3 bucket using this command
aws s3 cp recursive s3://<bucket-name>/%date:~4,2%-%date:~7,2%-%date:~10,4% --
recursive --include="*.zip" --exclude="*" --exclude="*/*/*"
the exclude only works in files but not in directory so my all my directory with zip files inside still uploading. Is there a way to upload only the zip files and exclude all kinds of other files and directories without specifying the name of directory/files.
https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters
When there are multiple filters, the rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.
Had a similar issue, turns out you need to put exclude="*" first.
aws s3 cp recursive s3://<bucket-name>/%date:~4,2%-%date:~7,2%-%date:~10,4% --
recursive --exclude="*" --exclude="*/*/*" --include="*.zip"
Should work

AWS S3: How to delete all contents of a directory in a bucket but not the directory itself?

I have an AWS S3 bucket entitled static.mysite.com
This bucket contains a directory called html
I want to use the AWS Command Line Interface to remove all contents of the html directory, but not the directory itself. How can I do it?
This command deletes the directory too:
aws s3 rm s3://static.mysite.com/html/ --recursive
I don't see the answer to this question in the manual entry for AWS S3 rm.
Old question, but I didn't see the answer here. If you have a use case to keep the 'folder' prefix, but delete all the files, you can use --exclude with an empty match string. I found the --exclude "." and --exclude ".." options do not prevent the folder from being deleted. Use this:
aws s3 rm s3://static.mysite.com/html/ --recursive --exclude ""
I just want to confirm how the folders were created...
If you created the "subA" folder manually and then deleted the suba1 folder, you should find that the the "subA" folder remains. When you create a folder manually, you are actually creating a folder "object" which is similar to any other file/object that you upload to S3.
However, if a file was uploaded directly to a location in S3 (when the "subA" and "suba1" folder don't exist yet) you'll find that the "subA" and "suba1" folders are created automatically. You can do this using something like the AWS CLI tool e.g:
aws s3 cp file1.txt s3://bucket/subA/suba1/file1.txt
If you now delete file1.txt, there will no longer be any objects within the "subA" folder and you'll find that the "subA" and "suba1" folders no longer exist.
If another file (file2.txt) was uploaded to the path "bucket/subA/file2.txt", and you deleted file1.txt (from the previous example) you'll find that the "subA" folder remains and the "suba1" folder disappears.
https://forums.aws.amazon.com/thread.jspa?threadID=219733
aws s3 rm s3://static.mysite.com/html/ --recursive --exclude ""
this command worked for me to delete all the files but not the folder.

aws s3 cp clobbers files?

Um, not quite sure what to make out of this.
I am trying to download 50 files from S3 to EC2 machine.
I ran:
for i in `seq -f "%05g" 51 101`; do (aws s3 cp ${S3_DIR}part-${i}.gz . &); done
A few minutes later, I checked on pgrep -f aws and found 50 processes running. Moreover, all files were created and started to download (large files, so expected to take a while to download).
At the end, however, I got only a subset of files:
$ ls
part-00051.gz part-00055.gz part-00058.gz part-00068.gz part-00070.gz part-00074.gz part-00078.gz part-00081.gz part-00087.gz part-00091.gz part-00097.gz part-00099.gz part-00101.gz
part-00054.gz part-00056.gz part-00066.gz part-00069.gz part-00071.gz part-00075.gz part-00080.gz part-00084.gz part-00089.gz part-00096.gz part-00098.gz part-00100.gz
Where is the rest??
I did not see any errors, but I saw these for successfully completed files (and these are the files that are shown in the ls output above):
download: s3://my/path/part-00075.gz to ./part-00075.gz
If you are copying many objects to/from S3, you might try the --recursive option to instruct aws-cli to copy multiple objects:
aws s3 cp s3://bucket-name/ . --recursive --exclude "*" --include "part-*.gz"