Is there a way using the AWS CLI to download files using --recursive and --exclude + --include and no overwrite files I have already downloaded? It likes to just rewrite files even if they haven't changed, and won't resume downloads after a crash.
I think you are looking for the sync command. It assumes --recursive flag by default:
Syncs directories and S3 prefixes. Recursively copies new and updated
files from the source directory to the destination. Only creates
folders in the destination if they contain one or more files.
Something like this will work:
aws s3 sync s3://bucket/path/to/folder/ . --exclude '*' --include 'filesToMatch*.txt'
As said by hjpotter92, --recursive is implied vs. cp.
And you can always include the --dryrun flag to verify what will run before actually executing it.
Related
I'm syncing the entire contents of an external hard drive, used with macOS, to an S3 bucket. I'd like to exclude all macOS hidden files.
I've tried:
aws s3 sync --dryrun --exclude "^\." --exclude "\/\." ./ s3://bucketname
However, the result when I run that is exactly the same as just:
aws s3 sync --dryrun . s3://bucketname
So, I must be doing something wrong.
Any suggestions?
Thanks.
aws s3 sync --dryrun . s3://bucketname --exclude ".*" --exclude "*/.*"
Adding two exclusion arguments will hide both the specified files in the current directory as well as any in subfolders.
This seems to work:
aws s3 sync --dryrun . s3://bucketname --exclude ".*"
However, I don't think it will exclude such files in sub-directories.
Try this:
aws s3 sync --dryrun --exclude '*/.*'
This should remove any hidden files, including in subfolders.
aws s3 sync --recursive --dryrun --exclude '/.'
I have problem where I can't push through all of my zip files to my s3 bucket, it happens right now when i run the bat files it just a second of loading of cmd and it will automatically close. when i refresh my s3 bucket folder there is no copy of zip files.
Command:
AWS S3 BUCKET:
My Script:
aws s3 cp s3://my_bucket/07-08-2020/*.zip C:\first_folder\second_folder\update_folder --recursive
The issue is with the *.zip. In order to copy file with specific extension use the following syntax :
aws s3 cp [LOCAL_PATH] [S3_PATH] --recursive --exclude "*" --include "*.zip"
From the docs:
Note that, by default, all files are included. This means that
providing only an --include filter will not change what files are
transferred. --include will only re-include files that have been
excluded from an --exclude filter. If you only want to upload files
with a particular extension, you need to first exclude all files, then
re-include the files with the particular extension.
More info can be found here.
#AmitBaranes is right. I checked on a Windows box. You could also simplify your command by using sync instead of cp.
So the command using sync could be:
aws s3 sync "C:\first_folder\second_folder\update_folder" s3://my_bucket/07-08-2020/ --exclude "*" --include "*.zip"
Consider an aws bucket/key structure along these lines
myBucket/dir1/file1
myBucket/dir1/file2
myBucket/dir1/dir2/dir2file1
myBucket/dir1/dir2/dir2file2
When using:
aws s3 cp --recursive s3://myBucket/dir1/ .
Then we will copy down dir2file[1,2] along with file[1,2]. How to only copy the latter files and not files under subdirectories ?
Responding to a comment: . I am not interested in putting a --exclude for every subdirectory so this is not a duplicate of excluding directories from aws cp
As far as I understood, you want to make sure that the files present in current directories are copied but anything in child directories should not be copied. I think you can use something like that.
aws s3 cp s3://myBucket/dir1/ . --recursive --exclude "*/*"
Here we are excluding files which will have a path separator after "dir1".
You can exclude paths using the --exclude option, e.g.
aws s3 cp s3://myBucket/dir1/ . --recursive --exclude "dir1/dir2/*"
More options and examples can be found by using the aws cli help
aws s3 cp help
There is no way you can control the recursion depth while copying files using aws s3 cp. Neither it is supported in aws s3 ls.
So, if you do not wish to use --exclude or --include options, I suggest you:
Use aws s3 ls command without --recursive option to list files directly under a directory, extract only the file names from the output and save the names to a file. Refer this post
Then write a simple script to read the file names and for each execute aws s3 cp
Alternatively, you may use:
aws s3 cp s3://spaces/dir1/ . --recursive --exclude "*/*"
Is there a way I could copy a list of files from one S3 bucket to another? Both S3 buckets are in the same AWS account. I am able to copy a single file at a time using the aws cli command:
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt
However I have 1000+ files to copy. I do not want to copy all files in the source bucket so I am not able to utilize the sync command. Is there a way to call a file with the list of file names that needs to be copied to automate this process?
You can use the --exclude and --include filters and as well use the --recursive flag in s3 cp command to copy multiple files
Following is an example
aws s3 cp /tmp/foo/ s3://bucket/ --recursive --exclude "*" --include "*.jpg"
For more details click here
Approaching this problem from the Python aspect, you can run a Python script that does it for you. Since you have a lot of files, it might take a while but should get the job done. Save the following code in a file with .py extension and run it. You might need to run pip install boto3 beforehand in your terminal in case you don't already have it.
import boto3
s3 = boto3.resource('s3')
mybucket = s3.Bucket('oldBucket')
list_of_files = ['file1.txt', 'file2.txt']
for obj in mybucket.objects.all():
if obj.key in list_of_files:
s3.Object('newBucket', obj.key).put(Body=obj.get()["Body"].read())
If you want to use the AWS CLI, you could use cp in a loop over a file containing the names of the files you want to copy:
while read FNAME
do
aws s3 cp s3://source-bucket/$FNAME s3://target-bucket/$FNAME
done < file_list.csv
I've done this for a few hundred files. It's not efficient because you have to make a request for each file.
A better way would be to use the --include argument multiple times in one cp line. If you could generate all those arguments in the shell from a list of files you would effectively have
aws s3 cp s3://source-bucket/ s3://target-bucket/ --exclude "*" --include "somefile.txt" --include "someotherfile.jpg" --include "another.json" ...
I'll let someone more skilled figure out how to script that.
I have some files that I want to copy to s3.
Rather than doing one call per file, I want to include them all in one single call (to be as efficient as possible).
However, I only seem to get it to work if I add the --recursive flag, which makes it look in all children directories (all files I want are in the current directory only)
so this is the command I have now, that works
aws s3 cp --dryrun . mybucket --recursive --exclude * --include *.jpg
but ideally I would like to remove the --recursive to stop it traversing,
e.g. something like this (which does not work)
aws s3 cp --dryrun . mybucket --exclude * --include *.jpg
(I have simplified the example, in my script I have several different include patterns)
AWS CLI's S3 wildcard support is a bit primitive, but you could use multiple --exclude options to accomplish this. Note: the order of includes and excludes is important.
aws s3 cp --dryrun . s3://mybucket --recursive --exclude "*" --include "*.jpg" --exclude "*/*"
Try the command:
aws s3 cp --dryrun . s3://mybucket --recursive --exclude "*/"
Hope it help.
I tried the suggested answers and could not get aws to skip nested folders. Saw some weird outputs about calculating size, and 0 size objects, despite using the exclude flag.
I eventually gave up on the --recursive flag and used bash to perform a single s3 upload for each file matched. Remove --dryrun once you're ready to roll!
for i in *.{jpg,jpeg}; do aws --dryrun s3 cp ${i} s3://your-bucket/your-folder/${i}; done
I would suggest to go for a utility called s4cmd which provides us unix like file system operations and it also allows us to include the wild cards
https://github.com/bloomreach/s4cmd