Copy list of files from S3 bucket to S3 bucket - amazon-web-services

Is there a way I could copy a list of files from one S3 bucket to another? Both S3 buckets are in the same AWS account. I am able to copy a single file at a time using the aws cli command:
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt
However I have 1000+ files to copy. I do not want to copy all files in the source bucket so I am not able to utilize the sync command. Is there a way to call a file with the list of file names that needs to be copied to automate this process?

You can use the --exclude and --include filters and as well use the --recursive flag in s3 cp command to copy multiple files
Following is an example
aws s3 cp /tmp/foo/ s3://bucket/ --recursive --exclude "*" --include "*.jpg"
For more details click here

Approaching this problem from the Python aspect, you can run a Python script that does it for you. Since you have a lot of files, it might take a while but should get the job done. Save the following code in a file with .py extension and run it. You might need to run pip install boto3 beforehand in your terminal in case you don't already have it.
import boto3
s3 = boto3.resource('s3')
mybucket = s3.Bucket('oldBucket')
list_of_files = ['file1.txt', 'file2.txt']
for obj in mybucket.objects.all():
if obj.key in list_of_files:
s3.Object('newBucket', obj.key).put(Body=obj.get()["Body"].read())

If you want to use the AWS CLI, you could use cp in a loop over a file containing the names of the files you want to copy:
while read FNAME
do
aws s3 cp s3://source-bucket/$FNAME s3://target-bucket/$FNAME
done < file_list.csv
I've done this for a few hundred files. It's not efficient because you have to make a request for each file.
A better way would be to use the --include argument multiple times in one cp line. If you could generate all those arguments in the shell from a list of files you would effectively have
aws s3 cp s3://source-bucket/ s3://target-bucket/ --exclude "*" --include "somefile.txt" --include "someotherfile.jpg" --include "another.json" ...
I'll let someone more skilled figure out how to script that.

Related

How to push all zip files on my specific folder to my s3 bucket folder?

I have problem where I can't push through all of my zip files to my s3 bucket, it happens right now when i run the bat files it just a second of loading of cmd and it will automatically close. when i refresh my s3 bucket folder there is no copy of zip files.
Command:
AWS S3 BUCKET:
My Script:
aws s3 cp s3://my_bucket/07-08-2020/*.zip C:\first_folder\second_folder\update_folder --recursive
The issue is with the *.zip. In order to copy file with specific extension use the following syntax :
aws s3 cp [LOCAL_PATH] [S3_PATH] --recursive --exclude "*" --include "*.zip"
From the docs:
Note that, by default, all files are included. This means that
providing only an --include filter will not change what files are
transferred. --include will only re-include files that have been
excluded from an --exclude filter. If you only want to upload files
with a particular extension, you need to first exclude all files, then
re-include the files with the particular extension.
More info can be found here.
#AmitBaranes is right. I checked on a Windows box. You could also simplify your command by using sync instead of cp.
So the command using sync could be:
aws s3 sync "C:\first_folder\second_folder\update_folder" s3://my_bucket/07-08-2020/ --exclude "*" --include "*.zip"

How to copy multiple file from local to s3?

I am trying to upload multiple files from my local to an AWS S3 bucket,
I am able to use aws s3 cp to copy files one by one,
But I need to upload multiple but not all ie. selective files to the same S3 folder,
Is it possible to do this in a single AWS CLI call, if so how?
Eg -
aws s3 cp test.txt s3://mybucket/test.txt
Reference -
https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
If you scroll down the documentation link you provided to the section entitled "Recursively copying local files to S3", you will see the following:
When passed with the parameter --recursive, the following cp command recursively copies all files under a specified directory to a specified bucket and prefix while excluding some files by using an --exclude parameter. In this example, the directory myDir has the files test1.txt and test2.jpg
So, assuming you wanted to copy all .txt files in some subfolder to the same bucket in S3, you could try something like:
aws s3 cp yourSubFolder s3://mybucket/ --recursive
If there are any other files in this subfolder, you need to add the --exclude and --include parameters (otherwise all files will be uploaded):
aws s3 cp yourSubFolder s3://mybucket/ --recursive --exclude "*" --include "*.txt"
If you're doing this from bash, then you can use this pattern as well:
for f in *.png; do aws s3 cp $f s3://my/dest; done
You would of course customize *.png to be your glob pattern, and the s3 destination.
If you have a weird set of files you can do something like put their names in a text file, call it filenames.txt and then:
for f in `cat filenames.txt`; do ... (same as above) ...
aws s3 cp <your directory path> s3://<your bucket name>/ --recursive --exclude "*.jpg" --include "*.log”

Use the aws client to copy s3 files from a single directory only (non recursively)

Consider an aws bucket/key structure along these lines
myBucket/dir1/file1
myBucket/dir1/file2
myBucket/dir1/dir2/dir2file1
myBucket/dir1/dir2/dir2file2
When using:
aws s3 cp --recursive s3://myBucket/dir1/ .
Then we will copy down dir2file[1,2] along with file[1,2]. How to only copy the latter files and not files under subdirectories ?
Responding to a comment: . I am not interested in putting a --exclude for every subdirectory so this is not a duplicate of excluding directories from aws cp
As far as I understood, you want to make sure that the files present in current directories are copied but anything in child directories should not be copied. I think you can use something like that.
aws s3 cp s3://myBucket/dir1/ . --recursive --exclude "*/*"
Here we are excluding files which will have a path separator after "dir1".
You can exclude paths using the --exclude option, e.g.
aws s3 cp s3://myBucket/dir1/ . --recursive --exclude "dir1/dir2/*"
More options and examples can be found by using the aws cli help
aws s3 cp help
There is no way you can control the recursion depth while copying files using aws s3 cp. Neither it is supported in aws s3 ls.
So, if you do not wish to use --exclude or --include options, I suggest you:
Use aws s3 ls command without --recursive option to list files directly under a directory, extract only the file names from the output and save the names to a file. Refer this post
Then write a simple script to read the file names and for each execute aws s3 cp
Alternatively, you may use:
aws s3 cp s3://spaces/dir1/ . --recursive --exclude "*/*"

Amazon S3: Use aws s3 cp without downloading existing files?

Is there a way using the AWS CLI to download files using --recursive and --exclude + --include and no overwrite files I have already downloaded? It likes to just rewrite files even if they haven't changed, and won't resume downloads after a crash.
I think you are looking for the sync command. It assumes --recursive flag by default:
Syncs directories and S3 prefixes. Recursively copies new and updated
files from the source directory to the destination. Only creates
folders in the destination if they contain one or more files.
Something like this will work:
aws s3 sync s3://bucket/path/to/folder/ . --exclude '*' --include 'filesToMatch*.txt'
As said by hjpotter92, --recursive is implied vs. cp.
And you can always include the --dryrun flag to verify what will run before actually executing it.

how to include and copy files that are in current directory to s3 (and not recursively)

I have some files that I want to copy to s3.
Rather than doing one call per file, I want to include them all in one single call (to be as efficient as possible).
However, I only seem to get it to work if I add the --recursive flag, which makes it look in all children directories (all files I want are in the current directory only)
so this is the command I have now, that works
aws s3 cp --dryrun . mybucket --recursive --exclude * --include *.jpg
but ideally I would like to remove the --recursive to stop it traversing,
e.g. something like this (which does not work)
aws s3 cp --dryrun . mybucket --exclude * --include *.jpg
(I have simplified the example, in my script I have several different include patterns)
AWS CLI's S3 wildcard support is a bit primitive, but you could use multiple --exclude options to accomplish this. Note: the order of includes and excludes is important.
aws s3 cp --dryrun . s3://mybucket --recursive --exclude "*" --include "*.jpg" --exclude "*/*"
Try the command:
aws s3 cp --dryrun . s3://mybucket --recursive --exclude "*/"
Hope it help.
I tried the suggested answers and could not get aws to skip nested folders. Saw some weird outputs about calculating size, and 0 size objects, despite using the exclude flag.
I eventually gave up on the --recursive flag and used bash to perform a single s3 upload for each file matched. Remove --dryrun once you're ready to roll!
for i in *.{jpg,jpeg}; do aws --dryrun s3 cp ${i} s3://your-bucket/your-folder/${i}; done
I would suggest to go for a utility called s4cmd which provides us unix like file system operations and it also allows us to include the wild cards
https://github.com/bloomreach/s4cmd