Retrieve one file from AWS Glacier using cli - amazon-web-services

I have uploaded several thousand video files to an S3 bucket, then changed the bucket management settings to migrate the files to Glacier. I'm trying to retrieve a single file and copy it to my local machine. Typically, I follow the instructions here. I use the S3 management console, select the 'Action' to restore the selected file from Glacier and then download it using the following command:
aws s3 cp s3://my-bucket/my_video_file.mp4 .
This works as I want it to but I'm wondering if there is a way to restore the file from Glacier without needing to sign-in through the web browser and manually select it for retrieval. Looking through the documentation for aws s3 cp there is an option called --force-galcier-transfer but when I include it in my command I get the following:
Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
Here's the passage in the command from the manual page:
--force-glacier-transfer (boolean) Forces a transfer request on all Glacier objects in a sync or recursive copy.
Is it possible to retrieve and download a single file from glacier in a single cli command or will I always need to use the management console to retrieve the file first? I'm also open to using a python script or something similar if it can be done that way.

You can restore a file from glacier using the CLI, but there is no way to both restore and download it in a single command, as you need to restore the file from glacier, then some time later, maybe hours later, after the restore is complete, download the file.
To restore the file, use a command like this:
aws s3api restore-object --bucket bucketname --key path/to/object.zip --restore-request '{"Days":25,"GlacierJobParameters":{"Tier":"Standard"}}'
You can check on the status of the restore request with a command like this:
aws s3api head-object --bucket bucketname --key path/to/object.zip
That will output a json object with details on S3 object, including the status:
.... While the restore is still in progress ...
"Restore": "ongoing-request=\"true\"",
.... Or, when the restore is done ...
"Restore": "ongoing-request=\"false\", expiry-date=\"...\"",
And from there, it's an object in S3, you can simply copy it to your local machine:
aws s3 cp s3://bucketnamepath/to/object.zip object.zip
Of course, scripting all of this is possible. boto3 in Python makes it fairly straight forward to follow this same pattern, but it's pretty much possible to do this in whatever language you prefer to use.

Related

Preserve metadata when uploading multiple files to an Amazon S3 bucket using AWS CLI

I have an archive of files for a client that needs uploading to an s3 bucket (100,000+ files). From there, the files need to be listed & available for download on their website.
It is crucial that the files I upload preserve the 'Date Created' or 'Date Modified' metadata in order for the files to be listed in the original date they were created. User-defined metadata is also fine, as long as I can get the data after upload.
I've tried using the following command on AWS CLI as a test:
aws s3 cp test.pdf s3://test-bucket/ --metadata-directive COPY
however it doesn't preserve the original metadata. Is there a bug with the COPY command or am I missing something? Any advice or alternative ways I could achieve this would be appreciated, thanks all.
--metadata-directive take effect only if source and destination path are both s3 path, has no effect to local files. Here you could pass local file attributes as --metadata (User-defined metadata) for new created s3 object. for example:
#!/bin/sh
file="$1"
meta=$(stat --printf='{"x-change-time":"%z","x-access-time":"%x","x-modify-time":"%y"}' $file)
aws s3 cp "$file" s3://test-bucket/ --metadata "$meta"
test.sh ./test.pdf

Can I copy files from one aws bucket to another aws bucket without downloading to local machine?

I have some huge files which are in bucket1. I need to copy some of the files to bucket2. I know some ways where I will download files from bucket1 to local machine and upload to bucket2.
Can I skip this download and upload step and request amazon to copy files without downloading? Is this even possible?
Amazon S3 has API calls that can copy objects between buckets (even between regions), which does not involve any downloading.
The easiest method is to use the AWS Command-Line Interface (CLI), which has some useful commands:
aws s3 sync s3://bucket1/ s3://bucket2/
will syncrhonize files between buckets, so they have the same content.
aws s3 cp --recursive s3://bucket1/ s3://bucket2/
will do similar, but you can be more selective
See: Using High-Level s3 Commands with the AWS Command Line Interface - AWS Command Line Interface

How to stop changing storage class to Glacier by S3 [duplicate]

I'm wondering whether there is an easy way to permanently restore Glacier objects to S3. It seems that you can restore Glacier objects for the certain amount of time you provide when restoring to S3. So for example, we have now thousands of files restored to S3 that will get back to Glacier in 90 days but we do not want them back in Glacier.
To clarify a technicality on one point, your files will not "go back to" Glacier in 90 days -- because they are still in Glacier, but since you have done a restore, there are temporary copies living in S3 reduced redundancy storage (RRS) that S3 will delete in 90 days (or whatever day value you specified when you did the restore operation. Restoring files doesn't remove the Glacier copy.
The answer to your question is no, and yes.
You cannot technically change an object from the Glacier storage class back to the standard or RRS class...
The transition of objects to the GLACIER storage class is one-way.You cannot use a lifecycle configuration rule to convert the storage class of an object from GLACIER to Standard or RRS.
... however...
If you want to change the storage class of an already archived object to either Standard or RRS, you must use the restore operation to make a temporary copy first. Then use the copy operation to overwrite the object as a Standard or RRS object.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
You can copy that object to what is, technically, a new object, but one that has the same key (path) as the new object... so for practical purposes, yes, you can.
The PUT/COPY action is discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/ChgStoClsOfObj.html
First, restore from Glacier (as you have done). This makes the file available so that you can copy it.
Then, once the file is available, you can copy/overwrite it using the AWS CLI:
aws s3 cp --metadata-directive "COPY" --storage-class "STANDARD" s3://my-bucket/my-image.png s3://my-bucket/my-image.png
Notes
In the above command:
The from and the to file paths are the same (we are overwriting it).
We are setting --metadata-directive "COPY". This tells cp to copy the metadata along with the file contents (documentation here).
We are setting the --storage-class "STANDARD". This tells cp to use the STANDARD s3 storage class for the new file (documentation here).
The result is a new file, this will update the modified date.
If you are using versioning, you may need to make additional considerations.
This procedure is based on the info from the AWS docs here.
Bulk
If you want to do it in bulk (over many files/objects), you can use the below commands:
Dry Run
This command will list the Glacier files at the passed bucket and prefix:
aws s3api list-objects --bucket my-bucket --prefix some/path --query 'Contents[?StorageClass==`GLACIER`][Key]' --output text | xargs -I {} echo 'Would be copying {} to {}'
Bulk Upgrade
Once you are comfortable with the list of files that will be upgraded, run the below command to upgrade them.
Before running, make sure that the bucket and prefix match what you were using in the dry run. Also make sure that you've already run the standard S3/Glacier "restore" operation on all of the files (as described above).
This combines the single file/object upgrade command with the list-objects command in the dry run using xargs.
aws s3api list-objects --bucket my-bucket --prefix some/path --query 'Contents[?StorageClass==`GLACIER`][Key]' --output text | xargs -I {} aws s3 cp --metadata-directive "COPY" --storage-class "STANDARD" s3://adplugg/{} s3://adplugg/{}

How to create folder on S3 from Ec2 instance

I want to create folder in S3 bucket from Ec2 instacne . I tried the put object but its not working . Is there any way of creating folder on s3 from ec2 instace using cli.
You don't need to create a folder to put item in it. For example, just run something like the below command and s3 will create the folders if they don't exist:
aws s3 cp ./testfile s3://yourBucketName/any/path/you/like
If you want to use cp recursively you can specify --recursive option, or use "aws s3 sync".
If your command does not work, then you may have permission issues. Paste your error so that we can help you.
aws s3api put-object --bucket bucketname --key foldername/
This command works like a charm.
Courtesy AWS Support.
aws s3 sync <folder_name> s3://<you-prefix>/<some_other_folder>/<folder_name>
And bare in mind that, S3 is an object store. It doesn't deal with folder.
If you create /xyz/ and upload a file call /xyz/foo.txt , those are actually 2 different object. if you delete /xyz/ , it will not delete /xyz/foo.txt.
S3 console allow you to "create folder", but after you play with it, you will notice , you CANNOT RENAME folder, or do ANYTHING that you can play with a folder (like moving a tree structure, recursively specify access rights)
In S3, there is something call "PREFIX" where the API allow you to list/filter file with particular "prefix", that let you deal with abstraction.
As mentioned above, since you CANNOT do anything like a file system folder, if you want to perform task like moving one folder to another folder, You need to write your own code to "rewrite" the file name(To be specific, it is "Key" in S3) , i.e. copy it to new object name and delete the old object.
If you want build advance control on S3, you may choose any of the AWS SDK to do it.
https://aws.amazon.com/tools/
You can play around with the API function call put_object() (naming varied depends on SDK language) and proof those facts (which most is found inside AWS documentation)
update: Since #Tom raise up the issues.
You cannot create an virtual folder using AWS cli (Maybe #Tom can show how), only ways to do that is using AWS SDK put_object()
Let's try this
First I create dummy file in shell
echo "dummy">test.txt
Then try use python aws sdk
import boto3
s3=boto3.client("s3")
s3.create_bucket(Bucket="dummy")
# now create so call xyz/ "empty virtual folder"
s3.put_object(Bucket="dummy", Key="xyz/")
# now I put above file name to S3 , call xyz/test.txt
# First I must open the file, because put_object only take bytes or file object
myfile=open("test.txt")
s3.put_object(Bucket="dummy", Key="xyz/test.txt")
Now, go to your command shell, fire up your AWS CLI (or continue to play with boto3)
# check everything
aws s3 ls s3://dummy --recursive
#now delete the so call "folder"
aws s3 rm s3://dummy/xyz/
# And you see the file "xyz/test.txt" is still there
aws s3 s3://dummy --recursive
You can find the commands here from official blog of AWS:
http://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html
And there are different other tools available which can be used to create Bucket/ folders in S3. One of the known tool is S3Browser which is available for windows servers. Install it on your EC2 instance and provide your AWS access key and secret keys to access the S3. This tool provide simple UI to do that.
There is no cli command that allows you to simply create a folder in an s3 bucket. To create this folder I would use the following command, which creates an empty file, with nothing inside. But if you delete the file you will delete the folder as long as you have not added anything else afterwards
aws s3api put-object --bucket bucket_name --key folder_name/empty.csv

Downloading an entire S3 bucket?

I noticed that there does not seem to be an option to download an entire s3 bucket from the AWS Management Console.
Is there an easy way to grab everything in one of my buckets? I was thinking about making the root folder public, using wget to grab it all, and then making it private again but I don't know if there's an easier way.
AWS CLI
See the "AWS CLI Command Reference" for more information.
AWS recently released their Command Line Tools, which work much like boto and can be installed using
sudo easy_install awscli
or
sudo pip install awscli
Once installed, you can then simply run:
aws s3 sync s3://<source_bucket> <local_destination>
For example:
aws s3 sync s3://mybucket .
will download all the objects in mybucket to the current directory.
And will output:
download: s3://mybucket/test.txt to test.txt
download: s3://mybucket/test2.txt to test2.txt
This will download all of your files using a one-way sync. It will not delete any existing files in your current directory unless you specify --delete, and it won't change or delete any files on S3.
You can also do S3 bucket to S3 bucket, or local to S3 bucket sync.
Check out the documentation and other examples.
Whereas the above example is how to download a full bucket, you can also download a folder recursively by performing
aws s3 cp s3://BUCKETNAME/PATH/TO/FOLDER LocalFolderName --recursive
This will instruct the CLI to download all files and folder keys recursively within the PATH/TO/FOLDER directory within the BUCKETNAME bucket.
You can use s3cmd to download your bucket:
s3cmd --configure
s3cmd sync s3://bucketnamehere/folder /destination/folder
There is another tool you can use called rclone. This is a code sample in the Rclone documentation:
rclone sync /home/local/directory remote:bucket
I've used a few different methods to copy Amazon S3 data to a local machine, including s3cmd, and by far the easiest is Cyberduck.
All you need to do is enter your Amazon credentials and use the simple interface to download, upload, sync any of your buckets, folders or files.
You've many options to do that, but the best one is using the AWS CLI.
Here's a walk-through:
Download and install AWS CLI in your machine:
Install the AWS CLI using the MSI Installer (Windows).
Install the AWS CLI using the Bundled Installer for Linux, OS X, or Unix.
Configure AWS CLI:
Make sure you input valid access and secret keys, which you received when you created the account.
Sync the S3 bucket using:
aws s3 sync s3://yourbucket /local/path
In the above command, replace the following fields:
yourbucket >> your S3 bucket that you want to download.
/local/path >> path in your local system where you want to download all the files.
To download using AWS S3 CLI:
aws s3 cp s3://WholeBucket LocalFolder --recursive
aws s3 cp s3://Bucket/Folder LocalFolder --recursive
To download using code, use the AWS SDK.
To download using GUI, use Cyberduck.
The answer by #Layke is good, but if you have a ton of data and don't want to wait forever, you should read "AWS CLI S3 Configuration".
The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:
aws configure set default.s3.max_concurrent_requests 1000
aws configure set default.s3.max_queue_size 100000
After running these, you can use the simple sync command:
aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path
or
aws s3 sync s3://source-bucket/source-path c:\my\local\data\path
On a system with CPU 4 cores and 16GB RAM, for cases like mine (3-50GB files) the sync/copy speed went from about 9.5MiB/s to 700+MiB/s, a speed increase of 70x over the default configuration.
100% works for me, i have download all files from aws s3 backet.
Install AWS CLI. Select your operating system and follow the steps here: Installing or updating the latest version of the AWS CLI
Check AWS version: aws --version
Run config command: aws configure
aws s3 cp s3://yourbucketname your\local\path --recursive
Eg (Windows OS): aws s3 cp s3://yourbucketname C:\aws-s3-backup\project-name --recursive
Check out this link: How to download an entire bucket from S3 to local folder
If you use Visual Studio, download "AWS Toolkit for Visual Studio".
After installed, go to Visual Studio - AWS Explorer - S3 - Your bucket - Double click
In the window you will be able to select all files. Right click and download files.
For Windows, S3 Browser is the easiest way I have found. It is excellent software, and it is free for non-commercial use.
Use this command with the AWS CLI:
aws s3 cp s3://bucketname . --recursive
Another option that could help some OS X users is Transmit.
It's an FTP program that also lets you connect to your S3 files. And, it has an option to mount any FTP or S3 storage as a folder in the Finder, but it's only for a limited time.
I've done a bit of development for S3 and I have not found a simple way to download a whole bucket.
If you want to code in Java the jets3t lib is easy to use to create a list of buckets and iterate over that list to download them.
First, get a public private key set from the AWS management consule so you can create an S3service object:
AWSCredentials awsCredentials = new AWSCredentials(YourAccessKey, YourAwsSecretKey);
s3Service = new RestS3Service(awsCredentials);
Then, get an array of your buckets objects:
S3Object[] objects = s3Service.listObjects(YourBucketNameString);
Finally, iterate over that array to download the objects one at a time with:
S3Object obj = s3Service.getObject(bucket, fileName);
file = obj.getDataInputStream();
I put the connection code in a threadsafe singleton. The necessary try/catch syntax has been omitted for obvious reasons.
If you'd rather code in Python you could use Boto instead.
After looking around BucketExplorer, "Downloading the whole bucket" may do what you want.
AWS SDK API is only the best option for uploading entire folder and repository to AWS S3 and to download entire AWS S3 bucket locally.
To upload whole folder to AWS S3: aws s3 sync . s3://BucketName
To download whole AWS S3 bucket locally: aws s3 sync s3://BucketName .
You can also assign path like BucketName/Path for particular folder in AWS S3 bucket to download.
If you only want to download the bucket from AWS, first install the AWS CLI in your machine. In terminal change the directory to where you want to download the files and run this command.
aws s3 sync s3://bucket-name .
If you also want to sync the both local and s3 directories (in case you added some files in local folder), run this command:
aws s3 sync . s3://bucket-name
You can do this with MinIO Client as follows: mc cp -r https://s3-us-west-2.amazonaws.com/bucketName/ localdir
MinIO also supports sessions, resumable downloads, uploads and many more. MinIO supports Linux, OS X and Windows operating systems. It is written in Golang and released under Apache Version 2.0.
AWS CLI is the best option to download an entire S3 bucket locally.
Install AWS CLI.
Configure AWS CLI for using default security credentials and default AWS Region.
To download the entire S3 bucket use command
aws s3 sync s3://yourbucketname localpath
Reference to AWS CLI for different AWS services: AWS Command Line Interface
To add another GUI option, we use WinSCP's S3 functionality. It's very easy to connect, only requiring your access key and secret key in the UI. You can then browse and download whatever files you require from any accessible buckets, including recursive downloads of nested folders.
Since it can be a challenge to clear new software through security and WinSCP is fairly prevalent, it can be really beneficial to just use it rather than try to install a more specialized utility.
If you use Firefox with S3Fox, that DOES let you select all files (shift-select first and last) and right-click and download all.
I've done it with 500+ files without any problem.
When in Windows, my preferred GUI tool for this is CloudBerry Explorer Freeware for
Amazon S3. It has a fairly polished file explorer and FTP-like interface.
You can use sync to download whole S3 bucket. For example, to download whole bucket named bucket1 on current directory.
aws s3 sync s3://bucket1 .
If you have only files there (no subdirectories) a quick solution is to select all the files (click on the first, Shift+click on the last) and hit Enter or right click and select Open. For most of the data files this will download them straight to your computer.
Try this command:
aws s3 sync yourBucketnameDirectory yourLocalDirectory
For example, if your bucket name is myBucket and local directory is c:\local, then:
aws s3 sync s3://myBucket c:\local
For more informations about awscli check this
aws cli installation
It's always better to use awscli for downloading / uploading files to s3. Sync will help you to resume without any hassle.
aws s3 sync s3://bucketname/ .
aws s3 sync s3://<source_bucket> <local_destination>
is a great answer, but it won't work if the objects are in storage class Glacier Flexible Retrieval, even if the the files have been restored. In that case you need to add the flag --force-glacier-transfer .
Here is a summary of what you have to do to copy an entire bucket:
1. Create a user that can operate with AWS s3 bucket
Follow this official article: Configuration basics
Don't forget to:
tick "programmatic access" in order to have the possibility to deal with with AWS via CLI.
add the right IAM policy to your user to allow him to interact with the s3 bucket
2. Download, install and configure AWS CLI
See this link allowing to configure it: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
You can use the following command in order to add the keys you got when you created your user:
$ aws configure
AWS Access Key ID [None]: <your_access_key>
AWS Secret Access Key [None]: <your_secret_key>
Default region name [None]: us-west-2
Default output format [None]: json
3. Use the following command to download content
You can a recursive cp commande, but aws sync command is f:
aws s3 sync s3://your_bucket /local/path
To see what would be the dowloaded files before really do the download, you can use the --dryrun option.
To improve speed, you can adjust s3 max_concurrent_requests and max_queue_size properties. See: http://docs.aws.amazon.com/cli/latest/topic/s3-config.html
You can exclude/include some files using --exclude and --include options. See: https://docs.aws.amazon.com/cli/latest/reference/s3/
For example, the below command will show all the .png file presents in the bucket. Replay the command without --dryrun to make the resulting files be downloaded.
aws s3 sync s3://your_bucket /local/path --recursive --exclude "*" --include "*.png" --dryrun
Windows User need to download S3EXPLORER from this link which also has installation instructions :- http://s3browser.com/download.aspx
Then provide you AWS credentials like secretkey, accesskey and region to the s3explorer, this link contains configuration instruction for s3explorer:Copy Paste Link in brower: s3browser.com/s3browser-first-run.aspx
Now your all s3 buckets would be visible on left panel of s3explorer.
Simply select the bucket, and click on Buckets menu on top left corner, then select Download all files to option from the menu. Below is the screenshot for the same:
Bucket Selection Screen
Then browse a folder to download the bucket at a particular place
Click on OK and your download would begin.
aws sync is the perfect solution. It does not do a two way.. it is a one way from source to destination. Also, if you have lots of items in bucket it will be a good idea to create s3 endpoint first so that download happens faster (because download does not happen via internet but via intranet) and no charges
As #layke said, it is the best practice to download the file from the S3 cli it is a safe and secure. But in some cases, people need to use wget to download the file and here is the solution
aws s3 presign s3://<your_bucket_name/>
This will presign will get you temporary public URL which you can use to download content from S3 using the presign_url, in your case using wget or any other download client.
You just need to pass --recursive & --include "*" in the aws s3 cp command as follows: aws --region "${BUCKET_REGION}" s3 cp s3://${BUCKET}${BUCKET_PATH}/ ${LOCAL_PATH}/tmp --recursive --include "*" 2>&1
In addition to the suggestions for aws s3 sync, I would also recommend looking at s5cmd.
In my experience I found this to be substantially faster than the AWS CLI for multiple downloads or large downloads.
s5cmd supports wildcards so something like this would work:
s5cmd cp s3://bucket-name/* ./folder