I'm wondering whether there is an easy way to permanently restore Glacier objects to S3. It seems that you can restore Glacier objects for the certain amount of time you provide when restoring to S3. So for example, we have now thousands of files restored to S3 that will get back to Glacier in 90 days but we do not want them back in Glacier.
To clarify a technicality on one point, your files will not "go back to" Glacier in 90 days -- because they are still in Glacier, but since you have done a restore, there are temporary copies living in S3 reduced redundancy storage (RRS) that S3 will delete in 90 days (or whatever day value you specified when you did the restore operation. Restoring files doesn't remove the Glacier copy.
The answer to your question is no, and yes.
You cannot technically change an object from the Glacier storage class back to the standard or RRS class...
The transition of objects to the GLACIER storage class is one-way.You cannot use a lifecycle configuration rule to convert the storage class of an object from GLACIER to Standard or RRS.
... however...
If you want to change the storage class of an already archived object to either Standard or RRS, you must use the restore operation to make a temporary copy first. Then use the copy operation to overwrite the object as a Standard or RRS object.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
You can copy that object to what is, technically, a new object, but one that has the same key (path) as the new object... so for practical purposes, yes, you can.
The PUT/COPY action is discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/ChgStoClsOfObj.html
First, restore from Glacier (as you have done). This makes the file available so that you can copy it.
Then, once the file is available, you can copy/overwrite it using the AWS CLI:
aws s3 cp --metadata-directive "COPY" --storage-class "STANDARD" s3://my-bucket/my-image.png s3://my-bucket/my-image.png
Notes
In the above command:
The from and the to file paths are the same (we are overwriting it).
We are setting --metadata-directive "COPY". This tells cp to copy the metadata along with the file contents (documentation here).
We are setting the --storage-class "STANDARD". This tells cp to use the STANDARD s3 storage class for the new file (documentation here).
The result is a new file, this will update the modified date.
If you are using versioning, you may need to make additional considerations.
This procedure is based on the info from the AWS docs here.
Bulk
If you want to do it in bulk (over many files/objects), you can use the below commands:
Dry Run
This command will list the Glacier files at the passed bucket and prefix:
aws s3api list-objects --bucket my-bucket --prefix some/path --query 'Contents[?StorageClass==`GLACIER`][Key]' --output text | xargs -I {} echo 'Would be copying {} to {}'
Bulk Upgrade
Once you are comfortable with the list of files that will be upgraded, run the below command to upgrade them.
Before running, make sure that the bucket and prefix match what you were using in the dry run. Also make sure that you've already run the standard S3/Glacier "restore" operation on all of the files (as described above).
This combines the single file/object upgrade command with the list-objects command in the dry run using xargs.
aws s3api list-objects --bucket my-bucket --prefix some/path --query 'Contents[?StorageClass==`GLACIER`][Key]' --output text | xargs -I {} aws s3 cp --metadata-directive "COPY" --storage-class "STANDARD" s3://adplugg/{} s3://adplugg/{}
Related
I have uploaded several thousand video files to an S3 bucket, then changed the bucket management settings to migrate the files to Glacier. I'm trying to retrieve a single file and copy it to my local machine. Typically, I follow the instructions here. I use the S3 management console, select the 'Action' to restore the selected file from Glacier and then download it using the following command:
aws s3 cp s3://my-bucket/my_video_file.mp4 .
This works as I want it to but I'm wondering if there is a way to restore the file from Glacier without needing to sign-in through the web browser and manually select it for retrieval. Looking through the documentation for aws s3 cp there is an option called --force-galcier-transfer but when I include it in my command I get the following:
Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
Here's the passage in the command from the manual page:
--force-glacier-transfer (boolean) Forces a transfer request on all Glacier objects in a sync or recursive copy.
Is it possible to retrieve and download a single file from glacier in a single cli command or will I always need to use the management console to retrieve the file first? I'm also open to using a python script or something similar if it can be done that way.
You can restore a file from glacier using the CLI, but there is no way to both restore and download it in a single command, as you need to restore the file from glacier, then some time later, maybe hours later, after the restore is complete, download the file.
To restore the file, use a command like this:
aws s3api restore-object --bucket bucketname --key path/to/object.zip --restore-request '{"Days":25,"GlacierJobParameters":{"Tier":"Standard"}}'
You can check on the status of the restore request with a command like this:
aws s3api head-object --bucket bucketname --key path/to/object.zip
That will output a json object with details on S3 object, including the status:
.... While the restore is still in progress ...
"Restore": "ongoing-request=\"true\"",
.... Or, when the restore is done ...
"Restore": "ongoing-request=\"false\", expiry-date=\"...\"",
And from there, it's an object in S3, you can simply copy it to your local machine:
aws s3 cp s3://bucketnamepath/to/object.zip object.zip
Of course, scripting all of this is possible. boto3 in Python makes it fairly straight forward to follow this same pattern, but it's pretty much possible to do this in whatever language you prefer to use.
I have an s3 bucket with millions of files copied there by a Java Process I do not control. Java Process is executed in a EC2 in "AWS Account A" but writes to a bucket owned by "AWS Account B". B was able to see the files but not to open them.
I figured out what was the problem and requested a change in Java Process to write new files with "acl = bucket-owner-full-control"... and it works! New files can be read from "AWS Account B".
But my problem is that I still have millions of files with incorrect acl. I can fix one of the old files easily with
aws s3api put-object-acl --bucket bucketFromAWSAccountA--key datastore/file0000001.txt --acl bucket-owner-full-control
What is the best way to do that?
I was thinking in something like
# Copy to TEMP folder
aws s3 sync s3://bucketFromAWSAccountA/datastore/ s3://bucketFromAWSAccountA/datastoreTEMP/ --recursive --acl bucket-owner-full-control
# Delete original store
aws s3 rm s3://bucketFromAWSAccountA/datastore/
# Sync it back to original folder
aws s3 sync s3://bucketFromAWSAccountA/datastoreTEMP/ s3://bucketFromAWSAccountA/datastore/ --recursive --acl bucket-owner-full-control
But it is going to be very time consuming. I wonder if...
is there a better way to achieve this?
Could this be achieved easily from bucket level? I mean some change "put-bucket-acl" that allows owner to ready everything?
One option seems to be to recursively copy all objects in the bucket over themselves, specifying the ACL change to make.
Something like:
aws s3 cp --recursive --acl bucket-owner-full-control s3://bucket/folder s3://bucket/folder --metadata-directive REPLACE
That code snippet was taken from this answer: https://stackoverflow.com/a/63804619
It is worth reviewing the other options presented in answers to that question, as it looks like there is a possibility for losing content-type tags or metadata information if you don't form the command properly.
I have a bucket in AWS S3. There are two folders in the bucket - folder1 & folder2. I want to copy the files from s3://myBucket/folder1 to s3://myBucket/folder2. But there is a twist: I ONLY want to copy the items in folder1 that were created after a certain date. I want to do something like this:
aws s3 cp s3://myBucket/folder1 s3://myBucket/folder2 --recursive --copy-source-if-modified-since
2020-07-31
There is no aws-cli command that will do this for you in a single line. If the number of files is relatively small, say a hundred thousands or fewer I think it would be easiest to write a bash script, or use your favourite language's AWS SDK, that lists the first folder, filters on creation date and issues the copy commands.
If the number of files is large you can create an S3 Inventory that will give you a listing of all the files in the bucket, which you can download and generate the copy commands from. This will be cheaper and quicker than listing when there are lots and lots of files.
Something like this could be a start, using #jarmod's suggestion about --copy-source-if-modified-since:
for key in $(aws s3api list-objects --bucket my-bucket --prefix folder1/ --query 'Contents[].Key' --output text); do
relative_key=${key/folder1/folder2}
aws s3api copy-object --bucket my-bucket --key "$relative_key" --source-object "my-bucket/$key" --copy-source-if-modified-since THE_CUTOFF_DATE
done
It will copy each object individually, and it will be fairly slow if there are lots of objects, but it's at least somewhere to start.
I have a S3 bucket with the below architecture -
Bucket
|__2019-08-23/
| |__SubFolder1
| |__Files
|
|__2019-08-22/
|__SubFolder2
I want to delete all folders, subfolders and files within which are older than X days.
How can that be done? I am not sure if S3 LifeCycle can be used for this ?
Also when I do -
aws s3 ls s3://bucket/
I get this -
PRE 2019-08-23/
PRE 2019-08-22/
Why do I see PRE in front of the folder name?
As per the valuable comments I tried this -
$ Number=1;current_date=$(date +%Y-%m-%d);
past_date=$(date -d "$current_date - $Number days" +%Y-%m-%d);
aws s3api list-objects --bucket bucketname --query 'Contents[?LastModified<=$past_date ].{Key:Key,LastModified: LastModified}' --output text | xargs -I {} aws s3 rm bucketname/{}
I am trying to remove all files which are 1 day old. But I get this error -
Bad jmespath expression: Unknown token $:
Contents[?LastModified<=$past_date ].{Key:Key,LastModified: LastModified}
How can I pass a variable in lastmodified?
You can use lifecycle, a lambda function if you have more complex logic or command line.
here is an example using command line:
aws s3api list-objects --bucket your-bucket --query 'Contents[?LastModified>=`2019-01-01` ].{Key:Key,LastModified: LastModified}' --prefix "2019-01-01" --output text | xargs -I {} aws s3 rm s3://your-bucket/{}
#Elzo's answer already covers the life cycle policy and how to delete the objects, therefore here I have an answer for the second part of your question:
PRE stands for PREFIX as stated in the aws s3 cli's manual.
If you run aws s3 ls help you will come across the following section:
The following ls command lists objects and common prefixes under
a
specified bucket and prefix. In this example, the user owns the bucket
mybucket with the objects test.txt and somePrefix/test.txt. The Last-
WriteTime and Length are arbitrary. Note that since the ls command has
no interaction with the local filesystem, the s3:// URI scheme is not
required to resolve ambiguity and may be omitted:
aws s3 ls s3://mybucket
Output:
PRE somePrefix/
2013-07-25 17:06:27 88 test.txt
This is just to differentiate keys that have a prefix (split by forward slashes) from keys that don't.
Therefore, if your key is prefix/key01 you will always see a PRE in front of it. However, if your key is key01, then PRE is not shown.
Keep in mind that S3 does not work with directories even though you can tell otherwise when looking from the UI. S3's file structure is just one flat single-level container of files.
From the docs:
In Amazon S3, buckets and objects are the primary resources, where
objects are stored in buckets. Amazon S3 has a flat structure with no
hierarchy like you would see in a file system. However, for the sake
of organizational simplicity, the Amazon S3 console supports the
folder concept as a means of grouping objects. Amazon S3 does this by
using a shared name prefix for objects (that is, objects that have
names that begin with a common string). Object names are also referred
to as key names.
For example, you can create a folder in the console called photos, and
store an object named myphoto.jpg in it. The object is then stored
with the key name photos/myphoto.jpg, where photos/ is the prefix.
S3 Lifecycle can be used for buckets. For folders and subfolder management, you can write a simple AWS lambda to delete the folders and sub folders which are xx days old. Leverage S3 AWS SDK for JavaScript or Java or Python, etc. to develop the Lambda.
I am trying to encrypt an existing s3 bucket. When I do this:
aws s3 cp s3://test/ s3://test/ --recursive --sse
it is encrypting all the files in the bucket by re-copying the objects. My issue here is that I have objects in the bucket in Standard, Standard-IA and Glacier storage classes. So, when I run the above copy command the objects in Standard-IA storage are converted to standard storage. (I haven't tested what happens to objects in glacier yet - probably it won't even allow me to copy.)
Is there any way where we can restore the storage type of an object and just enable encryption for an existing bucket?
You could do something like this using bash and JQ, obviously python with boto3 or similar would be cleaner.
I don't know if you'd be better off adding a check to skip the GLACIER files, there's no magic way to apply encryption to them without unfreezing, then re-freezing them.
You'll want to run this on an ec2 instance local to the s3 bucket.
#!/bin/bash
bucketname="bucket-name"
aws s3 ls ${bucketname} --recursive | awk '{ print $NF }' > /tmp/filelist
for file in `cat /tmp/filelist`
do
class=`aws s3api head-object --bucket ${bucketname} --key $file | jq '.StorageClass' | sed 's/\"//g'`
if [ "$class" = "null" ]
then
class="STANDARD"
fi
echo "aws s3 cp s3://${bucketname}/${file} s3://${bucketname}/${file} --sse --storage-class ${class}"
done
You need to add the command line option --storage-class STANDARD_IA
Does your bucket have a lifecycle policy? If so - it's actually behaving like it's supposed to - you are, in effect, creating a new object in the bucket, so the transition over to standard is, in fact, correct.
The option by Ewan Leith above is really the only way to do it - programmatically determine the current storage state, then override the storage of the 'new' item on save.
Hope this helps...