I wanted to automate the process of copying files between two S3 buckets.
scenario: whenever the same files are getting uploaded to S3(first bucket) the older versions of the file should be moved to another S3 bucket(second S3) and only the newer version file should reside in the first bucket. Please suggest to me the process of doing this
I don’t know if there is a direct way, but I want to use shell scripts to achieve:
aws s3 cp s3://newfile/ s3://backupfile --recursive
aws s3 cp /Local/file s3://newfile/ --recursive
When an object is uploaded to Amazon S3 with the same Key (filename) as an existing object, the new object will overwrite the existing object.
This can be avoided by activating Versioning on the bucket, which will retain all versions of an object. So, if a new object is uploaded with the same Key, then the old object becomes a 'previous version' and is still accessible in S3. (You will pay for the storage of all versions.)
If your requirements are to preserve previous versions of objects, then this should be sufficient for your need, without having to copy the older version to a different bucket.
If you really wanted to do as you ask, then you would need:
Versioning turned on (to preserve older versions)
An AWS Lambda function that is triggered by the upload that will copy the old version to a different bucket and optionally delete the 'old version'
A plan for what to do when there is another upload of the same object -- should it copy it to the 'other bucket' and overwrite the older version that is already there? It might need Versioning too!
Related
I want to use the AWS S3 sync command to sync a large bucket with another bucket.
I found this answer that say that the files from the bucket synced over the AWS backbone and are not copied to the local machine but I can't find a reference anywhere in the documentation. Does anyone has a proof for this behavior? any formal documentation that explains how it works?
I tried to find something in the documentation but nothing there.
To learn more about the sync command, check CLI docs. You can directly refer to the section named -
Sync from S3 bucket to another S3 bucket
The following sync command syncs objects to a specified bucket and
prefix from objects in another specified bucket and prefix by copying
s3 objects. An s3 object will require copying if one of the following
conditions is true:
The s3 object does not exist in the specified bucket and prefix
destination.
The sizes of the two s3 objects differ.
The last modified time of the source is newer than the last modified time of the destination.
Use the S3 replication capability if you only want to replicate the data that moves from bucket1 to bucket2.
I have uploaded 365 files 1 files per day to S3 bucket all at one go. Now All the files have the same upload date. I want to Move the file which are more than 6 months to S3 Glacier. S3 lifecycle policy will take effect after 6 months as all the files upload date to s3 is same. The actual date of the files upload is stored in DynamoDb table with S3KeyUrl.
I want to know the best way to be able to move file to s3 Glacier. I came up with the following approach
Create the S3 Lifecycle policy to move file to s3 Glacier which will work after 6 month.
Create a app to Query DynamoDB Table to get the list of files which are more than 6 months and
download the file from s3 (as it allows uploading files from local directory) and use
ArchiveTransferManager (Amazon.Glacier.Transfer) to the file to s3 glacier vault.
In Prod Scenario there will be files in some 10 million so the solution should be reliable.
There are two versions of Glacier:
The 'original' Amazon Glacier, which uses Vaults and Archives
The Amazon S3 Storage Classes of Glacier and Glacier Deep Archive
Trust me... You do not want to use the 'original' Glacier. It is slow and difficult to use. So, avoid anything that mentions Vaults and Archives.
Instead, you simply want to change the Storage Class of the objects in Amazon S3.
Normally, the easiest way to do this is to "Edit storage class" in the S3 management console. However, you mention Millions of objects, so this wouldn't be feasible.
Instead, you will need to copy objects over themselves, while changing the storage class. This can be done with the AWS CLI:
aws s3 cp s3://<bucket-name>/ s3://<bucket-name>/ --recursive --storage-class <storage_class>
Note that this would change the storage class for all objects in the given bucket/path. Since you only wish to selectively change the storage class, you would either need to issue lots of the above commands (each for only one object), or you could use an AWS SDK to script the process. For example, you could write a Python program that loops through the list of objects, checks DynamoDB to determine whether the object is '6 months old' and then copies it over itself with the new Storage Class.
See: StackOverflow: How to change storage class of existing key via boto3
If you have millions of objects, it can take a long time to merely list the objects. Therefore, you could consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You could then use this CSV file as the 'input list' for your 'copy' operation rather than having to list the bucket itself.
Or, just be lazy (which is always more productive!) and archive everything to Glacier. Then, if somebody actually needs one of the files in the next 6 months, simply restore it from Glacier before use. So simple!
I am attempting to use the following command in a bash script to update the x-amz-meta-uid and x-amz-meta-gid of files and folders recursively.
aws s3 cp s3://$SOURCE_CLOUD_BUCKET_PATH s3://$DESTINATION_CLOUD_BUCKET_PATH --recursive --quiet --metadata-directive "REPLACE" --metadata "uid=$USER_UID,gid=$USER_GID"
However, it only seems to be updating the metadata on files. How can I also get this to update the metadata on the directories/folders also?
aws --version
aws-cli/2.0.43 Python/3.7.3 Linux/5.4.0-1029-aws exe/x86_64.ubuntu.18
As AWS S3 document states : https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html
In Amazon S3, buckets and objects are the primary resources, and
objects are stored in buckets. Amazon S3 has a flat structure instead
of a hierarchy like you would see in a file system. However, for the
sake of organizational simplicity, the Amazon S3 console supports the
folder concept as a means of grouping objects. Amazon S3 does this by
using a shared name prefix for objects (that is, objects have names
that begin with a common string). Object names are also referred to as
key names.
Since AWS S3 is Object storage service and is not POSIX compliant.
Which means there is no disk level folder structure maintainer.
The folders you are seeing are logical which means a file with name hello/world.text will show hello as parent folder name while world.txt as file name however, the actual filename stored is hello/world.txt.
So the meta-data is also managed at file level and not at folder level since, their are no physical folders.
The CLI behaviour is correct and you need to modify meta-data of files. Though, you can modify meta-data of all files / multiple files in a go.
No need to change the metadata of the files recursively, Metadata of whole folder can be changed. Follow this
Unfortunately, this morning I accidentally deleted a number of images from my S3 account, and I need to restore them. I have read about versioning, however this was not enabled on the bucket at the time of deletion (I have now enabled).
Is there any way of restoring these files either manually, or via Amazon directly?
Thanks,
Pete
Unfortunately, I don't think you can. Here is what AWS says in their docs -
To be able to undelete a deleted object, you must have had versioning
enabled on the bucket that contains the object before the object was
deleted.
I am trying to copy S3 bucket's objects across region. My source bucket is having version enabled and I need my destination bucket should retain all the versions present in my source bucket(though I think we can't preserve the actual timestamp while copying). I would prefer to have some officially supported tool.
Amazon S3 Cross-Region Replication (CRR) will do this for you. In fact, it requires versioning to be activated.
However, it will only take effect after it is activated. Any existing objects and versions will not be replicated.