I want to keep a backup of an AWS s3 bucket. If I use Glacier, it will archive the files from the bucket and moved to the Glacier but it will also delete the files from s3. I don't want to delete the files from s3. One option is to try with EBS volume. You can mount the AWS s3 bucket with s3fs and copy it to the EBS volume. Another way is do an rsync of the existing bucket to a new bucket which will act as a clone. Is there any other way ?
What you are looking for is cross-region replication:
https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
setup versioning and setup the replication.
on the target bucket you could setup a policy to archive to Glacier (or you could just use the bucket as a backup as is).
(this will only work between 2 regions, i.e. buckets cannot be in the same region)
If you want your data to be present in both primary and backup locations then this is more of a data replication use case.
Consider using AWS Lambda which is an event driven compute service.
You can write a simple piece of code to copy the data wherever you want. This will execute every time there is a change in S3 bucket.
For more info check the official documentation.
Related
I want to move my EBS snapshot into my S3 bucket, but after researching a lot i didn't find the way.
Is there any possible way to do it.
Amazon EBS Snapshots use Amazon S3 for storage, but they cannot be moved into your own Amazon S3 bucket.
Amazon Athena Log Analysis Services with S3 Glacier
We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR failure.
Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena because of this. The file referenced in the error is a Glacier backup.
My guess is the answer will be: don't keep glacier backups in the same bucket. We don't have this option with ease due to our data volume sizes. I believe Athena will not work in this setup and we will not be able to use Athena for our log analysis.
However if there is a way we can use Athena, we would be thrilled. Is there a solution to HIVE_CURSOR_ERROR and a way to skip Glacier files? Our s3 bucket is a flat bucket without folders.
The S3 file object name shown in the above and below screenshots is omitted from the screenshot. The file reference in the HIVE_CURSOR_ERROR is in fact the Glacier object. You can see it in this screenshot of our S3 Bucket.
Note I tried to post on https://forums.aws.amazon.com/ but that was no bueno.
The documentation from AWS dated May 16 2017 states specifically that Athena does not support the GLACIER storage class:
Athena does not support different storage classes within the bucket specified by the LOCATION
clause, does not support the GLACIER storage class, and does not support Requester Pays
buckets. For more information, see Storage Classes, Changing the Storage Class of an Object in
|S3|, and Requester Pays Buckets in the Amazon Simple Storage Service Developer Guide.
We are also interested in this; if you get it to work, please let us know how. :-)
Since the release of February 18, 2019 Athena will ignore objects with the GLACIER storage class instead of failing the query:
[…] As a result of fixing this issue, Athena ignores objects transitioned to the GLACIER storage class. Athena does not support querying data from the GLACIER storage class.
You must have an S3 bucket to work with. In addition, the AWS account that you use to initiate a S3 Glacier Select job must have write permissions for the S3 bucket. The Amazon S3 bucket must be in the same AWS Region as the vault that contains the archive object that is being queried.
S3 glacier select runs the query and stores in S3 bucket
Bottom line, you must move the data into an S3 buck to use teh S3 glacier select statement. Then use Athena on the 'new' S3 bucket.
I want to do a daily backup for s3 buckets. I was wondering if anyone knew what was best practice?
I was thinking of using a lambda function to copy contents from one s3 bucket to another as the s3 bucket is updated. But that won't mitigate against an s3 failure. How do I copy contents from one s3 bucket to another Amazon service like Glacier using lamda? What's the best practice here for backing up s3 buckets?
NOTE: I want to do a backup not archive (where content is deleted afterward)
Look into S3 cross-region replication to keep a backup copy of everything in another S3 bucket in another region. Note that you can even have the destination bucket be in a different AWS Account, so that it is safe even if your primary S3 account is hacked.
Note that a combination of Cross Region Replication and S3 Object Versioning (which is required for replication) will allow you to keep old versions of your files available even if they are deleted from the source bucket.
Then look into S3 lifecycle management to transition objects to Glacier to save storage costs.
I need to copy a large chunk of data, around 300 GB of files from say bucket A which is in us-east region and to bucket B which is in ap-southeast region. Also I need to change the structure of the bucket. Like I need to push the files to different folders on bucket B according to the image name which is in the bucket A. I tried to using AWS Lambda but it's not available in ap-southeast.
Also how much would it cost since data will be transferred between regions?
Method
The AWS Command-Line Interface (CLI) has the aws s3 cp command that can be used to move objects between buckets (even in different regions), and can rename them at the same time.
aws s3 cp s3://bucket-in-us/foo/bar.txt s3://bucket-in-ap/foo1/foo2/bar3.txt
There is also the aws s3 sync option that can be used to synchronize content between two buckets, but that doesn't help your requirement to rename objects.
Cost
Data Transfer charges from US regions to another region are shown on the Amazon S3 pricing page as US$0.02/GB.
Use bucket replication and then create another bucket in your target region and do your S3 object key manipulation.
Read more on S3 cross-region replication.
I'd like to mirror an S3 bucket with Amazon Glacier.
The Glacier FAQ states:
Amazon S3 now provides a new storage option that enables you to
utilize Amazon Glacier’s extremely low-cost storage service for data
archiving. You can define S3 lifeycycle rules to automatically archive
sets of Amazon S3 objects to Amazon Glacier to reduce your storage
costs. You can learn more by visiting the Object Lifecycle Management
topic in the Amazon S3 Developer Guide.
This is close, but I'd like to mirror. I do not want to delete the content on S3, only copy it to Glacier.
Is this possible to setup automatically with AWS?
Or does this mirroring need be uploaded to Glacier manually?
It is now possible to achieve an S3 to Glacier "mirror" by first creating a cross-region replication bucket on Amazon S3 (this replication bucket will be a mirror of your original bucket - see http://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html), then setting up a life-cycle rule (to move the data to Glacier) from within the replication bucket.
Amazon doesn't offer this feature through its API. We had the same problem, and solved the problem by running a daily cron job that re-uploads files to Glacier.
Here is a snippet of code you can run using Python and boto to copy a file to a Glacier vault. Note that with the code below, you do have to download the file locally from S3 before you can run it (you can use s3cmd, for instance) - the following code is useful for uploading the local file to Glacier.
import boto
# Set up your AWS key and secret, and vault name
aws_key = "AKIA1234"
aws_secret = "ABC123"
glacierVault = "someName"
# Assumption is that this file has been downloaded from S3
fileName = "localfile.tgz"
try:
# Connect to boto
l = boto.glacier.layer2.Layer2(aws_access_key_id=aws_key, aws_secret_access_key=aws_secret)
# Get your Glacier vault
v = l.get_vault(glacierVault)
# Upload file using concurrent upload (so large files are OK)
archiveID = v.concurrent_create_archive_from_file(fileName)
# Append this archiveID to a local file, that way you remember what file
# in Glacier corresponds to a local file. Glacier has no concept of files.
open("glacier.txt", "a").write(fileName + " " + archiveID + "\n")
except:
print "Could not upload gzipped file to Glacier"
This is done via Lifecycle policy, but the object is not available in S3 anymore. You can duplicate it into separate bucket to keep it.
If you first enable versioning on your S3 bucket then lifecycle rules can be applied to previous versions. This will achieve a very similar outcome, except there won't be a backup of the current version.