Do I need to setup Glacier Vault to archive data from S3? - amazon-web-services

I'm really new to AWS and quite confused on the purpose of Glacier vault, when I can archive my objects thru S3 via lifecycle rule? so do I have to first setup Glacier Vault for me to archive my objects?

Once upon a time, there was a service called Amazon Glacier. It was very low-cost, but it was very painful to use. Every request (even listing the contents of a vault) took a long time (eg make a request, come back an hour later to get the result).
Then, the clever people in Amazon S3 realized that they could provide a more friendly interface to Glacier. By simpler changing the storage class of objects in S3 to Glacier, they would move the files to their own Glacier vault and save you all the hassle.
Then, the S3 team introduced Glacier Deep Archive, which is only available via Amazon S3 and is even lower cost than Glacier itself!
The children rejoiced and all cried out in unison... "We will now only use Glacier via S3. We will never go direct to Glacier again!"

No you don't have to. You use Glacier Vaults if you want to use extra features that S3 Glacier service provides, such as Vault Lock Policies and/or Vault Access Policies.
For using just the Glacier storage, you can use Amazon S3 service and lifecycle rules.

Related

Uploading files to Glacier using AWS S3 v/s S3 Glacier upload

Standard S3 console supports uploading files and changing storage type, but in S3 Glacier we need to create a vault, and console support is not provided. let's say if I selected the S3 Glacier storage class in standard S3 upload, how it's different from Glacier, will it internally create a vault? is there any price variation?
Uploading to Glacier via Amazon S3 storage classes looks simple and easier.
There are two different types of Glacier.
The 'original' Amazon Glacier uses vaults and jobs. Quite frankly, it is awful to use. It's bearable if you are using a software package that knows how to use Glacier, but it is not a pleasant experience. For example, even just listing the contents of a vault requires waiting for a job to run, and then results need to be retrieved.
Using Glacier as a Storage Class in Amazon S3 is a much more pleasant way to use Glacier. You can use all standard S3 commands and utilities and it gives immediate feedback when you list objects. The only thing that takes time is retrieving an object that is in a Glacier storage class.
Plus, the Glacier and Glacier Deep Archive storage classes are cheaper than Glacier itself! I'd like to prove this, but the pricing page for Glacier now redirects to S3 pricing so it's not possible to see how much it costs!
Bottom line: Use S3 storage classes, not the old 'Glacier' service that uses Vaults.

AWS - Can you assign a public IP or URL to an S3 Glacier Vault?

Can you assign a public IP or URL to an S3 Glacier Vault? I want to use it for automatic backups.
I realize that I can upload to an S3 bucket and then use lifecycle rules to move it over to glacier, but I'm asking if I can skip that step entirely and upload directly to Glacier Vault.
Thanks for any tips!
When originally released, Amazon Glacier was only accessible directly (rather than via Amazon S3). It offers low-cost storage, but it is only accessible via API (not much can be done in the Management Console) and it is very slow because all requests are processed as jobs. This even makes it slow to list the contents of a Vault.
You can certainly access Amazon Glacier directly, but it would be via API calls to the Glacier Endpoint. I would recommend that you use tools such as Cloudberry Backup that know how to talk directly to Glacier.
However, a much simpler way to use Glacier is to store files in Amazon S3 and then select the Glacier or Glacier Deep Archive storage class. This allows use of the S3 interface and the Deep Archive storage class is actually cheaper than Glacier itself! You can also use the AWS CLI to upload backups, which is much easier than working with a Glacier Vault.
By the way, if you are purely wanting to use S3/Glacier for "backups", I would highly recommend using traditional backup tools that know how to use S3. They are much more reliable, and offer more capabilities, than doing it yourself. For example, they can keep multiple versions of files and can retain deleted files for a period of time to allow recovery.
Specify --storage-class GLACIER if you are using aws s3 cp command of cli. Use upload-archive if you are using aws glacier command of cli.

Amazon S3 Glacier vs Glacier Storage Class

This question might look like an easy/idiot/beginner question but I'm really confused between both of them.
Why do I need to use Amazon S3 Glacier if I can use the normal S3 Bucket and just change the storage class of the objects inside to Glacier manually or by using Lifecycle rule?
Thanks in advance,
In the old days, Amazon Glacier was only available as a separate product. Frankly, the Glacier service is a pain to use.
Every request has to be submitted as a Job, which takes a long time to return. Even obtaining a list of archives is slow, let alone restoring a file from the archive.
The best way to use the Amazon Glacier service is with a third-party tool (eg Cloudberry Backup) that knows how to interface with Glacier, isolating you from having to use it directly.
Then, in 2012, the Amazon S3 team introduced a new Glacier Storage Class where S3 would move the data to Glacier, but still present the objects as being "in S3". (Well, the objects appear in S3 and their metadata is accessible, but the contents of the objects is stored in Glacier.) Then, in 2019, an even lower-cost Glacier Deep Archive storage class offered even lower prices than available through Amazon Glacier itself.
Therefore, it is now both easier and lower cost to use Glacier via Amazon S3 storage classes.
Amazon Glacier still remains available for use, and has been renamed Amazon S3 Glacier to further confuse things. There might be some use-cases where it is preferable to use (eg acting like traditional tape backups for AWS Storage Gateway Tape Gateways), but Glacier Deep Archive in S3 would be the lowest-cost option.
These days most people will just use S3 glacier storage class, because S3 api is much more convenient to work with then Glacier api.
However, Amazon S3 Glacier offers some extra functionality, not available in regular S3. Most notably this would be Vault Lock Policies which allow for fine-grain control of locking your vaults with archives for regulatory purposes.
S3 offers Object Lock which performs similar function, but it is not as versatile as vault lock policies. For example, the s3 object locks can be only enabled on bucket creation, and legal holds apply only to individual versions of objects. In contrast, vault lock policies, as the names suggest, are policy documents written in json, which don't have such limitations.

Amazon Glacier How to delete files after a certain period of time

Thanks for reading this.
I am able to transfer files from S3 to Glacier after 30 days using lifecycle rule. However, how do I make the same files get deleted from Glacier after 3 months?
Thanks.
If the objects were moved from S3 to Glacier via a Lifecycle Policy, add a permanently delete setting to the lifecycle policy to Delete the objects after n days. This will delete the objects from both S3 and Glacier.
If, instead, the objects were uploaded directly to Glacier, then there is no auto-deletion capability.
As far as I'm aware, Glacier does not currently have lifecycle policies for Glacier vaults like it does for S3.
You could create your own autodelete setup (likely within the not-expiring-after-12-months AWS Free Tier) by writing metadata about the Glacier archives to DynamoDB (vault name, archive id, timestamp) and have a scheduled Lambda function that looks for archives older than 30 days and deletes them from Glacier and DynamoDB.
It's a bit of work to set up, but it would accomplish what you're trying to do.

backing up s3 buckets best practice

I want to do a daily backup for s3 buckets. I was wondering if anyone knew what was best practice?
I was thinking of using a lambda function to copy contents from one s3 bucket to another as the s3 bucket is updated. But that won't mitigate against an s3 failure. How do I copy contents from one s3 bucket to another Amazon service like Glacier using lamda? What's the best practice here for backing up s3 buckets?
NOTE: I want to do a backup not archive (where content is deleted afterward)
Look into S3 cross-region replication to keep a backup copy of everything in another S3 bucket in another region. Note that you can even have the destination bucket be in a different AWS Account, so that it is safe even if your primary S3 account is hacked.
Note that a combination of Cross Region Replication and S3 Object Versioning (which is required for replication) will allow you to keep old versions of your files available even if they are deleted from the source bucket.
Then look into S3 lifecycle management to transition objects to Glacier to save storage costs.