Unable to upload files to Amazon Glacier - amazon-web-services

I have an AWS EC2 instance yielding some data, which in turn is meant to be moved to AWS Glacier. According to Is it possible to move EC2 volumes to Amazon Glacier without having to download and upload it? - Stack Overflow there are only two ways to put data in Glacier:
Upload data directly as described in Using Amazon Glacier with the AWS Command Line Interface - AWS Command Line Interface
Copy/move the data to S3 and create a lifecycle rule.
Unfortunately neither approaches worked, meaning that I access my vault and nothing is there, even after a week. Furthermore once I complete the example provided, the "aws glacier describe-vault" command
outputs:
{
"SizeInBytes": 0,
"NumberOfArchives": 0,
"CreationDate": "2018-08-14T12:59:31.456Z",
...
}
What am I missing?

For Option #2, where you created a lifecycle rule to move objects to Glacier, you will not see the objects in Glacier itself.
When Amazon S3 lifecycles objects to Glacier, the objects are kept in a Glacier vault that is managed by Amazon S3 and is not visible to you. Instead, the objects in S3 will show a Storage Class of Glacier, which means that the object metadata is kept in S3 (name, size, etc) but the actual contents of the object has been moved to Glacier.
As long as you can see the Storage Class of Glacier, your objects have been successfully moved to Glacier.

Related

Uploading files to Glacier using AWS S3 v/s S3 Glacier upload

Standard S3 console supports uploading files and changing storage type, but in S3 Glacier we need to create a vault, and console support is not provided. let's say if I selected the S3 Glacier storage class in standard S3 upload, how it's different from Glacier, will it internally create a vault? is there any price variation?
Uploading to Glacier via Amazon S3 storage classes looks simple and easier.
There are two different types of Glacier.
The 'original' Amazon Glacier uses vaults and jobs. Quite frankly, it is awful to use. It's bearable if you are using a software package that knows how to use Glacier, but it is not a pleasant experience. For example, even just listing the contents of a vault requires waiting for a job to run, and then results need to be retrieved.
Using Glacier as a Storage Class in Amazon S3 is a much more pleasant way to use Glacier. You can use all standard S3 commands and utilities and it gives immediate feedback when you list objects. The only thing that takes time is retrieving an object that is in a Glacier storage class.
Plus, the Glacier and Glacier Deep Archive storage classes are cheaper than Glacier itself! I'd like to prove this, but the pricing page for Glacier now redirects to S3 pricing so it's not possible to see how much it costs!
Bottom line: Use S3 storage classes, not the old 'Glacier' service that uses Vaults.

Do I need to setup Glacier Vault to archive data from S3?

I'm really new to AWS and quite confused on the purpose of Glacier vault, when I can archive my objects thru S3 via lifecycle rule? so do I have to first setup Glacier Vault for me to archive my objects?
Once upon a time, there was a service called Amazon Glacier. It was very low-cost, but it was very painful to use. Every request (even listing the contents of a vault) took a long time (eg make a request, come back an hour later to get the result).
Then, the clever people in Amazon S3 realized that they could provide a more friendly interface to Glacier. By simpler changing the storage class of objects in S3 to Glacier, they would move the files to their own Glacier vault and save you all the hassle.
Then, the S3 team introduced Glacier Deep Archive, which is only available via Amazon S3 and is even lower cost than Glacier itself!
The children rejoiced and all cried out in unison... "We will now only use Glacier via S3. We will never go direct to Glacier again!"
No you don't have to. You use Glacier Vaults if you want to use extra features that S3 Glacier service provides, such as Vault Lock Policies and/or Vault Access Policies.
For using just the Glacier storage, you can use Amazon S3 service and lifecycle rules.

S3 deleting files in Glacier storage class

Our S3 bucket contains a mix of objects in Standard and Glacier storage class(due to the lifecycle rules we setup). I want to understand how to delete the Glacier storage class objects. I looked at the Glacier console and we don't have a Vault/archive to delete. So, my guess is that S3 is managing the Glacier vault for us. I also looked at the lifecycle rule blog post(https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/) to validate this.
TL;DR
So, is deleting an object (in Glacier storage class) from S3 (using the aws CLI or lifecycle rules) sufficient to delete it from Glacier as well?
So, is deleting an object (in Glacier storage class) from S3 (using the aws CLI or lifecycle rules) sufficient to delete it from Glacier as well?
Yes, it is. Deleting the object from S3 removes it from Glacier.
The GLACIER storage class uses the very low-cost Amazon Glacier storage service, but you still manage objects in this storage class through Amazon S3.
https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html
When S3 stores objects in the GLACIER storage class, S3 doesn't put that data into your Glacier in your AWS account. S3 seems to have its own, separate interface to Glacier. (Similarly, EBS snapshots are stored "in S3," but not in "your" S3.)

Amazon AWS Athena S3 and Glacier Mixed Bucket

Amazon Athena Log Analysis Services with S3 Glacier
We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR failure.
Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena because of this. The file referenced in the error is a Glacier backup.
My guess is the answer will be: don't keep glacier backups in the same bucket. We don't have this option with ease due to our data volume sizes. I believe Athena will not work in this setup and we will not be able to use Athena for our log analysis.
However if there is a way we can use Athena, we would be thrilled. Is there a solution to HIVE_CURSOR_ERROR and a way to skip Glacier files? Our s3 bucket is a flat bucket without folders.
The S3 file object name shown in the above and below screenshots is omitted from the screenshot. The file reference in the HIVE_CURSOR_ERROR is in fact the Glacier object. You can see it in this screenshot of our S3 Bucket.
Note I tried to post on https://forums.aws.amazon.com/ but that was no bueno.
The documentation from AWS dated May 16 2017 states specifically that Athena does not support the GLACIER storage class:
Athena does not support different storage classes within the bucket specified by the LOCATION
clause, does not support the GLACIER storage class, and does not support Requester Pays
buckets. For more information, see Storage Classes, Changing the Storage Class of an Object in
|S3|, and Requester Pays Buckets in the Amazon Simple Storage Service Developer Guide.
We are also interested in this; if you get it to work, please let us know how. :-)
Since the release of February 18, 2019 Athena will ignore objects with the GLACIER storage class instead of failing the query:
[…] As a result of fixing this issue, Athena ignores objects transitioned to the GLACIER storage class. Athena does not support querying data from the GLACIER storage class.
You must have an S3 bucket to work with. In addition, the AWS account that you use to initiate a S3 Glacier Select job must have write permissions for the S3 bucket. The Amazon S3 bucket must be in the same AWS Region as the vault that contains the archive object that is being queried.
S3 glacier select runs the query and stores in S3 bucket
Bottom line, you must move the data into an S3 buck to use teh S3 glacier select statement. Then use Athena on the 'new' S3 bucket.

aws s3 glacier storage and transfer

I am trying to see if there is a way to transfer s3 objects in glacier in one bucket to another bucket but keep the storage type the same? I can restore the glacier object and transfer it, but in the new bucket, the file is saved in standard storage. I would like it to know if there is a way that the file is directly stored in glacier outside of enforcing life cycle policies on the bucket.
There isn't.
Objects can only be copied to another bucket once restored, and objects can only be transitioned into the Glacier storage class by lifecycle policies, not by creating them with this storage class ... which essentially rules out the possibility of the desired outcome for two different reasons.
S3 does not have either a "move" or a "rename" feature -- both of these can only be emulated by copy-and-delete.