Mimicking WHM backup rotations with S3 Lifecycle - amazon-web-services

I'm setting up a new managed VPS server to back up to Amazon S3. WHM has S3 backup natively implemented now, but it does not support deletion/rotation. I'd like to keep a set of backups something like this:
2 daily backups in S3
1 weekly backup in S3
4 weekly backups in Glacier
12 monthly backups in Glacier
yearly backups in Glacier
After WHM backups run, the S3 bucket contains this file structure:
yyyy-mm-dd/
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
I might even want different backup rules for different accounts (some more active, some less so). Given how many WHM accounts are using S3 for backup, surely this is a solved problem? I searched StackOverflow and google, but I'm not finding any info on how to use the S3 LifeCycle to do anything other than "move files older than X."
If this just isn't feasible, feel free to recommend a different WHM backup strategy (though my host's custom offsite backup is prohibitively expensive, so not an option).

Use different folders (S3 path) for your different file types. Then create a Lifecycle rule on that path. with the time you want the objects to be in S3, and/or glacier time and expiration
/daily/yyyy-mm-dd/ <- no lifecycle rule
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
/weekly/yyyy-mm-dd/ <- LifeCycleRule "weekly" files older than 7 days
are moved to glacier, files older than 45 days are removed from glacier
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
/monthly/yyyy-mm-dd/ <- LifeCycleRule "monthly" files older than 1 days
are moved to glacier, files older than 366 days are removed from glacier
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz

It turns out that WHM backup rotation actually is working now with S3 (rumors and documentation to the contrary). This means that greg_diesel's suggestion of using the lifecycle is not necessary to expire old logs (and keep my costs down), but it's the right answer to manage moving older monthly files to glacier before they are deleted by the WHM rotation.
Thanks!

Related

Confused about *storage-class 'Glacier' use by S3* and *S3-Glacier' service

Still confused about storage-class 'Glacier' use by S3 and S3-Glacier' service.
What's their difference and how about their upload and retrieve?
See a example question below.
You’re researching third-party backup solutions to backup 10 TB of data nightly to Amazon S3. File restores won’t be needed often, but when they are, they’ll need to be available in under five minutes. Your analysis shows that you will exceed your budget for backup storage and you need to find a way to reduce the estimated monthly costs. How should you modify the solution to achieve the cost reduction needed?
Create an S3 lifecycle rule to move the data immediately to Amazon
S3 Glacier
Choose a third-party backup solution that writes directly to the
Amazon S3 Glacier API
Choose a third-party backup solution that leverages AWS Storage
Gateway to write data to Amazon S3 Glacier.
Why option 2 is correct and how about option 1 and option 3? Thanks
Glaicer is a storage class under the S3 service. Glacier is used for archiving data. Glacier and Glacier Deep Archive have a longer retrival time than the other S3 storage tiers (Standard, Standard-IA, One Zone-IA), but also cost significantly cheaper.
This looks like a certification question, CSA - Associate, maybe? You may have forgotten to provide the fourth answer choice.
You cannot move data to Glacier immediately using a lifecycle policy. You can set it to 0 days but it still takes time to make the move.
You do not need third party software to write to the AWS APIs, you can use the CLI and SDKS
This maybe the answer because, using a third-party piece of software that is able to take care of some of the overhead involved in getting a Storage Gateway File Gateway up and running, and configured to store data to Glacier or Glacier Deep Archive is easier.
Typically, "third-party" is not the answer in certification exam questions.

Checking the integrity of an archive uploaded to AWS Glacier

We have daily database backups created and stored on a server. In order to free up space, it was decided that all the backups older than 30 days should be archived using AWS Glacier.
So far so good, I managed to write a PowerShell script to select the required files and upload them to Glacier, but since I am new to all the AWS stuff, I have one question: is it possible to check that the files I have uploaded are indeed in the archive and that there has been no information loss?
My first approach was to send job retrieval requests for all the files that we have uploaded, and 4 hours later compare the checksums and archive ids of our original files and the ones we retrieved from Glacier. However, I think this process takes long, costs extra money, and most importantly, makes no sense at all..
I have also found that I can use inventory retrieval, but as far as I can tell this approach would be very similar to the one described above, just without downloading all the files again.
Lastly, is there even a point to trying to ensure that a file upload was successful if there are no errors? My vague understanding is that AWS would come back with error messages should an upload to Glacier fail, and it computes checksums internally during uploads.
I know that StackOverflow has seen more precisely worded questions, but any clarification regarding this would be immensely appreciated.
You have to try pretty hard to upload a corrupt file to Glacier, because Glacier requires checksums sent with each API request, and will reject the uploads if they don't match the hashes. Obviously you need to spot check your archives, but each one does not need to be downloaded and verified because of the built-in protections.
See Computing Checksums in the Amazon S3 Glacier Developer Guide for descriptions of how this works, on the wire.
Then, consider not using Glacier at all... not directly, anyway. Use S3, and upload your files using the GLACIER or DEEP_ARCHIVE storage class. Or upload them as Standard, with a lifecycle policy that moves them into one of the archive storage classes after 1 day. (Useful because if you delete Glacier or Deep Archive uploads before the minimum storage time, you're billed for the entire minimum time... this way you have a 24 hour "oops I don't like the way I set this up" window, since Standard storage has no minimum storage time period).
Using S3 is a far better solution, because S3 has a much better API and console, but the pricing is identical, because S3 is actually using Glacier as its backend, while you have the advantage of S3 as the frontend. Glacier has essentially no console functionality, is very opaque, and is not really designed for human interaction -- Glacier appears to have been designed as a backing store for an archiving system or service, which is exactly how S3 uses Glacier.
Amazon Simple Storage Service (Amazon S3) supports lifecycle configuration on an S3 bucket, which enables you to transition objects to the Amazon S3 GLACIER storage class for archival. When you transition Amazon S3 objects to the GLACIER storage class, Amazon S3 internally uses Glacier for durable storage at lower cost. Although the objects are stored in Glacier, they remain Amazon S3 objects that you manage in Amazon S3, and you cannot access them directly through Glacier.
https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html
It is confusing and unfortunate that AWS recently confused this issue by dumbing things down, rebranding "Glacier" as "S3 Glacier," as if they were the same thing, when they are two very different services, one of which operates in a mode that gives you a gateway to the other. It's similarly unfortunate how Glacier has traditionally been marketed. Without S3 in front, Glacier is not well suited for very many applications.

Automatically delete old backups from S3 and move monthly to glacier

I have set up Gitlab to save a daily backup to an Amazon S3 bucket. I want to keep a monthly backup one year back on glacier and daily backups one week back on standard storage. Is this cleanup strategy viable and doable using S3 lifecycle rules? If yes, how?
Amazon S3 Object Lifecycle Management can Transition storage classes and/or Delete (expire) objects.
It can also work with Versioning such that different rules can apply to the 'current' version and 'all previous' versions. For example, the current version could be kept accessible while pervious versions could be transitioned to Glacier and eventually deleted.
However, it does have the concept of a "monthly backup" or "weekly backup". Rather, rules are applied to all objects equally.
To achieve your monthly/weekly objective, you could:
Store the first backup of each month in a particular directory (path)
Store other backups in a different directory
Apply Lifecycle rules differently to each directory
Or, you could use the same Lifecycle rules on all backups but write some code that deletes unwanted backups at various intervals (eg every day deletes a week-old backup unless it is the first backup of the month). This code would be triggered as a daily Lambda function.

How to restore millions of file's some time old version within a directory in S3 bucket?

We are taking a backup of our tool's home directories and configuration within same S3 bucket separating by directory names. We have versioning enabled on S3 bucket. With the fear of loosing important files in S3 backup as well, if something goes wrong with our distributed file system and scheduled S3 sync job runs - We didn't enable --delete option during s3 sync.
Now, I wonder -
If something goes wrong with the local backups and we realise - let's say 4 days after - and we need to restore 4 days old files within just one directory of S3 bucket, how I can achieve that?
Also, if you can answer -
Since we didn't enable --delete option, restore will also restore all files which were deleted or not present locally - which are no longer required - How can we avoid that during restore?
For business-critical backups, I would recommend using a "real" backup tool that can provide point-in-time restoration.
There are many backup tools that can store data in Amazon S3 and Amazon Glacier, such as CloudBerry Backup.

How to make automated S3 Backups

I am working on an app which uses S3 to store important documents. These documents need to be backed up on a daily, weekly rotation basis much like how database backups are maintained.
Does S3 support a feature where a bucket can be backup up into multiple buckets periodically or perhaps in Amazon Glacier. I want to avoid using an external service as much as possible, and was hoping S3 had some mechanism to do this, as its a common usecase.
Any help would be appreciated.
Quote from Amazon S3 FAQ about durability:
Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years
These numbers mean, first of all, that they are almost unbeatable. In other words, your data is safe in Amazon S3.
Thus, the only reason why you would need to backup your data objects is to prevent their accidental loss (by your own mistake). To solve this problem Amazon S3 enables versioning of S3 objects. Enable this feature on your S3 bucket and you're safe.
ps. Actually, there is one more possible reason - cost optimization. Amazon Glacier is cheaper than S3. I would recommend to use AWS Data Pipeline to move S3 data to Glacier routinely.
Regarding Glacier, you can make settings on your bucket to backup (old) s3 data to glaciaer if it is older than specified duration. This can save you cost if you want infrequently accessed data to be archived.
In s3 bucket there are lifecycle rules using which we can automatically move data from s3 to glaciers.
but if you want to access these important documents frequently from backup then you can also use another S3 bucket for backup your data.This backup can be scheduled using AWS datapipeline daily,weekly etc.
*Glaciers are cheaper than S3 as data is stored in compressed format in galaciers.
I created a Windows application that will allow you to schedule S3 bucket backups. You can create three kinds of backups: Cumulative, Synchronized and Snapshots. You can also include or exclude root level folders and files from your backups. You can try it free with no registration at https://www.bucketbacker.com