S3 life-cycle is not working - amazon-web-services

S3 life cycle is not working, I have configured this life-cycle policy 2 days back but still, my objects are showing in S3-RRS
Any help would be appreciated.

The process that archives files to Glacier can take up to 48 hours to take effect. However, you will receive the billing benefit immediately.
So, check again after 48 hours to confirm whether it is working as expected.
By the way, RRS (Reduced Redundancy Storage) stores data in only two data centers instead of the normal three data centers. Historically it was a lower cost, but these days it is not cheaper, so should be avoided. There is no benefit to using RRS (and it is possibly more expensive).

Related

Does the number or size of objects in an AWS S3 bucket have any performance implications

We have an AWS S3 bucket that is used for storing a number of files, about 50000 a day and about 5-10GB.
Currently we've got lifecycle rules to clear the files out after 2 days. We need to keep these files for longer now (1 year). The names are unique (start with GUID) and we're comfortable with the cost implications.
The question I have is whether insert or retrieval performance will be affected at all?
We don't list the contents of the bucket (obviously that would be slower). The AWS documentation is vague but seems to imply that there will be no change but I wonder if anyone has any real world observations.
Based on my experience, I don't believe there will be any implications, as long as you are not listing the contents of the bucket in order to find the object that you want (and you said you are not).
If you already know the object key when you are about to GET it, then there should be zero performance implications.
There will be no implications for performance of write or read.
These actions are queued as a background job and executed over time rather than happening immediately at the expiry time.
Remember your S3 bucket is distributed across a large pool of nodes that are shared in the region, therefore performance issues would affect many partners.
The delete actions do not count towards your quotas either.

bigstore increasing almost linearly Google Cloud

I use many api's from Google Cloud. Recently I noticed that the bigstore is gradually increasing on a daily basis. I am worried that if this continues I wont be able to pay the bill.
I do not know however how to check where this increase is coming from. Is there a way to see which cloud functions are causing this increased traffic?
The reason I am surprised about the increase in the traffic of bigstore is because I have cron jobs that are running multiple times per day to store the data in BigQuery. I have not changed these settings, so I would assume that this traffic should not increase as shown on the chart.
One other explanation I can think of is that the amount of data that I am storing has increased, which is indeed true on a daily basis. But why does this increase the traffic?
What is the way to check this?
There are two main data sources you should use:
GCP-wide billing export. This will tell you an exact breakdown of your costs. This is important to make sure you target your effort where the cost is largest to you. It also provides some level of detail about what the usage is.
Enable access & storage logging. The access log will give you an exact accounting of incoming requests down to the number of bytes transferred. The storage logs give you similar granularity into the cost of storage itself.
In addition, if you have a snapshot of your bigstore, as time goes on and you replace or even rename files, your storage charges will increase because where once you had 2 views of the same storage, as the files change each file forks in 2 copies (one is the current view of your storage, one is the snapshot.)

AWS S3 Standard Infrequent Access vs Reduced Redundancy storage class when coupled with CloudFront?

I'm using CloudFront to cache and distribute all of my thumbnails currently stored on S3 in Standard storage class. Since CloudFront caches originals and accesses them only every 24 hours, it makes sense to use a cheaper storage class than Standard: either Standard Infrequent Access (IA) or Reduced Redundancy (RR). But I'm not sure which one would be more suitable and cost effective.
Standard-IA has the cheapest storage among all (58% cheaper than Standard class and 47% cheaper than RR), but 60% more expensive requests than both Standard and RR. However, all files under 128kb stored in Standard-IA class are rounded to 128kb when calculating cost, which would apply to most of my thumbnail images.
Meanwhile, storage in RR class is only 20% cheaper than Standard, but its request cost is 60% cheaper than that of Standard-IA.
I'm unsure which one would be most cost effective in practice and would appreciate anyone with experience using both to give some feedback.
There's a problem with the premise of your question. The fact that CloudFront may cache your objects for some period of time actually has little relevance when selecting an S3 storage class.
REDUCED_REDUNDANCY is sometimes less expensive¹ because S3 stores your data on fewer physical devices, reducing the reliability somewhat in exchange for lower pricing... and in the event of failures, the object is statistically more likely to be lost by S3. If S3 loses the object because of the reduced redundancy, CloudFront will at some point begin returning errors.
The deciding factor in choosing this storage class is whether the object is easily replaced.
Reduced Redundancy Storage (RRS) is an Amazon S3 storage option that enables customers to reduce their costs by storing noncritical, reproducible data at lower levels of redundancy than Amazon S3’s standard storage. It provides a cost-effective, highly available solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced.
https://aws.amazon.com/s3/reduced-redundancy/
STANDARD_IA (infrequent access) is less expensive for a different reason: the storage savings are offset by retrieval charges. If an object is downloaded more than once per month, the combined charge will exceed the cost of STANDARD. It is intended for objects that will genuinely be accessed infrequently. Since CloudFront has multiple edge locations, each with its own independent cache,² whether an object is "currently stored in" CloudFront is not a question with a simple yes/no answer. It is also not possible to "game the system" by specifying large Cache-Control: max-age values. CloudFront has no charge for its cache storage, so it's only sensible that an object can be purged from the cache before the expiration time you specify. Indeed, anecdotal observations confirm what the docs indicate, that objects are sometimes purged from CloudFront due to a relative lack of "popularity."
The deciding factor in choosing this storage class is whether the increased data transfer (retrieval) charges will be low enough to justify the storage charge savings that they offset. Unless the object is expected to be downloaded less than once or twice a month, this storage class does not represent a cost savings.
Standard/Infrequent Access should be reserved for things you really don't expect to be needed often, like tarballs and database dumps and images unlikely to be reviewed after they are first accessed, such as (borrowing an example from my world) a proof-of-purchase/receipt scanned and submitted by a customer for a rebate claim. Once the rebate has been approved, it's very unlikely we'll need to look at that receipt again, but we do need to keep it on file. Hello, Standard_IA. (Note that S3 does this automatically for me, after the file has been stored for 30 days, using a lifecycle policy on the bucket).
Standard - IA is ideally suited for long-term file storage, older data from sync and share, backup data, and disaster recovery files.
https://aws.amazon.com/s3/faqs/#sia
Side note: one alternative mechanism for saving some storage cost is to gzip -9 the content before storing, and set Content-Encoding: gzip. I have been doing this for years with S3 and am still waiting for my first support ticket to come in reporting a browser that can't handle it. Even content that is allegedly already compressed -- such as .xlsx spreadsheets -- will often shrink a little bit, and every byte you squeeze out means slightly lower storage and download bandwidth charges.
Fundamentally, if your content is easily replaceable, such as resized images where you still have the original... or reports that can easily be rerun from source data... or content backed up elsewhere (AWS is essentially always my first choice for cloud services, but I do have backups of my S3 assets stored in another cloud provider's storage service, for example)... then reduced redundancy is a good option.
¹ REDUCED_REDUNDANCY is sometimes less expensive only in some regions as of late 2016. Prior to that, it was priced lower than STANDARD, but in an odd quirk of the strange world of competitive pricing, as a result of S3 price reductions announced in November, 2016, in some AWS regions, the STANDARD storage class is now slightly less expensive than REDUCED_REDUNDANCY ("RRS"). For example, in us-east-1, Standard was reduced from $0.03/GB to $0.023/GB, but RRS remained at $0.024/GB... leaving no obvious reason to ever use RRS in that region. The structure of the pricing pages leaves the impression that RRS may no longer be considered a current-generation offering by AWS. Indeed, it's an older offering than both STANDARD_IA and GLACIER. It is unlikely to ever be fully deprecated or eliminated, but they may not be inclined to reduce its costs to a point that lines up with the other storage classes if it's no longer among their primary offerings.
² "CloudFront has multiple edge locations, each with its own independent cache" is still a technically true statement, but CloudFront quietly began to roll out and then announced some significant architectural changes in late 2016, with the introduction of the regional edge caches. It is now, in a sense, "less true" that the global edge caches are indepenent. They still are, but it makes less of a difference, since CloudFront is now a two-tier network, with the global (outer tier) edge nodes sometimes fetching content from the regional (inner tier) edge nodes, instead of directly from the origin server. This should have the impact of increasing the likelihood of an object being considered to be "in" the cache, since a cache miss in the outer tier might be transformed into a hit by the inner tier, which is also reported to have more available cache storage space than some or all of the outer tier. It is not yet clear from external observations how much of an impact this has on hit rates on S3 origins, as the documentation indicates the regional edges are not used for S3 (only custom origins) but it seems less than clear that this universally holds true, particularly with the introduction of Lambda#Edge. It might be significant, but as of this writing, I do not believe it to have any material impact on my answer to the question presented here.
Since CloudFront caches originals and accesses them only every 24 hours
You can actually make CloudFront cache things for much longer if you want. You just need to add metadata to your objects that sets a Cache Control header, and according to the S3 documentation you can specify an age up to 100 years. You simply set a max-age in seconds, so if you really want to have your objects cached for 100 years:
Cache-Control: max-age=3153600000
As for your main question regarding SIA vs. RR, you've pretty much hit on all the differences between the two. It's just a matter of calculating the costs of using one vs. the other. You'll just need to run some calculations and see what the cost estimates are. If you have 100 thumbnails all under 128K then SIA will charge you for 100 * 128K bytes, whereas RR will just charge you for the costs of the total size of those 100 thumbnails. Similarly, if you set a fairly high cache timeout in CloudFront then you may see only 10 fetches from S3 each day, so SIA would charge you for retrieval of 10 * 128K bytes each day while RR would only charge you for the cost of the size of those 10 thumbnails.
Using some real numbers based on the size & quantity of your thumbnails and the amount of traffic you anticipate it should be pretty easy to come up with cost estimates.
FYI, you might also want to take a look at some of these slideshows and/or these videos. These are all from Amazon's re:Invent conferences, and these links should provide you with S3-specific presentations at those conferences.

Is switching over to Amazon S3 for Drupal 7 image hosting worth it?

So I just have a quick question with regards to using amazon s3.
I have a small Drupal 7 site hosted on a VPS with not too much storage space. I put together the site for members of my School's Photographic Arts Committee to upload photos of School events and projects.
The full-quality photos are stored in a private folder on the server, and the images displayed on the site are watermarked 2048px width ones stored publicly.
I'm worried that I'm going to blow my storage space very fast, and I fear that I'm going to blow my not-really-exsistant budget on using amazon s3 with the module in Drupal.
So, I would like to know if it is a worthy investment using amazon s3, I'll be willing to spend +/- $5 dollars on it.
My monthly usage will include 3gigs worth of uploads and probably 20 gigs max downloads. Obviously slowly increasing.
Also, a bit confused about storage billing, do I have to pay for say my 50gigs worth storage from uploads from previous months, or just the 3 gigs of storage I used this month
PS: I live in South Africa and will probably use the Ireland S3 servers as they have the best latency.
Any feedback much appreciated!
Thanks.
S3 may be a good option in your case, given your limited storage space.
You can calculate things fairly easily. Ignoring the 'requests' charge since it's tiny, here's the formula for Ireland:
(gb of storage * 0.3) + (avg image size * requests * 0.09) + (requests * 0.005/1000)
There are some volume discounts and some "first N transfer free", but this is a good ceiling, especially for a low-volume site as you mention. Also note storing the full-size images (and not downloading them) means only the first third of the formula matters. As an example, if you have 5gb in full-size images plus another 1gb in 350kb "2048px" images that sum to 10,000 image views per month:
full-size: 5*.03=.15
2048 hosting/downloads: (1*.03)+(0.00033*10000*.09)+(10000*.0004/10000)=0.3274
So, your monthly costs are about 50 cents.
What happens if your site is slashdotted? Imagine you get 10 million hits:
full-size: 5*.03=.15
2048 hosting/downloads: (1*.03)+(0.00033*10000000*.09)+(10000000*.004/10000)=301.03
So, your monthly cost is now over $300. (this is why billing alarms are important!)
Now, let's imagine you put cloudfront in front of S3 (which is a really good idea for several reasons) and look at the pricing in this scenario. (I've simplified the pricing here a little bit, and assuming nothing is loaded twice by the same browser, so no caching)
full-size: 5*.03=.15
2048 hosting/downloads: (1*.03)+(0.00033*10000000*.085)+(10000000*.009/10000)=289.53
so it saved about $10 but gave you better performance.
If you need more features (image resizing, for instance), you may want to consider a photo host like Flickr or Smugmug. They pay for bandwidth, which makes your costs more predictable.

Backup: Amazon S3 or Glacier - lots of little files?

I'm trying to understand the complicated Amazon Glacier pricing model. I don't want to store a huge amount of data, a few GB's say 10. I hope never to download the files and if I did need to I don't care how long it takes.
Is there a cost per file I upload? Is it cheaper to zip lots of tiny files and upload in a few chunks or does 10,000 say images not matter? (cannot get a straight answer to this during searching)
Am I able to request the download of a whole Archive/Bucket or is it file-by-file?
I know this is a bit old, but you may still find my answer helpful (I hope). The other answer is based on S3 which wasn't your question I believe.
Glacier is intended for rare file access. Having that in mind they sort of punish you if you need to retrieve many files at once. In your particular case I would suggest uploading 10.000 separate files instead of let's say 100 ZIP files with 100 files each. The reason is very simple. Glacier will let you download for free only 5% of the total archive and is prorated daily. So if, for example, you need to download 10 photos you took on a weekend, you would be able to get those 10 photos for free if they are spread in the vault. On the other hand, if you have a ZIP file that has 100 photos inside, you'll be forced to download that zip that will probably be more than 5% of the total archive meaning you'll be paying some fees for the retrieval.
The only reason it makes sense to upload fewer files is to avoid high upload requests (10.000 files usually mean 10.000 requests). Requests are charged $0,05 per 1000. This fees are much lower that retrieval fees (taking into account the limits imposed), that's why I would always recommend uploading separate files. Of course you may zip files that make sense to be together.
Retrieval costs are very complex in Amazon Glacier. They have a good explanation here:
http://aws.amazon.com/glacier/faqs/#How_much_data_can_I_retrieve_for_free
But even there you'll need to pay attention on the calculations to get a clear idea on how costs are billed.
Regarding this question:
Am I able to request the download of a whole Archive/Bucket or is it file-by-file?
Requests are by file-by-file, although you can select many files at once and download them altogether.
Deciding whether to use S3 or Glacier really depends on your needs on file access. If you will rearly need access to your files then Glacier is your answer. Otherwise for 10GB S3 can still be cheap and be more flexible than Glacier.
In my case I find family photos to be a very precious thing. That's why I have a 100GB backup on glacier with all my family photos. I don't intend to access it unless there is some kind of disaster at home. In that case, I think I would not mind the retrieval cost if that saved something I really care about. But that's just me.
Detailed pricing information for S3 is available here. Specifics of the API functions available are here.
For S3, you are mostly charged for upload bandwidth (bytes sent TO S3), download bandwidth (bytes received FROM S3), and storage (bytes IN S3). You are also charged for the number and type of API calls.
So, if you upload your 10GB of data to S3 in 10,000 1MB files, store it for a month, and then download each of the files once, you'll be charged:
$0.00 for upload bandwidth (this is free)
$0.10 for the 10,000 PUT requests to upload the files
$0.95 for storing the 10GB for a month
$1.08 for 10GB download bandwidth (the first is free, then $0.12/GB)
$0.01 for the 10,000 GET requests to download the files
That's $2.14. If you uploaded and downloaded once each, but kept the data for a year, only the storage cost would go up to 12 * $0.95, or $11.40. If your files averaged only 100KB, so you had 100,000 of them, you'd pay 10 times as much for the PUT and GET requests, or $1.10 instead of $0.11.
You can only upload and download a single file per operation. If you combined your files into one using Zip, you'd only save by using fewer operations, which, as you can see, are pretty cheap to start with.
There is one quirk here, though. I'm pretty sure you are charged for all bandwidth usage when uploading and downloading, including request headers, not just the bodies containing your data. So if your files were really tiny the request headers might become significant, perhaps as much as the files themselves. In that case your bandwidth costs would double.
Glacier pricing is more complicated, and I've never used it myself. Basically, it reduces storage cost by almost ten-fold, leaving the other costs the same, and adding costs to archive and restore per object. Those costs seem to be significant if you have a lot of small objects, need to get a lot of your files at a time, or get files frequently. Glacier seems to be best when you have a lot of data (terabytes or more, not just gigabytes), but few operations. Given that you only have 10GB of data, S3 is so inexpensive it doesn't seem worth it to consider Glacier.
Finally, AWS has a free usage tier for the first year, which looks like it would cover all your costs except for half the storage charges.
Better use few larger files than lot of small ones
There are two approaches to putting files into Amazon Glacier. You either interact with vaults directly, or use S3 as frontend.
I am using S3 (and Amazon Management Console) so that I am able to see content of the archive and at the same time have it stored cheaply in Glacier.
This approach has one drawback - as storing any piece of information in Glacier has some data overhead (which you pay for too), then there is logically a break even point. Before 2014-04 price reduction I made a calculation and critical size is about 16 kB, storing smaller files in Glacier (using AWS S3 as frontend) was more expensive than keeping it only on S3. With price reduction for S3 storage (Glacier did not change) the break even point went even higher.
I guess, that even without S3 as frontend, the situation will be similar, even though a bit more friendly to smaller files.
Since November 21, 2016, Amazon updated the free tier policy for Glacier retrievals and updated the "5% of your average monthly storage" policy in favor of a flat 10GB free per month. However, if your retrieval policy was set prior to that day, then you're still on the "5%" policy and the other answers here still apply to you.
If your retrieval policy was set after Nov 21, 2016, and you're in the OP's shoes:
You're only storing 10GB, so you could retrieve all of your data for free once per month using Standard retrievals. It would make no difference if all 10,000 photos are zipped into one zip file or not (for retrievals).
The only variable in this scenario is number of upload requests. 10,000 requests at a price of $0.05 per 1,000 is only $0.50 and that's a one time fee for your specific case.
More pricing info at AWS Glacier FAQ
UPDATE:
Glacier docs recommend using multipart upload for files larger than 100MB.
I came to this conclusion independently after a couple timeouts when trying to upload an 8GB file.