AWS S3 how to get prefix cost in period - amazon-web-services

I have a bucket that receives something around 20 new prefixes in a day.
The prefixes have files that are our products, and we need to know how much each product costs to keep on air.
I was researching how to get the total cost of each product (storage and data transfer) with 'Cost Explorer' and 'CloudWatch'.
The first does not seems to help me, while cloudwatch does have prefix or tags options, but I need to previously specify which prefix to watch over.
Is there a way to get this cost without previous configuration?

Cost is easy, since it is based on volume of data. Use Amazon S3 Inventory to obtain a daily listing of content.
Access costs is not available broken down by prefix. Instead, use Amazon S3 Server Access Logging to break down access by object and, therefore, by prefix. Then allocate the billed data transfer costs amongst prefixes. Use Bytes Sent to determine volume.

Related

Costs Related to Individual Bucket Items in S3

My AWS S3 costs have been going up pretty quickly for usage type "DataTransfer-Out-Bytes". I have thousands of images in this one bucket and I can't seem to find a way to drill down into the bucket to see which individual bucket items might be causing the increase. Is there a way to see which individual files are attributing to the higher data transfer cost?
Use Cloudfront if you can - its cheaper(if you properly set your cache headers!) than directly hosting from S3 and Cloudfront includes a popular objects report - which would answer your question.
If your using S3 alone you need to enable logging on the bucket (more storage cost) and then crunch the data in the logs (more data transfer cost) to get your answer. You can use AWS Athena to process the s3 access logs or use unix command line tools like grep/wc/uniq/cut to operate on the log files locally/from a server to find the culprits.

How to limit number of reads from Amazon S3 bucket

I'm hosting a static website in Amazon S3 with CloudFront. Is there a way to set a limit for how many reads (for example per month) will be allowed for my Amazon S3 bucket in order to make sure I don't go above my allocated budget?
If you are concerned about going over a budget, I would recommend Creating a Billing Alarm to Monitor Your Estimated AWS Charges.
AWS is designed for large-scale organizations that care more about providing a reliable service to customers than staying within a particular budget. For example, if their allocated budget was fully consumed, they would not want to stop providing services to their customers. They might, however, want to tweak their infrastructure to reduce costs in future, such as changing the Price Class for a CloudFront Distribution or using AWS WAF to prevent bots from consuming too much traffic.
Your static website will be rather low-cost. The biggest factor will likely be Data Transfer rather than charges for Requests. Changing the Price Class should assist with this. However, the only true way to stop accumulating Data Transfer charges is to stop serving content.
You could activate CloudTrail data read events for the bucket, create a CloudWatch Event Rule to trigger an AWS Lambda Function that increments the number of reads per object in an Amazon DynamoDB table and restrict access to the objects once a certain number of reads has been reached.
What you're asking for is a very typical question in AWS. Unfortunately with near infinite scale, comes near infinite spend.
While you can put a WAF, that is actually meant for security rather than scale restrictions. From a cost-perspective, I'd be more worried about the bandwidth charges than I would be able S3 requests cost.
Plus once you put things like Cloudfront or Lambda, it gets hard to limit all this down.
The best way to limit, is to put Billing Alerts on your account -- and you can tier them, so you get a $10, $20, $100 alerts, up until the point you're uncomfortable with. And then either manually disable the website -- or setup a lambda function to disable it for you.

Reducing AWS Data Transfer Cost for AWS S3 files

AWS S3 has a standard public bucket and folder (Asia Pacific region) which hosts ~30 GB of images/media. On another hand, the website and app access these images by using a direct S3 object URL. Unknowingly we run into high data transfer cost and its significantly unproportionate:
Amazon Simple Storage Service: USD 30
AWS Data Transfer: USD 110
I have also read that if EC2 and S3 is in the same region cost will be significantly lower, but problem is S3 objects are accessible from anywhere in the world from client machine directly and no EC2 is involved in between.
Can someone please suggest how can data transfer costs be reduced?
The Data Transfer charge is directly related to the amount of information that goes from AWS to the Internet. Depending on your region, it is typically charged at 9c/GB.
If you are concerned about the Data Transfer charge, there are a few things you could do:
Activate Amazon S3 Server Access Logging, which will create a log file for each web request. You can then see how many requests are coming and possibly detect strange access behaviour (eg bots, search engines, abuse).
You could try reducing the size of files that are typically accessed, such as making images smaller. Take a look at the Access Logs and determine which objects are being accessed the most, and are therefore causing the most costs.
Use less large files on your website (eg videos). Again, look at the Access Logs to determine where the data is being used.
A cost of $110 suggests about 1.2TB of data being transferred.

How to migrate millions of files in aws s3 bucket from one account to another really fast

I have an s3 bucket in account A with millions of files that take up many GBs
I want to migrate all this data into a new bucket in account B
So far, I've given account B permissions to run s3 commands on the bucket in account A.
I am able to get some results with the
aws s3 sync command with the setting aws configure set default.s3.max_concurrent_requests 100
its fast but it only does a speed of some 20,000 parts per minute.
Is there an approach to sync/move data across aws buckets in different accounts REALLY fast?
I tried to do aws transfer acceleration but it seems that that is good for uploading and downloading from the buckets and I think it works within an aws account.
20,000 parts per minute.
That's > 300/sec, so, um... that's pretty fast. It's also 1.2 million per hour, which is also pretty respectable.
S3 Request Rate and Performance Considerations implies that 300 PUT req/sec is something of a default performance threshold.
At some point, make too many requests too quickly and you'll overwhelm your index partition and you'll start encountering 503 Slow Down errors -- though hopefully aws-cli will handle that gracefully.
The idea, though, seems to be that that S3 will scale up to accommodate the offered workload, so if you leave this process running, you may find that it actually does get faster with time.
Or...
If you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate.
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Note, also, that it says "temporary limits." This is where I come to the conclusion that, all on its own, S3 will -- at some point -- provision more index capacity (presumably this means a partition split) to accommodate the increased workload.
You might also find that you get away a much higher aggregate trx/sec if you run multiple separate jobs, each handling a different object prefix (e.g. asset/1, asset/2, asset/3, etc. depending on how the keys are designed in your bucket, because you're not creating such a hot spot in the object index.
The copy operation going on here is an internal S3-to-S3 copy. It isn't download + upload. Transfer acceleration is only used for actual downloads.

can i meter or set a size limit to an s3 folder

I'd like to set up a separate s3 bucket folder for each of my mobile app users for them to store their files. However, I also want to set up size limits so that they don't use up too much storage. Additionally, if they do go over the limit I'd like to offer them increased space if they sign up for a premium service.
Is there a way I can set folder file size limits through s3 configuration or api? If not would I have to use the apis somehow to calculate folder size on every upload? I know that there is the devpay feature in Amazon but it might be a hassle for users to sign up with Amazon if they want to just use small amount of free space.
There does not appear to be a way to do this, probably at least in part because there is actually no such thing as "folders" in S3. There is only the appearance of folders.
Amazon S3 does not have concept of a folder, there are only buckets and objects. The Amazon S3 console supports the folder concept using the object key name prefixes.
— http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
All of the keys in an S3 bucket are actually in a flat namespace, with the / delimiter used as desired to conceptually divide objects into logical groupings that look like folders, but it's only a convenient illusion. It seems impossible that S3 would have a concept of the size of a folder, when it has no actual concept of "folders" at all.
If you don't maintain an authoritative database of what's been stored by clients (which suggests that all uploads should pass through an app server rather than going directly to S3, which is the the only approach that makes sense to me at all) then your only alternative is to poll S3 to discover what's there. An imperfect shortcut would be for your application to read the S3 bucket logs to discover what had been uploaded, but that is only provided on a best-effort basis. It should be reliable but is not guaranteed to be perfect.
This service provides a best effort attempt to log all access of objects within a bucket. Please note that it is possible that the actual usage report at the end of a month will slightly vary.
Your other option is to develop your own service that sits between users and Amazon S3, that monitors all requests to your buckets/objects.
— http://aws.amazon.com/articles/1109#13
Again, having your app server mediate all requests seems to be the logical approach, and would also allow you to detect immediately (as opposed to "discover later") that a user had exceeded a threshold.
I would maintain a seperate database in the cloud to hold each users total hdd usage count. Its easy to manage the count via S3 Object Lifecycle Events which could easily trigger a Lambda which in turn writes to a DB.