Does AWS S3 offer any kind of rate limiting or protection against abuse for publicly accessible files? - amazon-web-services

I have a web app which serves media files (in other words pretty large) with public access. The files are hosted on S3. I'm wondering if AWS offers any kind of abuse-protection, for example detection or prevention against download hogs via some type of rate limiting. A scenario might be a single source re-downloading the same content repeatedly. I was hoping there might be some mechanism to detect that behavior and either take preventative action or notify me.
I'm looking at AWS docs and don't see anything but perhaps I'm not looking smartly enough.
How do folks who host files which are available publicly handle this?

S3 is mostly a file storage service, with elementary web server capabilities. I would highly recommend you place a CDN between your end users and S3. A good CDN will provide protection from the sort of abuse you are talking about, while also serving the files to the user more quickly.

If you are mostly worried about how the abuse will affect your bills (and they can get very large so its good to be concerned about this), I would suggest that you put in some billing alerts on your account that alarm when certain thresholds are reached.
I have a step alarms set on my account so that I know when it hits 25%, 50%, 75% and 100% of what I budget each month. That way, for example, if I hit an alarm that tells me I have used 25% of my budget in the first two days of the month, I know I better look into it.

Related

Cloud File Storage with Bandwidth Limits

I want to develop an app for a friend's small business that will store/serve media files. However I'm afraid of having a piece of media goes viral, or getting DDoS'd. The bill could go up quite easily with a service like S3 and I really want to avoid surprise expenses like that. Ideally I'd like some kind of max-bandwidth limit.
Now, the solutions for S3 this has been posted here
But it does require quite a few steps. So I'm wondering if there is a cloud storage solution that makes this simpler I.e. where I don't need to create a custom microservice. I've talked to the support on Digital Ocean and they also don't support this
So in the interest of saving time, and perhaps for anyone else who finds themselves in a similar dilemma, I want to ask this question here, I hope that's okay.
Thanks!
Not an out-of-the-box solution, but you could:
Keep the content private
When rendering a web page that contains the file or links to the file, have your back-end generate an Amazon S3 pre-signed URLs to grant time-limited access to the object
The back-end could keep track of the "popularity" of the file and, if it exceeds a certain rate (eg 1000 over 15 minutes), it could instead point to a small file with a message of "please try later"

How to measure my clients API and Bandwidth (Store) usage in Google Cloud Platform?

I have an App that consumes my own API (Google Cloud Functions) and my own Storage (there are images).
Now, I have a couple of clients, that wants to consume my API and my Storage (A Google Cloud Bucket).
The Cloud Storage is a bucket that contains a lot of photo that have Public Read Access.
I'm trying to define a tier pricing model, in which the price depends on 2 things:
The number of API calls,
The Cloud Storage Bandwidth
Meaning, I want to set some pricing in relation to the costs they are consuming on my Google Cloud account.
To give an example:
If a client does between 1 and 500.000 API calls, I'll change them 10 dollars. Between 500.001 and 1.000.000, I'll charge 18 dollars, etc, etc.
Same thing for the Cloud Storage Bandwidth, if they consume between 0GM and 10GB, it's going to cost 10 dollars. If they consume between 10GB and 100GB, it's going to cost 18 dollars, etc, etc.
How can I do it with Google Cloud? How can I know how my clients are consuming? And is there a way to share that information with them, so they are able to monitor the usage every day?
I'm thinking that measuring the API usage is not going to be THAT hard, because I can just save a value in the DB every time the user calls the API, but if there is a way to avoid it, will be good, due to Google Cloud is going to charge me for that DB write action (that I use to track the API usage).
On the other hand, for measuring the Cloud Storage, I was thinking something like this:
Let's suppose I have a Public Bucket with photos in the URL: buckets.google.com/photos.
If my client wants to get the /cats/ugly-cat.jpg photo, I can ask them to call A FUNCTION in /api/get-photo/?url=/cats/ugly-cat.jpg, so there in that Function a can track that the user just get a photo, and then I redirect the call to the real URL where the user is going to see the photo (buckets.google.com/photos/cats/ugly-cat.jpg). As you can see, this idea seems to be too ppor performant, due to it's going to charge the Function usage, the DB write, and also the Storage bandwidth usage. And even, that way doesn't track the Bandwidth. It only tracks the number of photos that the client wants to show.
As you can see, both ideas are a bit ugly, with poor performance.
There should be something already done that makes it beautiful.
Obviously, the API call (and also the photo link) may have the client API-KEY, to help to measure the usage. Something like:
functions.google.com/api/search-photos/?api-key=111, and
bucket.google.com/photos/cats/ugly-cat.jpg?api-key=111
Where 111 identifies the client 111.
So, the question: Do you know if there is a "best-known" way to do measure those usages?
I think Cloud Endpoints is the best solution for you because managing your API as you suggest might get unwieldy quickly.
Endpoints provides all the tools to control authentication, quota and cost management and a developer portal so your users can access documentation and interact with your API. It also integrates with all Cloud Platform products including Cloud Functions.

Estimate AWS cost

The company which I work right now planning to use AWS to host a new website for a client. Their old website had roughly 75,000 sessions and 250,000 page views per year. We haven't used AWS before and I need to give a rough cost estimate to my project manager.
This new website is going to be mostly content-driven with a cms backend (probably WordPress) + a cost calculator for their services. Can anyone give me a rough idea about the cost to host such kind of a website in aws?
I have used simple monthly calculator with a single Linux t2.small 3 Year upfront which gave me around 470$.
(forgive my English)
The only way to know the cost is to know the actual services you will consume (Amazon EC2, Amazon EBS, database, etc). It is not possible to give an accurate "guess" of these requirements because it really does depend upon the application and usage patterns.
It is normally recommended that you implement the system and run it for a while before committing to Reserved Instances so that you have a chance to measure performance and test a few different instance types.
Be careful using T2 instances for production workloads. They are very powerful instances, but if the CPU Credits run out, the amount of CPU is limited.
Bottom line: Implement, measure, test. Then you'll know what is right for your needs.
Take Note
When you are new in AWS you have a 1 year free tier on a single t2.micro
Just pulled it out, looking into your requirement you may not need this
One load balancer and App server should be fine (Just use route53 to serve some static pages from s3 while upgrading or scalling )
Use of email subscription and processing of Some document can be handled with AWS Lambda, SNS and SWQ which may further reduce the cost ( you may reduce the server size and do all the hevay lifting from Lambda)
A simple webpage with 3000 request/monthly can be handled by T2 micro which is almost free for one year as mentioned above in the note
You don't have a lot of details in your question. AWS has a wide variety of services that you could be using in that scenario. To accurately estimate costs, you should gather these details:
What will the AWS storage be used for? A database, applications, file storage?
How big will the objects be? Each type of storage has different limits on individual file size, estimate your largest object size.
How long will you store these objects? This will help you determine static, persistent or container storage.
What is the total size of the storage you need? Again, different products have different limits.
How often do you need to do backup snapshots? Where will you store them?
Every cloud vendor has a detailed calculator to help you determine costs. However, to use them effectively you need to have all of these questions answered and you need to understand what each product is used for. If you would like to get a quick estimate of costs, you can use this calculator by NetApp.

I need feedback on this partly serverless architecture design

I want to host a scalable blog or application of this sort in nodeJS on AWS making use of AWS technologies. The idea here is to have a small EC2 server that is not responsible for serving the website, but only for running the CMS/admin panel. While these operations could be serverless as well, I think having a dedicated small VM EC2 instance could be more efficient, and works better with existing frameworks, etc.
In my diagram above, you can see there's two type of users audiences and admin/writers. Admin CRUD operations also cause lambda to run. Lambda generates the static site after Admin changes, which is delivered to S3. Users are directed to the static site hosted in S3. Only admins/writers have access to the server-connecting part of the site.
I think this is a good design for an extremely scalable and relatively cheap site, as long as the user-facing side is all static. An alternative to this is a CDN, but then I have to deal with cache invalidation issues, a site that updates slower, and a larger server.
This seems like a win-win to me. Feedback?
This ought to be a comment rather than an answer, but as I don't have enough points...
There are a couple of other considerations for this architecture. Lambda functions are great for scaling out microservices horizontally with each small function being executed in parallel tens or hundreds of times. Generation of a static site is typically a single threaded operation so you may not see the gains you expect, you'll also need to watch the timeout period (maximum 300 seconds currently) and make sure that you can generate the site in that time. Of course if you are not running Lambda code you are not getting charged.
For your admin frontend I would suggest ElasticBeanstalk, even if you peg it at a single instance, it gives you lots of great features like rolling updates.
Good luck with the project.

How to reduce Amazon Cloudfront costs?

I have a site that has exploded in traffic the last few days. I'm using Wordpress with W3 Total Cache plugin and Amazon Cloudfront to deliver the images and files from the site.
The problem is that the cost of Cloudfront is quite huge, near $500 just the past week. Is there a way to reduce the costs? Maybe using another CDN service?
I'm new to CDN, so I might not be implementing this well. I've created a cloudfront distribution and configured it on W3 Total Cache Plugin. However, I'm not using S3 and don't know if I should or how. To be honest, I'm not quite sure what's the difference between Cloudfront and S3.
Can anyone give me some hints here?
I'm not quite sure what's the difference between Cloudfront and S3.
That's easy. S3 is a data store. It stores files, and is super-scalable (easily scaling to serving 1000's of people at once.) The problem is that it's centralized (i.e. served from one place in the world.)
CloudFront is a CDN. It caches your files all over the world so they can be served faster. If you squint, it looks like they are 'storing' your files, but the cache can be lost at any time (or if they boot up a new node), so you still need the files at your origin.
CF may actually hurt you if you have too few hits per file. For example, in Tokyo, CF may have 20 nodes. It may take 100 requests to a file before all 20 CF nodes have cached your file (requests are randomly distributed). Of those 100 requets, 20 of them will hit an empty cache and see an additional 200ms latency as it fetches the file. They generally cache your file for a long time.
I'm not using S3 and don't know if I should
Probably not. Consider using S3 if you expect your site to massively grow in media. (i.e. lots of use photo uploads.)
Is there a way to reduce the costs? Maybe using another CDN service?
That entirely depends on your site. Some ideas:
1) Make sure you are serving the appropriate headers. And make sure your expires time isn't too short (should be days or weeks, or months, ideally).
The "best practice" is to never expire pages, except maybe your index page which should expire every X minutes or hours or days (depending on how fast you want it updated.) Make sure every page/image says how long it can be cached.
2) As stated above, CF is only useful if each page is requested > 100's of times per cache time. If you have millions of pages, each requested a few times, CF may not be useful.
3) Requests from Asia are much more expensive than the from the US. Consider launching your server in Toyko if you're more popular there.
4) Look at your web server log and see how often CF is requesting each of your assets. If it's more often than you expect, your cache headers are setup wrong. If you setup "cache this for months", you should only see a handful of requests per day (as they boot new servers, etc), and a few hundred requests when you publish a new file (i.e. one request per CF edge node).
Depending on your setup, other CDNs may be cheaper. And depending on your server, other setups may be less expensive. (i.e. if you serve lots of small files, you might be better off doing your own caching on EC2.)
You could give cloudflare a go. It's not a full CDN so it might not have all the features as cloudfront, but the basic package is free and it will offload a lot of traffic from your server.
https://www.cloudflare.com
Amazon Cloudfront costs Based on 2 factor
Number of Requests
Data Transferred in GB
Solution
Reduce image requests. For that combine small images into one image and use that image
https://www.w3schools.com/css/tryit.asp?filename=trycss_sprites_img (image sprites)
Don't use CDN for video file because video size is high and this is responsible for too high in CDN coast
What components make up your bill? One thing to check with W3 Total Cache plugin is the number of invalidation requests it is sending to CloudFront. It's known to send a large amount of invalidations paths on each change, which can add up.
Aside from that, if your spend is predictable, one option is to use CloudFront Security Savings Bundle to save up to 30% by committing to a minimum amount for a one year period. It's self-service, so you can sign up in the console and purchase additional commitments as your usage grows.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html
Don't forget that cloudfront has 3 different price classes, which will influence how far your data is being replicated, but at the same time, it will make it cheaper.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
The key here is this:
"If you choose a price class that doesn’t include all edge locations, CloudFront might still occasionally serve requests from an edge location in a region that is not included in your price class. When this happens, you are not charged the rate for the more expensive region. Instead, you’re charged the rate for the least expensive region in your price class."
It means that you could use price class 100 (the cheapest one) and still get replication on regions you are not paying for <3