How can I rate limit AWS S3 bucket access - amazon-web-services

I have a S3 bucket which holds a generated sitemap file, which needs to be publicly accessible. I'm afraid if someone finds out about the url and DDOSes it, it could cost me a fortune. Is there a way to rate limit the requests per second accessing a S3 bucket?

You can go for Content Delivery Network (CDN). With a CDN that specializes in DDOS e.g. you can setup a webservice to feed the S3 files and cache based on querystring
2) You can use API Gateway infront of your S3 Request to limit number of request. But i am afraid, incase of any DDOS attack, you will lock down the real users from making request
3) using CDN with WAF (Web Application Firewall) where you can define rules to safeguard from DDoS attacks. I am afraid if it will work directly with S3, but using a combination of Cloud Front or Cloud Watch logs you can implement this.
Reference

If it is your personal AWS and you have access to the billing alerts and budgets, you can set up an alarm to notify at a threshold and stop at a threshold for a particular service like S3
Using AWS budgets to stop a services

Related

Get cost or bandwidth contributed by a file in total AWS bill

I am using AWS S3 for serving assets to my website, now even though I have added cache-control metadata header to all my assets my daily overall bandwidth usage almost got doubled in past month.
I am sure that traffic on my website has not increased dramatically to account for increase in S3's bandwidth usage.
Is there a way to find out how much a file is contributing to the total bill in terms of bandwidth or cost ?
I am routing all my traffic through cloudfare so it should be protected against DDoS attack.
I expect the bandwidth of my S3 bucket to reduce or to get some valid reason which explains why bandwidth almost doubled when there's no increase in daily traffic.
You need to enable Server Access Logging on your content bucket. Once you do this, all bucket accesses will be written to logfiles that are stored in a (different) S3 bucket.
You can analyze these logfiles with a custom program (you'll find examples on the web) or AWS Athena, which lets you write SQL queries against structured data.
I would focus on the remote IP address of the requestor, to understand what proportion of requests are served via CloudFlare versus people going directly to your bucket.
If you find that CloudFlare is constantly reloading content from the bucket, you'll need to give some thought to cache-control headers, either as metadata on the object in S3, or in your CloudFlare configuration.
From: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-cloudtrail-events.html
To enable CloudTrail data events logging for objects in an S3 bucket:
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
In the Bucket name list, choose the name of the bucket that you want.
Choose Properties.
Choose Object-level logging.
Choose an existing CloudTrail trail in the drop-down menu. The trail you select must be in the same AWS Region as your bucket, so the drop-down list contains only trails that are in the same Region as the bucket or trails that were created for all Regions.
If you need to create a trail, choose the CloudTrail console link to go to the CloudTrail console. For information about how to create trails in the CloudTrail console, see Creating a Trail with the Console in the AWS CloudTrail User Guide.
Under Events, select Read to specify that you want CloudTrail to log Amazon S3 read APIs such as GetObject. Select Write to log Amazon S3 write APIs such as PutObject. Select both Read and Write to log both read and write object APIs. For a list of supported data events that CloudTrail logs for Amazon S3 objects, see Amazon S3 Object-Level Actions Tracked by CloudTrail Logging in the Amazon Simple Storage Service Developer Guide.
Choose Create to enable object-level logging for the bucket.
To disable object-level logging for the bucket, you must go to the CloudTrail console and remove the bucket name from the trail's Data events.

Can you get the AWS usage report for subdirectory for buckets?

Can you get the AWS usage report for subdirectory for buckets? I want to know the amount of traffic of all 'GetObject' requests for all subdirectory of S3.
First, remember that there are no "subdirectories" in S3. Everything within a bucket is in a flat index and identified by an object key. However, in the AWS console, objects that contain a shared prefix are represented together in a "folder" named after the shared prefix.
With that in mind, it should be easier to understand why you cannot get an AWS usage report for a specific "subdirectory". The AWS usage report is meant to be an overview of your AWS services and is not meant to be used for more detailed analytics.
Instead there is another AWS service that allows you insight into more detailed analytics for your other AWS services: AWS CloudWatch. With AWS Cloudwatch you can:
Set up daily storage
metrics
Set up request (GET) metrics on a
bucket
And, for your specific case, you can set up request metrics for specific prefixes (subdirectories) within a bucket.
Using request metrics from AWS CloudWatch is a paid service (and another reason why you cannot get detailed request metrics in the AWS usage report).

Low upload speed to ec2 instance running on another region

I have a few EC2 instances (t2.micro) behind a load balancer on the us-east-1 region (N. Virginia) and my users are accessing the application from South America. This is my current setup mainly because costs are about 50% of what I would pay for the same services here in Brasil.
My uploads all go to S3 buckets, also in the us-east-1 region.
When a user requests a file from my app, I check for permission because the buckets are not public (hence why I need all data to go through EC2 instances) and I stream the file from S3 to the user. The download speeds for the users are fine and usually reach the maximum the user connection can handle, since I have transfer acceleration enabled for my buckets.
My issue is uploading files through the EC2 instances. The upload speeds suffer a lot and, in this case, having transfer acceleration enabled on S3 does not help in any way. It feels like I'm being throttled by AWS, because the maximum speed is capped around 1Mb/s.
I could maybe transfer files directly from the user to S3, then update my databases, but that would introduce a few issues to my main workflow.
So, I have two questions:
1) Is it normal for upload speeds to EC2 instances to suffer like that?
2) What options do I have, other than moving all services to South America, closer to my users?
Thanks in advance!
There is no need to 'stream' data from Amazon S3 via an Amazon EC2 instance. Nor is there any need to 'upload' via Amazon EC2.
Instead, you should be using Pre-signed URLs. These are URLs that grant time-limited access to upload to, or download from, Amazon S3.
The way it works is:
Your application verifies whether the user is permitted to upload/download a file
The application then generates a Pre-signed URL with an expiry time (eg 5 minutes)
The application supplied the URL to the client (eg a mobile app) or includes it in an HTML page (as a link for downloads or as a form for uploads)
The user then uploads/downloads the file directly to Amazon S3
The result is a highly scalable system because your EC2 system does not need to be involved in the actual data transfer.
See:
Share an Object with Others - Amazon Simple Storage Service
Uploading Objects Using Pre-Signed URLs - Amazon Simple Storage Service

With Amazon S3, can I prevent trolls/grievers from making millions of GET-requests with bots?

I'm working on a website that contains photo galleries, and these images are stored on Amazon S3. Since Amazon charges like $0.01 per 10k GET-requests, it seems that a potential troll could seriously drive up my costs with a bot that makes millions of page requests per day.
Is there an easy way to protect myself from this?
The simplest strategy would be to create randomized URLs for your images.
You can serve these URLs with your page information. But they cannot be guessed by the bruteforcer and will usually lead to a 404.
so something like yoursite/images/long_random_string
Add aws Cloudfront service for your S3 object images. So it will retrieve the cached data from the edge location.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/MigrateS3ToCloudFront.html
As #mohan-shanmugam pointed out, you should use a CloudFront CDN with your origin as the S3 bucket. It is considered bad practice for external entities to hit S3 buckets directly.
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
With a CloudFront distribution, you can alter your S3 bucket's security policy to only allow access from the distribution. This will block direct access to S3 even if the URLs are known.
In reality, you would likely suffer from website performance way before needing to worry about additional charges as a direct DDOS attempt against S3 should result in AWS throttling API requests.
In addition, you can set up AWS WAF in front of your CloudFront distribution and use it for advanced control of security related concerns.

How To Block Bad Bots in Amazon S3 (Amazon Simple Storage Service)?

I have signed up to Amazon Web Services and created a static website via Amazon S3 service ( created a Bucket and mapped a domain to that Bucket).
This service looks great but I have one problem - I don't know how to block Bad Bots and to prevent them to waste my bandwidth ( you all know that Amazon charge for bandwidth)
Amazon Web Services doesn't support .htaccess and I have no idea how to block them.
What I need is to block the bad bots via 2 ways:
Via Bot Name, e.g.: BadBot1
Via Bot IP, e.g.: 185.11.240.175
Can you please help me to do it?
Your S3 bucket policy will definitely allow you to block specified IP addresses, but there is a size limitation (~20 kb) on bucket policy sizes, which would probably make trying to maintain a policy restricting disreputable IP addresses unfeasible.
AWS's WAF & Shield service, fronted by Cloudfront, is the most powerful way AWS provides to block IPs, and you could easily integrate this with an S3 origin. Cloudfront allows you to plug in a Waf & Shield ACL, which is comprised of rules that allow or disallow sets of IPs that you define.
AWS has some sample Lambda functions here that you can use as a starting point. You would probably want a Lambda function to run on a schedule, obtain the list of IPs that you want to block, parse that list, and add new IPs found to your WAF's IP sets (or remove ones no longer on the list). The waf-tor-blocking and waf-reputation-lists functions in the above link provide good examples for how to do this.
I'm not sure exactly what you mean by detecting Bot Name, but the standard Waf & Shield approach is currently to parse Cloudfront logs sent to an s3 bucket. Your s3 bucket would trigger SNS or a Lambda function directly whenever it receives a new gzipped log file. The Lambda function will then download that file, parse it for malicious requests, and block the associated IP addresses. The waf-block-bad-behaving and waf-reactive-blacklist functions in the repo I linked to provide examples for how you would approach this. Occasionally you will see signatures for bad bots in the user-agent string of the request. The Cloudfront logs will show the user-agent string, so you could potentially parse that and block associated IPs accordingly.