I am building a website to carry out a survey and this would store some answers and user data. Obviously, I want to keep costs low and within what the free tier offers. I am trying to build a low-cost solution for mitigating DDoS attacks. Here is what I have come up with but not sure if I am going in the right direction. I plan to put my frontend as well backend service behind CloudFront. I would put AWS WAF and Shield on this CloudFront. Along with that, I plan to add two WAF rules:
Every request should have a "user-agent" header
Requests should originate only from a specific country i.e the one
with my target audience
Along with this, I plan to add a Recaptcha to ensure only human users interact with my application just as a deterrent from cost perspective. Any other suggestion or feedback is really appreciated. Please note: cost is a huge factor.
AWS Shield, CloudFront and WAF should be sufficient for your use case. Use the geo restrictions but I don’t think a header check will add any value as it’s so easy to spoof. Additionally you may think about using auto scaling for your backend to achieve more resilience but be careful with the scaling cost, have a proper scaling policy and set alarms (especially a billing alarm if you don’t have one already) and notifications for scaling events.
Check this whitepaper for more information: https://docs.aws.amazon.com/whitepapers/latest/aws-best-practices-ddos-resiliency/aws-best-practices-ddos-resiliency.pdf
FluxCDN, DDoS-Guard, Cloudflare, Stackpath would be an good option.
Use GEO restrictions (China, India ...) and Challenge the requests (Captcha).
Block empty user-agent and user agents from bots for example "python-requests" if you don't need them.
Block an IP from accessing your site if it reaches the Threshold (Rate limit).
Block bad ASN from accessing your site.
Use an JS challenge to challenge the legitmacy of the request.
Cache static files (pngs, htmls, ...)
Block HTTP/1.0 HTTP/1.1 HTTP/1.2 if not needed (Blocks 99% of all DDoS attacks)
Related
How can I prevent Denial of Wallet attacks against AWS Cloudfront?
Here's my specific situation: I have a Cloudfront distribution where Lambda#Edge functions serve web pages and API requests for my application. I need to rate-limit requests made to Cloudfront based on the IP address of the user. Without any kind of rate-limiting in place, it's possible for a malicious user to make millions of slow requests to the distribution that wouldn't be blocked by AWS's DDOS protections and which would lead to significant charges. This is especially important here since Lambda#Edge functions cost 3x as much as ordinary Lambda functions and don't come with a free tier.
It seemed practical to use AWS WAF in order to accomplish this. However, I recently found out that WAF charges for all incoming requests, regardless of if they are blocked or not. So a Denial of Wallet attack would still be possible here.
Is there a method or a general strategy that I can implement here that doesn't involve AWS WAF?
The limits need to be very tight. Even paying $50 per month for malicious requests would be considered too high.
AWS Shield Standard is free when you use Cloudfront and it automatically protects against common DDoS attacks. Source
If you want to use WAF to tighten the requests to your Lambda, you can setup caching for HTTP 403 responses in Cloudfront, so the attacker won't get their request past the Cloudfront cache.
You have to decide which one is on priority for you, your service being down or your bill going above your budget? If it's the first one, you can use WAF and AWS Shield Advanced.
If it's the second one, you can implement a request throttling method. For example, you can make use of incoming requests to EC2 instances are free. So you can implement a queue in a free tier EC2 instance that forwards the requests to your Lambda but drops the requests when the rate is higher than a defined threshold. Keep in mind that you get charged for outgoing requests from EC2 to your Lambda Edge.
OR you can implement another Lambda function before your Lambda Edge to keep track of which IP address sent how many requests. If it's past the threshold, respond with HTTP 403 and have that cached in Cloudfront. Then the next request from that IP address won't reach your Lambda. But again, keep in mind that you'll get charged for this additional Lambda.
The last resort is creating a billing alarm that notifies you when monthly charges reach $50 so that you can stop the costs before it goes high.
AWS Shield Advanced includes DDoS cost protection, a safeguard from scaling charges as a result of a DDoS attack that causes usage spikes on protected Amazon EC2, Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator, or Amazon Route 53. If any of the AWS Shield Advanced protected resources scale up in response to a DDoS attack, you can request credits via the regular AWS Support channel.
Please note that, at the time of writing, this service involves a monthly fee of $3,000 per account, plus data transfer fees starting at $0.050 per GB.
I'm building a Laravel application that offers an authoring tool to customers. Each customer will get their own subdomain i.e:
customer-a.my-tool.com
customer-b.my-tool.com
My tool is hosted on Amazon in multiple regions for performance but mostly for privacy law reasons(GDPR++). Each customer have their data in only one region. Australian customers in Australia, European in Europe etc. So the customers users must be directed to the correct region. If a European user ends up being served by the US region their data won't be there.
We can solve this manually using DNS and simply point each sub-domain to the correct IP, but we don't want to do this for two reasons. (1) updating the DNS might take up to 60 seconds. We don't want the customer to wait. (2) It seems the sites we've researched uses wildcard domains. For instance slack and atlassian.net. We know that atlassian.net also have multiple regions.
So the question is:
How can we use a wildcard domain and still route the traffic to the regions where the content is located?
Note:
We don't want the content in all regions, but we can have for instance a DynamoDB available in all regions mapping subdomains to regions.
We don't want to tie an organization to a region. I.e. a domain structure like customer-a.region.my-tool.com is an option we've considered, but discarded
We, of course, don't want to be paying for transferring the data twice, and having apps in all regions accessing the databases in the regions the data belong to is not an option since it will be slow.
How can we use a wildcard domain and still route the traffic to the regions where the content is located?
It is, in essence, not possible to do everything you are trying to do, given all of the constraints you are imposing: automatically, instantaneously, consistently, and with zero overhead, zero cost, and zero complexity.
But that isn't to say it's entirely impossible.
You have asserted that other vendors are using a "wildcard domain," which is a concept that is essentially different than I suspect you believe it necessarily entails. A wildcard in DNS, like *.example.com is not something you can prove to the exclusion of other possibilities, because wildcard records are overridden by more specific records.
For a tangible example that you can observe, yourself... *.s3.amazonaws.com has a DNS wildcard. If you query some-random-non-existent-bucket.s3.amazonaws.com, you will find that it's a valid DNS record, and it routes to S3 in us-east-1. If you then create a bucket by that name in another region, and query the DNS a few minutes later, you'll find that it has begun returning a record that points to the S3 endpoint in the region where you created the bucket. Yes, it was and is a wildcard record, but now there's a more specific record that overrides the wildcard. The override will persist for at least as long as the bucket exists.
Architecturally, other vendors that segregate their data by regions (rather than replicating it, which is another possibility, but not applicable to your scenario) must necessarily be doing something along one of these lines:
creating specific DNS records and accepting the delay until the DNS is ready or
implementing what I'll call a "hybrid" environment that behaves one way initially, and a different way eventually, this evironment uses specific DNS records to override a wildcard and has an ability to temporarily deliver, via a reverse proxy, a misrouted request to the correct cluster, to allow instantaneous correct behavior until the DNS propagates or
an ongoing "two-tier" environment, using a wildcard without more specific records to override it, operating a two-tier infrastructure, with an outer tier that is distributed globally, that accepts any request, and has internal routing records that deliver the request to an inner tier -- the correct regional cluster.
The first option really doesn't seem unreasonable. Waiting a short time for your own subdomain to be created seems reasonably common. But, there are other options.
The second option, the hybrid environment, would simply require that the location where your wildcard points to by default be able to do some kind of database lookup to determine where the request should go, and proxy the request there. Yes, you would pay for inter-region transport, if you implement this yourself in EC2, but only until the DNS update takes effect. Inter-region bandwidth between any two AWS regions costs substantially less than data transfer to the Internet -- far less than "double" the cost.
This might be accomplished in any number of ways that are relatively straightforward.
You must, almost by definition, have a master database of the site configuration, somewhere, and this system could be queried by a complicated service that provides the proxying -- HAProxy and Nginx both support proxying and both support Lua integrations that could be used to do a lookup of routing information, which could be cached and used as long as needed to handle the temporarily "misrouted" requests. (HAProxy also has static-but-updatable map tables and dynamic "stick" tables that can be manipulated at runtime by specially-crafted requests; Nginx may offer similar things.)
But EC2 isn't the only way to handle this.
Lambda#Edge allows a CloudFront distribution to select a back-end based on logic -- such as a query to a DynamoDB table or a call to another Lambda function that can query a relational database. Your "wildcard" CloudFront distribution could implement such a lookup, caching results in memory (container reuse allows very simple in-memory caching using simply an object in a global varible). Once the DNS record propagates, the requests would go directly from the browser to the appropriate back-end. CloudFront is marketed as a CDN, but it is in fact a globally-distributed reverse proxy with an optional response caching capability. This capability may not be obvious at first.
In fact, CloudFront and Lambda#Edge could be used for such a scenario as yours in either the "hybrid" environment or the "two-tier" environment. The outer tier is CloudFront -- which automatically routes requests to the edge on the AWS network that is nearest the viewer, at which point a routing decision can be made at the edge to determine the correct cluster of your inner tier to handle the request. You don't pay for anything twice, here, since bandwidth from EC2 to CloudFront costs nothing. This will not impact site performance other than the time necessary for thst initial database lookup, and once your active containers have that cached the responsiveness of the site will not be impaired. CloudFront, in general, improves responsiveness of sites even when most of the content is dynamic, because it optimizes both the network path and protocol exchanges between the viewer and your back-end, with optimized TCP stacks and connection reuse (particularly helpful at reducing the multiple round-trips required by TLS handshakes).
In fact, CloudFront seems to offer an opportunity to have it both ways -- an initially hybrid capability that automatically morphs into a two-tier infrastructure -- because CloudFront distributions also have a wildcard functionality with overrides: a distribution with *.example.com handles all requests unless a distribution with a more specific domain name is provisioned -- at which point the other distribution will start handling the traffic. CloudFront takes a few minutes before the new distribution overrides the wildcard, but when the switchover happens, it's clean. A few minutes after the new distribution is configured, you make a parallel DNS change to the newly assigned hostname for the new distribution, but CloudFront is designed in such a way that you do not have to tightly coordinate this change -- all endpoints will handle all domains because CloudFront doesn't use the endpoint to make the routing decision, it uses SNI and the HTTP Host header.
This seems almost like a no-brainer. A default, wildcard CloudFront distribution is pointed to by a default, wildcard DNS record, and uses Lambda#Edge to identify which of your clusters handles a given subdomain using a database lookup, followed by the deployment -- automated, of course -- of a distribution for each of your customers, which already knows how to forward the request to the correct cluster, so no further database queries are needed after the subdomain is fully live. You'll need to ask AWS Support to increase your account's limit for the number of CloudFront distributions from the default of 200, but that should not be a problem.
There are multiple ways to accomplish that database lookup. As mentioned, before, the Lambda#Edge function can invoke a second Lambda function inside VPC to query the database for routing instructions, or you could push the domain location config to a DynamoDB global table, which would replicate your domain routing instructions to multiple DynamoDB regions (currently Virginia, Ohio, Oregon, Ireland, and Frankfurt) and DynamoDB can be queried directly from a Lambda#Edge function.
I'm trying to figure out where the latency in my calls is coming from, please let me know if any of this information could be presented in a format that is more clear!
Some background: I have two systems--System A and System B. I manually (through Postman) hit an endpoint on System A that invokes an endpoint on System B.
System A is hosted on an EC2 instance.
When System B is hosted on a Lambda function behind API Gateway, the
latency for the call is 125 ms.
When System B is hosted on an
EC2 instance, the latency for the call is 8 ms.
When System B is
hosted on an EC2 instance behind API Gateway, the latency for the
call is 100 ms.
So, my hypothesis is that API Gateway is the reason for increased latency when it's paired with the Lambda function as well. Can anyone confirm if this is the case, and if so, what is API Gateway doing that increases the latency so much? Is there any way around it? Thank you!
It might not be exactly what the original question asks for, but I'll add a comment about CloudFront.
In my experience, both CloudFront and API Gateway will add at least 100 ms each for every HTTPS request on average - maybe even more.
This is due to the fact that in order to secure your API call, API Gateway enforces SSL in all of its components. This means that if you are using SSL on your backend, that your first API call will have to negotiate 3 SSL handshakes:
Client to CloudFront
CloudFront to API Gateway
API Gateway to your backend
It is not uncommon for these handshakes to take over 100 milliseconds, meaning that a single request to an inactive API could see over 300 milliseconds of additional overhead. Both CloudFront and API Gateway attempt to reuse connections, so over a large number of requests you’d expect to see that the overhead for each call would approach only the cost of the initial SSL handshake. Unfortunately, if you’re testing from a web browser and making a single call against an API not yet in production, you will likely not see this.
In the same discussion, it was eventually clarified what the "large number of requests" should be to actually see that connection reuse:
Additionally, when I meant large, I should have been slightly more precise in scale. 1000 requests from a single source may not see significant reuse, but APIs that are seeing that many per second from multiple sources would definitely expect to see the results I mentioned.
...
Unfortunately, while cannot give you an exact number, you will not see any significant connection reuse until you approach closer to 100 requests per second.
Bear in mind that this is a thread from mid-late 2016, and there should be some improvements already in place. But in my own experience, this overhead is still present and performing a loadtest on a simple API with 2000 rps is still giving me >200 ms extra latency as of 2018.
source: https://forums.aws.amazon.com/thread.jspa?messageID=737224
Heard from Amazon support on this:
With API Gateway it requires going from the client to API Gateway,
which means leaving the VPC and going out to the internet, then back
to your VPC to go to your other EC2 Instance, then back to API
Gateway, which means leaving your VPC again and then back to your
first EC2 instance.
So this additional latency is expected. The only way to lower the
latency is to add in API Caching which is only going to be useful is
if the content you are requesting is going to be static and not
updating constantly. You will still see the longer latency when the
item is removed from cache and needs to be fetched from the System,
but it will lower most calls.
So I guess the latency is normal, which is unfortunate, but hopefully not something we'll have to deal with constantly moving forward.
In the direct case (#2) are you using SSL? 8 ms is very fast for SSL, although if it's within an AZ I suppose it's possible. If you aren't using SSL there, then using APIGW will introduce a secure TLS connection between the client and CloudFront which of course has a latency penalty. But usually that's worth it for a secure connection since the latency is only on the initial establishment.
Once a connection is established all the way through, or when the API has moderate, sustained volume, I'd expect the average latency with APIGW to drop significantly. You'll still see the ~100 ms latency when establishing a new connection though.
Unfortunately the use case you're describing (EC2 -> APIGW -> EC2) isn't great right now. Since APIGW is behind CloudFront, it is optimized for clients all over the world, but you will see additional latency when the client is on EC2.
Edit:
And the reason why you only see a small penalty when adding Lambda is that APIGW already has lots of established connections to Lambda, since it's a single endpoint with a handful of IPs. The actual overhead (not connection related) in APIGW should be similar to Lambda overhead.
I plan to have the following setup:
Completely STATIC front-end web interface (built with AngularJS or the likes)
Serverless Framework back-end APIs
I want to store my front-end in S3 and my back-end in Lambda.
Since I'm charged every time the lambda function gets executed, I don't want everyone to be able to make requests directly to it. On the other hand, I want to store my front-end simply in S3 as opposed to a server.
How do I go about protecting my back-end API from abuse or DoS?
I'm not sure you can protect your front end from people calling it more than they should since that's extremely hard to determine.
However for real DDoS or DoS protection you would probably want to use the features of API Gateway (check the question about threats or abuse) or AWS's new WAF. I know WAF has the ability to block ranges of IP addresses and the like.
what #Boushley said +
you may want to checkout Cloudflare: https://www.cloudflare.com/ddos
Actually, Amazon API Gateway automatically protects your backend systems from distributed denial-of-service (DDoS) attacks, whether attacked with counterfeit requests (Layer 7) or SYN floods (Layer 3).
In your serverless.yml you can now provide a provider.usagePlan property, assuming you are using AWS.
provider:
...
usagePlan: # limit expenditures
quota:
limit: 5000
period: DAY
throttle:
burstLimit: 200
rateLimit: 100
While this does not mean that you cannot be DDoSed (as #mrBorna mentioned AWS tries to prevent this by default), it should mean that if you are DDoSed, you will not be significantly affected from a financial perspective.
I have a RESTful webservice running on Amazon EC2. Since my application needs to deal with large number of photos, I plan to put them on Amazon S3. So the URL for retrieving a photo from S3 could look like this:
http://johnsmith.s3.amazonaws.com/photos/puppy.jpg
Is there any way or necessity to cache the images on EC2? The pros and cons I can think of is:
1) Reduced S3 usage and cost with improved image fetching performance. However on the other hand EC2 cost can rise plus EC2 may not have the capability to handle the image cache due to bandwidth restrictions.
2) Increased development complexity cuz you need to check the cache first and ask S3 to transfer the image to EC2 and then transfer to the client.
I'm using the EC2 micro instance and feel it might be better not to do the image cache on EC2. But the scale might grow fast and eventually will need a image cache.(Am I right?) If cache is needed, is it better to do it on EC2, or on S3? (Is there a way for caching for S3?)
By the way, when the client uploads an image, should it be uploaded to EC2 or S3 directly?
Why bring EC2 into the equation? I strongly recommend using CloudFront for the scenario.
When you use CloudFront in conjunction with S3 as origin; the content gets distributed to 49 different locations worldwide ( as of count of edge locations worldwide today ) directly working out as a cache globally and the content being fetched from nearest location based on the latency to your end users.
The way you don't need to worry about the scale and performance of Cache and EC2 can straightforward offload this to CloudFront and S3.
Static vs dynamic
Generally speaking, here are the tiers:
best CDN (cloudfront)
good static hosting (S3)
okay dynamic (EC2)
Why? There are a few reasons.
maintainability and scalability: cloudfront and S3 scale "for free". You don't need to worry about capacity or bandwidth or request rate.
price: approximately speaking, it's cheaper to use S3 than EC2.
latency: CDNs are located around the world, leading to shorter load times.
Caching
No matter where you are serving your static content from, proper use of the Cache-Control header will make life better. With that header you can tell a browser how long the content is good for. If it is something that never changes, you can instruct a browser to keep it for a year. If it frequently changes, you can instruct a browser to keep it for an hour, or a minute, or revalidate every time. You can give similar instructions to a CDN.
Here's a good guide, and here are some examples:
# keep for one year
Cache-Control: max-age=2592000
# keep for a day on a CDN, but a minute on client browsers
Cache-Control: s-maxage=86400, maxage=60
You can add this to pages served from your EC2 instance (no matter if it's nginx, Tornado, Tomcat, IIS), you can add it to the headers on S3 files, and CloudFront will use these values.
I would not pull the images from S3 to EC2 and then serve them. It's wasted effort. There are only a small number of use cases where that makes sense.
Few scenarios when EC2 caching instance:
your upload/download ratio is far from 50/50
you hit S3 limit 100req/sec
you need URL masking
you want to optimise kernel, TCP/IP settings, cache SSL session for clients
you want proper cache invalidating mechanism for all geo locations
you need 100% control where data is stored
you need to count number of requests
you have custom authentication mechanism
For number of reasons I recommend to take a look at Nginx S3 proxy.