I want to start by saying that I'm a total newcomer to AWS.
I'm investigating using AWS WAF for dynamic rate limiting based on a component of the request URL. The AWS website has a tutorial for doing this by IP address, but I have no idea if it can be modified to do what I need.
So, with that in mind, please tell me what, if any, of the following is actually possible:
Rate limit by a component of the URL (an API key in this case)
Determine limit dynamically (different behaviour for different keys)
Perform some non-blocking action in the first instance of exceeding
the limit, then block if the limit is exceeded consistently
Log both of the above actions and do something with the outputted logs (i.e. forward them somewhere)
Again, I'm not looking for detailed how-tos here as they would probably warrant seperate questions - just want to know if this is possible.
API Gateway is probably the right fit for what you are looking to implement. It has throttling implemented out of the box.
Take a look at API Gateway Usage Plans for implementation details for your specific use case.
Related
I have 400 quotas and if I add one more I'm getting an error
'Maximum number of Resources for this API has been reached.'
What is the maximum number? 500-800?
I want to know if I can extend it for another 200-300 quotas or I need to create another API, thank you!
As per the documentation, the default quota for Resources per API is 300. Reviewing the documentation further we can see that this limit can be increased which I would suspect has already occurred on your account.
If you would like to increase this further, you can use the console again and request a service increase, a useful guide for this is here.
As for the upper limit, this is not listed and most likely wont be listed as it will be at the AWS service teams discretion to do so. Based on my experience, you can usually get 100-150% more than the default quotas just by requesting a service increase in the console. If you would like more than this you may have to create a support case and give justification for the request, but, as long as it is reasonable, it will usually be accepted.
Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!
I've got an ASP.NET Web API that is using AWS Cognito for authentication and resource access control. We've been using user pool groups up until this point to define certain entities users have access to (non-aws resources in a DB).
The problem is, now that our requirements for access control are more detailed, we are hitting the group cap of 25 per pool. I've looked into alternatives within Cognito, such as using custom attributes, but I've found that there are also limits on the number of custom attributes per pool, as well as they only support string & number types, not arrays.
Another alternative I've explored is intercepting the token when it hits our API, and adding claims based on permissions mapped in the DB. This works reasonably well, but this is only a solution server side, and I'm not entirely thrilled with needing to intercept every request to add claims with a DB call (not great for performance). We need some of these claims client side as well, so this isn't a great solution.
Beyond requesting a service limit increase to the amount of groups available per pool, am I missing anything obvious? Groups seem to be the suggested way to do this, based on documentation from AWS. Even if we went for a multi-tenant approach with multiple pools, I think the 25 group cap is still going to be an issue.
https://docs.aws.amazon.com/cognito/latest/developerguide/scenario-backend.html
You can request limit increases for nearly any part of the service. They will consider. Sometimes this is more straightforward than building side systems, as you point out. See https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html
I'm looking through Google Cloud Functions docs and I wonder if it is possible to restrict access to HTTP cloud function to the given network? I would like to avoid anyone to exhaust the free quota.
Is there any firewall rules or similar mechanism for Cloud Functions?
I don't believe there is any in-built security restrictions at the moment.
In terms of avoid quota exhaustion you could pass a header or parameter with some kind of shared secret. Even a fixed string value would help avoid this problem.
You can add authentication to a cloud function by using firebase authentication. Here's a github example of how to do to it: https://github.com/firebase/functions-samples/tree/master/authorized-https-endpoint
Note however that the authentication code is executed by your function, so rejecting unauthorized access would still consume a small portion of your free resource allowance.
The Google Function Authorizer module might be what you're looking for. It provides "a simple user authentication and management system for Google Cloud HTTP Functions." It doesn't seem to have a lot of users yet, but the project seems simple enough that you could at least use it as a basis to modify or implement your own solution if you prefer.
This article was helpful for me.
https://cloud.google.com/solutions/authentication-in-http-cloud-functions
Anyone can still invoke the function but it must contain credentials from a user that has access to the resources accessed by the function.
Before that I was doing something very simple that is probably not great for production but does provide a little bit more security that just leaving it open publicly. I call my function with a password in the payload and if it doesn't match one of the passwords I hardcoded on the function it just fails with a 403.
If you need to restrict to IP range then you can follow instructions here: https://sukantamaikap.com/posts/load-balancing-cloud-functions
The UI of Google Cloud has unfortunately changed and you need to do some searching before you get all done, but I managed to set it up. But note that the related services will cost roughly 25 eur per month at minimum.
You can estimate the pricing here:
https://cloudpricingcalculator.appspot.com/
You need to search for "Cloud Load Balancing and Network Services" and then enable "Cloud Load Balancing", "Google Cloud Armor", and "IP addresses".
Alternatively, in some cases it might be sufficient if you set the name of the function or some suffix to the name complex enough so that it will be effectively like a sort of password. Something like MyGoogleCloudFunc-abracadabra. Then it will not restrict the network but perhaps outsiders would not know the secret name anyway.
I have a web app which serves media files (in other words pretty large) with public access. The files are hosted on S3. I'm wondering if AWS offers any kind of abuse-protection, for example detection or prevention against download hogs via some type of rate limiting. A scenario might be a single source re-downloading the same content repeatedly. I was hoping there might be some mechanism to detect that behavior and either take preventative action or notify me.
I'm looking at AWS docs and don't see anything but perhaps I'm not looking smartly enough.
How do folks who host files which are available publicly handle this?
S3 is mostly a file storage service, with elementary web server capabilities. I would highly recommend you place a CDN between your end users and S3. A good CDN will provide protection from the sort of abuse you are talking about, while also serving the files to the user more quickly.
If you are mostly worried about how the abuse will affect your bills (and they can get very large so its good to be concerned about this), I would suggest that you put in some billing alerts on your account that alarm when certain thresholds are reached.
I have a step alarms set on my account so that I know when it hits 25%, 50%, 75% and 100% of what I budget each month. That way, for example, if I hit an alarm that tells me I have used 25% of my budget in the first two days of the month, I know I better look into it.