How AWS Cognito User Pool defends against bruteforce attacks - amazon-web-services

I am going to use AWS Cognito User Pool product as user directory for application and have several questions:
Is Amazon throttle request to Cognito User Pool and if yes what is the rate limit of calls to get throttled?
How Cognito defends against bruteforce attack on login/password?

After couple of hours search I found this two exceptions in source code:
TooManyFailedAttemptsException This exception gets thrown when the
user has made too many failed attempts for a given action (e.g., sign
in).
HTTP Status Code: 400
TooManyRequestsException This exception gets thrown when the user has
made too many requests for a given operation.
HTTP Status Code: 400
Also, I tried to log in with wrong credentials to test limits, I get NotAuthorizedException: Password attempts exceeded exception after 5. attempt.
In a similar scenario, I tried to brute force to forgot password but after 10 failed attempt I got LimitExceededException: Attempt limit exceeded, please try after some time.
I think that is how they do it.

Yes, Cognito User Pools protects against brute force attacks by using various security mechanisms. Throttling is one of those of mechanisms. We do not share limits as they vary dynamically.

This contains the latest documentation on the lockout policies for Cognito.
https://docs.aws.amazon.com/cognito/latest/developerguide/amazon-cognito-user-pools-authentication-flow.html
We allow five failed sign-in attempts. After that we start temporary lockouts with exponentially increasing times starting at 1 second and doubling after each failed attempt up to about 15 minutes. Attempts during a temporary lockout period are ignored. After the temporary lockout period, if the next attempt fails, a new temporary lockout starts with twice the duration as the last. Waiting about 15 minutes without any attempts will also reset the temporary lockout. Please note that this behavior is subject to change.

Rather than (or in addition to) focusing on bruteforcing the login endpoint, I think forgot password flow deserves some attention.
Forgot password email contains a 6-digit code that can be used to set new password.
This code is valid for 1 hour. User Pools code validity resource quotas.
In my tests I could make 5 attempts to set new password within an hour for a single user before throttling came into effect (LimitExceededException: Attempt limit exceeded, please try after some time.)
Now, if I do the math correctly, there are 1000000 possible values for a code (from my tests I never saw codes starting with 0 so there may be less). You have 5 attempts/hr to guess the code. So each hour you have 5/1000000*100=0.0005% chance to succeed with resetting the password without knowing the code.
Is this a small chance? It seems so.
Considering a large-scale attack bruteforcing multiple users with retries concurrently should I sleep well at night? I don't know!
To solve the issue once and for all, why can't Cognito use longer codes that are hard to guess (I want to sleep well at night). Maybe it has something to do with the fact that the same codes mechanism is used in text messages. I wish there was an official comment.

Related

How to validate the RefreshToken programmatically in Google?

We are using Google Ads API and we wanted to validate the Refresh token programmatically, as using a incorrect refresh token or expired refresh token is taking lot of time before giving an exception(60 mins approx or even more) and hence causing a 504 TIMEOUT. Also there is a limitation on number of refresh token that we can create which is at max 50 refresh token at a time and if we create new 51st refresh token then the oldest one will expire. And hence chances of getting into this issue is more likely so we wanted to know if there is some API via which we can validate and then take appropriate actions instead of direct calling Google Ads API and getting into TIMEOUT ISSUE.
We also reached out to Google ads forum for this requirement and suggested to reach out GCP support ref link to Question asked: https://groups.google.com/g/adwords-api/c/tqOdXsnL5NI
We tried calling listaccessiblecustomers .
And we were expecting to get some invalid Exception in some ms or some secs so that we can log it for Error notification to our customers instead, after calling the API the call got stuck for almost 61 mins and then 504 TIMEOUT occurred.
You really need to post your code. You said you tried calling the listaccessiblecustomers service, but how? Are you using the client libraries? If so, what language are you even using?
You need to put in a bit of effort if you need some help. Remember, we can't see what you see on the screen in front of you.

AWS lambda execution fails only first time I run it with 'customer function error'

I trigger a lambda function via API gateway and everything works perfectly with the one exception that the very first time I trigger it on a given day it fails.
Strangely, the lambda function logs don't show any errors. I get my usual START log statement and then the request and context of the trigger, then after 5s, it ends unexpectedly.
When I look into the API gateway logs this is the error it returns:
Lambda execution failed with status 200 due to customer function error: 2018-12-10T11:00:31.208Z cc233168-fc9n-11fc-a05a-577bb4sd2b2ccc Task timed out after 5.01 seconds.
Has anyone encountered a similar problem? What is customer function error and how may I resolve this?
without knowing much of the background code you are using, i would termed this a Cold Start. Cold start happens for the first request where your function has not be called for a very long time. If you notice error message says "Time Out after 5.01 seconds. which is default set. you can increase a time out.
Alternatively, you could consider reducing the impact of cold starts by reducing the length of cold starts reference :
by authoring your Lambda functions in a language that doesn’t incur a high cold start time — i.e. Node.js, Python, or Go
choose a higher memory setting for functions on the critical path of handling user requests (i.e. anything that the user would have to wait for a response from, including intermediate APIs)
optimizing your function’s dependencies, and package size
You can also explore by putting a cron job through Cloud Watch after every specific interval to call your API through PING
Adding to Yash's answer:
I've only seen Lambda execution failed with status 200 in API Gateway execution logs, though in case it can manifest in other ways: ensure you have execution logging enabled for the endpoint. If you didn't already have it enabled you'll need to wait for the problem to manifest again.
You can verify it's a cold start problem as follows:
In the log entry with the error grab the #logStream value and the timestamp for the event; it'll be a long string of alphanumerics like a4f8115980dc83a511eeedc493a78741
Open the log group for that endpoint's execution log -> find the log stream with the identifier you just grabbed
Narrow the date/time range to a window around the time where the event occurred
If you chose a narrow window and if it's a cold start problem: I would expect the offending request to be the first one in the list. Click the There are older events to load. Load more. at the top of the list.
You should now see a gap of time between the last request received and the offending request.
In my case the error says connection reset by peer which leads me to think it's behaving as though a virtual machine were put to sleep then awoken in the sense that it believes TCP connections it previously had open are still valid.
In the short term the solution we're going with is to implement a retry strategy.
Besides the cold-start problem, there's another potential aspect of this problem: your API Gateway access log format.
Do the following:
Find the access log entries that correspond to the offending request in the execution log.
Is the HTTP status == 502?
502s in the API Gateway access log usually (always?) indicate the Lambda responded with malformed JSON.
The most obvious reason for it returning malformed JSON is a bug in your code. One of the less obvious reasons: a mistake in the access log format.
If you suspect that's the case, look for the following:
Quoted fields that shouldn't be; eg $context.error.messageString
Un-quoted fields that should be. A common idiom is to leave numeric fields un-quoted because it makes insights queries like this work: | filter #status >= 500. As convenient as that is, if the field isn't guaranteed to produce a numeric result then the JSON response will be malformed.
Trailing commas in {} bodies
Here's the documentation for many of the the context variables, though one thing to keep in mind: the context variables that are available differ between the different API Gateway endpoint types (lambda, websocket, etc).

How to edit the limit of attempts to change a password in AWS Cognito?

I have implemented a change password feature and now I would like to test it. But I face the limit of attempts.
What should I do to prevent this error?
Attempt limit exceeded, please try after some time
I am on the Cognito team. This is not configurable. We do have protection mechanisms to prevent users from abusing forgot password APIs which is probably what you are witnessing.
this is not the exact answer e.g. attempts limit is not configurable for sure.
but still, if you want to test multiple times, you can try different emails e.g.
if attempts limits exceed for your Email1, you can start attempting with Email2.
Also, note that you can receive the emails for Email1 and Email2 on a single email address e.g suppose your Email1= xyz#gmail.com you can register your Email2 =xyz+1#gmail.com
This way you can receive emails on xyz#gmail.com for both Email1 and Email2
Cognito allows 5 password reset/sign-in attempts.
After the allowed number is exceeded the service starts temporary lockouts with exponentially increasing times.
Here you can find more details on how it happens:
https://docs.aws.amazon.com/cognito/latest/developerguide/amazon-cognito-user-pools-authentication-flow.html
The default lockout behavior is as follows:
Users can attempt but fail to sign in correctly five times before
Amazon Cognito temporarily locks them out. Lockout time starts at one
second and increases exponentially, doubling after each subsequent
failed attempt, up to about 15 minutes. Amazon Cognito ignores
attempts to log in during a temporary lockout period, and these
attempts don't initiate a new lockout period. After a user waits 15
minutes, Amazon Cognito resets the temporary lockout. This behavior is
subject to change.
https://docs.aws.amazon.com/cognito/latest/developerguide/amazon-cognito-user-pools-authentication-flow.html
Workaround solution:
If resetting password using email , you can use something like guerrillamail to get many new temporary emails to work around the limitExceeded exception.
If resseting password using phone , try using a virtual phone number provider like google voice to get at least 2 phone-#'s to work around the limitExceeded exception.
The workaround I have used (while testing user confirmation flow) is: once the limit is reached, make sure to complete a correct flow to reset the exponential backoff , then delete account and continue testing after re-creating the account.

When using the Admin SDK directory API to insert Org Units a dailyLimitExceeded error is returned even though that quota has not been reached

I work for an Student Information System and we're using the Admin SDK directory API to create school districts Google Org Unit structures from within our software.
POST https://www.googleapis.com/admin/directory/v1/customer/customerId/orgunits
When generating these API requests we're consistently receiving dailyLimitExceeded errors even when the district's quota has not been reached.
This error can be bypassed by ignoring the error, and implementing an exponential back-off routine, but I believe this to be acting much more like the quotaExceeded error is intended to act rather than dailyLimitExceeded, in that the request succeeds afterward on the first retry of this request.
In detail, the test I just ran successfully completed 9 of these API calls and then I received this response on the 10th:
Google.Apis.Requests.RequestError
Quota limit exceeded for the day. [403]
Errors [Message[Quota limit exceeded for the day.] Location[ - ] Reason[dailyLimitExceeded] Domain[usageLimits]
From the start of the batch of API calls it took about 10 seconds to get to the point where the error occurred.
Thanks for your help!
What I would suggest is to slow down your API requests. Don't make like 10 requests in 1 second. Give it a space in between requests. You are correct to implement exponential backoff. Also, if you can, use other accounts as well to make requests.

Unusual request activity log found in django server

Following is the screenshot of the server activity log.I can see that many requests are automatically raised in the server.How can I avoid this.?
It looks like someone is fuzzing your website and scanning to find any common file names or extensions that commonly have security vulnerabilities. One way to limit this behaviour is to implement rate limiting whereby you might limit the number of requests a user makes that result in HTTP 404 Not Found during some time period before giving them a temporary ban. Note: this solution doesn't stop this from happening but it does buy you time and may deter the attacker or researcher