Authentication with Cognito - where to find logs - amazon-web-services

We have 2 React Native app are using AWS Cognito for authentication. We use library react-native-aws-cognito-js in our code. The apps are working fine until these 2 days. Apps are experiencing intermittent "Internal Server Error".
How can I find more information about this error? Any tool can help us pinpoint the cause?
Update
From CloudTrail, each API call has an event "CreateNetworkInterface". Many of such API calls have error code "Client.NetworkInterfaceLimitExceeded". What is the cause and solution to this?
According to this AWS Doc (in Chinese), CloudWatch will not write to log when error is due to insufficient IP/ENI. That explains the increase in error number but no logs in CloudWatch.
Upate 2
We have found a scheduled Lambda job which may exhausted IP addresses. We stopped the batch job. But still can't have too many user login to server due to "Client.NetworkInterfaceLimitExceeded" error. I realized that there are many "CreateNetworkInterface" event and few "DeleteNetworkInterface" event. How can I "clean up / reset" all network interface in VPC?

Short answer: Cloud Trail.
Long answer with a suggestion
Assuming your application code is fine, most likely the cause of your 500 error is based on Cognito's initial limitations (e.g., number of calls per user): https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html.
AWS suggests to use Cloud Trail, for logging Api calls.
However I would suggest, to prove the limitations first, add some logs around the api call yourself, and in development you could call your app/api with a high number of calls; and most likely you will see the 500 error due to the limitations.
You could do the following in the terminal:
for i in `seq 1 1000`; do curl --cookie SecureCookie=TokenValueFromAWS http://localhost:desirablePort/SecuredPath; done

Related

Mysterious 500 error with AWS Lambda; unable to debug

I have an API that I host using Lambda (nodejs), with API-gateway. I'm using serverless to deploy.
Generally things have been fine, but while I was working on a specific function today, I started to receive HTTP 500 errors when hitting the endpoint. However, while there were still API-Gateway access logs for the end point, there were no Cloudwatch logs for the lambda functions getting hit. I was able to verify that the Authorizer was getting hit successfully, and not returning any issue (if it was, it would have been a 401). After using CLI tools to invoke the function from the command line, the 500 error went away and I was able to successfully hit the endpoints again.
Has anyone ever ran into this before? If I'm missing a debug step, I would really like to know. It was really concerning that my API could be generating 500 errors with no paper trail to help me understand what was happening.
You can check your role and permissions ,this link could help you https://aws.amazon.com/premiumsupport/knowledge-center/api-gateway-lambda-stage-variable-500/
Also you can debug further with X-ray : https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html

GCP PubSub: "The request was aborted because there was no available instance." - Doesn't Retry on Failure

We have a pubsub subscription setup passing requests to a Google Cloud Function.
Both the cloud function and the subscription to it are set to "Retry on Failure" (both with exponential back-off policies fwiw).
The Google Cloud Function is limited to 40 concurrent instances.
When the subscription queue is larger than the available instances, the expected behaviour is delivery will fail and be retried later.
What seems to be happening is the logs are filled with messages saying:
{
"textPayload": "The request was aborted because there was no available instance.",
"insertId": "6109fbbb0007ec4aaa3855a9",
...
}
And the subscription messages are just dropped and not retried.
Is this the expected behaviour? It seems crazy to me but if so, what architecture should you put in place to catch these dropped messages?
Edit: These issues started showing up in our logs on July 5 2021 and can't be found in logs before that date. Before that, the pubsub/gcf combo used to work as expected.
The error you are encountering is a known issue and the updates can be tracked through this Issue Tracker. You can also STAR the issue to receive automatic updates and give it traction by referring to this link. The tracker also discusses work-arounds to mitigate the request aborts. Since you have already implemented retries with exponential backoff, please take a look at the other solutions provided here.
If your concern is to do with Google Cloud Functions scalability or in general require further investigation of these errors, please reach out to GCP support in case you have a support plan. Otherwise, please open an issue in the issue tracker.

When using the Admin SDK directory API to insert Org Units a dailyLimitExceeded error is returned even though that quota has not been reached

I work for an Student Information System and we're using the Admin SDK directory API to create school districts Google Org Unit structures from within our software.
POST https://www.googleapis.com/admin/directory/v1/customer/customerId/orgunits
When generating these API requests we're consistently receiving dailyLimitExceeded errors even when the district's quota has not been reached.
This error can be bypassed by ignoring the error, and implementing an exponential back-off routine, but I believe this to be acting much more like the quotaExceeded error is intended to act rather than dailyLimitExceeded, in that the request succeeds afterward on the first retry of this request.
In detail, the test I just ran successfully completed 9 of these API calls and then I received this response on the 10th:
Google.Apis.Requests.RequestError
Quota limit exceeded for the day. [403]
Errors [Message[Quota limit exceeded for the day.] Location[ - ] Reason[dailyLimitExceeded] Domain[usageLimits]
From the start of the batch of API calls it took about 10 seconds to get to the point where the error occurred.
Thanks for your help!
What I would suggest is to slow down your API requests. Don't make like 10 requests in 1 second. Give it a space in between requests. You are correct to implement exponential backoff. Also, if you can, use other accounts as well to make requests.

AWS Elastic Beanstalk GUI: Query failed to deserialize response

after running AWS Elastic Beanstalk application for few weeks suddenly I can't open my application. Page simply displays an error which doesn't provide much information how to fix it.
Error
A problem occurred while loading your page: AWS Query failed to deserialize response
(and there is no more information, Googling also haven't found any answer)
So before updating my subscription and starting paying to Amazon not insignificant amount of money for being able to contact their technical support I thought I will ask here first if someone here encountered this issue.
Thanks for any suggestions.
After receiving this generic error, I was able to dig into the actual error message by using the EB CLI. In my case the CLI threw "ZIP does not support timestamps before 1980".

How to remove RequestTimeTooSkewed check from Amazon?

I have a Java 7 "agent" program running on several client machines (mostly Windows XP). My "agent" uploads client files to Amazon S3 and often I get this error:
RequestTimeTooSkewed
I know this is because the client's computer system time difference is too large compared to Amazon's. Here's my problem: I can't control the client's computer (system) time! So, I don't want Amazon to care about time differences.
I heard about jets3t, but I'm hoping not having to resort to yet another tool (agent footprint must remain small).
Any ideas how to remove this check and get rid of this pesky error?
Error detail:
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 59C9614D15006F23, AWS Error Code: RequestTimeTooSkewed, AWS Error Message: The difference between the request time and the current time is too large., S3 Extended Request ID: v1pGBm3ed2J9dZ3sG/3aDrG3DUGSlt3Ac+9nduK2slih2wyaAnc1n5Jrt5TkRzlV
The error is coming from the S3 service, not from the client, so there really isn't anything you can do other than correct the clock on the client. That check is being done on the service to help detect and prevent replay attacks so it's an important part of the overall security of the service.
Trying a different client-side SDK won't help.