Cognito passwordless solution sends code multiple times - amazon-web-services

I have successfully implemented these instructions from AWS (https://aws.amazon.com/de/blogs/mobile/implementing-passwordless-email-authentication-with-amazon-cognito/), but as soon as I execute the signIn function via aws-amplify, it often takes up to 7 seconds and I receive 3 emails with different codes.
The reason for this is that the event createAuthChallenge executes the respective lambda function 3 times, which generates and sends the respective code. This only happens if I do not login/register for a certain time (~10 minutes). I thought that this might be because the function is cold and tried to keep it warm by setting "Provisioned Concurrency" in the lambda functions
CreateAuthChallenge
VerifyAuthChallenge
DefineAuthChallenge
PreSignup
PostAuthentication
to 1 and additionally(!) tried to warm up the functions by executing them every 5 minutes via cloudwatch.
I don't know what else I should do.
Thx!

We had followed a different post to setup our custom auth flow, but had the same issue with 3 codes being sent out.
In that post it has the CreateAuthChallenge lambda start with
exports.handler = async (event) => {
const crypto = require('crypto')
const aws = require('aws-sdk')
...
}
We have been able to stop sending 3 verification codes by moving those requires outside of the handler method.
const crypto = require('crypto')
const aws = require('aws-sdk')
exports.handler = async (event) => {
...
}
My guess is that trying to read the entire aws-sdk inside of the function was the cause of the slowness and because this lambda took longer than the cognito system allows for, it ended up getting called multiple times and eventually did complete, thus causing the extra verification codes.
I did not see the same issue from the link you posted, but it would worth reviewing the specific code you have and check if its trying to bring in a package that needs to be handled differently.

You get 5 seconds for the lambda to complete, otherwise it retries. Cold starts and the blocking call to send the email via SES is what is eating all of those 5 seconds. You can make the call to SES asynchronous by writing the code, email address, timestamp and other necessary details to the log instead with some fixed prefix like SEND_EMAIL. Since this is sensitive data, you should encode the data into some format like json, encrypt it and base64 encode it before writing it to the log. Then you can attach a Cloudwatch subscription filter to the lambda log to route the log lines with SEND_EMAIL to a lambda to decrypt and decode the details and send the actual email via SES. This allows you to take longer than 5 seconds to send the email and workaround the timeouts.

Related

Boto3 invocations of long-running Lambda runs break with TooManyRequestsException

Experience with "long-running" Lambda's
In my company, we recently ran into this behaviour, when triggering Lambdas, that run for > 60 seconds (boto3's default timeout for connection establishment and reads).
The beauty of the Lambda invocation with boto3 (using the 'InvocationType' 'RequestResponse') is, that the API returns the result state of the respective Lambda run, so we wanted to stick to that.
The issue seems to be, that the client fires to many requests per minute on the standing connection to the API. Therefore, we experimented with the boto3 client configuration, but increasing the read timeout resulted in new (unwanted) invocations after each timeout period and increasing the connection timeout triggered a new invocation, after the Lambda was finished.
Workaround
As various investigations and experimentation with boto3's Lambda client did not result in a working setup using 'RequestResponse' invocations,
we circumvented the problem now by making use of Cloudwatch logs. For this, the Lambda has to be setup up to write to an accessible log group. Then, these logs can the queried for the state. Then you would invoke the Lambda and monitor it like this:
import boto3
lambda_client = boto3.client('lambda')
logs_clients = boto3.client('logs')
invocation = lambda_client.invoke(
FunctionName='your_lambda',
InvocationType='Event'
)
# Identifier of the invoked Lambda run
request_id = invocation['ResponseMetadata']['RequestID']
while True:
# filter the logs for the Lambda end event
events = logs_client.filter_log_events(
logGroupName='your_lambda_loggroup',
filterPattern=f'"END RequestId: {request_id}"'
).get('events', [])
if len(events) > 0:
# the Lambda invocation finished
break
This approach works for us now, but it's honestly ugly. To make this approach slightly better, I recommend to set the time range filtering in the filter_log_events call.
One thing, that was not tested (yet): The above approach only tells, whether the Lambda terminated, but not the state (failed or successful) and the default logs don't hold anything useful in that regards. Therefore, I will investigate, if a Lambda run can know its own request id during runtime. Then the Lambda code can be prepared to also write error messages with the request id, which then can be filtered for again.

How should I handle asynchronous processes that occur after API calls in AWS?

I'm designing the backend for a website that uses API Gateway and Lambda to handle API requests, many of which target a MySQL DB on RDS. Some processes need to happen asynchronously but I'm debating which is best practice or cleaner.
In the given scenario, every time a user creates a new row in a certain table, let's say an email also needs to be sent asynchronously. There are many other scenarios similar to this but this will set precedent.
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, write to something like SQS which will later be read from another lambda that sends an email. When the response from SQS is successful that the record was added to the queue, send a 201 response saying the REST API call was successful.
Option 2: In the lambda that handles the API request, write to the MySQL instance to add the new row. When the response from the MySQL comes back successful, send a 201 response saying the REST API call was successful. Then set up a DMS (data migration service) task that runs indefinitely to send database modification binlogs to a kinesis stream which will trigger a lambda that will handle all DB changes, read the change as a new row in a certain table, and send an email.
Option 1:
less infrastructure
more direct tracking of logic from an API call
1 extra http call (to sqs) delaying response times for an api for a web page
Option 2:
more infrastructure (dms task, replication instance)
scaling out shards may mean loss of ordering when processes binlog events if ordering is a requirement (it is)
side question: Are you able to choose hash key for kinesis for dms tasks from mysql?
a single codebase for reacting to all modifications in the DB may actually make following logic in code simpler
Is this the tradeoff or am I missing something? What is best practice in this scenario?
Option 1 in my view seems most logical, but I would replace SQS and second lambda with SNS. So, modified option 1 could be:
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, publish confirmation message to SNS that sends an email. When the response from SNS is successful send a 201 response saying the REST API call was successful.
This should be faster, cheaper and easier to implement then using SQS and second lambda for sending email.

How to debounce events on AWS grouped by a key?

Our frontend application sends user actions to a lambda function behind an API gateway, which then stores these actions in dynamodb.
We then use dynamodb streams to trigger a separate lambda function that'll parse these actions in dynamodb and decide if the user's actions should result in any notifications being sent (we call these notification events).
For example, if a user places a comment in our app, we'll store a "CREATED_COMMENT" action in dynamodb, which will then trigger a new lambda through a dynamodb stream. The new lambda may then create an "email notification event", which we may send to an email provider like customer.io
However, our users have informed us that they receive emails too frequently, and thus we'd like to start sending email digests aggregating multiple actions over time into a single email rather than sending an email for each action.
Our idea was to use something like AWS EventBridge, Kinesis, Step Functions, or even DynamoDB streams to resend the dynamodb stream actions to, but then configure the new stream's events to be grouped by email address and for these events to be debounced by e.g. 10 minutes. If the user then performs a new action, that user's stream will continue gathering actions for another 10 minutes, until there's been no new actions from that user for 10 minutes. Once that happens, the stream will "release" all gathered actions and invoke a lambda function. Our lambda function will then generate the email notification event and send it to e.g. customer.io.
However, we've been unable to find such grouping and debounced flushing configuration in any of the aforementioned AWS stream services. For such a common thing as digesting (or rolling up), shouldn't there be a serverless approach to doing this without having to write our own queueing service?
The answer to me seems like using a tool such as SQS. SQS will allow you to accumulate messages into a queue and every x minutes you can then read the queue using a Lambda function to do so on a schedule event. You do not need to have a Lambda triggered by SQS, and can still read the queue "manually" from within Lambda instead.
Gareth McCumskey is on the right track.
Use a normal sqs queue for strictly for debouncing.
Set a batch window, i.e 5 seconds. Use a really large batch size when you read from the queue.
In code, use a hashMap to group your message with the same messageId together. Now use your deduped messageIDs to do your work.
I wrote a blog post on something just like this. The short version of it is that it uses a scheduled Lambda function to identify the records that need to be processed.
The problem with using the delay in SQS is that you can only receive 10 messages at a time, so in order to get all the messages you'd have to call SQS repeatedly to clear the queue. At that point, you can aggregate the messages. This doesn't scale very well, as all the messages have to be read in order for it to work. By using DynamoDB you can actually have just one record that represents the collection of records, and query the single record, which then can result in a message in a queue for that specific group of messages. Consider the following data:
user | comment | time
user 1 | comment 1 | 11:43am
user 1 | comment 2 | 11:50am
user 2 | comment 1 | 11:51am
You can add another record that is a signal for the need to send a message for each user (in this example 15 minutes after the first message).
user | scheduled
user 1 | 11:58
user 2 | 12:06
When you insert the second set of records you are inserting the time when you want to send the batch. You only do the insert if there isn't a record already, so you don't end up constantly increasing the time. Your scheduled process reads that record to know what users it needs to send messages to and collects all the data for that user. The process of sending the messages to each user can be done in parallel (you could send a message the SQS for each user or use a Map state in a step function, for example).

Alexa sent multiple request to AWS Lambda

I'm building the Alexa skill that sends the request to my web server,
then web server will do some process and upload a file to Amazon S3.
During the period of web server process, I make skill keep getting the file from Amazon S3 per 10 seconds till get the file. And the response is based on the file content.
But unfortunately, the web server process takes more than 1 minute. That means skill must stay more than 1 minute to get the file to response.
For now, I used progressive response with async await in my code,
and skill did keep waiting for the file on S3.
But I found that the skill will send the second request to Lambda after 50 seconds automatically. That means for the same skill, i got the two lambda function running at the same time.
And the execution result is : After the first response that progressive response made, 50 seconds later will hear another response that also made by the progressive response which belongs to the second request.
And nothing happened till the end.
I know it is bad to let skill waits this long, but i still want to figure out the executable way if skill needs to wait this long.
There are some points I want to figure out.
Is there anyway to prevent the skill to send the second
requests to Lambda?
Is there another way I can try to accomplish the goal?
Thanks
Eventually, I found that the second invoke of Lambda is not from Alexa, is from AWS Lambda itself. Refer to the following artical
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
So you have to deal with this kind of situation in your Lambda code. One thing can be used is these two times invoke's request id is the same. So you can tell if this is the first time execution by checking your storage for the same request id which you store at the first time execution.
Besides, I also found that once the Alexa Skill waits for more than 1 minutes, it will crash and return the error by speaking (test by Amazon Echo). And there is nothing different in the AWS Lambda log compare to the normal execution one. That meaning the Log seems to be fine but actually the execution result is not.
Hope this can help someone is also struggled at this problem.

Asynchronous HTTP request in AWS Lambda

I am wanting to execute a http request inside of a lambda function, invoked by API Gateway. The problem is, the request takes a bit of time to complete (<20 seconds) and don't want the client waiting for a response. In my research on asynchronous requests, I learned that I can pass the X-Amz-Invocation-Type:Event header to make the request execute asynchronously, however this isn't working and the code still "waits" for the http request to complete.
Below is my lambda code:
'use strict';
const https = require('https');
exports.handler = function (event, context, callback) {
let requestUrl;
requestUrl = event.queryStringParameters.url;
https.get(requestUrl, (res) => {
console.log('statusCode:', res.statusCode);
console.log('headers:', res.headers);
res.on('data', (d) => {
process.stdout.write(d);
});
}).on('error', (e) => {
console.error(e);
});
let response = {
"statusCode": 200,
"body": JSON.stringify(event.queryStringParameters)
};
callback(null, response);
};
Any help would be appreciated.
You can use two Lambda functions.
Lambda 1 is triggered by API Gateway then calls Lambda 2 asynchronously (InvocationType = Event) then returns a response to the user.
Lambda 2, once invoked, will trigger the HTTP request.
Whatever you do, don't use two lambda functions.
You can't control how lambda is being called, async or sync. The caller of the lambda decides that. For APIGW, it has decided to call lambda sync.
The possible solutions are one of:
SQS
Step Functions (SF)
SNS
In your API, you call out to one of these services, get back a success, and then immediately return a 202 to your caller.
If you have a high volume of single or double action execution use SQS. If you have potentially long running with complex state logic use SF. If you for someone reason want to ignore my suggestions, use SNS.
Each of these can (and should) call back out to a lambda. In the case that you need to run more than 15 minutes, they can call back out to CodeBuild. Ignore the name of the service, it's just a lambda that supports up to 8 hour runs.
Now, why not use two lambdas (L1, L2)? The answer is simple. Once you respond that your async call was queued (SQS, SF, SNS), to your users (202). They'll expect that it works 100%. But what happens if that L2 lambda fails. It won't retry, it won't continue, and you may never know about it.
That L2 lambda's handler no longer exist, so you don't know the state any more. Further, you could try to add logging to L2 with wrapper try/catch but so many other types of failures could happen. Even if you have that, is CloudWatch down, will you get the log? Possible not, it just isn't a reliable strategy. Sure if you are doing something you don't care about, you can build this arch, but this isn't how real production solutions are built. You want a reliable process. You want to trust that the baton was successfully passed to another service which take care of completing the user's transaction. This is why you want to use one of the three services: SQS, SF, SNS.