I'm trying to build a chatbot on AWS Lambda.
However, 90% of my Lambda duration is lost in requests wait time.
For each interaction a user has with my chatbot, I send approximately 3 requests (1 to Dialogflow and 2 to Messenger). I have to wait until those requests are completed because:
for Dialogflow, I need the answer
for Messenger, I need to make sure the previous message has been sent before sending the next one
Requests take approximately 400ms so for every API call to my Lambda function, I "lose" most of my duration time waiting...
Do you have any hints about how I can avoid waiting 4000ms each time ?
Maybe I should move to a more common ec2 instance.
I was first really interested in stateless and Lambda because I thought it would make sense for a chatbot, but the more I add feature in my project, the more problems I get (database connection is really long...)
It sounds like you're mostly stuck. Maybe one thing you could do is try to make as many async calls as you can in parallel. It sounds like your flow is currently like this:
Event -> Dialogflow -> Messenger -> Messenger -> Finish
You could try and combine some of these calls and execute them in parallel:
Event -> Messenger -> Messenger -> Finish
-> Dialogflow ->
AWS Lambda may not be cost-effective in cases like that.
To optimize the cost you can consider:
Use async requests as much as possible.
Reduce the lambda's memory size. It will also make it run slower, so the optimized value can be usually found by trial and error. In your case, reducing it to the minimum possible may be of best fit. Check this example.
Batch multiple events to a single invocation, and process them asynchronously. For example, in your case, you can aggregate multiple interactions of different users using services such as Kinesis Data Streams and SQS, handle them in the same invocation, and send the separate response for each one of them.
Related
My problem every 20minutes I want to execute the curl request which is around 25000 or more than that and save the curl response in database. In PHP it is not handled properly which is the best AWS services I can use except lambda.
A common technique for processing large number of similar calls is:
Create an Amazon Simple Queue Service (SQS) queue and push each request into the queue as a separate message. In your case, the message would contain the URL that you wish to retrieve.
Create an AWS Lambda function that performs the download and stores the data in the database.
Configure the Lambda function to trigger off the SQS queue
This way, the SQS queue can trigger hundreds of Lambda functions running parallel. The default concurrency limit is 1000 Lambda functions, but you can request for this to be increased.
You would then need a separate process that, every 20 minutes, queries the database for the URLs and pushes the messages into the SQS queue.
The complete process is:
Schedule -> Lambda pusher -> messages into SQS -> Lambda workers -> database
The beauty of this design is that it can scale to handle large workloads and operates in parallel, rather than each curl request having to wait. If a message cannot be processed, it Lambda will automatically try again. Repeated failures will send the message to a Dead Letter Queue for later analysis and reprocessing.
If you wish to perform 25,000 queries every 20 minutes (1200 seconds), this would need a query to complete every 0.05 seconds. That's why it is important to work in parallel.
By the way, if you are attempting to scrape this information from a single website, I suggest you investigate whether they provide an API otherwise you might be violating the Terms & Conditions of the website, which I strongly advise against.
I would like to send a push notification to users in my database in a lambda environment via SQS / messaging queue architecture, in order to do that
I would first need to query all users in my database with push notifications enabled.
loop over all of them them
send a SQS event/message for each user.
let my sqs triggered lambda handle/send the push notification
Is there a better way to implement this to avoid querying a big number of users and/or looping over all the results to send a SQS message for each?
I would take a slightly different approach here, but similar.
Query the database for the users
Loop over the users
Send one messages to SQS for a batch of records to send, and use the SendMessageBatch operation of SQS to send them. So batches of batches. Each batch of messages would have several "users" to send to, not just one. This will should increase your performance because a batch will require fewer lambda invocations.
Lambda handles SQS messages (probably more than one), and each SQS message results in sending many push notifications. In the case of Firebase I believe there is a way to send batches, which is even better. Even without that you can send several messages at once using a Promise.all type logic.
With this structure you can send a very large number of messages really quickly, and probably a lot cheaper. Imagine you need to send to 1M users. If you send batches of 100, in batches of 25 to SQS, then you have 2,500 messages per call to SQS. That would mean 400 calls to SQS, far better than even the 40K you'd have to make if you sent single messages in batches of 25.
On the receiving side, even if you throttled the SQS integration to 1 message per invocation you'd have 10,000 lambda invocations. If you assume even 1s per invocation, and 1000 concurrent invocations, it would take 10 seconds (likely less). If you send one message per user you'd have to make 1M lambda invocations. If you assume each invocation takes 100ms then you can send 10/second, so with 1000 concurrent executions it would take 100 seconds. In reality the numbers are probably even better than that for the batch version, especially if you don't limit it to 1 message at a time.
Edit
Based on the comments the question seemed to be a bit more about the first part of the process. With that in mind I'd suggest the following options.
If you find yourself needing to address the same large groups repeatedly most messaging services (Firebase and SNS for sure) support some sort of topic subscription model. Given that these are push notifications you can subscribe a device to the topic in code. What this ultimately leads to is one messages sent from your code to the messaging service. The service handles the rest. This is probably the preferred solution for anything that has mass recipients, especially if you can know the recipients up front. This even works for dynamic topics. For example, consider a situation where a person comments on a post. Any new comment on that post should send a message to everyone who has commented on that post. You can create a topic on the fly when the post is created, and add recipients to the topic as they comment. If a user wishes to stop receiving messages you can remove the user from the topic.
If you don't know the recipients up front the above solution is a solid solution. However, if you are concerned with Lambda timeouts on the first two steps I'd modify slightly. I would take advantage of AWS Step Functions and page the data in the lambda. Lambda will tell you, via the context object supplied in the invocation, how much time is remaining. You can check that periodically to determine if you should exit the lambda and pass to the step function the current paging information. The step function can pass that paging information back into the lambda, which should be coded to accept the paging information as part of the request, and continue from that point if supplied.
I would suggest an additional piece in your application architecture,
I personally prefer to avoid using the Primary database for heavy querying,
assuming you have a large user base.
I will suggest maintaining your user list in a Search Engine like ElasticSearch or CloudSearch, or a simple table with just the user list in AWS DynamoDb or create a Read Replica of your DB.
To no confuse you, use a Search Engine(first choice) or an AWS DynamoDb
This will avoid creating pressure on your database when you query the read specialty datastore and won't affect other modules in operation
And it's way fast to query this way
Step 2: loop over all of them them
Step 3: batch send messages to SQS using its SendMessageBatch method like Jason is suggesting
Step 4: Based on your SQS setting, you may process multiple messages on your Lambda function
I have defined a lambda function that is invoked from API Gateway with proxy integration. Thus, I have defined an eager resource path for it:
And referenced my lambda function:
My lambda is able to process request like GET /myresource, POST /myresource.
I have tried this strategy to keep it warm, described in acloudguru. It consists of setting up a CloudWatch event rule that invokes the lambda every 5 minutes to keep it warm. Unfortunately it isn't working.
This is the behaviour I have seen:
After some period, let's say 20 minutes, I call GET /myresource from API Gateway and it takes around 15 seconds. Subsequent requests last ~30ms. The CloudWatch event is making no difference...
Let's suppose another long period without calling the gateway. If I go to the Lambda console and invoke it directly (test button) it answers right away (less than 1ms) with a 404 (that's normal because my lambda expects GET /myresource or POST /myresource).
Immediately after this lambda console execution I call GET /myresource from API Gateway and it still takes ~20 seconds. That is to say, the function was still cold despite having being invoked from the Lambda console. This might explain why the CloudWatch event doesn't work since it calls the lambda without setting the method/resource-url.
So, how can I make this particular case with API Gateway with proxy integration + Lambda stay warm to prevent those slow first request?
As of now (2019-02-27) [1], A periodic CloudWatch event rule does not deterministically solve the cold start issue. But a periodic CloudWatch event rule will reduce the probability of cold starts.
The reason is it's upto the Lambda server to decide whether to use a new Lambda container instead of an existing container to process an incoming request. Some of the related details regarding how Lambda containers are reused is explained in [1]
In order to reduce the cold start time (not to reduce the number cold starts), can you try followings? 1. increasing the memory allocated to the function, 2. reduce the deployment package size (eg- remove unnecessary dependencies), and 3. use a language like NodeJS, Python instead of Java, .Net
[1]According to reinvent session, (39:50 at https://www.youtube.com/watch?v=QdzV04T_kec), the Lambda team expects to improve the VPC cold start latency in Lambda.
[2] https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/
Denis is quite right about the non deterministic lambda behaviour regarding the number of containers hit by CloudWatch events. I'll follow his advice to improve the startup time.
On the other hand I have managed to make my CloudWatch events hit the lambda function properly, reducing (in many cases) the number of cold starts.
I just had to add an additional controller mapped to "/" with a hardcoded response:
#Controller("/")
class WarmUpController {
private val logger = LoggerFactory.getLogger(javaClass)
#Get
fun warmUp(): String {
logger.info("Warming up")
return """{"message" : "warming up"}"""
}
}
With this in place the default (/) invocation from CloudWatch does keep the container warm most of the time.
I've scoured for any answer but everything I've read are about concurrent lambda executions and async keyword syntax in Node however I can't find information about lambda instance execution.
The genesis of this was that I was at a meetup and someone mentioned that lambda instances (i.e. a ephemeral container hosted by AWS containing my code) can only execute one request at a time. This means that if I had 5 requests come in (for the sake of simplicity lets say to an already warm instance) they would all run in a separate instance, i.e. in 5 separate containers.
The bananas thing to me is that this undermines years of development in async programming. Starting back in 2009 node.js popularized programming with i/o in mind given that for a boring run of the mill CRUD app most of your request time is spent waiting on external DB calls or something. Writing async code allowed a single thread of execution to seemingly execute many simultaneous requests. While node didn't invent it I think it's fair to say it popularized it and has been a massive driver of backend technology development over the last decade. Many languages have added features to make async programming easier (callbacks/tasks/promises/futures or whatever you want to call them) and web servers have shifted to event loop based (node, vertx, kestrel etc) away from the single thread per request models of yester year.
Anyways enough with the history lesson, my point is that if what I heard is true then developing with lambdas throws most of that out the window. If the lambda run time will never send multiple requests through my running instance then programming in an async style will just waste resources. Say for example I'm talking C# and my lambda is for retrieving widgets. Then this code var response = await db.GetWidgets() is actually inefficient because it pushes the current threadcontext onto the stack so it can allow for other code to execute while it waits for that call to comeback. Since no other request will be invoked until the original one completes it makes more sense to program in a synchronous style save for places where parallel calls can be made.
Is this correct?
If so I'm honestly shocked it's not discussed more. Async programming has paradigm shift I've seen in the last few years and this totally changes that.
TL;DR: does lambda really only allow one request execution at a time per instance? If so this up ends major shift in server development towards asynchronous code.
Yes, you are correct - Lambda will spin up multiple containers to handle concurrent requests even if your Lambda does some work asynchronously (I have confirmed this through my own experience, and many other people have observed the same behavior - see this answer to a similar question). This is true for every supported runtime (C#, Node.js, etc).
This means that async code in your Lambda functions won't allow one Lambda container to handle multiple requests at once, as you stated. That being said, you still get all the other benefits of async code and you could still potentially improve your Lambda's performance by, for example, making many web service or database calls at once asynchronously - so this property of Lambda does not make async programming useless on the platform.
Your question is :
Since no other request will be invoked until the original one completes it makes more sense to program in a synchronous style save for places where parallel calls can be made.
No because you no longer have to wait the answer as you should do if you were using a sync process. Your trigger itself must die after the call so it will free memory. Either the lamba sends a notification or triggers a new service once it is completed, either a watcher looks at the result value (it is possible to wait the answer with a sync lambda, but it is not accurate due to the underlying async process beneath lambda system itself). As an Android developper, you can compare that to intent and broadcast, and it is completely async.
It is a complete different way to design solution because the async mechanism must be managed on the workflow layer itself and no longer in the core of the app, the solution becomes an aggregation of notifiers/watchers that triggers micro-services, it is no longer a single binary of thousand lines of code.
Each lambda function must be an individual micro-services.
Coming back to handle heavy traffic, you can run millions of Lambda in parallel as long as your micro-service is ending quickly, it won't cost much.
To ensure that your workflow is not dropping anything, you can add SQS (queue messaging) in the solution.
Further to the above answer, please see here. From what I understand, it's a synchronous loop. So, the only way to make things async from a request-handling perspective is to delegate the work to a message queue, e.g. SQS, as written here. I think this is similar to how Celery is used to make Django asynchronous. Lastly, if you truly want async handling of requests in line with async/await in node.js/python/c#/c++, if you may need to use AWS Fargate / EC2 instead of Lambda. Otherwise in Lambda, as you have mentioned yourself, it's bananas indeed. On the other hand, for heavy traffic, for which async/await shows its benefits, Lambda is not a good fit. There is a break-even analysis here about the three services: ec2, Lambda and Fargate.
AWS Lambda functions are supposed to respond quickly to events. I would like to create a function that fires off a quick request to a slow API, and then terminates without waiting for a response. Later, when a response comes back, I would like a different Lambda function to handle the response. I know this sounds kind of crazy, when you think about what AWS would have to do to hang on to an open connection from one Lambda function and then send the response to another, but this seems to be very much in the spirit of how Lambda was designed to be used.
Ideas:
Send messages to an SQS queue that represent a request to be made. Have some kind of message/HTTP proxy type service on an EC2 / EB cluster listen to the queue and actually make the HTTP requests. It would put response objects on another queue, tagged to identify the associated request, if necessary. This feels like a lot of complexity for something that would be trivial for a traditional service.
Just live with it. Lambda functions are allowed to run for 60 seconds, and these API calls that I make don't generally take longer than 10 seconds. Not sure how costly it would to have LFs spend 95% of their running time waiting on a response, but "waiting" isn't what LFs are for.
Don't use Lambda for anything that interacts with 3rd party APIs that aren't lightning fast :( That is what most of my projects do these days, though.
It depends how many calls will this lambda execute monthly, and how many memory are you allocating for those lambda. The new timeout for lambda is 5 minutes, which should (hopefully :p) be more than enough for an API to respond. I think you should let lambda deal with all of it to not over complicate the workflow. Lambda pricing is generally really cheap.
E.g: a lambda executed 1 million times with 128 MB allocated during 10 seconds would cost approximatively 20$ - this without considering the potential free tier.