I've scoured for any answer but everything I've read are about concurrent lambda executions and async keyword syntax in Node however I can't find information about lambda instance execution.
The genesis of this was that I was at a meetup and someone mentioned that lambda instances (i.e. a ephemeral container hosted by AWS containing my code) can only execute one request at a time. This means that if I had 5 requests come in (for the sake of simplicity lets say to an already warm instance) they would all run in a separate instance, i.e. in 5 separate containers.
The bananas thing to me is that this undermines years of development in async programming. Starting back in 2009 node.js popularized programming with i/o in mind given that for a boring run of the mill CRUD app most of your request time is spent waiting on external DB calls or something. Writing async code allowed a single thread of execution to seemingly execute many simultaneous requests. While node didn't invent it I think it's fair to say it popularized it and has been a massive driver of backend technology development over the last decade. Many languages have added features to make async programming easier (callbacks/tasks/promises/futures or whatever you want to call them) and web servers have shifted to event loop based (node, vertx, kestrel etc) away from the single thread per request models of yester year.
Anyways enough with the history lesson, my point is that if what I heard is true then developing with lambdas throws most of that out the window. If the lambda run time will never send multiple requests through my running instance then programming in an async style will just waste resources. Say for example I'm talking C# and my lambda is for retrieving widgets. Then this code var response = await db.GetWidgets() is actually inefficient because it pushes the current threadcontext onto the stack so it can allow for other code to execute while it waits for that call to comeback. Since no other request will be invoked until the original one completes it makes more sense to program in a synchronous style save for places where parallel calls can be made.
Is this correct?
If so I'm honestly shocked it's not discussed more. Async programming has paradigm shift I've seen in the last few years and this totally changes that.
TL;DR: does lambda really only allow one request execution at a time per instance? If so this up ends major shift in server development towards asynchronous code.
Yes, you are correct - Lambda will spin up multiple containers to handle concurrent requests even if your Lambda does some work asynchronously (I have confirmed this through my own experience, and many other people have observed the same behavior - see this answer to a similar question). This is true for every supported runtime (C#, Node.js, etc).
This means that async code in your Lambda functions won't allow one Lambda container to handle multiple requests at once, as you stated. That being said, you still get all the other benefits of async code and you could still potentially improve your Lambda's performance by, for example, making many web service or database calls at once asynchronously - so this property of Lambda does not make async programming useless on the platform.
Your question is :
Since no other request will be invoked until the original one completes it makes more sense to program in a synchronous style save for places where parallel calls can be made.
No because you no longer have to wait the answer as you should do if you were using a sync process. Your trigger itself must die after the call so it will free memory. Either the lamba sends a notification or triggers a new service once it is completed, either a watcher looks at the result value (it is possible to wait the answer with a sync lambda, but it is not accurate due to the underlying async process beneath lambda system itself). As an Android developper, you can compare that to intent and broadcast, and it is completely async.
It is a complete different way to design solution because the async mechanism must be managed on the workflow layer itself and no longer in the core of the app, the solution becomes an aggregation of notifiers/watchers that triggers micro-services, it is no longer a single binary of thousand lines of code.
Each lambda function must be an individual micro-services.
Coming back to handle heavy traffic, you can run millions of Lambda in parallel as long as your micro-service is ending quickly, it won't cost much.
To ensure that your workflow is not dropping anything, you can add SQS (queue messaging) in the solution.
Further to the above answer, please see here. From what I understand, it's a synchronous loop. So, the only way to make things async from a request-handling perspective is to delegate the work to a message queue, e.g. SQS, as written here. I think this is similar to how Celery is used to make Django asynchronous. Lastly, if you truly want async handling of requests in line with async/await in node.js/python/c#/c++, if you may need to use AWS Fargate / EC2 instead of Lambda. Otherwise in Lambda, as you have mentioned yourself, it's bananas indeed. On the other hand, for heavy traffic, for which async/await shows its benefits, Lambda is not a good fit. There is a break-even analysis here about the three services: ec2, Lambda and Fargate.
Related
I'm not sure if this would be better served on ServerFault or Software Engineering, willing to move this post if appropriate.
We have somewhat recently started to move some of our data processing pipeline to use queues to manage individual bits of data, whereas previously we had timed lambdas that would pull all data since last change.
While making this change, we noticed that queues didn't work quite as we had anticipated first of all - we thought lambda would just pull items off the queue as the lambdas had availability. Instead, it seems the aws managed lambda trigger grabs a chunk of messages (up to ten) and throws it at the lambda service. If lambda doesn't have availability, the message gets throttled, then replayed after a backoff time, up til our configured replay "error" limit (five). After that, it's thrown into our dead letter queue.
We see a handful of message per day end up in the dead letter queue as a result of throttling. We then throw these back into the main queue (we have a process to do so every handful of hours). However, we weren't 100% sure throttling was the reason for things being pushed over since nothing indicates why the messages are moved over - we just assumed as much because we weren't getting any error logs for those messages. We contacted Amazon support to ask about this, and they were able to actually confirm the messages were in fact "errored" as a result of throttling.
We asked further into their recommendations for this - this must be a common problem right? They first off suggested upping our replay limit, which seemed an obvious no go. Replays occur for any failure, so that would just hammer our lambdas with bad requests when they came through. Asked also if there's any way to differentiate the errors because we don't care for throttling, we'd happily let those retry a dozen times if needed - but no. The other suggestion they had was to manage the queue ourselves from our lambdas. Build our own code within our lambdas to pull messages and then delete them after processing. This seems really counter-intuitive, though - why would every AWS consumer build their own infrastructure?
So I guess my question is, is this what others are doing? Are you using the built in lambda triggers? Are you creating your own code for managing queue consumption? Do you see these sorts of throttling, or is there anything we could do differently? Are there any difference with other services to manage this?
Best practice is to handle errors in your code and manually delete messages that have succeeded. That allows you to handle poison messages without reprocessing the good messages again. Throttles shouldn't be ending up in a DLQ that often. This video from re:Invent 2020 has a good explaination of how this works. Scalable serverless event-driven architectures with SNS, SQS & Lambda. Start at about the 20 minutes mark to get into SQS error handling.
I am using provisioned lambdas for running my application. The initialisation time for my function is quite high. I was wondering if there was a way to check at runtime if the lambda under consideration was a provisioned one or not. If it isn't a provisioned lambda I would like to handle the request in a very lean manner cutting out a lot of activities done during initialisation and during the actual request handling to provide a degraded experience.
Also it seems unlikely but could I essentially spin off a different background thread during the initialisation phase to take care of the heavy activities and have a flag implemented in my code that checks if the initialisation is complete. If it isn't and I start processing the request, I go ahead with the degraded experience otherwise I fulfill my request normally. I am not sure how the background thread will behave in Lambda.
I'm trying to build a chatbot on AWS Lambda.
However, 90% of my Lambda duration is lost in requests wait time.
For each interaction a user has with my chatbot, I send approximately 3 requests (1 to Dialogflow and 2 to Messenger). I have to wait until those requests are completed because:
for Dialogflow, I need the answer
for Messenger, I need to make sure the previous message has been sent before sending the next one
Requests take approximately 400ms so for every API call to my Lambda function, I "lose" most of my duration time waiting...
Do you have any hints about how I can avoid waiting 4000ms each time ?
Maybe I should move to a more common ec2 instance.
I was first really interested in stateless and Lambda because I thought it would make sense for a chatbot, but the more I add feature in my project, the more problems I get (database connection is really long...)
It sounds like you're mostly stuck. Maybe one thing you could do is try to make as many async calls as you can in parallel. It sounds like your flow is currently like this:
Event -> Dialogflow -> Messenger -> Messenger -> Finish
You could try and combine some of these calls and execute them in parallel:
Event -> Messenger -> Messenger -> Finish
-> Dialogflow ->
AWS Lambda may not be cost-effective in cases like that.
To optimize the cost you can consider:
Use async requests as much as possible.
Reduce the lambda's memory size. It will also make it run slower, so the optimized value can be usually found by trial and error. In your case, reducing it to the minimum possible may be of best fit. Check this example.
Batch multiple events to a single invocation, and process them asynchronously. For example, in your case, you can aggregate multiple interactions of different users using services such as Kinesis Data Streams and SQS, handle them in the same invocation, and send the separate response for each one of them.
AWS Lambda functions are supposed to respond quickly to events. I would like to create a function that fires off a quick request to a slow API, and then terminates without waiting for a response. Later, when a response comes back, I would like a different Lambda function to handle the response. I know this sounds kind of crazy, when you think about what AWS would have to do to hang on to an open connection from one Lambda function and then send the response to another, but this seems to be very much in the spirit of how Lambda was designed to be used.
Ideas:
Send messages to an SQS queue that represent a request to be made. Have some kind of message/HTTP proxy type service on an EC2 / EB cluster listen to the queue and actually make the HTTP requests. It would put response objects on another queue, tagged to identify the associated request, if necessary. This feels like a lot of complexity for something that would be trivial for a traditional service.
Just live with it. Lambda functions are allowed to run for 60 seconds, and these API calls that I make don't generally take longer than 10 seconds. Not sure how costly it would to have LFs spend 95% of their running time waiting on a response, but "waiting" isn't what LFs are for.
Don't use Lambda for anything that interacts with 3rd party APIs that aren't lightning fast :( That is what most of my projects do these days, though.
It depends how many calls will this lambda execute monthly, and how many memory are you allocating for those lambda. The new timeout for lambda is 5 minutes, which should (hopefully :p) be more than enough for an API to respond. I think you should let lambda deal with all of it to not over complicate the workflow. Lambda pricing is generally really cheap.
E.g: a lambda executed 1 million times with 128 MB allocated during 10 seconds would cost approximatively 20$ - this without considering the potential free tier.
Lambda has some concurrency limits that when hit, cause subsequent invocations to get throttled.
This makes sense, but is it possible to detect this situation ahead of time and start applying backpressure?
The problem is that (according to the docs) the concurrency limit is per-account, which means a single runaway microservice can block ALL unrelated services.
For example: a lambda fn with an s3 event source could easily lead to API Gateway handlers being throttled and unhappy API users.
Is there any QoS for lambda functions? It'd be great to be able to give public-facing functions priority. (I know the answer is no, but I wish there were.)
Short of that, is it possible to detect that you're nearing this concurrency limit and build backpressure in?
I'm not seeing anything, and the only solution I can think of at this moment is to create a metric that watches for Throttles and as soon as one happens, toggle some flag somewhere? This adds significant complexity though...
Any ideas?