I'm using a Google Cloud Function to run a function off an HTTP trigger. The code itself is non-idempotent, which is expected and desired for us as it's using an external API.
However, quick and repeated triggers of the cloud function yields the same output repeatedly (generally while there is 1 active instance), and we need the output to be different every time the function is triggered.
Unsure if this is due to instance caching or something else, but is there any known workaround to ensure that every time the cloud function is HTTP triggered we get a new output, even when triggered seconds apart?
Thanks.
Related
AWS Lambda initialization issue:
We got an app (lambda based architecture) that uses an excel Spreadsheet (10MB size) which is loaded using apache POI XSSFWorkbook instance. Here what we have tried:
First we tried to load the spreadsheet at the init phase, but this was problematic since the startup of the lambda was taking long and generating the lambda to restart multiple times to finally get it up. (This is due to the init time limit of lambdas of 10 seconds. See this). Some times the response was just a timeout error (the api gateway limit).
We moved the load of the spreadsheet to the function so this was not longer part of the init phase. (Once the instance is created, that's the instance to be used while the lambda is up). This solved the time out error and the init restart. But the lambda takes close to 15 seconds (which is not acceptable) to return the response in the first call, then it behaves normally.
We tried provisioned concurrency, which allows to have lambdas 'warm' but this feature depends on the requests, so it works fine when lambda is frequently demanded, but when it is not, (two requests within 20 minutes or less), the lambda will need to go through the whole spreadsheet upload process (only init phase will keep warm) meaning, we will get again a 16 seconds request.
Is there any advice or any possible solution you can think of to manage this situation in order to never get this >16 secs request? (get the spreadsheet available in memory even if the lambda has not received requests).
Thanks in advance.
i have gone through the site but unable get the root cause of my issue.
we have a lambda that will run for every 50 seconds. the first run of lambda is a cold start. during the start all the necessary dependencies for the lambda are prepared ( all the interfaces ).. Lambda handler will have its own code to interact with SQS and SWF. during the first run from the cloud watch logs it is clear that it is reading the base file to get all the services. then lambda handler will start. from second run only lambda handler will get invoked after 50th second. So far everything is going smooth.
All of sudden we noticed the lambda took more than 50 seconds ( in general it finishes below 10s). log shows that lambda got timed out and freshly it started to initializing all the dependencies again.
This is not giving any clue to us as after the timeout the subsequent run works smooth. Its not good to see lambda timed out. Definitely lambda code is without errors.
Could this be any container issue? Does the container have any time period that it will keep data active till it reaches the expiry time out.
Can we able to access the container object to find out more information? we have 2 or more dev environments. this behavior is different for different environments. for some it happens for every 3 days. some time in a day it happens thrice.
if we want to understand the properties of the container object how can we do it? Is it a grey zone that only AWS can access it? Lambda code is written in c# using net core App 2.0. thought of checking the cloud trail log for this lambda during the invocation. there too i am not able to find the reason behind the timeout.
we have more than 20 lambda's for dev and 10 for test in each different regions. its not getting clear to us which lambda will time out.
Any suggestions or idea's will help me a lot???????
thankyou.
Lambda containers will not live indefinitely. If you are seeing occasional "cold starts" then that is normal behavior. If you're running only 1 invocation at a time (i.e. you only have a single lambda instance) you can still expect to see the container recycled every few hours. In general, I understand AWS is trying to give us fewer cold starts but you can still expect to get a new container and new cold start from time to time.
When I try to invoke a method that has a HTTP event it results in 500 Internal server error.
On CloudWatch logs it shows Recoverable error occurred (Rate Exceeded.)
When I try invoke a function without lambda it executes with response.
Here is my serverless config:
You have set your Lambda's reservedConcurrency to 0. This will prevent your Lambda from ever being invoked. Setting it to 0 is usually useful when your functions are getting invoked but you're not sure why and you want to stop it right away.
If you want to have it invoked, change reservedConcurrency to a positive integer (by default, it can be a positive integer <= 1000, but you can increase this limit by contacting AWS) or simply remove the reservedConcurrency attribute from your .yml file as it will use the default values.
Why would one ever use reservedConcurrency anyways? Well, let's say your Lambda functions are triggered by requests from API Gateway. Let's say you get 400 (peak hours) requests/second and, upon every request, two other Lambda functions are triggered, one to generate a thumbnail for a given image and one to insert some metadata in DynamoDB. You'd have, in theory, 1200 Lambda functions running at the same time (given all of your Lambda functions finish their execution in less than a second). This would lead to throttling as the default concurrent execution for Lambda functions is 1000. But is the thumbnail generation as important as the requests coming from API Gateway? Very likely not as it's naturally an eventually consistent task, so you could set reservedConcurrency on the thumbnail Lambda to only 200, so you wouldn't use up your concurrency, meaning other functions would be able to spin up to do something more useful at a given point in time (in our example, receiving HTTP requests is more important than generating thumbnails). The other 800 left concurrency could then be split between the function triggered from API Gateway and the one that inserts data into DynamoDB, thus preventing throttling for the important stuff and keeping the not-so-important-stuff eventually consistent.
I have defined a lambda function that is invoked from API Gateway with proxy integration. Thus, I have defined an eager resource path for it:
And referenced my lambda function:
My lambda is able to process request like GET /myresource, POST /myresource.
I have tried this strategy to keep it warm, described in acloudguru. It consists of setting up a CloudWatch event rule that invokes the lambda every 5 minutes to keep it warm. Unfortunately it isn't working.
This is the behaviour I have seen:
After some period, let's say 20 minutes, I call GET /myresource from API Gateway and it takes around 15 seconds. Subsequent requests last ~30ms. The CloudWatch event is making no difference...
Let's suppose another long period without calling the gateway. If I go to the Lambda console and invoke it directly (test button) it answers right away (less than 1ms) with a 404 (that's normal because my lambda expects GET /myresource or POST /myresource).
Immediately after this lambda console execution I call GET /myresource from API Gateway and it still takes ~20 seconds. That is to say, the function was still cold despite having being invoked from the Lambda console. This might explain why the CloudWatch event doesn't work since it calls the lambda without setting the method/resource-url.
So, how can I make this particular case with API Gateway with proxy integration + Lambda stay warm to prevent those slow first request?
As of now (2019-02-27) [1], A periodic CloudWatch event rule does not deterministically solve the cold start issue. But a periodic CloudWatch event rule will reduce the probability of cold starts.
The reason is it's upto the Lambda server to decide whether to use a new Lambda container instead of an existing container to process an incoming request. Some of the related details regarding how Lambda containers are reused is explained in [1]
In order to reduce the cold start time (not to reduce the number cold starts), can you try followings? 1. increasing the memory allocated to the function, 2. reduce the deployment package size (eg- remove unnecessary dependencies), and 3. use a language like NodeJS, Python instead of Java, .Net
[1]According to reinvent session, (39:50 at https://www.youtube.com/watch?v=QdzV04T_kec), the Lambda team expects to improve the VPC cold start latency in Lambda.
[2] https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/
Denis is quite right about the non deterministic lambda behaviour regarding the number of containers hit by CloudWatch events. I'll follow his advice to improve the startup time.
On the other hand I have managed to make my CloudWatch events hit the lambda function properly, reducing (in many cases) the number of cold starts.
I just had to add an additional controller mapped to "/" with a hardcoded response:
#Controller("/")
class WarmUpController {
private val logger = LoggerFactory.getLogger(javaClass)
#Get
fun warmUp(): String {
logger.info("Warming up")
return """{"message" : "warming up"}"""
}
}
With this in place the default (/) invocation from CloudWatch does keep the container warm most of the time.
I trigger a lambda function via API gateway and everything works perfectly with the one exception that the very first time I trigger it on a given day it fails.
Strangely, the lambda function logs don't show any errors. I get my usual START log statement and then the request and context of the trigger, then after 5s, it ends unexpectedly.
When I look into the API gateway logs this is the error it returns:
Lambda execution failed with status 200 due to customer function error: 2018-12-10T11:00:31.208Z cc233168-fc9n-11fc-a05a-577bb4sd2b2ccc Task timed out after 5.01 seconds.
Has anyone encountered a similar problem? What is customer function error and how may I resolve this?
without knowing much of the background code you are using, i would termed this a Cold Start. Cold start happens for the first request where your function has not be called for a very long time. If you notice error message says "Time Out after 5.01 seconds. which is default set. you can increase a time out.
Alternatively, you could consider reducing the impact of cold starts by reducing the length of cold starts reference :
by authoring your Lambda functions in a language that doesn’t incur a high cold start time — i.e. Node.js, Python, or Go
choose a higher memory setting for functions on the critical path of handling user requests (i.e. anything that the user would have to wait for a response from, including intermediate APIs)
optimizing your function’s dependencies, and package size
You can also explore by putting a cron job through Cloud Watch after every specific interval to call your API through PING
Adding to Yash's answer:
I've only seen Lambda execution failed with status 200 in API Gateway execution logs, though in case it can manifest in other ways: ensure you have execution logging enabled for the endpoint. If you didn't already have it enabled you'll need to wait for the problem to manifest again.
You can verify it's a cold start problem as follows:
In the log entry with the error grab the #logStream value and the timestamp for the event; it'll be a long string of alphanumerics like a4f8115980dc83a511eeedc493a78741
Open the log group for that endpoint's execution log -> find the log stream with the identifier you just grabbed
Narrow the date/time range to a window around the time where the event occurred
If you chose a narrow window and if it's a cold start problem: I would expect the offending request to be the first one in the list. Click the There are older events to load. Load more. at the top of the list.
You should now see a gap of time between the last request received and the offending request.
In my case the error says connection reset by peer which leads me to think it's behaving as though a virtual machine were put to sleep then awoken in the sense that it believes TCP connections it previously had open are still valid.
In the short term the solution we're going with is to implement a retry strategy.
Besides the cold-start problem, there's another potential aspect of this problem: your API Gateway access log format.
Do the following:
Find the access log entries that correspond to the offending request in the execution log.
Is the HTTP status == 502?
502s in the API Gateway access log usually (always?) indicate the Lambda responded with malformed JSON.
The most obvious reason for it returning malformed JSON is a bug in your code. One of the less obvious reasons: a mistake in the access log format.
If you suspect that's the case, look for the following:
Quoted fields that shouldn't be; eg $context.error.messageString
Un-quoted fields that should be. A common idiom is to leave numeric fields un-quoted because it makes insights queries like this work: | filter #status >= 500. As convenient as that is, if the field isn't guaranteed to produce a numeric result then the JSON response will be malformed.
Trailing commas in {} bodies
Here's the documentation for many of the the context variables, though one thing to keep in mind: the context variables that are available differ between the different API Gateway endpoint types (lambda, websocket, etc).