I'm building the Alexa skill that sends the request to my web server,
then web server will do some process and upload a file to Amazon S3.
During the period of web server process, I make skill keep getting the file from Amazon S3 per 10 seconds till get the file. And the response is based on the file content.
But unfortunately, the web server process takes more than 1 minute. That means skill must stay more than 1 minute to get the file to response.
For now, I used progressive response with async await in my code,
and skill did keep waiting for the file on S3.
But I found that the skill will send the second request to Lambda after 50 seconds automatically. That means for the same skill, i got the two lambda function running at the same time.
And the execution result is : After the first response that progressive response made, 50 seconds later will hear another response that also made by the progressive response which belongs to the second request.
And nothing happened till the end.
I know it is bad to let skill waits this long, but i still want to figure out the executable way if skill needs to wait this long.
There are some points I want to figure out.
Is there anyway to prevent the skill to send the second
requests to Lambda?
Is there another way I can try to accomplish the goal?
Thanks
Eventually, I found that the second invoke of Lambda is not from Alexa, is from AWS Lambda itself. Refer to the following artical
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
So you have to deal with this kind of situation in your Lambda code. One thing can be used is these two times invoke's request id is the same. So you can tell if this is the first time execution by checking your storage for the same request id which you store at the first time execution.
Besides, I also found that once the Alexa Skill waits for more than 1 minutes, it will crash and return the error by speaking (test by Amazon Echo). And there is nothing different in the AWS Lambda log compare to the normal execution one. That meaning the Log seems to be fine but actually the execution result is not.
Hope this can help someone is also struggled at this problem.
Related
If I understood it well, Google Cloud Run will make an API publicly available. Once a request is received, an instance is started and the job is processed. Once the job done, the instance will be terminated. Is this right?
If so, I presume that Google determine when the instance should be shutdown when the HTTP response in sent back to the client. Is that also right?
In my case the process will run from 10 to 20 Minutes. Can I still send the HTTP response after so much time? Any Advice on how to implement that?
Frankly, all of this is well documented in the cloud run docs:
Somewhat, but this depends on how you configured your scaling https://cloud.google.com/run/docs/about-instance-autoscaling
Also see above, but a request is considered "done" when the HTTP connection is closed (either by you or the client), yes
60 mins is the limit, see:
https://cloud.google.com/run/docs/configuring/request-timeout
Any Advice on how to implement that?
You just keep the connection open for 20mins, but do note the remark on long living connections in the link above.
I have a simple problem. I want to create some kind of pipeline. First we have a function that does a request to some external resource (that answer takes a long time - sometimes is more than 60 seconds!). So first function just does a request and go down. And second function handles response of this function.
I currently try to run one lambda which do request, and second which handle response of this request.
I found a repo that has implementation of lambda messaging which is close to my idea https://github.com/pj0616/serverless-async-lambda-api. In this approach we dont wait for the results.
I was trying to run code from here too: https://docs.aws.amazon.com/lambda/latest/dg/nodejs-handler.html, but it doesnt wait asynchronously for response.
Is it possible to create this flow? Maybe with lambda or step functions?
Below I put a diagram that shows what I'm trying to do.
I want to change the default failure message in Alexa, Sorry, I'm having trouble accessing your {} skill right now.
You cannot change that prompt but you can code to avoid that as much as possible. The error happens when Alexa is not able to get a valid response from your skill endpoint. There can be multiple reasons to that as mentioned here
1. Your endpoint is giving an invalid response
This can be due to the errors/exceptions happening in your endpoint code. You can make sure that error/exceptions don't occur and if they occur, thre is code to catch them and provide a valid response back to Alexa, with an error message of your choice.
2. Your endpoint availability
Make sure that your endpoints are available all the time if you have configured them as an endpoint. This is pretty much guaranteed if you are using Lambda endpoints. But if you are your own hosted web service endpoint, then you must put in all the measures to keep it available for Alexa to communicate with it.
3. Your endpoint response time
Make sure that your endpoint gives back the response within the time period that Alexa expects it to get(guess its 10 seconds). Also make sure if you are using Lambda functions, you have configured them with reasonable execution time to avoid timeout errors.
If you cover the exception/error/availability scenarios well then you can avoid the default error message as much as possible.
We've got a little java scheduler running on AWS ECS. It's doing what cron used to do on our old monolith. it fires up (fargate) tasks in docker containers. We've got a task that runs every hour and it's quite important to us. I want to know if it crashes or fails to run for any reason (eg the java scheduler fails, or someone turns the task off).
I'm looking for a service that will alert me if it's not notified. I want to call the notification system every time the script runs successfully. Then if the alert system doesn't get the "OK" notification as expected, it shoots off an alert.
I figure this kind of service must exist, and I don't want to re-invent the wheel trying to build it myself. I guess my question is, what's it called? And where can I go to get that kind of thing? (we're using AWS obviously and we've got a pagerDuty account).
We use this approach for these types of problems. First, the task has to write a timestamp to a file in S3 or EFS. This file is the external evidence that the task ran to completion. Then you need an http based service that will read that file and calculate if the time stamp is valid ie has been updated in the last hour. This could be a simple php or nodejs script. This process is exposed to the public web eg https://example.com/heartbeat.php. This script returns a http response code of 200 if the timestamp file is present and valid, or a 500 if not. Then we use StatusCake to monitor the url, and notify us via its Pager Duty integration if there is an incident. We usually include a message in the response so a human can see the nature of the error.
This may seem tedious, but it is foolproof. Any failure anywhere along the line will be immediately notified. StatusCake has a great free service level. This approach can be used to monitor any critical task in same way. We've learned the hard way that critical cron type tasks and processes can fail for any number of reasons, and you want to know before it becomes customer critical. 24x7x365 monitoring of these types of tasks is necessary, and helps us sleep better at night.
Note: We always have a daily system test event that triggers a Pager Duty notification at 9am each day. For the truly paranoid, this assures that pager duty itself has not failed in some way eg misconfiguratiion etc. Our support team knows if they don't get a test alert each day, there is a problem in the notification system itself. The tech on duty has to awknowlege the incident as per SOP. If they do not awknowlege, then it escalates to the next tier, and we know we have to have a talk about response times. It keeps people on their toes. This is the final piece to insure you have robust monitoring infrastructure.
OpsGene has a heartbeat service which is basically a watch dog timer. You can configure it to call you if you don't ping them in x number of minutes.
Unfortunately I would not recommend them. I have been using them for 4 years and they have changed their account system twice and left my paid account orphaned silently. I have to find a new vendor as soon as I have some free time.
Since Api Gateway time limit is 10 seconds to execute any request I'm trying to deal with timeout errors, but a haven't found a way to catch and respond a custom message.
Context of the problem: I have a function that takes less than 2 seconds to execute, but when the function performs a cold start sometimes it takes more than 10 seconds creating a connection with DynamoDB in Java. I've already optimize my function using threads but I still cannot keep between the 10-seconds limit for the initial call.
I need to find a way to deliver a response model like this:
{
"error": "timeout"
}
To find a solution I created a function in Lambda that intentionally responds something after 10 seconds of execution. Doing the integration with Api Gateway I'm getting this response:
Request: /example/lazy
Status:
Latency: ms
Response Body
{
"logref": "********-****-****-****-1d49e75b73de",
"message": "Timeout waiting for endpoint response"
}
In documentation I found that you can catch this errors using HTTP status regex in Integration Response. But I haven't find a way to do so, and it seems that nobody on the Internet is having my same problem, as I haven't find this specific message in any forum.
I have tried with these regex:
.*"message".*
Timeout.*
.*"status":400.*
.*"status":404.*
.*"status":504.*
.*"status":500.*
Anybody knows witch regex I should use to capture this "message": "Timeout... ?
You are using Test Invoke feature from console which has a timeout limit of 10 seconds. But, the deployed API's timeout is 30 seconds as mentioned here. So, that should be good enough to handle Lambda cold start case. Please deploy and then test using the api link. If that times out because your endpoint takes more than 30 seconds, the response would be:
{"message": "Endpoint request timed out"}
To clarify, you can configure your method response based on the HTTP status code of integration response. But in case of timeout, there is no integration response. So, you cannot use that feature to configure the method response during timeout.
You can improve the cold start time by allocating more memory to your Lambda function. With the default 512MB, I am seeing cold start times of 8-9 seconds for functions written in Java. This improves to 2-3 seconds with 1536MB of memory.
Amazon says that it is the CPU allocation that is really important, but there is not way to directly increase it. CPU allocation increases proportionately to memory.
And if you want close to zero cold start times, keeping the function warm is the way to go, as described here.