Debugging "read time out" for AWS lambda function in Alexa Skill - amazon-web-services

I am using an AWS lambda function to serve my NodeJS codebase for an Alexa Skill.
The skill makes external API calls to a custom API as well as the Amazon GameOn API, it also uses URL's which serve audio files and images from an S3 Bucket.
The issue I am having is intermittent, and is affecting about 20% of users. At random points of the skill, the user request will produce an invalid response from the skill, with the following error:
{
"Request": {
"type": "System.ExceptionEncountered",
"requestId": "amzn1.echo-api.request.ab35c3f1-b8e6-4478-945c-16f644359556",
"timestamp": "2020-05-16T19:54:24Z",
"locale": "en-US",
"error": {
"type": "INVALID_RESPONSE",
"message": "Read timed out for requestId amzn1.echo-api.request.323b1fbb-b4e8-4cdf-8f31-30c9b67e4a5d"
},
"cause": {
"requestId": "amzn1.echo-api.request.323b1fbb-b4e8-4cdf-8f31-30c9b67e4a5d"
}
},
I have looked up this issue, I believe it's something wrong with the lambda function configuration but can't figure out where!
I've tried increasing the Memory the function uses (now 256MB).
It should be noted that the function timeout is 8000ms, since this is the max time you are allowed for an Alexa response.
What causes this Read timeout issue, and what measures can I take to debug and resolve it?

Take a look at AWS XRay. By using this with your Lambda you should be able to identify the source of these timeouts.
This link should help you understand how to apply it.

We found that this was occurring when the skill was trying to access a resource which was stored on our Azure website.
The CPU and Memory allocation for the azure site was too low, and it would fail when facing a large amount of requests.
To fix, we improved the plan the app service was running on.

Related

Agora cloud recording with Google cloud storage

I want to record my voice call with agora cloud recording. I'm using a Postman collection provided by Agora, and I don't change anything except StorageConfig. I'm successful in getting resourceID and sid, but when I stop recording, I receive the error "Failed to find worker." Based on an agora document, they said for Google Clound the region parameter has no effect, whether it is set or not, so I'm setting it to 0. There's a list of solutions I've tried but have had no success with:
Change region
Both users have different UIDs, and the recording UID is different from them.
Access key and secret key is correct
I don't know what I did wrong. Please help me. Thanks all.
{
"resourceId": "nUwUbQf9Zg6tsgtLslGnDg0lk8RYaUE09pqOuSIgwfwi6-n9kITolzw3vvIFHMfm2VZsOrLd9fk9kMzos8Y_D-2Z2fFtUu_1BD2_pKJEZ-jTgXPe--K6Ua7TpSNY0pLd4zzyZV6iXCndqZvHmfsZloox0y-UZgs-r2_zBR2Gor05YCP0HuusWF8Kv1StAYabr1HJykw7RorDYnUIzzry6p6LRfvlq2zJVyVxvzVRVmoeMPYX-cVKyhNDuI2ct9a9aPdi8jCwDUzRbYimVVAnJBRYppTH012Xt6DnnMBkskJsbK0-CK3IaipQA9Gu2RmIJxSowuZbHspwA2lpwpzre-aNG6NlXk95hZgthOfNUVE",
"sid": "edd35b65ec496e43aa502cad99bbdb27",
"code": 404,
"serverResponse": {
"command": "StopCloudRecorder",
"payload": {
"message": "Failed to find worker."
},
"subscribeModeBitmask": 1,
"vid": "1020399"
}
}

GCP Snapshot create API failing through C++ code, but not through postman

I've been trying to make API calls to create snapshot of GCP disk through my code but it keeps giving me this error:
Failed to check snapshot status before creating GCPDisk swift-snapshot-gkeos-dkdwl-dynamic-pvc-1b13097f-7-1630579922. HTTP Error: GET request to the remote host failed [HTTP-Code: 404]: {
"error": {
"code": 404,
"message": "The resource 'projects/rw-migration-dev/global/snapshots/swift-snapshot-gkeos-dkdwl-dynamic-pvc-1b13097f-7-1630579922' was not found",
"errors": [
{
"message": "The resource 'projects/rw-migration-dev/global/snapshots/swift-snapshot-gkeos-dkdwl-dynamic-pvc-1b13097f-7-1630579922' was not found",
"domain": "global",
"reason": "notFound"
}
]
}
}
My program worked fine for a considerable amount of time, but now sometimes it gives errors.
I tried passing the same query through postman and it works fine. Some times it works fine through .
The main problem is with the snapshot creating API,
https://compute.googleapis.com/compute/v1/projects/{projectName}/zones/{diskLocation}/disks/{diskName}/createSnapshot
This URL works fine on postman, after creation you can see the snapshot when you list them, but through code once this API is called it returns an OK 200, but no snapshot is created
Can someone tell me why this is happening?
I think you are trying to use this operation:
https://cloud.google.com/compute/docs/reference/rest/v1/disks/createSnapshot
Which is a long running operation, a 200 response indicates that the snapshot operation started, but it does not indicate that it has finished.
The documentation points to:
https://cloud.google.com/compute/docs/api/how-tos/api-requests-responses#handling_api_responses
You may need to poll the operation until it completes before trying to use the snapshot.

GCP stackdriver fo OnPrem

Based on Stackdriver, I want to send notifications to my Centreon monitoring (behind Nagios) for workflow reasons, do you have any idea on how to do so?
Thank you
Stackdriver alerting allows webhook notifications, so you can run a server to forward the notifications anywhere you need to (including Centreon), and point the Stackdriver alerting notification channel to that server.
There are two ways to send external information in the Centreon queue without a traditional passive agent mode.
First, you can use the Centreon DSM (Dynamic Services Management) addon.
It is interesting because you don't have to register a dedicated and already known service in your configuration to match the notification.
With Centreon DSM, Centreon can receive events such as SNMP traps resulting from the detection of a problem and assign the event dynamically to a slot defined in Centreon, like a tray event.
A resource has a set number of “slots” on which alerts will be assigned (stored). While this event has not been taken into account by human action, it will remain visible in the Centreon web frontend. When the event is acknowledged, the slot becomes available for new events.
The event must be transmitted to the server via an SNMP Trap.
All the configuration is made through Centreon web interface after the module installation.
Complete explanations, screenshots, and tips are described on the online documentation: https://documentation.centreon.com/docs/centreon-dsm/en/latest/user.html
Secondly, Centreon developers added a Centreon REST API you can use to submit information to the monitoring engine.
This feature is easier to use than the SNMP Trap way.
In that case, you have to create both host/service objects before any API utilization.
To send status, please use the following URL using POST method:
api.domain.tld/centreon/api/index.php?action=submit&object=centreon_submit_results
Header
key value
Content-Type application/json
centreon-auth-token the value of authToken you got on the authentication response
Example of service body submit: The body is a JSON with the parameters provided above formatted as below:
{
"results": [
{
"updatetime": "1528884076",
"host": "Centreon-Central"
"service": "Memory",
"status": "2"
"output": "The service is in CRITICAL state"
"perfdata": "perf=20"
},
{
"updatetime": "1528884076",
"host": "Centreon-Central"
"service": "fake-service",
"status": "1"
"output": "The service is in WARNING state"
"perfdata": "perf=10"
}
]
}
Example of body response: :: The response body is a JSON with the HTTP return code, and a message for each submit:
{
"results": [
{
"code": 202,
"message": "The status send to the engine"
},
{
"code": 404,
"message": "The service is not present."
}
]
}
More information is available in the online documentation: https://documentation.centreon.com/docs/centreon/en/19.04/api/api_rest/index.html
Centreon REST API also allows to get real-time status for hosts, services and do the object configuration.

Cognito User Migration Trigger - Exception during user migration - Exception Location

We're using a lambda function to respond to the 'User Migration' trigger in AWS Cognito. When something like a syntax error occurs, you can see it in cloud watch logs. However, "Exception during user migration" errors seen on the login page are no where to be found in the cloud watch logs.
Where are we supposed to look for these? I can't find any anything in the documentation and assumed it would have gone to cloud watch.
I can't test it in the lambda interface because one of the parameters being passed into the lambda function will have a function nested within the object and I can't create a test JSON setup that has that. There's also no test trigger for user migration that is pre-built.
Any ideas as to why I can't see this in cloud watch or where the exceptions would be shown would be greatly appreciated.
Unfortunately Cogntio doesn't expose any logs (or metrics, for that matter!).
The closest you can get is to view the lambda's logs in CloudWatch. If you log your response, and watch your lambda's error metric then you should mostly be able to debug issues internal to the lambda.
This does leave a few edge cases:
You won't see anything if the lambda can't be invoked (this would only happen under heavy concurrent loads either on that single lambda, or on all lambdas across your account)
If you return a bad response the lambda will succeed but the trigger action will fail and Cognito will give you a fairly generic message. At this point you're at the mercy of AWS' documentation to work out what's wrong (which can be a bit hit and miss- although StackOverflow always helps!).
You can find an example payload for the lambda in the trigger documentation:
{
"userName": "THE USERNAME",
"request": {
"password": "THE PASSWORD"
},
"response": {
// it is your responsibility to fill this bit in and return the completed object back:
"userAttributes": {
"string": "string",
...
},
"finalUserStatus": "string",
"messageAction": "string",
"desiredDeliveryMediums": [ "string", ... ],
"forceAliasCreation": boolean
}
}
n.b. As an aside, which you might know, but Lambda payloads always have to be in JSON, which does not store functions. So you should always be able to derive a test payload to use in the console.

Lambda function not working upon Alexa Skill invocation

I've just created my first (custom) still. I've set the function up in Lambda by uploading a zip file containing my index.js and all the necessary code required, including node_modules and the base Alexa skill that mine is a child of (as per the tutorials). I made sure I zipped up the files and sub-folders, not the folder itself (as I can see this is a common cause of similar errors) but when I create the skill and test in the web harness with a sample utterance I get:
remote endpoint could not be called, or the response it returned was
invalid.
I'm not sure how to debug this as there's nothing logged in CloudWatch.
I can see in the Lambda request that my slot value is translated/parsed successfully and the intentname is correct.
In AWS Lambda I can invoke the function successfully both with a LaunchRequest and another named intent. From the developer console though, I get nothing.
I've tried copying the JSON from the lambda test (that works) to the developer portal and I get the same error. Here is a sample of the JSON I'm putting in the dev portal (that works in Lambda)
{
"session": {
"new": true,
"sessionId": "session1234",
"attributes": {},
"user": {
"userId": null
},
"application": {
"applicationId": "amzn1.echo-sdk-ams.app.149e75a3-9a64-4224-8bcq-30666e8fd464"
}
},
"version": "1.0",
"request": {
"type": "LaunchRequest",
"requestId": "request5678"
}
}
The first step in pursuing this problem is probably to test your lambda separate from your skill configuration.
When looking at your lambda function in the AWS console, note the 'test' button at the top, and next to it there is a drop down with an option to configure a test event. If you select that option you will find that there are preset test events for Alexa. Choose 'alexa start session' and then choose 'save and test' button.
This will give you more detailed feedback about the execution of your lambda.
If your lambda works fine here then the problem probably lies in your skill configuration, so I would go back through whatever tutorial and documentation you were using to configuration your skill and make sure you did it right.
When you write that the lambda request looks fine I assume you are talking about the service simulator, so that's a good start, but there could still be a problem on the configuration tab.
We built a tool for local skill development and testing.
BST Tools
Requests and responses from Alexa will be sent directly to your local server, so that you can quickly code and debug without having to do any deployments. I have found this to be very useful for our own development.
Let me know if you have any questions.
It's open source: https://github.com/bespoken/bst