So my problem is that DynamoDB is taking quite some time to return single object. I'm using node.js and AWS docclient. The weird thing is that it takes from 100ms to 200ms to "select" single item from DB.
Is there anyway to make it faster?
Exampel code:
var AWS = require("aws-sdk");
var docClient = new AWS.DynamoDB.DocumentClient();
console.time("user get");
var params = {
TableName : 'User',
Key: {
"id": "2f34rf23-4523452-345234"
}
};
docClient.get(params, function(err, data) {
if (err) {
callback(err);
}
else {
console.timeEnd("user get");
}
});
And average for this simple piece of code in lambda is 130ms. Any idea what could I do to make it faster? User table has only Primary partition key "id" and global secondary index with primary key email. When I try this from my console it takes even more time.
Any help will be much appreciated!
I faced exactly the same issue using Lambda#Edge. Responses from DynamoDB took 130-140ms on average while the DynamoDB latency graph shown 10-20ms latency.
I managed to improve response times to ~30ms on average by disabling ssl, parameter validations, and convertResponseTypes:
const docClient = new AWS.DynamoDB.DocumentClient({
apiVersion: '2012-08-10',
sslEnabled: false,
paramValidation: false,
convertResponseTypes: false
});
Most likely the cause of the issue was CPU/Network throttling in the lambda itself. Lambda#Edge for viewer request can have maximum 128MB which is a pretty slow lambda. So disabling extra-checks and SSL validation made things lots faster.
If you are running just a regular Lambda, increasing memory should fix the issue.
Have you warmed up your Lambda function? If you are only running it ad-hoc, and not running a continuous load, the function might not be available yet on the container running it, so additional time might be taken there. One way to support or refute this theory would be to look at latency metrics for the GetItem API. Finally, you could try using AWS X-Ray to find other spots of latency in your stack.
The DynamoDB SDK could also be retrying, adding to your perceived latency in the Lambda function. Given that your items are around 10 KB, it is possible you are getting throttled. Have you provisioned enough read capacity? You can verify both your read latency and read throttling metrics in the DynamoDB console for your table.
I know this is a little old, but for anyone finding this question now: the instantiation of the client can be extremely slow. This was despite fast local testing, yet accessing Dynamo DB from the same region and Elastic Beanstalk instance was extremely slow!
Accessing Dynamo from a single client instance improved the speeds significantly.
Reusing the connection helped speed up my calls from ~120ms to ~35ms.
Reusing Connections with Keep-Alive in Node.js
By default, the default Node.js HTTP/HTTPS agent creates a new TCP connection for every new request. To avoid the cost of establishing a new connection, you can reuse an existing connection.
For short-lived operations, such as DynamoDB queries, the latency overhead of setting up a TCP connection might be greater than the operation itself. Additionally, since DynamoDB encryption at rest is integrated with AWS KMS, you may experience latencies from the database having to re-establish new AWS KMS cache entries for each operation.
Related
I am attempting to introduce DAX to our architecture but so far with no success. Connection to dax happenns through lambdas and the setup done is like the examples in AWS documentation. Lambda and Dax are in the same vpc, they can see each other most of the time and dax is returning responses. Dax also has 8111 port open.
However, after running our regression tests a few times there are errors that starts popping out in cloudwatch. The most frequent ones are:
"Failed to pull from [daxurlhere] (10.0.1.177,10.0.1.25,10.0.2.11):
TimeoutError: Connection timeout after 10000ms"
Error: NoRouteException: not able to resolve address:
[{"host":"[daxurlhere]","port":8111}]
ERROR caught exception during cluster refresh: DaxClientError:
NoRouteException: not able to resolve address:[{"host":"[daxurlhere]","port":8111}]
ERROR Failed to resolve [daxurl]: Error: queryA ECONNREFUSED [daxurl]
When those errors happen they are breaking a few of our regression tests. Funny thing is that they are not persistent and it is very hard to track the issue.
Any suggestions would be more than welcome!
Seems your configuration is fine. Check the below steps:
1. Make sure you are not strongly consistently reading
From the AWS doc:
DAX can't serve strongly consistent reads by itself because it's not tightly coupled to DynamoDB. For this reason, any subsequent reads from DAX would have to be eventually consistent reads
see this code results strongly consistent read and make the connection unstable
const parameters = {
TableName: 'Travels',
ConsistentRead: false,
ExpressionAttributeNames: {
'#createdAt': 'createdAt',
},
ExpressionAttributeValues: {
':createdAt': Date.now(), -----> Look at this
},
KeyConditionExpression: '#createdAt >= :createdAt',
};
const endpoint = DAX_CLUSTER_ENDPOINT;
const daxService = new AmazonDaxClient({ endpoints: [endpoint], region });
const daxClient = new AWS.DynamoDB.DocumentClient({ service: daxService });
response = await daxClient.query(parameters).promise();
Date.now() wouldn't generate same value everytime. If a request does not exactly match a previous request, it won't be a cache hit. check the parameters on your large requests like limit, projection expression, exclusive start key;
2. Check the Clusters Monitor - Cloudwatch query/scan cache hit,the clusters cacheing the data.
3. Other helpful links:
https://forums.aws.amazon.com/thread.jspa?messageID=896762
AWS DAX cluster has zero cache hits and cache miss
Be aware the although the DAX distributes reads among the nodes in the clusters for reads, all the writes happen though the master node. We have seen cascading failover of nodes during write intensive periods. The master node gets overwhelmed, reboots, and another node now becomes master, reboots, etc.
Considering this code:
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#1 = :v1")
.withNameMap(new NameMap().with("#1", "tableKey"))
.withValueMap(new ValueMap().withString(":v1", "none.json"));
//connect DynamoDB instance over AWS
DynamoDB dynamoDB = new DynamoDB(Regions.US_WEST_2);
//get the table instance
String tableName = "WFMHistoricalProcessedFiles";
Table table = dynamoDB.getTable(tableName);
ItemCollection<QueryOutcome> items = table.query(spec);
//getting over the results
Iterator<Item> it = items.iterator();
Item item = null;
while (it.hasNext()) {
item = it.next();
System.out.println(item.toJSONPretty());
}
While using DynamoDB to make any Query or a Scan like in the example above.
Is there an actual need to call shutdown() in order to close the connection?
The documentation seems pretty clear.
shutdown
void shutdown()
Shuts down this client object, releasing any resources that might be held open. This is an optional method, and callers are not expected to call it, but can if they want to explicitly release any open resources. Once a client has been shutdown, it should not be used to make any more requests.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#shutdown--
But to clarify what you specifically asked about:
in order to close the connection?
There is not exactly a "connection" to DynamoDB. It's accessed over HTTPS, statelessly, when requests are sent... so where your code says // connect DynamoDB instance over AWS, that really isn't accurate. You're constructing an object that will not acually connect until around the time you call table.query().
That connection might later be kept-alive for a short time for reuse, but even if true, it isn't "connected to" DynamoDB in any meaningful sense. At best, it's connected to a front-end system inside AWS that is watching for the next request and will potentially forward that request to DynamoDB if it's syntactically valid and authorized.
But this idle connection, if it exists, isn't consuming any DynamoDB resources in a way that should degrade performance of your application or others accessing the same DynamoDB table.
Good practice, of course, suggests that if you have the option of cleaning something up, it's potentially a good idea to do so, but it seems clearly optional.
As I am aware of the limitations listed here, I need to receive some clarifications on the quota limit.
I'm using the Node.js library to make a simple asynchronous speech-to-text API, using a .raw file which is stored in my bucket.
After the request is done, when checking the API Manager Traffic, the daily requests counter is increased by 50 to 100 requests.
I am not using any kind of requests libraries or other frameworks. Just the code from the gCloud docs.
var file = URL.bucket + "audio.raw"; //require from upload.
speech.startRecognition(file, config).then((data) => {
var operation = data[0];
operation.on('complete', function(transcript) {
console.log(transcript);
});
})
I believe this has to do with the operation.on call, which registers a listener for the operation to complete, but continues to poll the service until it does finally finish.
I think, based on looking at some code, that you can change the interval at which the service will poll using the longrunning.initialRetryDelayMillis setting, which should reduce the number of requests you see in your quota consumption.
Some places to look:
Speech client constructor: https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/speech/src/index.js#L70
GAX Speech Client's constructor: https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/speech/src/v1/speech_client.js#L67
Operations client: https://github.com/googleapis/gax-nodejs/blob/master/lib/operations_client.js#L74
GAX's Operation: https://github.com/googleapis/gax-nodejs/blob/master/lib/longrunning.js#L304
I have some issue with API gateway. I made a few API methods, sometimes they work longer than 10 seconds and Amazon returns 504 error. Here is screenshot below:
Please help! How can I increase timeout?
Thanks!
Right now the default limit for Lambda invocation or HTTP integration is 30s according to http://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html and this limit is not configurable.
As of Dec/2017, the maximum value is still 29 seconds, but should be able to customize the timeout value.
https://aws.amazon.com/about-aws/whats-new/2017/11/customize-integration-timeouts-in-amazon-api-gateway/
This can be set in "Integration Request" of each method in APIGateway.
Finally in 2022 we have a workaround. Unfortunately AWS did not change the API Gateway so that's still 29 seconds but, you can use a built-in HTTPS endpoint in the lambda itself: Built-in HTTPS Endpoints for Single-Function Microservices
which is confirmed to have no timeout-so essentially you can have the full 15 minute window of lambda timeout: https://twitter.com/alex_casalboni/status/1511973229740666883
For example this is how you define a function with the http endpoint using aws-cdk and typescript:
const backendApi = new lambda.Function(this, 'backend-api', {
memorySize: 512,
timeout: cdk.Duration.seconds(40),
runtime: lambda.Runtime.NODEJS_16_X,
architecture: Architecture.ARM_64,
handler: 'lambda.handler',
code: lambda.Code.fromAsset(path.join(__dirname, '../dist')),
environment: {
...parsedDotenv
}
})
backendApi.addFunctionUrl({
authType: lambda.FunctionUrlAuthType.NONE,
cors: {
// Allow this to be called from websites on https://example.com.
// Can also be ['*'] to allow all domain.
allowedOrigins: ['*']
}
})
You can't increase the timeout, at least not now. Your endpoints must complete in 10 seconds or less. You need to work on improving the speed of your endpoints.
http://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html
Lambda functions will timeout after a max. of 5 min; API Gateway requests will timeout after 29 sec. You can't change that, but you can workaround it with asynchronous execution pattern, I wrote I blog post about:
https://joarleymoraes.com/serverless-long-running-http-requests/
I wanted to comment on "joarleymoraes" post but don't have enough reputation. The only thing to add to that is that you don't HAVE to refactor to use async, it just depends on your backend and how you can split it up + your client side retries.
If you aren't seeing a high percentage of 504's and you aren't ready for async processing, you can implement client side retries with exponential backoff on them so they aren't permanent failures.
The AWS SDK automatically implements retries with backoff, so it can help to make it easier, especially since Lambda Layers will allow you to maintain the SDK for your functions without having to constantly update your deployment packages.
Once you do that it will result in less visibility into those timeouts, since they are no longer permanent failures. This can buy you some time to deal with the core problem, which is that you are seeing 504's in the first place. That certainly can mean refactoring your code to be more response, splitting up large functions into more "micro service" type concepts and reducing external network calls.
The other benefit to retries is that if you retry all 5xx responses from an application, it can cover a lot of different issues which you might see during normal execution. It is generally considered in all applications that these issues are never 100% avoidable so it's best practice to go ahead and plan for the worst!
All of that being said, you should still work on reducing the lambda execution time or going async. This will allow you to set your timeout values to a much smaller number, which allows you to fail faster. This helps a lot for reducing the impact on the front end, since it doesn't have to wait 29 seconds to retry a failed request.
Timeouts can be decreased but cannot be increased more than 29 seconds. The backend on your method should return a response before 29 seconds else API gateway will throw 504 timeout error.
Alternatively, as suggested in some answers above, you can change the backend to send status code 202 (Accepted) meaning the request has been received successfully and the backend then continues further processing. Of course, we need to consider the use case and it's requirements before implementing the workaround
Lambda functions have 15 mins of max execution time, but since APIGateway has strict 29 second timeout policy, you can do following things to over come this.
For an immediate fix, try increasing your lambda function size. Eg.: If your lambda function has 128 MB memory, you can increase it to 256 MB. More memory helps function to execute faster.
OR
You can use lambdaInvoke() function which is part of the "aws-sdk". With lambdaInvoke() instead of going through APIGateway you can directly call that function. But this is useful on server side only.
OR
The best method to tackle this is -> Make request to APIGateway -> Inside the function push the received data into an SQS Queue -> Immediately return the response -> Have a lambda function ready which triggers when data available in this SQS Queue -> Inside this triggered function do your actual time complex executions -> Save the data to a data store -> If call is comes from client side(browser/mobile app) then implement long-polling to get the final processed result from the same data store.
Now since api is immediately returning the response after pushing data to SQS, your main function execution time will be much less now, and will resolve the APIGateway timeout issue.
There are other methods like using WebSockets, Writing event driven code etc. But above methods are much simpler to implement and manage.
While you cannot increase the timeout, you can link lambda's together if the work is something that could be split up.
Using the aws sdk:
var aws = require('aws-sdk');
var lambda = new aws.Lambda({
region: 'us-west-2' //change to your region
});
lambda.invoke({
FunctionName: 'name_of_your_lambda_function',
Payload: JSON.stringify(event, null, 2) // pass params
}, function(error, data) {
if (error) {
context.done('error', error);
}
if(data.Payload){
context.succeed(data.Payload)
}
});
Source: Can an AWS Lambda function call another
AWS Documentation: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html
As of May 21, 2021 This is still the same. The hard limit for the maximum time is 30 seconds. Below is the official document on quotas for API gateway.
https://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html#http-api-quotas
The timeout limits cannot be increased so a response should be returned within 30 seconds. The workaround I usually do :
Send the result in an async way. The lambda function should trigger
another process and sends a response to the client saying
Successfully started process X and the other process should notify
the client in async way once it finishes (Hit an endpoint, Send a
slack notification or an email..). You can found a lot of interesting resources concerning this topic
Utilize the full potential of the multiprocessing in your lambda
function and increase the memory for a faster computing time
Eventually, if you need to return the result in a sync way and one
lambda function cannot do the job, you could integrate API gateway
directly with step function so you would have multiple lambda
function working in parallel. It may seem complicated but in fact it
is quite simple
Custom timeout between 50 and 29,000 milliseconds for WebSocket APIs and between 50 and 30,000 milliseconds for HTTP APIs. The default timeout is 29 seconds for WebSocket APIs and 30 seconds for HTTP APIs
I have a web application running 24/7 in a AWS micro instance and it works just fine.
Occasionally (10 to 50 times a day) I need process big amounts of data (stored on RDS) in a CPU intensive task. That's overkill for my micro instance.
Starting a EC2 server for this tasks doesn't seem a good idea because these tasks must be executed on demand when a user asks for them, and I need low latency (less than 10 seconds).
Is there any amazon service where I can submit my task and take advantage of higher CPU capacity?
Keep in mind that my task needs to read a big amount of data from RDS.
You can queue up all those task 10 to 50 over a particular cut-off time during a day and launch an instance to process that and terminate the same when you are done with the processing. The scheduling part of that can be done by the Micro instance.
Once the Micro instance starts the High Compute Instance; then rest of that can be carried out by High Compute Instance; once the queue for all those to be processed are empty you can terminate that instance.
This is like 0 Instance to 1 Instance over a schedule.
It depends on the cost that you are willing to pay, the complexity of the processing, the future scale of your service, the ability to pre-calculate your results and so on.
One option is to have a larger instance (or pool of instances in case of scale of your service), ready for processing that you can trigger them. You can lower the cost of this machine by using reserve instances pricing (http://aws.amazon.com/ec2/purchasing-options/reserved-instances/), or even better, using Spot instances (http://aws.amazon.com/ec2/purchasing-options/spot-instances/). With spot you are facing the risk that some of the times, you won't be able to have the instances up and running, and you need to back it with on-demand instance(s).
This is probably a more expensive solution, but as you have more and more jobs like that running, the "cost per job" is decreasing dramatically.
Another option is to off-load the processing to a different service. If you can run your calculation with a query syntax of external services like DynamoDB or Redis, for example, you can keep on using your micro instance to trigger the query. For example, Redis with ElastiCache can have complex data manipulations like sorted set intersection etc. You do need to make sure that you have your data also in the other data store, and write the query.
Another option is to have these calculations pre-calculated. It really depends on the type of jobs you need to run. If you can prepare these output in advanced and only update their results with the latest data from the time it was calculated to the time it is requested, it might also be easier on your machines to prepare for it without these unpredictable CPU peaks.
We used AWS Batch for this, basically because it heads above other options out there. Besides AWS Batch, we considered additional servers, AWS Lambda, using a separate server for each task and so on.
But we finally chose AWS Batch because of the number of reasons:
In AWS Batch, all processes are completely isolated. This means, that the tasks will not impact each other & break the workflow.
You can set the min. and max. RAM you want to use up
AWS Batch supports containers which makes it easier to integrate if you use containers already
You can also create queues for different tasks to gain more control over your resources and expenses.
AWS Batch is very price-effective, meaning you pay for what you use.
It's also pretty easy to set up. Here's a quick code snippet on how to launch rockets for each individual user. For more info go here: https://fulcrum.rocks/blog/cpu-intensive-tasks
`const comand = "npm run rocket"
const newJob = await new Promise((resolve, reject) => {
batch.submitJob(
{
jobName: "your_important_job",
jobDefinition: "killer_process",
jobQueue: "night_users",
timeout: {
attemptDurationSeconds: 600
},
retryStrategy: {
attempts: 1
},
containerOverrides: {
vcpus: 2,
memory: 2048,
command: [comand]
}
},
(err, data) => {
if (err) {
console.error(err.message);
reject(err);
}
resolve(data);
}
);
});
`