DynamoDB - is there a need to call shutdown()? - amazon-web-services

Considering this code:
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#1 = :v1")
.withNameMap(new NameMap().with("#1", "tableKey"))
.withValueMap(new ValueMap().withString(":v1", "none.json"));
//connect DynamoDB instance over AWS
DynamoDB dynamoDB = new DynamoDB(Regions.US_WEST_2);
//get the table instance
String tableName = "WFMHistoricalProcessedFiles";
Table table = dynamoDB.getTable(tableName);
ItemCollection<QueryOutcome> items = table.query(spec);
//getting over the results
Iterator<Item> it = items.iterator();
Item item = null;
while (it.hasNext()) {
item = it.next();
System.out.println(item.toJSONPretty());
}
While using DynamoDB to make any Query or a Scan like in the example above.
Is there an actual need to call shutdown() in order to close the connection?

The documentation seems pretty clear.
shutdown
void shutdown()
Shuts down this client object, releasing any resources that might be held open. This is an optional method, and callers are not expected to call it, but can if they want to explicitly release any open resources. Once a client has been shutdown, it should not be used to make any more requests.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#shutdown--
But to clarify what you specifically asked about:
in order to close the connection?
There is not exactly a "connection" to DynamoDB. It's accessed over HTTPS, statelessly, when requests are sent... so where your code says // connect DynamoDB instance over AWS, that really isn't accurate. You're constructing an object that will not acually connect until around the time you call table.query().
That connection might later be kept-alive for a short time for reuse, but even if true, it isn't "connected to" DynamoDB in any meaningful sense. At best, it's connected to a front-end system inside AWS that is watching for the next request and will potentially forward that request to DynamoDB if it's syntactically valid and authorized.
But this idle connection, if it exists, isn't consuming any DynamoDB resources in a way that should degrade performance of your application or others accessing the same DynamoDB table.
Good practice, of course, suggests that if you have the option of cleaning something up, it's potentially a good idea to do so, but it seems clearly optional.

Related

Does below Dynamodb scenario valid?

Can we get exceptions because of network failures While writing the data to dynamodb post data written to the tables ?
Yes. This can happen:
You post the data to dynamo successfully
Dynamo successfully writes the data
Dynamo servers return a 200 OK
A network error causes the TCP connection to time out or fail in some other way before the data makes it back from the dynamo servers to your client
In this case, the data exists happily on the server, but your code was never notified about it.
See the Two Generals' Problem: https://en.wikipedia.org/wiki/Two_Generals%27_Problem

DynamoDB slow response

So my problem is that DynamoDB is taking quite some time to return single object. I'm using node.js and AWS docclient. The weird thing is that it takes from 100ms to 200ms to "select" single item from DB.
Is there anyway to make it faster?
Exampel code:
var AWS = require("aws-sdk");
var docClient = new AWS.DynamoDB.DocumentClient();
console.time("user get");
var params = {
TableName : 'User',
Key: {
"id": "2f34rf23-4523452-345234"
}
};
docClient.get(params, function(err, data) {
if (err) {
callback(err);
}
else {
console.timeEnd("user get");
}
});
And average for this simple piece of code in lambda is 130ms. Any idea what could I do to make it faster? User table has only Primary partition key "id" and global secondary index with primary key email. When I try this from my console it takes even more time.
Any help will be much appreciated!
I faced exactly the same issue using Lambda#Edge. Responses from DynamoDB took 130-140ms on average while the DynamoDB latency graph shown 10-20ms latency.
I managed to improve response times to ~30ms on average by disabling ssl, parameter validations, and convertResponseTypes:
const docClient = new AWS.DynamoDB.DocumentClient({
apiVersion: '2012-08-10',
sslEnabled: false,
paramValidation: false,
convertResponseTypes: false
});
Most likely the cause of the issue was CPU/Network throttling in the lambda itself. Lambda#Edge for viewer request can have maximum 128MB which is a pretty slow lambda. So disabling extra-checks and SSL validation made things lots faster.
If you are running just a regular Lambda, increasing memory should fix the issue.
Have you warmed up your Lambda function? If you are only running it ad-hoc, and not running a continuous load, the function might not be available yet on the container running it, so additional time might be taken there. One way to support or refute this theory would be to look at latency metrics for the GetItem API. Finally, you could try using AWS X-Ray to find other spots of latency in your stack.
The DynamoDB SDK could also be retrying, adding to your perceived latency in the Lambda function. Given that your items are around 10 KB, it is possible you are getting throttled. Have you provisioned enough read capacity? You can verify both your read latency and read throttling metrics in the DynamoDB console for your table.
I know this is a little old, but for anyone finding this question now: the instantiation of the client can be extremely slow. This was despite fast local testing, yet accessing Dynamo DB from the same region and Elastic Beanstalk instance was extremely slow!
Accessing Dynamo from a single client instance improved the speeds significantly.
Reusing the connection helped speed up my calls from ~120ms to ~35ms.
Reusing Connections with Keep-Alive in Node.js
By default, the default Node.js HTTP/HTTPS agent creates a new TCP connection for every new request. To avoid the cost of establishing a new connection, you can reuse an existing connection.
For short-lived operations, such as DynamoDB queries, the latency overhead of setting up a TCP connection might be greater than the operation itself. Additionally, since DynamoDB encryption at rest is integrated with AWS KMS, you may experience latencies from the database having to re-establish new AWS KMS cache entries for each operation.

strongloop/loopback - Change connection string based on route value

My application's users are geographically dispersed and data is stored in various regions. Each region has it's own data center and database server.
I would like to include a route value to indicate the region that the user wants to access and connect to, as follows:
/api/region/1/locations/
/api/region/2/locations/
/api/region/3/locations/
Depending on the region passed in, I would like to change the connection string being used. I assume this can be performed somewhere in the middleware chain, but don't know where/how. Any help is appreciated!
What should not be done
Loopback provides a method MyModel.attachTo (doesnt seem to be documented, but a reference to it is made there ).
But since it is a static method, it affects the entire Model, not a single instance.
So for this to work on a per-request basis, you must switch the DB right before the call to the datasource method, to make sure nothing async starts in between. I don't think this is possible.
This is an example using an operation hook (and define all datasources, include dbRegion1 below in datasources.json)
Bad, don't that below. Just for reference
Region.observe('loaded', function filterProperties(ctx, next) {
app.models.Region.attachTo(app.dataSources.dbRegion1);
}
But then you will most likely face concurrency issues when your API receives multiple requests in a short time.
(Another way to see it is that the server is no longer truly stateless, execution will not depend only on inputs but also on a shared state).
The hook may set region2 for request 2 while the method called after the hook was expecting to use region1 for request 1. This will be the case if something async is triggered between the hook and the actual call to the datasource method.
So ultimately, I don't think you should do that. I'm just putting it there because some people have recommended it in other SO posts, but it's just bad.
Potential option 1
Build an external re-routing server, that will re-route the requests from the API server to the appropriate region database.
Use the loopback-connector-rest in your API server to consume this microservice, and use it as a single datasource for all your models. This provides abstraction over database selection.
Then of course there is still the matter of implementing the microservice, but maybe you can find some other ORM than loopback's that will support database sharding, and use it in that microservice.
Potential option 2
Create a custom loopback connector that will act as router for MySQL queries. Depending on region value passed inside the query, re-route the query to the appropriate DB.
Option 3
Use a more distributed architecture.
Write a region-specific server to persist region-specific data.
Run for instance 3 different servers, each one configured for a region.
+ 1 common server for routing
Then build a routing middleware for your single user-facing REST api server.
Basic example:
var express = require('express');
var request = require('request');
var ips = ['127.0.0.1', '127.0.0.2'];
app.all('/api/region/:id', function (req, res, next) {
console.log('Reroute to region server ' + req.params.id);
request(ips[req.params.id], function (error, response, body) {
if (err) return next(err);
next(null, body);
});
});
Maybe this option is the easiest to do

How we can use JDBC connection pooling with AWS Lambda?

Can we use JDBC connection pooling with AWS Lambda ? AS AWS lambda function get called on a specific event, so its life time persist even after it finishing one of its call ?
No. Technically, you could create a connection pool outside of the handler function but since you can only make use of any one single connection per invocation so all you would be doing is tying up database connections and allocating a pool of which you could only ever use 1.
After uploading your Lambda function to AWS, the first time it is invoked AWS will create a container and run the setup code (the code outside of your handler function that creates the pool- let's say N connections) before invoking the handler code.
When the next request arrives, AWS may re-use the container again (or may not. It usually does, but that's down to AWS and not under your control).
Assuming it reuses the container, your handler function will be invoked (the setup code will not be run again) and your function would use one of N the connections to your database from the pool (held at the container level). This is most likely the first connection from the pool, number 1 as it is guaranteed to not be in use, since it's impossible for two functions to run at the same time within the same container. Read on for an explanation.
If AWS does not reuse the container, it will create a new container and your code will allocate another pool of N connections. Depending on the turnover of containers, you may exhaust the database pool entirely.
If two requests arrive concurrently, AWS cannot invoke the same handler at the same time. If this were possible, you'd have a shared state problem with the variables defined at the container scope level. Instead, AWS will use two separate containers and these will both allocate a pool of N connections each, i.e. 2N connections to your database.
It's never necessary for a single invocation function to require more than one connection (unless of course you need to communicate to two independent databases within the same context).
The only time a connection pool would be useful is if it were at one level above the container scope, that is, handed down by the AWS environment itself to the container. This is not possible.
The best case you can hope for is to have a single connection per container. Even then you would have to manage this single connection to ensure the database server hasn't disconnect or rebooted. If it does, your container's connection will die and your handler will never be able to connect again (until the container dies), unless you write some code in your function to check for dropped connections. On a busy server, the container might take a long time to die.
Also keep in mind that if your handler function fails, for example half way through a transaction or having locked a table, the next request invocation will get the dirty connection state from the container. The first invocation may have opened a transaction and died. The second invocation may commit and include all the previous queries up to the failure.
I recommend not managing state outside of the handler function at all, unless you have a specific need to optimise. If you do, then use a single connection, not a pool.
Yes, the lambda is mostly persistent, so JDBC connection pooling should work. The first time a lambda function is invoked, the environment will be created and it may or may not get reused. But in practice, subsequent invocations will often reuse the same lambda process along with all program state if your triggering events occur often.
This short lambda function demonstrates this:
package test;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
public class TestLambda implements RequestHandler<String, String> {
private int invocations = 0;
public String handleRequest(String request, Context context) {
invocations++;
System.out.println("invocations = " + invocations);
return request;
}
}
Invoke this from the AWS console with any string as the test event. In the CloudWatch logs, you'll see the invocations number increment each time.
Kudos to the AWS RDS proxy, now you can used pooled MySql and postgrese connections without any extra configs in your Java or other any code specific to AWS Lambda. All you need is to create and Add a Database proxy your AWS Lambda function you want to reuse/pool connections. See how-to here.
Note: AWS RDS proxy is not included in the Free-Tier (more here).
It has caveat
There is no destroy method which ensures closing pool. One may say DB connection idle time would handle.
What if same DB being used for other use cases like pool maintain in regular machine Luke EC2.
As many say, if there is sudden spike in requests, create chaos to DB as there will be always some maximum connection setting at database side per user.

Web Services design

Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!
Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.
It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.