DynamoDB batchGet vs multiple getItem - amazon-web-services

Is there any difference in terms of cost or performance between using multiple getItem calls:
Promise.all([
client.getItem({ TableName, Key }).promise(),
client.getItem({ TableName, Key }).promise(),
client.getItem({ TableName, Key }).promise(),
]
to one batchGet call:
const params = {
RequestItems: {
'TABLE_NAME': {
Keys: [
{'KEY_NAME': {N: 'KEY_VALUE_1'}},
{'KEY_NAME': {N: 'KEY_VALUE_2'}},
{'KEY_NAME': {N: 'KEY_VALUE_3'}}
]
}
}
};
db.batchGetItem(params).promise()

In terms of cost: No - both Operations consume the same number of Read Capacity Units.
In terms of performance: Yes - using multiple GetItem requests sends separate network requests for each of them and for BatchGetItem there is only one request, which should be quite a bit faster.
There is no real downside to BatchGetItem except for a slight increase in complexity when some items aren't found.

Related

AWS AppSync & DynamoDB Single Table Design: PutItem based on another item's field value

I'm new to AWS AppSync and I have adopted the single table pattern in DynamoDB. Now I am trying to create an item based on a particular field value in the existing item in the same table. For example, I have a table called transaction which holds 2 types of records.
Request
Response
As you can see the above table, I can insert (PutItem) multiple responses for a particular request. Before I insert a new response, I need to validate whether the request (RequestID) is already exists. Is there any way to do via conditional expression in the resolver? Below is my current request resolver code which is not working as expected.
#set( $Id = $util.autoId() )
{
"version" : "2017-02-28",
"operation" : "PutItem",
"key" : {
"PK": $util.dynamodb.toDynamoDBJson("USER#$ctx.args.input.UserId"),
"SK": $util.dynamodb.toDynamoDBJson("RESPONSE#$Id"),
},
"attributeValues" : $util.dynamodb.toMapValuesJson($ctx.args.input),
"condition": {
"expression": "SK = :SK",
"expressionValues" : {
":SK" : {
"S" : "REQUEST#${ctx.args.input.RequestId}"
}
}
}
}
You could do this in one request mapping template by using DynamoDB transactions, see (https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-dynamodb-transact.html).
In one of the "transactWriteItems", you update an arbitrary value (or perhaps, number of responses) in the request item with a condition that checks if the request item exists with that requestId in the SK. If the conditions succeeds, the request item is updated.
Also make sure you have the response item in your "transactWriteItems" so that the response item is also written if the request condition passes.
You won't be able to do a conditional PutItem based on another entry no. In this case you'd want to use pipeline resolvers. In your first function you'd fetch the request item and in the second you can then do your PutItem condition - the previous GetItem result will be available as $ctx.prev.result.

How to specify attributes to return from DynamoDB through AppSync

I have an AppSync pipeline resolver. The first function queries an ElasticSearch database for the DynamoDB keys. The second function queries DynamoDB using the provided keys. This was all working well until I ran into the 1 MB limit of AppSync. Since most of the data is in a few attributes/columns I don't need, I want to limit the results to just the attributes I need.
I tried adding AttributesToGet and ProjectionExpression (from here) but both gave errors like:
{
"data": {
"getItems": null
},
"errors": [
{
"path": [
"getItems"
],
"data": null,
"errorType": "MappingTemplate",
"errorInfo": null,
"locations": [
{
"line": 2,
"column": 3,
"sourceName": null
}
],
"message": "Unsupported element '$[tables][dev-table-name][projectionExpression]'."
}
]
}
My DynamoDB function request mapping template looks like (returns results as long as data is less than 1 MB):
#set($ids = [])
#foreach($pResult in ${ctx.prev.result})
#set($map = {})
$util.qr($map.put("id", $util.dynamodb.toString($pResult.id)))
$util.qr($map.put("ouId", $util.dynamodb.toString($pResult.ouId)))
$util.qr($ids.add($map))
#end
{
"version" : "2018-05-29",
"operation" : "BatchGetItem",
"tables" : {
"dev-table-name": {
"keys": $util.toJson($ids),
"consistentRead": false
}
}
}
I contacted the AWS people who confirmed that ProjectionExpression is not supported currently and that it will be a while before they will get to it.
Instead, I created a lambda to pull the data from DynamoDB.
To limit the results form DynamoDB I used $ctx.info.selectionSetList in AppSync to get the list of requested columns, then used the list to specify the data to pull from DynamoDB. I needed to get multiple results, maintaining order, so I used BatchGetItem, then merged the results with the original list of IDs using LINQ (which put the DynamoDB results back in the correct order since BatchGetItem in C# does not preserve sort order like the AppSync version does).
Because I was using C# with a number of libraries, the cold start time was a little long, so I used Lambda Layers pre-JITed to Linux which allowed us to get the cold start time down from ~1.8 seconds to ~1 second (when using 1024 GB of RAM for the Lambda).
AppSync doesn't support projection but you can explicitly define what fields to return in the response template instead of returning the entire result set.
{
"id": "$ctx.result.get('id')",
"name": "$ctx.result.get('name')",
...
}

Not able to solve throttlingException in DynamoDB

I have a lambda function which does a transaction in DynamoDB similar to this.
try {
const reservationId = genId();
await transactionFn();
return {
statusCode: 200,
body: JSON.stringify({id: reservationId})
};
async function transactionFn() {
try {
await docClient.transactWrite({
TransactItems: [
{
Put: {
TableName: ReservationTable,
Item: {
reservationId,
userId,
retryCount: Number(retryCount),
}
}
},
{
Update: {
TableName: EventDetailsTable,
Key: {eventId},
ConditionExpression: 'available >= :minValue',
UpdateExpression: `set available = available - :val, attendees= attendees + :val, lastUpdatedDate = :updatedAt`,
ExpressionAttributeValues: {
":val": 1,
":updatedAt": currentTime,
":minValue": 1
}
}
}
]
}).promise();
return true
} catch (e) {
const transactionConflictError = e.message.search("TransactionConflict") !== -1;
// const throttlingException = e.code === 'ThrottlingException';
console.log("transactionFn:transactionConflictError:", transactionConflictError);
if (transactionConflictError) {
retryCount += 1;
await transactionFn();
return;
}
// if(throttlingException){
//
// }
console.log("transactionFn:e.code:", JSON.stringify(e));
throw e
}
}
It just updating 2 tables on api call. If it encounter a transaction conflict error, it simply retry the transaction by recursively calling the function.
The eventDetails table is getting too much db updates. ( checked it with aws Contributor Insights) so, made provisioned unit to a higher value than earlier.
For reservationTable Provisioned capacity is on Demand.
When I do load test over this api with 400 (or more) users using JMeter (master slave configuration) I am getting Throttled error for some api calls and some api took more than 20 sec to respond.
When I checked X-Ray for this api found that, DynamoDB is taking too much time for this transasction for the slower api calls.
Even with much fixed provisioning ( I tried on demand scaling too ) , I am getting throttled exception for api calls.
ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded.
Consider increasing your provisioning level with the UpdateTable API.
UPDATE
And one more thing. When I do the load testing, I am always uses the same eventId. It means, I am always updating the same row for all the api requests. I have found this article, which says that, a single partition can only have upto 1000 WCU. Since I am always updating the same row in the eventDetails table during load testing, is that causing this issue ?
I had this exact error and it helped me to change the On Demand to Provisioned under Read/write capacity mode. Try to change that, if that doesn't help, we'll go from there.
From the link you cite in your update, also described in an AWS help article here, it sounds like the issue is that all of your load testers are writing to the same entry in the table, which is going to be in the same partition, subject to the hard limit of 1,000 WCU.
Have you tried repeating this experiment with the load testers writing to different partitions?

Map different Sort Key responses to Appsync Schema values

So here is my schema:
type Model {
PartitionKey: ID!
Name: String
Version: Int
FBX: String
# ms since epoch
CreatedAt: AWSTimestamp
Description: String
Tags: [String]
}
type Query {
getAllModels(count: Int, nextToken: String): PaginatedModels!
}
type PaginatedModels {
models: [Model!]!
nextToken: String
}
I would like to call 'getAllModels' and have all of it's data, and all of it's tags be filled in.
But here is the thing. Tags are stored via sort keys. Like so
PartionKey | SortKey
Model-0 | Model-0
Model-0 | Tag-Tree
Model-0 | Tag-Building
Is it possible to transform the 'Tag' sort keys into the Tags: [String] array in the schema via a DynamoDB resolver? Or must I do something extra fancy through a lambda? Or is there a smarter way to do this?
To clarify, are you storing objects like this in DynamoDB:
{ PartitionKey (HASH), Tag (SortKey), Name, Version, FBX, CreatedAt, Description }
and using a DynamoDB Query operation to fetch all rows for a given HashKey.
Query #PartitionKey = :PartitionKey
and getting back a list of objects some of which have a different "Tag" value and one of which is "Model-0" (aka the same value as the partition key) and I assume that record contains all other values for the record. E.G.
[
{ PartitionKey, Tag: 'ValueOfPartitionKey', Name, Version, FBX, CreatedAt, ... },
{ PartitionKey, Tag: 'Tag-Tree' },
{ PartitionKey: Tag: 'Tag-Building' }
]
You can definitely write resolver logic without too much hassle that reduces the list of model objects into a single object with a list of "Tags". Let's start with a single item and see how to implement a getModel(id: ID!): Model query:
First define the response mapping template that will get all rows for a partition key:
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
"expression": "#PartitionKey = :id",
"expressionValues" : {
":id" : {
"S" : "${ctx.args.id}"
}
},
"expressionNames": {
"#PartitionKey": "PartitionKey" # whatever the table hash key is
}
},
# The limit will have to be sufficiently large to get all rows for a key
"limit": $util.defaultIfNull(${ctx.args.limit}, 100)
}
Then to return a single model object that reduces "Tag" to "Tags" you can use this response mapping template:
#set($tags = [])
#set($result = {})
#foreach( $item in $ctx.result.items )
#if($item.PartitionKey == $item.Tag)
#set($result = $item)
#else
$util.qr($tags.add($item.Tag))
#end
#end
$util.qr($result.put("Tags", $tags))
$util.toJson($result)
This will return a response like this:
{
"PartitionKey": "...",
"Name": "...",
"Tags": ["Tag-Tree", "Tag-Building"],
}
Fundamentally I see no problem with this but its effectiveness depends upon your query patterns. Extending this to the getAll use is doable but will require a few changes and most likely a really inefficient Scan operation due to the fact that the table will be sparse of actual information since many records are effectively just tags. You can alleviate this with GSIs pretty easily but more GSIs means more $.
As an alternative approach, you can store your Tags in a different "Tags" table. This way you only store model information in the Model table and tag information in the Tag table and leverage GraphQL to perform the join for you. In this approach have Query.getAllModels perform a "Scan" (or Query) on the Model table and then have a Model.Tags resolver that performs a Query against the Tag table (HK: ModelPartitionKey, SK: Tag). You could then get all tags for a model and later create a GSI to get all models for a tag. You do need to consider that now the nested Model.Tag query will get called once per model but Query operations are fast and I've seen this work well in practice.
Hope this helps :)

Implementation of Atomic Transactions in dynamodb

I have a table in dynamodb, where I need to update multiple related items at once(I can't put all data in one item because of 400kb size limit).
How can I make sure that either multiple rows are updated successfully or none.
End goal is to read consistent data after update.
On November 27th, 2018, transactions for Dynamo DB were announced. From the linked article:
DynamoDB transactions provide developers atomicity, consistency, isolation, and durability (ACID) across one or more tables within a single AWS account and region. You can use transactions when building applications that require coordinated inserts, deletes, or updates to multiple items as part of a single logical business operation. DynamoDB is the only non-relational database that supports transactions across multiple partitions and tables.
The new APIs are:
TransactWriteItems, a batch operation that contains a write set, with one or more PutItem, UpdateItem, and DeleteItem operations. TransactWriteItems can optionally check for prerequisite conditions that must be satisfied before making updates. These conditions may involve the same or different items than those in the write set. If any condition is not met, the transaction is rejected.
TransactGetItems, a batch operation that contains a read set, with one or more GetItem operations. If a TransactGetItems request is issued on an item that is part of an active write transaction, the read transaction is canceled. To get the previously committed value, you can use a standard read.
The linked article also has a JavaScript example:
data = await dynamoDb.transactWriteItems({
TransactItems: [
{
Update: {
TableName: 'items',
Key: { id: { S: itemId } },
ConditionExpression: 'available = :true',
UpdateExpression: 'set available = :false, ' +
'ownedBy = :player',
ExpressionAttributeValues: {
':true': { BOOL: true },
':false': { BOOL: false },
':player': { S: playerId }
}
}
},
{
Update: {
TableName: 'players',
Key: { id: { S: playerId } },
ConditionExpression: 'coins >= :price',
UpdateExpression: 'set coins = coins - :price, ' +
'inventory = list_append(inventory, :items)',
ExpressionAttributeValues: {
':items': { L: [{ S: itemId }] },
':price': { N: itemPrice.toString() }
}
}
}
]
}).promise();
You can use an API like this one for Java, http://aws.amazon.com/blogs/aws/dynamodb-transaction-library/. The transaction library API will help you manage atomic transactions.
If you're using node.js, there are other solutions for that using an atomic counter or conditional writes. See answer here, How to support transactions in dynamoDB with javascript aws-sdk?.