DynamoDB: How to perform conditional write to enforce unique Hash + Range key - amazon-web-services

I am using DynamoDB to store events.
They are stored in 1 event table with a hash key 'Source ID' and a range key 'version'. Every time a new event occurs for a source, i want to add a new item with the source ID and an increased version nr.
Is it possible to specify a conditional write so that a duplicate item (same hash key and same range key) can never exist? And if so, how would you do this?
I have done this successfully for tables with just a Hash Key:
Map<String, ExpectedAttributeValue> expected = new HashMap<String, ExpectedAttributeValue>();
expected.put("key", new ExpectedAttributeValue().withExists(false));
But not sure how to handle hash + range keys....

I don't know Java SDK well but you can specify "Exist=False" on both the range_key and the hash_key.
Maybe a better idea could be to use a timestamp instead of a version number ? Otherwise, there are also techniques to generate unique ids.

I was trying to enforce a unique combination of hash and range keys and came across this post. I found that it didn't completely answer my question but certainly pointed me in the right direction. This is an attempt to tidy up the loose ends.
It seems that DynamoDB actually enforces a unique combination of hash and range key by design. I quote
"All items in the table must have a value for the primary key attribute and Amazon DynamoDB ensures that the value for that name is unique"
from http://aws.amazon.com/dynamodb/ under the section with the heading Primary Key.
In my own tests using putItem with the aws-sdk for nodejs I was able to post two identical items without generating an error. When I checked the database, only one item was actually inserted. It seems that the second call to putItem with the same hash and range key combination is treated like an update to the original item.
I too received the error "Cannot expect an attribute to have a specified value while expecting it to not exist" when I tried to set the exist=false option on the hash key and range key with the values set. To resolve this error, I removed the value under the expected hash and range key and it started to generate a validation error when I tried to insert the same key twice.
So, my insert command looks like this (will be different for Java, but hopefully you get the idea)
{ "TableName": "MyTableName",
"Item" : {
"HashKeyFieldName": {
"S": HashKeyValue
},
"RangeKeyFieldName": {
"N": currentTime.getTime().toString()
},
"OtherField": {
"N": "61404032632"
}
},
"Expected": {
"HashKeyFieldName" : { "Exists" : false},
"RangeKeyFieldName" : { "Exists" : false}
}
}
Whereas originally I was trying to do a conditional insert to check if there was a hash value and range value the same as what I was trying to insert, now I just need to check if the HashField and RangeField exist at all. If they exist, that means I am updating an item rather than inserting.

Related

Why AWS Scan LastEvaluatedKey return values even when the Key not exist

Why does AWS Scan LastEvaluatedKey return values ​​even when the key does not exist?
My scan request is
{
"TableName": "tks-processtracker-dumper",
"ExclusiveStartKey": {
"Mykey": {
"S": "AKeyThatDoesntExists"
}
},
"Limit": 2000
}
Even passing a key that does not exist in the table, scanning still returns values.
My question is, should it return values ​​even when the key doesn't exist? and why?
It's an EXCLUSIVE start key...
It doesn't have to exist, as DDB starts reading at whatever item has the next higher value.
In SQL it'd look something like
select *
from table
where tableKey > :exclusiveStartKey
ExclusiveStartKey is essentially a pointer to a location on the storage medium.
It doesn't care if the item is there, it hashes the value you submit and that points to a location in disk and the Scan will proceed from that position.

How to query a DynamoDB table to retreive only items which have a specific value inside an string-set attribute?

The items in my table have an attribute of type string set. I'll stick to the example from the documentation and call the set "colors". As the name indicates the set holds various strings representing colors in each item. This would look like
this.
Now I want to query the table so that I retrieve all items where a specific color is within the set. So in regards to the attached picture I would like to query for the color "Green" and want to receive the items Picture2 and Picture3.
Is there a way to do this?
Since the amount of all possible colors and items is huge plus the fact that only a very small amount of colors are associated to an item, a scan would be very inefficient. So far I tried to create a global secondary index (GSI) but it seems that its not possible in the way I want it or am I wrong?
Unless the field you are searching for is built into the primary key or secondary index, scan will be your only option.
The scan operation will allow you to use the contains keyword to search the set
let params = {
TableName : 'TABLE_NAME',
FilterExpression: "contains(#color, :color)",
ExpressionAttributeNames: {
"#color": "color",
},
ExpressionAttributeValues: {
":color": "Blue",
}
};
documentClient.scan(params, function(err, data) {
console.log(data);
});
According to the docs on secondary indexes, you cannot build an index using a set as the primary key
The key schema for the index. Every attribute in the index key schema must be a top-level attribute of type String, Number, or Binary. Other data types, including documents and sets, are not allowed.

Best method to extract data from dynamoDb and move it to another table

I have a table of 500gb. I want to transfer the data to another table based on the timestamps.
There are several items in table and I want only latest entry of every item in another table.
Considering the size of table, can anyone recommend best aws service to get it done fast and easy?
I have come across aws glue, hivecopyactivity. Are this the best solution or is there any other service I can use?
(assuming you now can add a Global secondary indexes (GSI) on that table, that is: you currently have < 5 GSIs)
Define a new GSI on your table. The GSI's partition key will be x. The GSI's sort key will be timestamp. Once you have that GSI defined you can do a query on that index with ScanIndexForward set to false to get the most recent item first. You need to supply the value of x you are interested at. In the following example request it is simply set to 'abc'
{
"TableName": "<your-table-name>",
"IndexName": "<your-GSI-name>",
"KeyConditionExpression": "x = :argx",
"ExpressionAttributeValues": {
":argx": {"S": "abc"}
},
"ScanIndexForward": false,
"Limit": 1
}
This query looks at items with a given x value (as set in the ExpressionAttributeValues field) sorted in descending order (by the GSI's sort key, which is the timestamp field) and picks the first one (Limit is set to 1). As long as you do not need filtering (the FilterExpression field is empty) then you will get the result that you need by issuing a single Query request.
If you do want to use filtering you will need to do multiple requests and unset the Limit field (i.e., use its default value). See this answer for further details on those subtleties.

dynamodb - scan items where map contains a key

I have a table that contains a field (not a key field), called appsMap, and it looks like this:
appsMap = { "qa-app": "abc", "another-app": "xyz" }
I want to scan all rows whose appsMap contains the key "qa-app" (the value is not important, just the key). I tried something like this but it doesn't work in the way I need:
FilterExpression = '#appsMap.#app <> :v',
ExpressionAttributeNames = {
"#app": "qa-app",
"#appsMap": "appsMap"
},
ExpressionAttributeValues = {
":v": { "NULL": True }
},
ProjectionExpression = "deviceID"
What's the correct syntax?
Thanks.
There is a discussion on the subject here:
https://forums.aws.amazon.com/thread.jspa?threadID=164470
You might be missing this part from the example:
ExpressionAttributeValues: {":name":{"S":"Jeff"}}
However, just wanted to echo what was already being said, scan is an expensive procedure that goes through every item and thus making your database hard to scale.
Unlike with other databases, you have to do plenty of setup with Dynamo in order to get it to perform at it's great level, here is a suggestion:
1) Convert this into a root value, for example add to the root: qaExist, with possible values of 0|1 or true|false.
2) Create secondary index for the newly created value.
3) Make query on the new index specifying 0 as a search parameter.
This will make your system very fast and very scalable regardless of how many records you get in there later on.
If I understand the question correctly, you can do the following:
FilterExpression = 'attribute_exists(#0.#1)',
ExpressionAttributeNames = {
"#0": "appsMap",
"#1": "qa-app"
},
ProjectionExpression = "deviceID"
Since you're not being a bit vague about your expectations and what's happening ("I tried something like this but it doesn't work in the way I need") I'd like to mention that a scan with a filter is very different than a query.
Filters are applied on the server but only after the scan request is executed, meaning that it will still iterate over all data in your table and instead of returning you each item, it applies a filter to each response, saving you some network bandwidth, but potentially returning empty results as you page trough your entire table.
You could look into creating a GSI on the table if this is a query you expect to have to run often.

Dynamodb scan in sorted order

Hi I have a dynamodb table. I want the service to return me all the items in this table and the order is by sorting on one attribute.
Do I need to create a global secondary index for this? If that is the case, what should be the hash key, what is the range key?
(Note that query on gsi must specify a "EQ" comparator on the hash key of GSI.)
Thanks a lot!
Erben
If you know the HashKey, then any query will return the items sorted by Range key. From the documentation:
Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order. Otherwise, the results are returned in order of UTF-8 bytes. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter set to false.
Now, if you need to return all the items, you should use a scan. You cannot order the results of a scan.
Another option is to use a GSI (example). Here, you see that the GSI contains only HashKey. The results I guess will be in sorted order of this key (I didn't check this part in a program yet!).
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying
Approach I followed to solve this problem is by creating a Global Secondary Index as below. Not sure if this is the best approach but posting it if it is useful to someone.
Hash Key | Range Key
------------------------------------
Date value of CreatedAt | CreatedAt
Limitation imposed on the HTTP API user to specify the number of days to retrieve data, defaults to 24 hr.
This way, I can always specify the HashKey as Current date's day and RangeKey can use > and < operators while retrieving. This way the data is also spread across multiple shards.