Can I query DynamoDBMapper with partition key only? - amazon-web-services

I've seen this page about how to query with partition keys only. However, my case is using DynamoDBMapper class to make the query, what seemed to work there does not apply.
Here's a part of my code:
private final DynamoDBMapper mapper;
List<QueryResult> queryResult = mapper.query(QueryResult.class, queryExpression);
The table I query has a primary partition key id and primary sort key timestamp.
I wanted to query all the rows with designatedid, eav looks like:
{:id={S: 0123456,}}
but if the id has duplicates (which makes sense cause it's partition key), it always gives me
"The provided key element does not match the schema"
Not sure how to resolve this. Due to sharing code with other tables, DynamoDBMapper class is a must.
Any help appreciated! Thanks.

Does the below work?
final DynamoDBQueryExpression<QueryResult> queryExpression = new DynamoDBQueryExpression<>();
expression.setKeyConditionExpression("id = :id");
expression.withExpressionAttributeValues(ImmutableMap.of(":id", new AttributeValue("0123456")));

Here is a working example:
final MyItem hashKeyValues = MyItem.builder()
.hashKeyField("abc")
.build();
final DynamoDBQueryExpression<MyItem> queryExpression = new DynamoDBQueryExpression<>();
queryExpression.withHashKeyValues(hashKeyValues);
queryExpression.setConsistentRead(false); //or true
final PaginatedQueryList<MyItem> response = dynamoDBMapper.query(MyItem.class, queryExpression);

Related

Query GSI in DynamoDb with a substring of the key (begins_with)

Let's take the best practices for sort keys official documentation of DynamoDb as an example: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
Imagine we have a table like the documentation mentions, where our sort key is a composite that looks like:
[country]#[region]#[city]#[neighborhood]
For example, something like this:
Partition Key (Employee Name)
Sort Key
Other columns...
Antonio
Spain#Madrid#Getafe#Whatever
...
Maria
Spain#Andalucia#Sevilla#Whatever2
...
Mike
Spain#Madrid#Alcorcon#Whatever
...
And I'd like to get all the records from a specific country + region, so we have a partial sort key:
[country]#[region] like Spain#Madrid to get Antonio and Mike.
I know it's not possible to query by sort key directly, so I created a GSI with the inverted index (like mentioned here https://stackoverflow.com/a/64141405)
Partition Key
Sort Key
Other columns...
Spain#Madrid#Getafe#Whatever
Antonio
...
Spain#Andalucia#Sevilla#Whatever2
Maria
...
Spain#Madrid#Alcorcon#Whatever
Mike
...
But it still looks like it's not possible to query using the begins_with operator.
var request = new QueryRequest
{
IndexName = "GSI_Name",
KeyConditionExpression = "begins_with(SortKey, :v_SortKey)",
ExpressionAttributeValues = new Dictionary<string, AttributeValue> {
{":v_SortKey", new AttributeValue { S = sortKey }},
},
};
My question is: is there any way to achieve this without using the Scan operation which is not ideal? Or any suggestion to change my table definition to achieve this? I've been trying to think of ways of restructuring the table to accomplish this behavior, but I'm not fluent enough with DynamoDB.
Use the country as the PK and the rest as the SK. That spreads the data nicely across partitions while also enabling your access pattern.

DynamoDB Query with filter

I like to write a dynamoDb query in which I filter for a certain field, sounds simple.
All the examples I find always include the partition key value, which really confuses me, since it is unique value, but I want a list.
I got id as the partition key and no sort key or any other index. I tried to add partner as an index did not make any difference.
AttributeValue attribute = AttributeValue.builder()
.s(partner)
.build();
Map<String, AttributeValue> expressionValues = new HashMap<>();
expressionValues.put(":value", attribute);
Expression expression = Expression.builder()
.expression("partner = :value")
.expressionValues(expressionValues)
.build();
QueryConditional queryConditional = QueryConditional
.keyEqualTo(Key.builder()
.partitionValue("id????")
.build());
Iterator<Product> results = productTable.query(r -> r.queryConditional(queryConditional)
Would appreciate any help. Is there a misunderstandig on my side?
DynamoDB has two distinct, but similar, operations - Query and Scan:
Scan is for reading the entire table, including all partition keys.
Query is for reading a specific partition key - and all sort key in it (or a contiguous range of sort key - hence the nickname "range key" for that key).
If your data model does not have a range key, Query is not relevant for you - you should use Scan.
However this means that each time you call this query, the entire table will be read. Unless your table is tiny, this doesn't make economic sense, and you should reconsider your data model. For example, if you frequently look up results by the "partner" attribute, you can consider creating a GSI (global secondary index) with "partner" as its partition key, allowing you to quickly and cheapy fetch the list of items with a given "partner" value without scanning the entire table.

What does DynamoDB scan return if item with Exclusive Start Key does not exist in the table?

I'm trying to implement pagination for my API. I have a DynamoDB table with a simple primary key.
Since the ExclusiveStartKey in a DynamoDB scan() operation is nothing but the primary key of the last item fetched in the scan operation before, I was wondering what would DynamoDB return if I perform a scan() with an ExclusiveStartKey that does not exist in the table?
# Here response contains the same list of items for the same
# primary key passed to the scan operation
response = table.scan(ExclusiveStartKey=NonExistentPrimaryKey)
I expected DynamoDB to return no items (correct me if this assumption of mine is what's wrong), i.e the scanning should resume from the ExclusiveStartKey, if it exists in the table. If not, it should return no items.
But what I do see happening is, the scan() still returns items. When I give the same non-existent primary key, it keeps returning me a list starting from the same item.
Does DynamoDB simply apply the hash function on the ExclusiveStartKey and from the result of this hash decide from which partition it has to start returning items or something?
# My theory as to what DynamoDB does in a paginated scan operation
partitionId = dynamodbHashFunction(NonExistentPrimaryKey)
return fetchItemsFromPartition(partitionId)
My end goal is that when an invalid ExclusiveStartKey is provided by the user (i.e a non-existent primary key), I want to return nothing or even better, return a message that the ExclusiveStartKey is invalid.
Looks like you want to return items based on a value. If that value does not exist, then you want to have an empty result set. This is possible with the
Java V2 DynamoDbTable object's scan method:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/enhanced/dynamodb/DynamoDbTable.html
For this solution, one way is to scan an AmazonDB table and return a result set based on the value of specific column (including the key). You can use an Expression object. This lets you set the value that you want to return in a result set.
For example, here is Java logic that returns all items where a date column is 2013-11-15. If there are no items that meet this condition, then no items are returned. There is no need for a pre-check, etc. You need to setup the ScanEnhancedRequest properly.
public static void scanIndex(DynamoDbClient ddb, String tableName, String indexName) {
System.out.println("\n***********************************************************\n");
System.out.print("Select items for "+tableName +" where createDate is 2013-11-15!");
try {
// Create a DynamoDbEnhancedClient and use the DynamoDbClient object.
DynamoDbEnhancedClient enhancedClient = DynamoDbEnhancedClient.builder()
.dynamoDbClient(ddb)
.build();
// Create a DynamoDbTable object based on Issues.
DynamoDbTable<Issues> table = enhancedClient.table("Issues", TableSchema.fromBean(Issues.class));
// Setup the scan based on the index.
if (indexName == "CreateDateIndex") {
System.out.println("Issues filed on 2013-11-15");
AttributeValue attVal = AttributeValue.builder()
.s("2013-11-15")
.build();
// Get only items in the Issues table for 2013-11-15.
Map<String, AttributeValue> myMap = new HashMap<>();
myMap.put(":val1", attVal);
Map<String, String> myExMap = new HashMap<>();
myExMap.put("#createDate", "createDate");
Expression expression = Expression.builder()
.expressionValues(myMap)
.expressionNames(myExMap)
.expression("#createDate = :val1")
.build();
ScanEnhancedRequest enhancedRequest = ScanEnhancedRequest.builder()
.filterExpression(expression)
.limit(15)
.build();
// Get items in the Issues table.
Iterator<Issues> results = table.scan(enhancedRequest).items().iterator();
while (results.hasNext()) {
Issues issue = results.next();
System.out.println("The record description is " + issue.getDescription());
System.out.println("The record title is " + issue.getTitle());
}
}
} catch (DynamoDbException e) {
System.err.println(e.getMessage());
System.exit(1);
}
}

Querying a Global Secondary Index of a DynamoDB table without using the partition key

I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks

Can I use the DynamoDB .NET Object Persistence Model when I have a Global Secondary Index?

I have a table in Dynamo with a hash & range index plus a secondary global index with just a hash. If I try to Query or Save an object I get the following error:
Number of hash keys on table TableName does not match number of hash keys on type ObjectModelType
(Replacing TableName and ObjectModelType with the actual table and model type)
I have the hash properties (both the primary and secondary) decorated with DynamoDBHashKey
Googling the error turns up exactly zero results
Update: Ok, so not exactly zero, obviously it now returns this question!
Update the second: I've tried using the helper API & it works just fine, so I am assuming at this point that the Object Persistence Model doesn't support Global Secondary Indexes
I encountered the same problem and found FromQuery worked, although QueryFilter is actually from the DocumentModel namespace:
var queryFilter = new QueryFilter(SecondaryIndexHashKeyColumn, QueryOperator.Equal, "xxxx");
queryFilter.AddCondition(SecondaryIndexRangeKeyColumn, QueryOperator.LessThan, DateTime.Today);
var items = context.FromQuery<MyItem>(new QueryOperationConfig { IndexName = SecondaryIndexName, Filter = queryFilter }).ToList();
Thank you Brendon for that tip! I was able to adapt it to get deserialization of a multiple result set.
var queryFilter = new QueryFilter(indexedColumnName, QueryOperator.Equal, targetValue);
Table table = Table.LoadTable(client, Configuration.Instance.DynamoTable);
var search = table.Query(new QueryOperationConfig { IndexName = indexName, Filter = queryFilter });
List<MyObject> objects = new List<MyObject>();
List<Document> documentSet = new List<Document>();
do
{
documentSet = search.GetNextSetAsync().Result;
foreach (var document in documentSet)
{
var record = JsonConvert.DeserializeObject<MyObject>(document.ToJson());
objects .Add(record);
}
} while (!search.IsDone);
Thanks x1000!! :)