Let's take the best practices for sort keys official documentation of DynamoDb as an example: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
Imagine we have a table like the documentation mentions, where our sort key is a composite that looks like:
[country]#[region]#[city]#[neighborhood]
For example, something like this:
Partition Key (Employee Name)
Sort Key
Other columns...
Antonio
Spain#Madrid#Getafe#Whatever
...
Maria
Spain#Andalucia#Sevilla#Whatever2
...
Mike
Spain#Madrid#Alcorcon#Whatever
...
And I'd like to get all the records from a specific country + region, so we have a partial sort key:
[country]#[region] like Spain#Madrid to get Antonio and Mike.
I know it's not possible to query by sort key directly, so I created a GSI with the inverted index (like mentioned here https://stackoverflow.com/a/64141405)
Partition Key
Sort Key
Other columns...
Spain#Madrid#Getafe#Whatever
Antonio
...
Spain#Andalucia#Sevilla#Whatever2
Maria
...
Spain#Madrid#Alcorcon#Whatever
Mike
...
But it still looks like it's not possible to query using the begins_with operator.
var request = new QueryRequest
{
IndexName = "GSI_Name",
KeyConditionExpression = "begins_with(SortKey, :v_SortKey)",
ExpressionAttributeValues = new Dictionary<string, AttributeValue> {
{":v_SortKey", new AttributeValue { S = sortKey }},
},
};
My question is: is there any way to achieve this without using the Scan operation which is not ideal? Or any suggestion to change my table definition to achieve this? I've been trying to think of ways of restructuring the table to accomplish this behavior, but I'm not fluent enough with DynamoDB.
Use the country as the PK and the rest as the SK. That spreads the data nicely across partitions while also enabling your access pattern.
Related
I have a dynamodb table on which a GSI is defined with a partition key and sort key.
Let's say the parition key is name and sort key is ssn for the GSI.
I have to fetch based upon a name and ssn, below is the query I am using and it works fine.
table.query(IndexName='lookup-by-name',KeyConditionExpression=Key('name').eq(name)\
& Key('ssn').eq(ssn))
Now, I have to query based upon a name and a list of ssns.
For Example
ssns=['ssn1','ss2','ss3',ssn4']
name='Alex'
query all records which has name as 'Alex' and whose ssn is present in ssns list.
How do I implement something like this ?
While DynamoDB native SDK cannot provide the functionality to do this, you can achieve it using PartiQL which provides a SQL like interface for interacting with DynamoDB.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-gettingstarted.html
import boto3
client = boto3.client('dynamodb', region_name="eu-west-1")
name = 'Alex'
ssns = ['ssn1','ssn2','ssn3','ssn4']
response = client.execute_statement(
Statement = "Select * from \"MyTableTest\".\"lookup-by-name\" where \"name\" = '%s' AND \"ssn\" IN %s" % (name, ssns)
)
print(response['Items'])
It would also require you to use the lower level Client instead of the Table level resource which you are using above.
You would have to do multiple queries.
Ended up using just the name as keycondition and then filter out the ssn in python code.
Below worked for me as the number of records was not a lot.
response=table.query(IndexName='lookup-by-name',KeyConditionExpression=Key('name').eq(name)
ssns=['ssn1','ss2','ss3',ssn4']
data= response['Items']
data=list(filter(lambda record: record['ssn'] in ssns,data))
return data
I like to write a dynamoDb query in which I filter for a certain field, sounds simple.
All the examples I find always include the partition key value, which really confuses me, since it is unique value, but I want a list.
I got id as the partition key and no sort key or any other index. I tried to add partner as an index did not make any difference.
AttributeValue attribute = AttributeValue.builder()
.s(partner)
.build();
Map<String, AttributeValue> expressionValues = new HashMap<>();
expressionValues.put(":value", attribute);
Expression expression = Expression.builder()
.expression("partner = :value")
.expressionValues(expressionValues)
.build();
QueryConditional queryConditional = QueryConditional
.keyEqualTo(Key.builder()
.partitionValue("id????")
.build());
Iterator<Product> results = productTable.query(r -> r.queryConditional(queryConditional)
Would appreciate any help. Is there a misunderstandig on my side?
DynamoDB has two distinct, but similar, operations - Query and Scan:
Scan is for reading the entire table, including all partition keys.
Query is for reading a specific partition key - and all sort key in it (or a contiguous range of sort key - hence the nickname "range key" for that key).
If your data model does not have a range key, Query is not relevant for you - you should use Scan.
However this means that each time you call this query, the entire table will be read. Unless your table is tiny, this doesn't make economic sense, and you should reconsider your data model. For example, if you frequently look up results by the "partner" attribute, you can consider creating a GSI (global secondary index) with "partner" as its partition key, allowing you to quickly and cheapy fetch the list of items with a given "partner" value without scanning the entire table.
I have the following JSON in dynamo:
{
cdItem: "123456",
dtItem: "2021-03-01"
}
My hashkey is cdItem.
I would need my dtItem also be a key. So that if I send an item with the same cdItem, but different dtItem, it creates a new record and does not update the existing one.
How can I do this? Or, is it possible to do this?
There are multiple ways you can implement this and they depend on your access patterns.
If you only want to request an item for which you know both the cdItem as well as the dtItem values you could just overload the partition key by concatenating them, e.g. 123456#2021-03-01 that way you could keep your existing table.
A more flexible solution would be using a composite primary key, which is a combination of a partition and a sort key. This requires you to create a new table.
I'd set it up like this:
cdItem (Partition Key)
dtItem (Sort Key
123456
2021-02-27
123456
2021-02-28
123456
2021-03-01
654321
2021-03-01
You'll have to provide both of those attributes on each PutItem request.
You can also call GetItem with both values to retrieve a single item and you can select all dtItem values for a given cdItem value using the Query API as well as do some filtering on the value of dtItem.
I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks
I've seen this page about how to query with partition keys only. However, my case is using DynamoDBMapper class to make the query, what seemed to work there does not apply.
Here's a part of my code:
private final DynamoDBMapper mapper;
List<QueryResult> queryResult = mapper.query(QueryResult.class, queryExpression);
The table I query has a primary partition key id and primary sort key timestamp.
I wanted to query all the rows with designatedid, eav looks like:
{:id={S: 0123456,}}
but if the id has duplicates (which makes sense cause it's partition key), it always gives me
"The provided key element does not match the schema"
Not sure how to resolve this. Due to sharing code with other tables, DynamoDBMapper class is a must.
Any help appreciated! Thanks.
Does the below work?
final DynamoDBQueryExpression<QueryResult> queryExpression = new DynamoDBQueryExpression<>();
expression.setKeyConditionExpression("id = :id");
expression.withExpressionAttributeValues(ImmutableMap.of(":id", new AttributeValue("0123456")));
Here is a working example:
final MyItem hashKeyValues = MyItem.builder()
.hashKeyField("abc")
.build();
final DynamoDBQueryExpression<MyItem> queryExpression = new DynamoDBQueryExpression<>();
queryExpression.withHashKeyValues(hashKeyValues);
queryExpression.setConsistentRead(false); //or true
final PaginatedQueryList<MyItem> response = dynamoDBMapper.query(MyItem.class, queryExpression);