AWS DynamoDB : Why does "Keys Only" projection not include the Sort Key of the Table? - amazon-web-services

I was reading the official documentation for AWS' GSI. In the documentation they are indicating that the GameScores table has a Primary Key (UserID) and a Sort Key(GameTitle). They then create a GSI called GameTitleIndex on GameTitle and TopScore with KEYS ONLY projection - they mention that the new GSI will have GameTitle and TopScore AS WELL the primary key attributes projected.
But they ONLY indicate that UserID (And not GameTitle) is projected. They even show a diagram of the GSI where only UserID is shown and not GameTitle.
Isn't GameTitle a Key attribute (since its a COMPOSITE Primary Key?) and shouldn't BOTH UserID and GameTitle have been projected on the GSI?

GameTitle is right there as the partition key in the GSI. It's not going to appear twice. An attribute acting as a key is still present.

Related

Oracle APEX 21.2.0 display image Primary Key

I want to display image in report column. My table has composite primary key: ID and DATE.
When I add these columns in BLOB attributes as Primary Key Column 1 and Primary Key Column 2 report can not find data because of DATE column. Is it a problem in date format, or something else?
I'd suggest you to use only one column as a primary key column (a sequence or - if your database version supports it - an identity column).
Combination of [ID, DATE] you currently have can then be set to unique key (set both columns NOT NULL to "mimic" what primary key would do).
Why? Although your data model probably is just fine, certain Apex functionalities "suffer" from such things and prefer having a single-column primary keys.

User ID vs ID for primary key dynamoDB

I have a table in which has a "userId" column (set as a partition key) and a "createdAt" column (set as the sort key) so they form up a composite primary key.
I also need to find the exact row in case I don't have the User ID available, so I made another column "id" and made it as a global secondary index.
In my case, should I make the "id" column the primary key and remove the "userId" as the partition key or will this remove the feature of what "Partitioning" actually does by the DynamoDB?
Similarly, If I need to delete a row from the table, should I send "createdAt" field from the front end to be able to find out the exact row? Does this make sense? Sending the "id" of the row seems more good to me to be able to delete the row.
You probably don't want to put a timestamp in your user primary keys. Why? You'd need to know the exact time the user was created to fetch a user, which is probably not what you want.
Consider using a partition key of USER#<user_id> and a sort key of something predictable, like A or METADATA or USER#<user_id>. This allows you to fetch/delete a user by their ID.
If you have access patterns around fetching users in order of account creation, you can create a GSI with the sort key set to the createdAt attribute.

Querying a Global Secondary Index of a DynamoDB table without using the partition key

I have a DynamoDB table with partition key as userID and no sort key.
The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions).
After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table.
Hence, I created a GSI with the following keys:
Partition Key: userID
Sort Key: timestamp
I am querying the index with Java SDK using the following code:
String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression("timestamp > :v_timestampLowerBound")
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString));
ItemCollection<QueryOutcome> items = index.query(querySpec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
Item item = iter.next();
// extract item attributes here
}
I am getting the following error on executing this code:
Query condition missed key schema element: userID
From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.
Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.
When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)
The condition must perform an equality test on a single partition key value.
The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)
If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item
QuerySpec querySpec = new QuerySpec()
.withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
.withKeyMap(new KeyMap()
.withString("#type", "type"))
.withValueMap(new ValueMap()
.withString(":v_timestampLowerBound", lastWeekDateString)
.withString(":item", "item"));
Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI.
In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.
Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key.
At most, the number of query you need to make is just 2 at max in this case.
you are right, you should avoid filtering or scanning as much as you can.
So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:
.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")
and else like this:
.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")
and
.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")
Finally, you should union the result set from both queries.
This consumes only 2 read capacity unit at most.
For better comparison of sort key, you might need to use UNIX timestamp.
Thanks

How to structure table in DynamoDB?

I have four fields: payload, receivedOn, topic, uuid.
I have taken date[milliseconds] as primary key [partition key].
I want to write a query that gives me result based on receivedOn field.
I have tried to scan the database but it does not give result in ascending and descending format.
When I use query I have to use date[partition key] and receivedOn both.
As you can see I have to assign value in Partition key, but all my partition key is different. So how should I structure my database so I can query on receivedOn field and can get data in descending order.
Please help.
Thank you in advance.
Answer:
In dynamoDB partition key is combination of two keys : primary key and sortKey. You can sort your data using sortKey. Partition key can be same but sortKey cannot. So I give receivedOn value to sortKey and add the millisecond to it so it will always be unique.

Can you query by a range in a GSI in DynamoDB

Suppose I had a users and images tables in DynamoDB.
users table
userId (hash) name email
images table
imagesId (hash) userId filename
How should I set up the range or GSI if I wanted to get all images for a single userId? Should images table be a composite key with imageId (hash) and userId (range) and search by range (if possible)? or should userId (be a GSI) and query just be userId?
It looks like you could add a GSI on your Images table with the reverse key schema of your table:
hash key - userId
range key - imagesId
filename - additional property
You can then query this GSI with a userId to get all of its associated imagesId.
You cannot query using just a range key in DynamoDB.
Unless you want your query results from the images table to be sorted by userId, there's no point to using it as a range key. If you want to retrieve all of the images with a certain userId from the images table using a query, you need to use a GSI on userId.