Enforcing uniqueness with conditional writes in DynamoDB - amazon-web-services

I am new to DynamoDB, i have a table: post_reaction (it stores user's reaction on a post)
PK: postId
SK: reactionType#timestamp#userId
userId: userId
PK
SK
userId
123
1#1676444573#324
324
Since my SK is non-deterministic (because of timestamp), is there any way to avoid duplicates?
Same user cannot give same reactionType on the same Post
I have tried using ConditionExpression attribute_not_exists, but not working

The conditions for conditional item changes are only enforced on the item being changed and not across other items as that wouldn't scale well. An exception are transactions, which are a different topic.
I assume you have the timestamp for sorting purposes or making the record unique.
What you want to achieve can be done by using a global secondary index, I'm going to call it GSI1 with GSI1PK as its partition and GSI1SK as its sort key:
PK
SK
GSI1PK
GSI1SK
DATA
POST#123
USER#123#REACTION#1
POST#123
REACTION#1#TS#20230215
reactionType: 1
POST#123
USER#123#REACTION#2
POST#123
REACTION#2#TS#20230213
reactionType: 2
POST#123
USER#456#REACTION#2
POST#123
REACTION#2#TS#20230214
reactionType: 2
Your business rule that you need to enforce appears to be:
A user can only react to a post with a given reaction type exactly once.
Given this table structure you try to put the item into the base table if it doesn't exist. This will enforce your constraint as the combination of partition and sort key needs to be unique in the base table.
You can use the global secondary index to get all reactions for a post. Partition and Sort Key combinations in GSIs don't need to be unique, so you can query GSI1 with the Post Id and get all the reactions. You could still append a timestamp to GSI1Sk if you need sorting.
The drawback is higher costs because of the GSI.
And alternative (simpler) setup looks like this and omits the GSI at the cost of not being able to sort reactions by timestamps:
PK
SK
DATA
POST#123
REACTION#1#USER#123
reactionType: 1
POST#123
REACTION#1#USER#456
reactionType: 1
POST#123
REACTION#2#USER#123
reactionType: 2
You can again do your conditional PutItem requests as above to enforce the constraint.
To get all reactions for a post you do: Query: PK = Post#<id> & SK starts_with(REACTION#)
To get all reactions of type x for a post you do: Query: PK = Post#<id> & SK starts_with(REACTION#<x>#)

Related

Single table design of a chat app on DynamoDB, checking if on the right direction

I am all new to NoSQL and specifically DynamoDB single table design. Have been going through a lot of videos and articles on the internet regarding the single-table design and finally I have put together a small design for a chat application which I am planning to build in the future.
The access patterns I have so far thought about are -
Get user details by User Id.
Get list of conversations the user is part of.
Get list of messages the user has created
Get all members of a conversation
Get all messages of a conversation
Also want to access messages of a conversation by a date range, so far I haven't figured out that one.
As per the below design, if I were to pull all messages of a conversation, is that going to pull the actual message in the message attribute which is in the message partition?
Here is the snip of the model I have created with some sample data on. Please let me know if I am in the right direction.
As per the below design, if I were to pull all messages of a conversation, is that going to pull the actual message in the message attribute which is in the message partition?
No, it will only return the IDs of a message as the actual content is in a separate partition.
I'd propose a different model - it consists of a table with a Global Secondary Indexe (GSI1). The layout is like this:
Base Table:
Partition Key: PK
Sort Key: SK
Global Secondary Index GSI1:
Partition Key: GSI1PK
Sort Key: GSI1SK
Base Table
GSI 1
Access Patterns
1.) Get user details by User Id.
GetItem on Base Table with Partition Key = PK = U#<id> and Sort Key SK = USER
2.) Get list of conversations the user is part of.
Query on Base Table with Partition Key = PK = U#<id> and Sort Key SK = starts_with(CONV#)
3.) Get list of messages the user has created
Query on GSI1 with Partition Key GSI1PK = U#<id>
4.) Get all members of a conversation
Query on Base Table with Partition Key = PK = CONV#<id> and Sort Key SK starts_with(U#)
5.) Get all messages of a conversation
Query on Base Table with Partition Key PK = CONV#<id> and Sort Key SK starts_with(MSG#)
6.) Also want to access messages of a conversation by a date range, so far I haven't figured out that one.
DynamoDB does Byte-Order Sorting in a partition - if you format all dates according to ISO 8601 in the UTC timezone, you can make the range query, e.g.:
Query on Base Table with Partition Key PK = CONV#<id> and Sort Key SK between(MSG#2021-09-20, MSG#2021-09-30)

How to structure DynamoDB index to allow retrieval by multiple fields

I'm new to DynamoDB and trying to figure out how to structure my data/table/index. My schema includes an itemid (unique) and an orderid (multiple items per order), along with some other arbitrary attributes. I want to be able to retrieve a single item by its itemid, but also retrieve a set of items by their OrderId.
My initial instinct was to set the itemid as the primary key and the orderid as the sort key, but that didn't allow me to query by orderid only. However the same problem occurs if I reverse those.
Example data:
ItemId
OrderId
abc-123
1234
def-345
1234
ghi-678
5678
jkl-901
5678
I think I may need a Global Se but not quite understanding where those fit.
If your question is really whether you "are able" to do this, then with ItemId as the partition key, you can still retrieve by OrderId, with the Scan operation, which will let you filter by any attribute.
However Scan will perform full table scans, so the real question is probably whether you can retrieve by OrderId efficiently. In that case, you would indeed need a Global Secondary Index with OrderId and ItemId as the composite attribute key.
This is typically achieved using what's called a "single table design". What this means, is that you store all your data in one table, and store it normalized, i.e. duplicate your data so that it fits your access patterns.
Generally speaking, if you do not know your access patterns beforehand, dynamodb might not be a good fit. For many systems, a good solution is to have the "main" access patterns in dynamo and then offloading some not so performance critical ad-hoc queries by replicating data to something like elasticsearh.
If you have a table with the hash key PK (String) and the sort key SK (String), you can store your data like this. Use transactions to keep the multiple items up to date and consistent etc.
PK
SK
shippingStatus
totalPrice
cartQuantity
order_1234
order_status
PENDING
123123
order_1234
item_abc-123
3
order_1234
item_def-345
1
order_5678
order_status
SHIPPED
54321
order_5678
item_jkl-901
5
item_abc-123
order_1234
item_abc-123
order_9876
item_abc-123
order_5656
This table illustrates the schemaless nature of a dynamo table (except from the PK/SK). With this setup, you can store "metadata" about the order in the order_1234/order_status item. Then, you can query for items with PK order_1234 and SK starts_with "item_" to get all the items for that order. You can do the same to get all the orders for an item - query for PK item_abc-123 and SK starting with "order_" to get all the orders.
I highly recommend this talk by Rick Houlihan to get into single table design and data modelling in dynamo :)
https://www.youtube.com/watch?v=HaEPXoXVf2k

DynamoDB query using DynamoDBMapper

Say if I had a DynamoDB table:
UserId (hash): S
BookName (range): S
BorrowedTime: S
ReturnedTime: S
UserId is the primary key (hash), and I needed to set BookName as sort key (range) because another item being added to the database was overwriting the previous with the same UserId.
How would I go about creating a query using DynamoDBMapper, but the fields being queried are the time fields (which are non-keys)? For instance, say if I wanted to return the UserId and BookName of any book borrowed over 2 weeks ago that hasn't been returned yet?
Do I need to setup a GSI on both BorrowedTime and ReturnedTime fields?
Yes you can make a GSI using BorrowedTime and ReturnedTime or you can use scan instead of a query , if you use scan you dont need to make a gsi but scan operations scan the whole database so it is not really recommended on large db or frequent use.

Dynamodb schema design for optimal queries

I am new to nosql and after reading up on dynamodb i have some confusion on how to model my use case. I've got an app that has users and clubs. The users can belong to 0-many clubs. users can also be an owner of a club which grants them enhanced privileges.
i was thinking of having 2 tables to manage the club/user relationship. 1 with a partition key of user_name and a sort key of club_name the other with a partition key of club_name and a sorty_key of user_name. These should allow me to efficiently query for all users in my club and all clubs i'm a member of.
How would i efficiently query for all clubs i'm not a member of and all users who are not in my club?
Maybe you should use a relational database or hive.
With DynamoDB. You can use 3 tables.
Table 1:User table,use user_id as hash_key and have field user_name.
Table 2:Club table,use club_id as hash_key and have field club_name.
Table 3:Relationship table,use user_id and club_id as hash_key and
range_key.
But you can't get result use one query.
I don't recommended do this query use dynamodb.

3 fields composite primary key (unique item) in Dynamodb

I am trying to create a table to store invoice line items in DynamoDB. Let's say the item is defined by CompanyCode, InvoiceNumber and LineItemId, amount and other line item details.
A unique item is defined by the combination of the first 3 attributes. Any 2 of those attributes can be same for the different items. What should I select as the Hash Attribute and the Range Attribute?
Some Intro
For efficiency I would propose totally different design. With NoSQL databases (and DynamoDB is not different) we always need to consider the access patterns first. Also, if possible we should strive to fit all our data within same table and several indexes. From what we have from OP and his comments, these are the two access patterns:
For a company X, get complete invoice Y (including all items or range of items) [based on this comment ]
Get all invoices for company X [ based on this comment ]
We now wonder what is a good Primary Key? Translates to question what is a good Partition Key (PK) and what is a good Sort Key (SK) and which secondary indexes do we need to create and of what kind (local or global)? Some reminders:
Primary Key can be on one column or composite
Composite primary key consists of Partition Key and Sort Key
Partition key is used as input to the hashing function that will determine partition of the items
Sort key can also be composite, which allows us to model one-to-many relationships in DynamoDB as given in one of the comments links: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
When creating query on the table or index, you always need to use '=' operator on the Partition Key
When querying ranges on Sort Key you have option for KeyConditionExpression which provides you with set of operators for sorting and everything in between (one of them being function begins_with (a, substr) )
You are also allowed to use FilterExpression if you need to further refine the Query results (filter on the projected attributes)
Local Secondary Indexes (LSI) have same Partition Key but different Sort Key than your original table and give you different view of your data, organized according to an alternative Sort Key
Global Secondary Indexes (GSI) have different Partition Key and different Sort Key than your original table and give you completely different view on data
All items with the same partition key are stored together, and for composite Primary keys, are ordered by the sort key value. DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.
Back To Modeling
It is obvious that we are dealing with multiple entities that need to be modeled and fit into the same table. To satisfy condition of Partition Key being unique on the table, CompanyCode comes as a natural Partition Key - so I would ensure that is unique. If not then you need to ask yourself how can you model the second access pattern?
Assuming we have established uniqueness on the CompanyCode let's simplify and say that it comes in the form of an e-mail (or could be domain or just a code, but I will use email for demonstration).
Relationship between Company and Invoices is always 1:many.
Relationship between Invoice and Items is always 1:many.
I propose design as in the image below:
With PK being CompanyCode and SK being InvoiceNumber can store all attributes about that invoice for that company.
Nothing prevents me to also add record where the SK is Customer which allows me to store all attributes about the company.
With GSI1 , we will create reverse lookup where GSI1PK is my tables SK (InvoiceNumber) and my GSI1SK is my tables PK (CompanyCode).
I am using same table to store line items with PK being LineItemId and SK being CompanyCode (still unique)
For Item entity items my GSI1PK is still InvoiceNumber and my GSI1SK is LineItemId which is tables PK so its same as for Invoice entity items.
Now the access patterns supported with this:
If I want to get invoice Y for company X and all the items (access pattern 1): Query the table where CompanyCode=X and use KeyConditionExpression with = operator on the Sort Key InvoiceNumber. If I want to get all the items tied to that invoice, I will project Items attribute using ProjectionExpression.
By retrieving all the items with previous query for company X and invoice Y, I can now run BatchGetItem API call (using my unique composite key LineItemId+CompanyCode) on table to get all items belonging to that particular invoice of that particular customer. (this comes with some constraints of BatchGetItem API)
To support access pattern 2, I will do a query with CompanyCode=X on PK and use KeyConditionExpression on the SK with begins_with (a, substr) function/operator to get only invoices for company X and not the metadata about that company. That will give me all invoices for given company/customer.
Additionally, with above GSI1, for any given InvoiceNumber I can easily select all the line items that belong to that particular invoice. REMEMBER: The key values in a global secondary index do not need to be unique - so in my GSI1 I could have had easily invoice_1 -> (item_1, item_2) and then another invoice_1 -> (item_1,item_2) but the difference between two items in GSI would be in the SK (it would be associated with different CompanyCode (but for demonstration purposes I used invoice_1 and invoice_2).
I believe the first option offered by #georgeaf99 won't work, because if you do it that way, then CompanyCode has to be unique in the table. Therefore, there would only be one item allowed per company. I think the second solution is the only real way to do it.
You can use CompanyCode as the Hash Key, and then all other fields that combine to make the item unique (in this case InvoiceNumber and LineItemId) need to be somehow combined into one value (such as concatenation with a field delimiter), which would be your Range Key. Unfortunately that is kind of ugly, but that's the nature of a NoSQL database like DynamoDB. However, it will allow you to successfully store records with the correct uniqueness. When reading the records back, if you don't want to parse the combined field back out to its individual parts, then you'll have to add additional separate fields for InvoiceNumber and LineItemID.
If you don't have a large number of invoices per company, you can query by only the Hash Key and do the filtering on the client side. If you have a large number of invoices per company and need to be able to query only the items for a single invoice, then I would create a secondary index on CompanyCode and InvoiceNumber.
As I'm sure you have figured out you cannot have more than two attributes form your primary key (hash+range). Thus, depending on the type of queries you will be performing and the size of your data you can structure your table in different ways.
(Optimized for the query type you mentioned above: only CompanyCode & all 3)
Best sol'n for small/medium size data sets:
Hash Key: CompanyCode
Perform the query using only CompanyCode and
then filter your results on the other two attributes
Optimal solution for large data sets:
Hash Key: CompanyCode
Range Key: InvoiceNumber+LineItemId
This allows you to query only on an index, but the table structure is pretty ugly