Dynamodb schema design for optimal queries

Dynamodb schema design for optimal queries - amazon-web-services

I am new to nosql and after reading up on dynamodb i have some confusion on how to model my use case. I've got an app that has users and clubs. The users can belong to 0-many clubs. users can also be an owner of a club which grants them enhanced privileges.
i was thinking of having 2 tables to manage the club/user relationship. 1 with a partition key of user_name and a sort key of club_name the other with a partition key of club_name and a sorty_key of user_name. These should allow me to efficiently query for all users in my club and all clubs i'm a member of.
How would i efficiently query for all clubs i'm not a member of and all users who are not in my club?

Maybe you should use a relational database or hive.
With DynamoDB. You can use 3 tables.
Table 1:User table,use user_id as hash_key and have field user_name.
Table 2:Club table,use club_id as hash_key and have field club_name.
Table 3:Relationship table,use user_id and club_id as hash_key and
range_key.
But you can't get result use one query.
I don't recommended do this query use dynamodb.

Related

Data modelling in dynamo db for Ticket Management System

I am working in Dynamo DB for the first time . My assignment is Ticket Management System where it has 3 entities Department , User and Ticket. The relationship between each entity is.
I have identified the following access patterns
Fetch a Department.
Fetch all users in Department
Fetch a given user in Department
Fetch all Tickets belongs to the Department
Fetch all Tickets assigned to the User
for which i defined the following data model . I am thinking of creating GSI with Tickets as PK and User as SK to do 4 & 5
On a higher level I need to perform 2 updates . I can update the User to which the ticket is assigned and I can update the ticket status as inprogress, resolved . And in the table I have Ticket details as JSON object as below.
I need help from from the experienced people whether my understanding and approach is efficient.

I think you're on the right track. I'd design it as a table with two Global Secondary indexes. The base table looks like this:
The first Global Secondary Index like this (GSI1):
The second Global Secondary Index like this (GSI2):
Now for the why:
This design allows you to easily update the following things:
A user's department
A ticket's status if you know the ticket Id
A ticket's user if you know the ticket Id
A ticket's department if you know the ticket Id
You can get a bunch of information from this model:
Fetch a Department.
Query the base table with the department name or list all departments
Fetch all users in Department
Query GSI 1 with the Department Name and filter the sort Key using begins_with = USER#
Fetch a given user in Department
Sound like you know the UserId, so do a GetItem on the base table. If that's not the case, do the query mentioned in "Fetch all users in Department".
Fetch all Tickets belongs to the Department
Query GSI 1 with the department name as the PK and filter the SK using begins_with = Ticket#
Fetch all Tickets assigned to the User
Query GSI 2 with the user id as the PK and filter the SK using begins_with = Ticket#

Single table design of a chat app on DynamoDB, checking if on the right direction

I am all new to NoSQL and specifically DynamoDB single table design. Have been going through a lot of videos and articles on the internet regarding the single-table design and finally I have put together a small design for a chat application which I am planning to build in the future.
The access patterns I have so far thought about are -
Get user details by User Id.
Get list of conversations the user is part of.
Get list of messages the user has created
Get all members of a conversation
Get all messages of a conversation
Also want to access messages of a conversation by a date range, so far I haven't figured out that one.
As per the below design, if I were to pull all messages of a conversation, is that going to pull the actual message in the message attribute which is in the message partition?
Here is the snip of the model I have created with some sample data on. Please let me know if I am in the right direction.

As per the below design, if I were to pull all messages of a conversation, is that going to pull the actual message in the message attribute which is in the message partition?
No, it will only return the IDs of a message as the actual content is in a separate partition.
I'd propose a different model - it consists of a table with a Global Secondary Indexe (GSI1). The layout is like this:
Base Table:
Partition Key: PK
Sort Key: SK
Global Secondary Index GSI1:
Partition Key: GSI1PK
Sort Key: GSI1SK
Base Table
GSI 1
Access Patterns
1.) Get user details by User Id.
GetItem on Base Table with Partition Key = PK = U#<id> and Sort Key SK = USER
2.) Get list of conversations the user is part of.
Query on Base Table with Partition Key = PK = U#<id> and Sort Key SK = starts_with(CONV#)
3.) Get list of messages the user has created
Query on GSI1 with Partition Key GSI1PK = U#<id>
4.) Get all members of a conversation
Query on Base Table with Partition Key = PK = CONV#<id> and Sort Key SK starts_with(U#)
5.) Get all messages of a conversation
Query on Base Table with Partition Key PK = CONV#<id> and Sort Key SK starts_with(MSG#)
6.) Also want to access messages of a conversation by a date range, so far I haven't figured out that one.
DynamoDB does Byte-Order Sorting in a partition - if you format all dates according to ISO 8601 in the UTC timezone, you can make the range query, e.g.:
Query on Base Table with Partition Key PK = CONV#<id> and Sort Key SK between(MSG#2021-09-20, MSG#2021-09-30)

How to structure DynamoDB index to allow retrieval by multiple fields

I'm new to DynamoDB and trying to figure out how to structure my data/table/index. My schema includes an itemid (unique) and an orderid (multiple items per order), along with some other arbitrary attributes. I want to be able to retrieve a single item by its itemid, but also retrieve a set of items by their OrderId.
My initial instinct was to set the itemid as the primary key and the orderid as the sort key, but that didn't allow me to query by orderid only. However the same problem occurs if I reverse those.
Example data:
ItemId
OrderId
abc-123
1234
def-345
1234
ghi-678
5678
jkl-901
5678
I think I may need a Global Se but not quite understanding where those fit.

If your question is really whether you "are able" to do this, then with ItemId as the partition key, you can still retrieve by OrderId, with the Scan operation, which will let you filter by any attribute.
However Scan will perform full table scans, so the real question is probably whether you can retrieve by OrderId efficiently. In that case, you would indeed need a Global Secondary Index with OrderId and ItemId as the composite attribute key.

This is typically achieved using what's called a "single table design". What this means, is that you store all your data in one table, and store it normalized, i.e. duplicate your data so that it fits your access patterns.
Generally speaking, if you do not know your access patterns beforehand, dynamodb might not be a good fit. For many systems, a good solution is to have the "main" access patterns in dynamo and then offloading some not so performance critical ad-hoc queries by replicating data to something like elasticsearh.
If you have a table with the hash key PK (String) and the sort key SK (String), you can store your data like this. Use transactions to keep the multiple items up to date and consistent etc.
PK
SK
shippingStatus
totalPrice
cartQuantity
order_1234
order_status
PENDING
123123
order_1234
item_abc-123
3
order_1234
item_def-345
1
order_5678
order_status
SHIPPED
54321
order_5678
item_jkl-901
5
item_abc-123
order_1234
item_abc-123
order_9876
item_abc-123
order_5656
This table illustrates the schemaless nature of a dynamo table (except from the PK/SK). With this setup, you can store "metadata" about the order in the order_1234/order_status item. Then, you can query for items with PK order_1234 and SK starts_with "item_" to get all the items for that order. You can do the same to get all the orders for an item - query for PK item_abc-123 and SK starting with "order_" to get all the orders.
I highly recommend this talk by Rick Houlihan to get into single table design and data modelling in dynamo :)
https://www.youtube.com/watch?v=HaEPXoXVf2k

DynamoDB query using DynamoDBMapper

Say if I had a DynamoDB table:
UserId (hash): S
BookName (range): S
BorrowedTime: S
ReturnedTime: S
UserId is the primary key (hash), and I needed to set BookName as sort key (range) because another item being added to the database was overwriting the previous with the same UserId.
How would I go about creating a query using DynamoDBMapper, but the fields being queried are the time fields (which are non-keys)? For instance, say if I wanted to return the UserId and BookName of any book borrowed over 2 weeks ago that hasn't been returned yet?
Do I need to setup a GSI on both BorrowedTime and ReturnedTime fields?

Yes you can make a GSI using BorrowedTime and ReturnedTime or you can use scan instead of a query , if you use scan you dont need to make a gsi but scan operations scan the whole database so it is not really recommended on large db or frequent use.

DynamoDB : Global Secondary Index utilisation in queries

I am coming from RDMS background and I started using DynamoDB recently.
I have following DyamoDB table with three Global Secondary Indexes (GSI)
Id (primary key), user_id(GSI), event_type (GSI), product_id (GSI)
, rate, create_date
I have following three query patterns:
a) WHERE event_type=?
b) WHERE event_type=? AND product_id=?
c) WHERE product_id=?
d) WHERE product_id=? AND user_id=?
I know in MySQL I need to create following indexes to optimize above queries :
composite index (event_type,product_id) : for queries "a" and "b"
composite index (product_id,user_id) : for queries "c" and "d"
My question is , if I create three GSIs for 'event_type', 'product_id' and 'user_id' fields in DyanomoDB, do the query patterns "b" and "d" utilize these three independent GSIs ?

Firstly, unlike in RDBMS, the Dynamodb doesn't choose the GSI based on the fields used in filter expression (I meant there is no SQL optimizer to choose the appropriate index based on the fields used in SQL).
You will have to query the GSI directly to get the data. You can refer the GSI query page to understand more on this.
You can create two GSIs:-
1) Event type
2) Product id
You make sure to include the other required fields in the GSI especially product id, user id and any other required fields. This way when you query the GSI, you get all the fields required to fulfill the use case. As long as you have one field from GSI, you can include other fields in Filter expression to filter the data. This ensures that you dont create unnecessary GSIs which requires additional space and cost.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js