Handling more many to many relationships in dynamodb single table - amazon-web-services

I am designing a single dynamodb table for an institute. It has entities like institute, students, subjects and teachers. The relationships between them are like this.
institute - students many to many
institute - teachers many to many
institute - subjects many to many
I selected institute id as PK and location as SK. But there are more many to many relationships in this scenario. So how to handle this kind of situation in was dynamodb?
Thanks in advance.

DynamoDB is a noSQL database where you should denormalize and duplicate your data as much as possible. You are thinking in SQL and are trying to normalize your data, you need to switch your mindset.
Create an item for each institute, student, teacher and subject.
Create a students map or list (I prefer a map with an ID as the key) inside institute, and a institutes map or list inside students. Those maps or lists are made of copies of your original items:
inst-0 | USA | Institute name | { stud-0: { name: John }, studo-1: { name: Matt } }
-----------------------------------------------------------------------------------------------------------------
stud-0 | John | { inst-0: { name: Institute name } } |
-----------------------------------------------------------------------------------------------------------------
stud-1 | Matt | { inst-0: { name: Institute name } } |
The downside here is that you need to update each copy when the original changes. But it is generally not a problem unless your data changes very frequently.
If your copied data changes frequently, you can create an intermediate relationnal item, but then maybe a relationnal database is more appropriate for your project.
You could also create an item for each student and just store a reference to those student into institute items. But here again you are not denormalizing your data, but it could be a viable solution if your data changes frequently or if you have a big amount of student in each institute.

Related

How can I model this data in DynamoDB for a Library App

I have two entities, Books and Authors with a strict one-to-many relationship (many-to-many relationship not required for my use case)
The access patterns I want to satisfy are:
Get Author Info by Author Name
Get Book Info By just ISBN
Get all Books records by an Author using Author Name.
Do I need any GSI given the constraint that I can make only a single request to DB when adding a Book or an Author, and fulfill above three access patterns also with a single request?
If my Author Entity uses this key schema:
Partition Key: AUTHOR#XYZ
Sort Key: AUTHOR#XYZ
and for Book Entity I use
Partition Key: BOOK#123
Sort Key BOOK#123
I can get author info by name and book info by ISBN easily. How do I get the 3rd access pattern, entire book data by author name?
Two approaches I thought of:
Have a third entity in the table with PK AUTHOR#XYZ, SK BOOK#123, and use BEGINS_WITH(SK, 'BOOK') but in this approach, when adding a book to DB, I will have to write two items, PK BOOK#, SK BOOK# for getting book by just ISBN and PK AUTHOR#, SK BOOK# for getting all books by author, and the book info will be duplicated in both items.
Add an attribute GSIAuthorName to Book entity when adding a book, and create a GSI with PK GSIAuthorName (AUTHOR#XYZ) and SK being PK of Book entity (BOOK#123). But in this the issue is, in projections I will have to select ALL, since I want all book info attributes by author name, and need to fetch in single query to the GSI, so entire Book Entity will be duplicated in this GSI.
Is there an easier way to model this data?
Since you're trying to have two different access patterns for a single entity that require a different partition key value, there is basically only the two options you have identified correctly.
Your design seems to only work for books that have a single author. In the real world that's not sufficient. There are plenty of books with multiple authors such as "The Dictator's handbook" by Bruce Bueno de Mesquita and Alastair Smith - your data model might want to account for that. Author <-> Book isn't One-to-Many, it's Many-to-Many.
I'd go for something like this which uses a Global Secondary Index. It's very close to your second suggestion.
PK
SK
GSI1PK
GSI1SK
type
attributes
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
author
name, birthdate, ...
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
author
name, birthdate, ...
BOOK#978-1610391849
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
BOOK#978-1610391849
book
title, publisher, author,...
BOOK#978-1610391849
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
BOOK#978-1610391849
book
title, publisher, author,...
Does this introduce data duplication? - Yes
Does this introduce complexity on writes? - Yes
Does it work in the real world? - Yes
The model I've chose allows you to fulfill the requirements:
Get Author Info by Author Name: GetItem on the primary index with PK=AUTHOR#... and SK=Author#...
Get Book Info by just ISBN: Query on primary index with PK=BOOK#... and limit 1
Get all books for an Author: Query on GSI1 with PK=AUTHOR#
When you write a book, you need to add a book record for each author and potentially the author entries. For updates on a books info (which should be very rare) you first do the query as in 2) without the limit and then update each item that comes back.
Update
To address the requests for clarification in the comments:
If you require a strict One-to-Many relationship, I'd pick the second approach
Frequent writes are typically not a problem in your one-to-many case as long as you don't exceed the write throughput of a single partition, which is unlikely given the data. I don't see why you'd need frequent writes though.
The extra complexity is typically only a one-time penalty when you create your data access layer. The code for update_book_by_isbn will have to include the steps I outlined above and the create_book might store multiple records.

How to design a DynamoDB table schema

I am doing my best to understand DynamoDB data modeling but I am struggling. I am looking for some help to build off what I have now. I feel like I have fairly simple data but it's not coming to me on what I should do to fit into DynamoDB.
I have two different types of data. I have a game object and a team stats object. A Game represents all of the data about the game that week and team stats represents all of the stats about a given team per week.
A timeId is in the format of year-week (ex. 2020-9)
My Access patterns are
1) Retrieve all games per timeId
2) Retrieve all games per timeId and by TeamName
3) Retrieve all games per timeId and if value = true
4) Retrieve all teamStats per timeId
5) Retrieve all teamStats by timeId and TeamName
My attempt at modeling so far is:
PK: TeamName
SK: TimeId
This is leading me to have 2 copies of games since there is a copy for each team. It is also only allowing me to scan for all teamStats by TimeId. Would something like a GSI help here? Ive thought maybe changing the PK to something like
PK: GA-${gameId} / TS-${teamId}
SK: TimeId
Im just very confused and the docs aren't helping me much.
Looking at your access patterns, this is a possible table design. I'm not sure if it's going to really work with your TimeId, especially for the Local Secondary Index (see note below), but I hope it's a good starting point for you.
# Table
-----------------------------------------------------------
pk | sk | value | other attributes
-----------------------------------------------------------
TimeId | GAME#TEAM{teamname} | true | ...
TimeId | STATS#TEAM{teamname} | | ...
GameId | GAME | | general game data (*)
TeamName | TEAM | | general team data (*)
# Local Secondary Index
-------------------------------------------------------------------------------
pk from Table as pk | value from Table as sk | sk from Table + other attributes
-------------------------------------------------------------------------------
TimeId | true | GAME#Team{teamname} | ...
With this Table and Local Secondary Index you can satisfy all access patterns with the following queries:
Retrieve all games by timeId:
Query Table with pk: {timeId}
Retrieve all games per timeId and by TeamName
Query table with pk: {timeId}, sk: GAME#TEAM{teamname}
Retrieve all games per timeId and if value = true
Query LSI with pk: {timeId}, sk: true
Retrieve all teamStats per timeId
Query table with pk: {timeId}, sk: begins with 'STATS'
Retrieve all teamStats by timeId and TeamName
Query table with pk: {timeId}, sk: STATS#TEAM{teamname}
*: I've also added the following two items, as I assume that there are cases where you want to retrieve general information about a specific game or team as well. This is just an assumption based on my experience and might be unnecessary in your case:
Retrieve general game information
Query table with pk: {GameId}
Retrieve general team information
Query table with pk: {TeamName}
Note: I don't know what value = true stands for, but for the secondary index to work in my model, you need to make sure that each combination of pk = TimeId and value = true is unique.
To learn more about single-table design on DynamoDB, please read Alex DeBrie's excellent article The What, Why, and When of Single-Table Design with DynamoDB.

One to Many relationship real life example

I am trying to design the schema. I am confused about should I use one-to-many or many-to-one relationships.
My use case is somewhat like customers ordering the food.
There are 2 customers and 5 food items
Customers: [John, Alice]
Food: [Rice, Noodle, Chicken, Beacon, Ice-cream]
Use case: One Customer can order many items, but if first customer orders that item, it can not be ordered by other.
Example:
John orders -> Rice, Noodle, Chicken
Alice orders -> Beacon, Ice-cream
**This is valid, both customers ordered unique food.**
Example:
John orders -> Rice, Noodle, Chicken
Alice orders -> Beacon, Ice-cream, Chicken
**This is invalid, because Chicken is being ordered twice. John Already ordered chicken so Alice can not order it.**
Note: I am trying to this in mongodb documents and trying to establish relationship using Django models.
One way to handle this would be to create a junction table CustomerFood which looks something like this:
CREATE TABLE CustomerFood (
Customer varchar(255) NOT NULL,
Food varchar(255) NOT NULL,
PRIMARY KEY(Customer, Food)
);
The above table definition alone would only ensure that each customer can be related to each food at most once. To enforce the additional restriction that a given food can be associated with only one customer, we can add a unique constraint on the Food column:
ALTER TABLE CustomerFood ADD CONSTRAINT food_unique UNIQUE (Food);
Using Django templates:
You could use many to many in django (less code bit more complex to understand) OR create "table in the
middle approch" (more manual approach that needs more model code).
Django many to many documentation
Secondly you should use Validators to
ensure your logic that one person can only order one dish, and the
dishes will sell out, this is more programming logic and can be part
of a validator. Django validators documentation

Qt C++ - Displaying data in one view from multiple SQLite tables

Qt version: 5.8
Let's say I have the following SQL tables
-- People
person_id | first_name | last_name | age
-- Cars, person_id is a foreign key to show that this person owns this car
car_id | car_year | car_make | car_model | person_id
Let's say I want to populate the following Table View or Table Widget with a mixture of that data like so
// Table that the user sees. Notice that not all the information from the tables is shown.
first_name | last_name | car_year | car_make | car_model
What is the best/recommended way to do this? I can see the following two ways, but I feel neither are the best way to do this
Use a Table Widget, which is an item-based table view with a default model. To do this, I'm guessing I would need to make QSqlQuerys to get the data from my QSqlDatabase and just populate the Table Widget that way.
Use a Table View, which would require me to create my own QSqlTableModel for the data model of the view. According to the documentation for QSqlTableModel, it is a high-level interface for reading and writing database records from a single table. This means I would need two QSqlTableModels, one for each of my tables above. However, the Table View can only use one model, and it will show all the data from that model. I think the only way this would work is to combine the tables into one table with only the information I want the user to see. I feel like that would be very ugly but possible. In that case, should I have three tables total - the two above plus the combined one for the users to see?
I feel like #1 is the better of those two, but I'm wondering if there's still a better way than both of those.
If person_id is primary key of table people you can use QtSql.QsqlRelationalTableModel to show data from several tables in an QtWidgets.QTableView, here your example:
QSqlRelationalTableModel rm = new QSqlRelationalTableModel(parentObject, database);
rm→setTable(„cars“);
rm→setRelation(4, QSqlRelation(„people“, „person_id“, „first_name, last_name“);
rm→select();
QTableView tv = new QTableView();
tv→setModel(rm);
tv→hideColumn(0); # hide column car_id
hh = tv->horizontalHeader();
hh→moveSection(4, 0); # change order of columns
hh→moveSection(5, 1);

Cassandra, schema and process design for concurrent writes

This is a long-winded question. It is about Cassandra schema design. I'm here to get inputs from your respected experts on a use-case I'm working on. All inputs, suggestions, and critics are welcome. Here goes my question.
We would like to collect REVIEWS from our USERS about some PAPERS we are about to publish. For each paper we seek for 3 reviews. But We send out review invites to 3*2= 6 users. All 6 users can submit their reviews to our system, but only the first 3 count; and these first 3 reviewers will get reward their work.
In our Cassandra DB, there are three tables: USER, PAPER and REVIEW. The USER and PAPER tables are simple: each user corresponds to a row in the USER table with an unique USER_ID; similarly, each paper has a unique PAPER_ID in the PAPER table.
The REVIEW table looks like this
CREATE TABLE REVIEW(
PAPER_ID uuid,
USER_ID uuid,
REVIEW_CONTENT text,
PRIMARY KEY(PAPER_ID, USER_ID)
);
We use PAPER_ID as the partition key of the REVIEW table so that all reviews of a given paper is stored in a single Cassandra row. For each paper we have, we pick up 6 users, insert 6 entries into the REVIEW table and send out 6 invites to those users. So, for paper "P1", there are 6 entries in the REVIEW table that look like this
----------------------------------------------------
PAPER_ID | USER_ID | REVIEW_CONTENT |
----------------------------------------------------
P1 | U1 | null |
----------------------------------------------------
P1 | U2 | null |
----------------------------------------------------
P1 | U3 | null |
----------------------------------------------------
P1 | U4 | null |
----------------------------------------------------
P1 | U5 | null |
----------------------------------------------------
P1 | U6 | This paper ... |
---------------------------------------------------
... | ... | ... |
Users submit review via a web browser using http. At the backend, we use the following process to handle submitted reviews (use paper "P1" as an example):
Use partition key "P1" to get all 6 entries out from the REVIEW table.
Find out how many of these 6 entries have non-null values at the REVIEW_CONTENT column (non-null values indicate that the corresponding user has already submitted his review. For example, in the above table, user "U6" has submitted his review, while other 5 have not yet).
If this number >=3, we already had enough reviews, return to the current reviewer with a message like "Thanks, we already had enough reviews."
If this number < 2, save the current review to the corresponding entry in the REVIEW table, return to the reviewer with a message like "Your review has been accepted." (E.g. If the current reviewer is "U1", then fill the REVIEW_CONTENT column of "P1, U1" entry with the current review content.)
If this number =2, this is the most complicated the case as the current submission is the last one we'll accept. In this case, we first save the current review to the REVIEW table, then we find the ids of all three users that have submitted reviews (including the current user), record their ids into a transaction table to pay them rewards later.
But this process does not work. The problem is that it does not handle concurrent submissions correctly. Consider the following case: two users have already submitted their reviews, and meanwhile 3 other users are submitting their reviews via three concurrent process shown above. At step 5, each of the three will think he is the 3rd and last submitter and insert new records into the transaction table. This leads to a double counting: a single user may be rewarded more than once for the same review he submitted.
Another problem of this process is that it may never reach to step 5. Let's say there is no submission in the REVIEW table, and 4 users submit their reviews at the same time. All of them saved their reviews at step 4. After this, later submitter will always be rejected as there are 4 accepted reviews already. But since we never reach step 5, no ids will be recorded into the transaction table and users will never get any rewards.
So here comes my question: How should I handle my use case using Cassandra as the back-end DB? Will Cassandra COUNTER help? If so, how? I have not thought through how to use COUNTER yet, but this blog (http://aphyr.com/posts/294-call-me-maybe-cassandra) warned that Cassandra COUNTER is not safe (quote "Consequently, Cassandra counters will over- or under-count by a wide range during a network partition.") Will Cassandra's Compare and Set (CAS) feature help? If so, how? Again the save blog warned that "Cassandra lightweight transactions are not even close to correct."
Rather than creating empty entries in your review table, I would consider leaving it empty and only filling it as the reviews are submitted. To handle concurrency, add a timeuuid field as a sorting key:
CREATE TABLE review(
paper_id uuid,
submission_time timeuuid,
user_id uuid,
content text,
PRIMARY KEY (paper_id, submission_time)
);
When a user makes their submission, add the entry to the table. Then AFTER the write is successful, query the table (on only the paper_id) and find out if the user's id is one of the first three. Respond to the user accordingly. Since you're committed to a small set of reviewers, the extra overhead of fetching all the reviews should be minimal (especially since you wouldn't need to include the content column in the query).
If you need to track who's reviewing the papers, add a set of user ids to the paper table and write the six user ids there.