How can I model this data in DynamoDB for a Library App - amazon-web-services

I have two entities, Books and Authors with a strict one-to-many relationship (many-to-many relationship not required for my use case)
The access patterns I want to satisfy are:
Get Author Info by Author Name
Get Book Info By just ISBN
Get all Books records by an Author using Author Name.
Do I need any GSI given the constraint that I can make only a single request to DB when adding a Book or an Author, and fulfill above three access patterns also with a single request?
If my Author Entity uses this key schema:
Partition Key: AUTHOR#XYZ
Sort Key: AUTHOR#XYZ
and for Book Entity I use
Partition Key: BOOK#123
Sort Key BOOK#123
I can get author info by name and book info by ISBN easily. How do I get the 3rd access pattern, entire book data by author name?
Two approaches I thought of:
Have a third entity in the table with PK AUTHOR#XYZ, SK BOOK#123, and use BEGINS_WITH(SK, 'BOOK') but in this approach, when adding a book to DB, I will have to write two items, PK BOOK#, SK BOOK# for getting book by just ISBN and PK AUTHOR#, SK BOOK# for getting all books by author, and the book info will be duplicated in both items.
Add an attribute GSIAuthorName to Book entity when adding a book, and create a GSI with PK GSIAuthorName (AUTHOR#XYZ) and SK being PK of Book entity (BOOK#123). But in this the issue is, in projections I will have to select ALL, since I want all book info attributes by author name, and need to fetch in single query to the GSI, so entire Book Entity will be duplicated in this GSI.
Is there an easier way to model this data?

Since you're trying to have two different access patterns for a single entity that require a different partition key value, there is basically only the two options you have identified correctly.
Your design seems to only work for books that have a single author. In the real world that's not sufficient. There are plenty of books with multiple authors such as "The Dictator's handbook" by Bruce Bueno de Mesquita and Alastair Smith - your data model might want to account for that. Author <-> Book isn't One-to-Many, it's Many-to-Many.
I'd go for something like this which uses a Global Secondary Index. It's very close to your second suggestion.
PK
SK
GSI1PK
GSI1SK
type
attributes
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
author
name, birthdate, ...
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
author
name, birthdate, ...
BOOK#978-1610391849
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
BOOK#978-1610391849
book
title, publisher, author,...
BOOK#978-1610391849
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
BOOK#978-1610391849
book
title, publisher, author,...
Does this introduce data duplication? - Yes
Does this introduce complexity on writes? - Yes
Does it work in the real world? - Yes
The model I've chose allows you to fulfill the requirements:
Get Author Info by Author Name: GetItem on the primary index with PK=AUTHOR#... and SK=Author#...
Get Book Info by just ISBN: Query on primary index with PK=BOOK#... and limit 1
Get all books for an Author: Query on GSI1 with PK=AUTHOR#
When you write a book, you need to add a book record for each author and potentially the author entries. For updates on a books info (which should be very rare) you first do the query as in 2) without the limit and then update each item that comes back.
Update
To address the requests for clarification in the comments:
If you require a strict One-to-Many relationship, I'd pick the second approach
Frequent writes are typically not a problem in your one-to-many case as long as you don't exceed the write throughput of a single partition, which is unlikely given the data. I don't see why you'd need frequent writes though.
The extra complexity is typically only a one-time penalty when you create your data access layer. The code for update_book_by_isbn will have to include the steps I outlined above and the create_book might store multiple records.

Related

DynamoDB LSI or GSI to query on composite attribute

I'm trying to use DynamoDB for my JAVA project and I have this (from my point of view) strange scenario that I have to cover. Let me to explain how I organize my table:
Suppose that I have to store these info related to Books:
book_id (UUID) used as PK autogerated
author_id (UUID)
type (String)
book_code (UUID) this different as concept from book_id
publishing_house_id (String)
book_gender (String)
And additional dynamic attributes that are not queryable and I'm thinking to store as Document (JSON)
Now, the queries that I need are:
Insert/Get/Update/Delete book by book_id
Get all book by author_id
Get all book by author_id and type
Get book by book_code, publishing_house_id, book_gender (I would like to highlight that this tuple will be unique)
Using the book_id as PK I'll be able to cover the first query set (CRUD using the book id)
For the query #2 and #3 the idea is to create a GS index where the author_id is the PK and type is the SK.
In order to cover the query #4 I'm thinking to:
Create an dedicated Attribute book_sk where I'll store:
book_gender#publishing_house_id#book_code
Create a Local Secondary Index using this book_sk as SK
Probably I can move book_code, publishing_house_id, book_gender into a Document field instead to have these unquerable attributes here.
I'm not very sure about this design.
What do you think?
In that case, is it better to use a LSI or GSI for the query #4?
For #4, if you're always getting a book by those three together, then make an attribute with that concatenated value and use it as the PK of a GSI, making it easy to directly look up.

django query with filtered annotations from related table

Take books and authors models for example with books having one or more authors. Books having cover_type and authors having country as origin.
How can I list all the books with hard cover, and authors only if they're from from france?
Books.objects.filter(cover_type='hard', authors__origin='france')
This query doesnt retrieve books with hard cover but no french author.
I want all the books with hard cover, this is predicate #1.
And if their authors are from France, I want them annotated, otherwise authors field may be empty or 'None'.
e.g.:
`
Bookname, covertype, origin
The Trial, hardcover, none
Madam Bovary, hardcover, France
`
Tried many options, annotate, Q, value, subquery, when, case, exists but could come up with a solution.
With sql this is so easy:
select * from books b left join authors a on a.bookref=b.id and a.origin=france where b.covertype='hard'
(my models are not books and authors, i picked them because they're django-docs' example models. my models are building and buildingtype, where i want building.id=454523 with buildigtype where buildingtype is active, buildingtype might be null for the building or only 1 active and zero or more passive)
You should use Book id in Auther table.then your query will be like this: Author.objects.filter(origin="france",book__cover_type="hard")
I think i solved it with subquery, outerref, exists, case, when, charfield...too many imports for a simple sql.
`
author = Authors.objects.filter(bookref=OuterRef('id'), origin='France').values('origin')
books = Books.objects.filter(cover_type='hard').annotate(author=Case(When(Exists(author), then=Subquery(author)), default='none', output_field=CharField())).distinct().values('name','cover_type','author')
`

One to Many relationship real life example

I am trying to design the schema. I am confused about should I use one-to-many or many-to-one relationships.
My use case is somewhat like customers ordering the food.
There are 2 customers and 5 food items
Customers: [John, Alice]
Food: [Rice, Noodle, Chicken, Beacon, Ice-cream]
Use case: One Customer can order many items, but if first customer orders that item, it can not be ordered by other.
Example:
John orders -> Rice, Noodle, Chicken
Alice orders -> Beacon, Ice-cream
**This is valid, both customers ordered unique food.**
Example:
John orders -> Rice, Noodle, Chicken
Alice orders -> Beacon, Ice-cream, Chicken
**This is invalid, because Chicken is being ordered twice. John Already ordered chicken so Alice can not order it.**
Note: I am trying to this in mongodb documents and trying to establish relationship using Django models.
One way to handle this would be to create a junction table CustomerFood which looks something like this:
CREATE TABLE CustomerFood (
Customer varchar(255) NOT NULL,
Food varchar(255) NOT NULL,
PRIMARY KEY(Customer, Food)
);
The above table definition alone would only ensure that each customer can be related to each food at most once. To enforce the additional restriction that a given food can be associated with only one customer, we can add a unique constraint on the Food column:
ALTER TABLE CustomerFood ADD CONSTRAINT food_unique UNIQUE (Food);
Using Django templates:
You could use many to many in django (less code bit more complex to understand) OR create "table in the
middle approch" (more manual approach that needs more model code).
Django many to many documentation
Secondly you should use Validators to
ensure your logic that one person can only order one dish, and the
dishes will sell out, this is more programming logic and can be part
of a validator. Django validators documentation

Dynamodb schema design (map relational data to nosql)

Trying to get my head around this example from AWS to map a relational model to nosql
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
A key concept highlighted there is:
Important
.... Most well-designed applications require only one
table. ...
Given that, the example table is as follows
It explains,
You define the following entities, which support the relational order
entry schema:
HR-Employee - PK: EmployeeID, SK: Employee Name
HR-Region - PK: RegionID, SK: Region Name
...
However, the entity HR-Employee - PK: EmployeeID, SK: Employee Name in the example table has SK values which are not Employee Names.
Also, it suggests the following query
but GSI-1 doesn't have PK of Employee Name.
I understand this could be a discrepancy in AWS documentation and I should raise it with them (which I have and they are notoriously bad in following up) but what I'm not sure is if the documentation is correct and my understanding is wrong (I'm inclined to believe the latter as AWS documentation is generally accurate).
Can someone guide me in the right direction in terms of a nosql schema mapping ? A correct example (with sample records of the dynamo table) for the schema in the above link would be much valued.
So I'll try and make this clearer for you, let me know if something still doesn't make sense.
To start, you mention the fact that:
However, the entity HR-Employee - PK: EmployeeID, SK: Employee Name in the example table has SK values which are not Employee Names.
The reason there are SK values that are not "Employee Names" is because the SK isn't only for "Employee Names", it is also used by other queries (such as Region Name, Country Name, etc.). Think of the SK as exactly what it stands for, a sort key. The documentation seems to miss the explanation of the extra SK they have, so let me summarize what you're looking at.
You have HR-Employee1, with Employee Name = Employee1, QuotaID (guessing what this key is) = QUOTA-2017-Q1, Some Other Key = HR-CONFIDENTIAL
These key names are not actually defined in the table, they all go under the sort key, and are only implicitly "employee name" or "quota id" or "region name".
What this allows you to do is query the employees data, using employeeID as PK and employee name as SK, but it also lets you query the employee Quota data (or whatever it is) by using employeeID as PK and quotaID as SK.
The same applies to your second question, concerning GSI-1. In essence, the way they have designed the table in this scenario is you have an SK "SortKey" where you can have various types of values to sort on, if that makes sense.

Grouping Custom Attributes in a Query

I have an application that allows for "contacts" to be made completely customized. My method of doing that is letting the administrator setup all of the fields allowed for the contact. My database is as follows:
Contacts
id
active
lastactive
created_on
Fields
id
label
FieldValues
id
fieldid
contactid
response
So the contact table only tells whether they are active and their identifier; the fields tables only holds the label of the field and identifier, and the fieldvalues table is what actually holds the data for contacts (name, address, etc.)
So this setup has worked just fine for me up until now. The client would like to be able to pull a cumulative report, but say state of all the contacts in a certain city. Effectively the data would have to look like the following
California (from fields table)
Costa Mesa - (from fields table) 5 - (counted in fieldvalues table)
Newport 2
Connecticut
Wallingford 2
Clinton 2
Berlin 5
The state field might be id 6 and the city field might be id 4. I don't know if I have just been looking at this code way to long to figure it out or what,
The SQL to create those three tables can be found at https://s3.amazonaws.com/davejlong/Contact.sql
You've got an Entity Attribute Value (EAV) model. Use the field and fieldvalue tables for searching only - the WHERE caluse. Then make life easier by keeping the full entity's data in a CLOB off the main table (e.g. Contacts.data) in a serialized format (WDDX is good for this). Read the data column out, deserialize, and work with on the server side. This is much easier than the myriad of joins you'd need to do otherwise to reproduce the fully hydrated entity from an EAV setup.