Trying to get my head around this example from AWS to map a relational model to nosql
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
A key concept highlighted there is:
Important
.... Most well-designed applications require only one
table. ...
Given that, the example table is as follows
It explains,
You define the following entities, which support the relational order
entry schema:
HR-Employee - PK: EmployeeID, SK: Employee Name
HR-Region - PK: RegionID, SK: Region Name
...
However, the entity HR-Employee - PK: EmployeeID, SK: Employee Name in the example table has SK values which are not Employee Names.
Also, it suggests the following query
but GSI-1 doesn't have PK of Employee Name.
I understand this could be a discrepancy in AWS documentation and I should raise it with them (which I have and they are notoriously bad in following up) but what I'm not sure is if the documentation is correct and my understanding is wrong (I'm inclined to believe the latter as AWS documentation is generally accurate).
Can someone guide me in the right direction in terms of a nosql schema mapping ? A correct example (with sample records of the dynamo table) for the schema in the above link would be much valued.
So I'll try and make this clearer for you, let me know if something still doesn't make sense.
To start, you mention the fact that:
However, the entity HR-Employee - PK: EmployeeID, SK: Employee Name in the example table has SK values which are not Employee Names.
The reason there are SK values that are not "Employee Names" is because the SK isn't only for "Employee Names", it is also used by other queries (such as Region Name, Country Name, etc.). Think of the SK as exactly what it stands for, a sort key. The documentation seems to miss the explanation of the extra SK they have, so let me summarize what you're looking at.
You have HR-Employee1, with Employee Name = Employee1, QuotaID (guessing what this key is) = QUOTA-2017-Q1, Some Other Key = HR-CONFIDENTIAL
These key names are not actually defined in the table, they all go under the sort key, and are only implicitly "employee name" or "quota id" or "region name".
What this allows you to do is query the employees data, using employeeID as PK and employee name as SK, but it also lets you query the employee Quota data (or whatever it is) by using employeeID as PK and quotaID as SK.
The same applies to your second question, concerning GSI-1. In essence, the way they have designed the table in this scenario is you have an SK "SortKey" where you can have various types of values to sort on, if that makes sense.
Related
I'm trying to use DynamoDB for my JAVA project and I have this (from my point of view) strange scenario that I have to cover. Let me to explain how I organize my table:
Suppose that I have to store these info related to Books:
book_id (UUID) used as PK autogerated
author_id (UUID)
type (String)
book_code (UUID) this different as concept from book_id
publishing_house_id (String)
book_gender (String)
And additional dynamic attributes that are not queryable and I'm thinking to store as Document (JSON)
Now, the queries that I need are:
Insert/Get/Update/Delete book by book_id
Get all book by author_id
Get all book by author_id and type
Get book by book_code, publishing_house_id, book_gender (I would like to highlight that this tuple will be unique)
Using the book_id as PK I'll be able to cover the first query set (CRUD using the book id)
For the query #2 and #3 the idea is to create a GS index where the author_id is the PK and type is the SK.
In order to cover the query #4 I'm thinking to:
Create an dedicated Attribute book_sk where I'll store:
book_gender#publishing_house_id#book_code
Create a Local Secondary Index using this book_sk as SK
Probably I can move book_code, publishing_house_id, book_gender into a Document field instead to have these unquerable attributes here.
I'm not very sure about this design.
What do you think?
In that case, is it better to use a LSI or GSI for the query #4?
For #4, if you're always getting a book by those three together, then make an attribute with that concatenated value and use it as the PK of a GSI, making it easy to directly look up.
I'm new to DynamoDB and trying to figure out how to structure my data/table/index. My schema includes an itemid (unique) and an orderid (multiple items per order), along with some other arbitrary attributes. I want to be able to retrieve a single item by its itemid, but also retrieve a set of items by their OrderId.
My initial instinct was to set the itemid as the primary key and the orderid as the sort key, but that didn't allow me to query by orderid only. However the same problem occurs if I reverse those.
Example data:
ItemId
OrderId
abc-123
1234
def-345
1234
ghi-678
5678
jkl-901
5678
I think I may need a Global Se but not quite understanding where those fit.
If your question is really whether you "are able" to do this, then with ItemId as the partition key, you can still retrieve by OrderId, with the Scan operation, which will let you filter by any attribute.
However Scan will perform full table scans, so the real question is probably whether you can retrieve by OrderId efficiently. In that case, you would indeed need a Global Secondary Index with OrderId and ItemId as the composite attribute key.
This is typically achieved using what's called a "single table design". What this means, is that you store all your data in one table, and store it normalized, i.e. duplicate your data so that it fits your access patterns.
Generally speaking, if you do not know your access patterns beforehand, dynamodb might not be a good fit. For many systems, a good solution is to have the "main" access patterns in dynamo and then offloading some not so performance critical ad-hoc queries by replicating data to something like elasticsearh.
If you have a table with the hash key PK (String) and the sort key SK (String), you can store your data like this. Use transactions to keep the multiple items up to date and consistent etc.
PK
SK
shippingStatus
totalPrice
cartQuantity
order_1234
order_status
PENDING
123123
order_1234
item_abc-123
3
order_1234
item_def-345
1
order_5678
order_status
SHIPPED
54321
order_5678
item_jkl-901
5
item_abc-123
order_1234
item_abc-123
order_9876
item_abc-123
order_5656
This table illustrates the schemaless nature of a dynamo table (except from the PK/SK). With this setup, you can store "metadata" about the order in the order_1234/order_status item. Then, you can query for items with PK order_1234 and SK starts_with "item_" to get all the items for that order. You can do the same to get all the orders for an item - query for PK item_abc-123 and SK starting with "order_" to get all the orders.
I highly recommend this talk by Rick Houlihan to get into single table design and data modelling in dynamo :)
https://www.youtube.com/watch?v=HaEPXoXVf2k
I have two entities, Books and Authors with a strict one-to-many relationship (many-to-many relationship not required for my use case)
The access patterns I want to satisfy are:
Get Author Info by Author Name
Get Book Info By just ISBN
Get all Books records by an Author using Author Name.
Do I need any GSI given the constraint that I can make only a single request to DB when adding a Book or an Author, and fulfill above three access patterns also with a single request?
If my Author Entity uses this key schema:
Partition Key: AUTHOR#XYZ
Sort Key: AUTHOR#XYZ
and for Book Entity I use
Partition Key: BOOK#123
Sort Key BOOK#123
I can get author info by name and book info by ISBN easily. How do I get the 3rd access pattern, entire book data by author name?
Two approaches I thought of:
Have a third entity in the table with PK AUTHOR#XYZ, SK BOOK#123, and use BEGINS_WITH(SK, 'BOOK') but in this approach, when adding a book to DB, I will have to write two items, PK BOOK#, SK BOOK# for getting book by just ISBN and PK AUTHOR#, SK BOOK# for getting all books by author, and the book info will be duplicated in both items.
Add an attribute GSIAuthorName to Book entity when adding a book, and create a GSI with PK GSIAuthorName (AUTHOR#XYZ) and SK being PK of Book entity (BOOK#123). But in this the issue is, in projections I will have to select ALL, since I want all book info attributes by author name, and need to fetch in single query to the GSI, so entire Book Entity will be duplicated in this GSI.
Is there an easier way to model this data?
Since you're trying to have two different access patterns for a single entity that require a different partition key value, there is basically only the two options you have identified correctly.
Your design seems to only work for books that have a single author. In the real world that's not sufficient. There are plenty of books with multiple authors such as "The Dictator's handbook" by Bruce Bueno de Mesquita and Alastair Smith - your data model might want to account for that. Author <-> Book isn't One-to-Many, it's Many-to-Many.
I'd go for something like this which uses a Global Secondary Index. It's very close to your second suggestion.
PK
SK
GSI1PK
GSI1SK
type
attributes
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
author
name, birthdate, ...
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
author
name, birthdate, ...
BOOK#978-1610391849
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
BOOK#978-1610391849
book
title, publisher, author,...
BOOK#978-1610391849
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
BOOK#978-1610391849
book
title, publisher, author,...
Does this introduce data duplication? - Yes
Does this introduce complexity on writes? - Yes
Does it work in the real world? - Yes
The model I've chose allows you to fulfill the requirements:
Get Author Info by Author Name: GetItem on the primary index with PK=AUTHOR#... and SK=Author#...
Get Book Info by just ISBN: Query on primary index with PK=BOOK#... and limit 1
Get all books for an Author: Query on GSI1 with PK=AUTHOR#
When you write a book, you need to add a book record for each author and potentially the author entries. For updates on a books info (which should be very rare) you first do the query as in 2) without the limit and then update each item that comes back.
Update
To address the requests for clarification in the comments:
If you require a strict One-to-Many relationship, I'd pick the second approach
Frequent writes are typically not a problem in your one-to-many case as long as you don't exceed the write throughput of a single partition, which is unlikely given the data. I don't see why you'd need frequent writes though.
The extra complexity is typically only a one-time penalty when you create your data access layer. The code for update_book_by_isbn will have to include the steps I outlined above and the create_book might store multiple records.
Say I have two tables that can have 'tags' associated with them. With potentially more in the future.
tracks -
id, title, artist, etc...
artists -
id, name, description, etc...
I want to be able to have a general table called 'tags'
tags -
id, title, description
How would I construct the joining table to create the relationship? Is it possible to have it such that foreign keys are applicable to both artists and tracks table?
I was thinking of a structure similar to:
tag_relations -
tag_id (foreign key to tags.id), item_id (either artists.id or tracks.id)
Is this a bad design not having any foreign key integrity on the item_id?
Laravel supports polymorphic relationships which I believe suit the purpose that I require. You can read up about them here.
http://four.laravel.com/docs/eloquent#polymorphic-relations
I'm working on simple ratings and comments apps to add to my project and am looking for advice on creating the models.
Normally, I'd create these database schemas like this:
comment_
id - primary key
type - varchar (buyer_item, buyer_vendor, vendor_buyer)
source_id - int (primary key of the table based on the type)
target_id - int (primary key of the table based on the type))
timestamp - timestamp
subject - varchar
comment - text
rating_
id - primary key
type - varchar (buyer_item, buyer_vendor, vendor_buyer)
source_id - int (primary key of the table based on the type)
target_id - int (primary key of the table based on the type)
timestamp - timestamp
rating - int (the score given, ie: 1-5 stars)
This would let me have simple methods that would allow me to apply comments or ratings to any type of thing by setting the proper type and setting the id's of who submitted it (source_id) what it applies to (target_id), like:
add_comment('user_product', user.pk, product.pk, now, subject, comment)
add_comment('user_vendor', user.pk, vendor.pk, now, subject, comment)
I know in the models you define the relationships to other tables as part of the model. How would I define the relationship in these types of tables where the TYPE field determines what table the SOURCE_ID and TARGET_ID link to.
Or should I omit the relationships from the model and set the joins up when I get the QuerySets?
Or just trash the who common table idea and make a whole bunch of different tables to be used for each relationship (eg: user_ratings, product_ratings, transaction_ratings, etc)?
What's the best practice here? My DBA senses say use common tables, but Django newbie me isn't sure what the natives do.
Thanks!
I think what you are looking for is a Generic Relation, and you can find this type of thing in the contenttypes framework: https://docs.djangoproject.com/en/1.0/ref/contrib/contenttypes/#generic-relations