DynamoDB LSI or GSI to query on composite attribute - amazon-web-services

I'm trying to use DynamoDB for my JAVA project and I have this (from my point of view) strange scenario that I have to cover. Let me to explain how I organize my table:
Suppose that I have to store these info related to Books:
book_id (UUID) used as PK autogerated
author_id (UUID)
type (String)
book_code (UUID) this different as concept from book_id
publishing_house_id (String)
book_gender (String)
And additional dynamic attributes that are not queryable and I'm thinking to store as Document (JSON)
Now, the queries that I need are:
Insert/Get/Update/Delete book by book_id
Get all book by author_id
Get all book by author_id and type
Get book by book_code, publishing_house_id, book_gender (I would like to highlight that this tuple will be unique)
Using the book_id as PK I'll be able to cover the first query set (CRUD using the book id)
For the query #2 and #3 the idea is to create a GS index where the author_id is the PK and type is the SK.
In order to cover the query #4 I'm thinking to:
Create an dedicated Attribute book_sk where I'll store:
book_gender#publishing_house_id#book_code
Create a Local Secondary Index using this book_sk as SK
Probably I can move book_code, publishing_house_id, book_gender into a Document field instead to have these unquerable attributes here.
I'm not very sure about this design.
What do you think?
In that case, is it better to use a LSI or GSI for the query #4?

For #4, if you're always getting a book by those three together, then make an attribute with that concatenated value and use it as the PK of a GSI, making it easy to directly look up.

Related

Dynamodb schema design (map relational data to nosql)

Trying to get my head around this example from AWS to map a relational model to nosql
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
A key concept highlighted there is:
Important
.... Most well-designed applications require only one
table. ...
Given that, the example table is as follows
It explains,
You define the following entities, which support the relational order
entry schema:
HR-Employee - PK: EmployeeID, SK: Employee Name
HR-Region - PK: RegionID, SK: Region Name
...
However, the entity HR-Employee - PK: EmployeeID, SK: Employee Name in the example table has SK values which are not Employee Names.
Also, it suggests the following query
but GSI-1 doesn't have PK of Employee Name.
I understand this could be a discrepancy in AWS documentation and I should raise it with them (which I have and they are notoriously bad in following up) but what I'm not sure is if the documentation is correct and my understanding is wrong (I'm inclined to believe the latter as AWS documentation is generally accurate).
Can someone guide me in the right direction in terms of a nosql schema mapping ? A correct example (with sample records of the dynamo table) for the schema in the above link would be much valued.
So I'll try and make this clearer for you, let me know if something still doesn't make sense.
To start, you mention the fact that:
However, the entity HR-Employee - PK: EmployeeID, SK: Employee Name in the example table has SK values which are not Employee Names.
The reason there are SK values that are not "Employee Names" is because the SK isn't only for "Employee Names", it is also used by other queries (such as Region Name, Country Name, etc.). Think of the SK as exactly what it stands for, a sort key. The documentation seems to miss the explanation of the extra SK they have, so let me summarize what you're looking at.
You have HR-Employee1, with Employee Name = Employee1, QuotaID (guessing what this key is) = QUOTA-2017-Q1, Some Other Key = HR-CONFIDENTIAL
These key names are not actually defined in the table, they all go under the sort key, and are only implicitly "employee name" or "quota id" or "region name".
What this allows you to do is query the employees data, using employeeID as PK and employee name as SK, but it also lets you query the employee Quota data (or whatever it is) by using employeeID as PK and quotaID as SK.
The same applies to your second question, concerning GSI-1. In essence, the way they have designed the table in this scenario is you have an SK "SortKey" where you can have various types of values to sort on, if that makes sense.

DynamoDB query using DynamoDBMapper

Say if I had a DynamoDB table:
UserId (hash): S
BookName (range): S
BorrowedTime: S
ReturnedTime: S
UserId is the primary key (hash), and I needed to set BookName as sort key (range) because another item being added to the database was overwriting the previous with the same UserId.
How would I go about creating a query using DynamoDBMapper, but the fields being queried are the time fields (which are non-keys)? For instance, say if I wanted to return the UserId and BookName of any book borrowed over 2 weeks ago that hasn't been returned yet?
Do I need to setup a GSI on both BorrowedTime and ReturnedTime fields?
Yes you can make a GSI using BorrowedTime and ReturnedTime or you can use scan instead of a query , if you use scan you dont need to make a gsi but scan operations scan the whole database so it is not really recommended on large db or frequent use.

Which is better? city.state.id or city.state_id

I have to table with relation.
State
id
name
City
id
name
state
Which is better in performance?
city.state.id or city.state_id
city.state_id is better anyway. city.state will do another fetch from database.You can avoid this using select_related.If you need only id of foriegn key, no need of select_related here.Just do city.state_id(since foriegn key id will fetch in the query which gives city object).
city.state_id is better than city.state.id. Because It makes only a query instead of two.
BTW, You can use Django Debug Toolbar for debugging queries.
the <field>_id field you see is the database column name
docs
Behind the scenes, Django appends "_id" to the field name to create its database column name. In the above example, the database table for the Car model will have a manufacturer_id column
So this means it doesn't need to make a separate query to retrieve the foreign key instance (See Select a single field from a foreign key for more details).
But this assumes you haven't used select_related or prefetch_related

Creating 'taggable' relationships

Say I have two tables that can have 'tags' associated with them. With potentially more in the future.
tracks -
id, title, artist, etc...
artists -
id, name, description, etc...
I want to be able to have a general table called 'tags'
tags -
id, title, description
How would I construct the joining table to create the relationship? Is it possible to have it such that foreign keys are applicable to both artists and tracks table?
I was thinking of a structure similar to:
tag_relations -
tag_id (foreign key to tags.id), item_id (either artists.id or tracks.id)
Is this a bad design not having any foreign key integrity on the item_id?
Laravel supports polymorphic relationships which I believe suit the purpose that I require. You can read up about them here.
http://four.laravel.com/docs/eloquent#polymorphic-relations

Model linking to other models based on field values

I'm working on simple ratings and comments apps to add to my project and am looking for advice on creating the models.
Normally, I'd create these database schemas like this:
comment_
id - primary key
type - varchar (buyer_item, buyer_vendor, vendor_buyer)
source_id - int (primary key of the table based on the type)
target_id - int (primary key of the table based on the type))
timestamp - timestamp
subject - varchar
comment - text
rating_
id - primary key
type - varchar (buyer_item, buyer_vendor, vendor_buyer)
source_id - int (primary key of the table based on the type)
target_id - int (primary key of the table based on the type)
timestamp - timestamp
rating - int (the score given, ie: 1-5 stars)
This would let me have simple methods that would allow me to apply comments or ratings to any type of thing by setting the proper type and setting the id's of who submitted it (source_id) what it applies to (target_id), like:
add_comment('user_product', user.pk, product.pk, now, subject, comment)
add_comment('user_vendor', user.pk, vendor.pk, now, subject, comment)
I know in the models you define the relationships to other tables as part of the model. How would I define the relationship in these types of tables where the TYPE field determines what table the SOURCE_ID and TARGET_ID link to.
Or should I omit the relationships from the model and set the joins up when I get the QuerySets?
Or just trash the who common table idea and make a whole bunch of different tables to be used for each relationship (eg: user_ratings, product_ratings, transaction_ratings, etc)?
What's the best practice here? My DBA senses say use common tables, but Django newbie me isn't sure what the natives do.
Thanks!
I think what you are looking for is a Generic Relation, and you can find this type of thing in the contenttypes framework: https://docs.djangoproject.com/en/1.0/ref/contrib/contenttypes/#generic-relations