AWS DynamoBD NoSQL: many-to-many with composite keys - amazon-web-services

I'm trying to start with NoSQL database and so started a simple dictionary project to train.
I am working on Amazon Web Services DynamoDB.
My dictionary needs to store words with their language, and their translations.
So in SQL I would have two tables, one for the words, one for the mapping of translations.
1. Many to many
According to Amazon's video (here), to do a N to M relationship, we just need to create a table with a composite primary key :
Partition key : the word
Sort key : its translation
And a secondary index which PK and SK the table are swapped :
Parition key : the sort key of the table (the translation)
Sort key : partition key of the table (the word)
It makes sense.
2. Composite primary key
My words have a language, and it need to take part in the primary key, otherwise I will have collisions when the user enters a word that exists in two languages. So my word table primary key looks like this :
Parition key : language
Sort key : word
3. And... The problem
Now, I want to apply the N-to-M mapping strategy (1) with my table (2) ; and here is my problem, my table has a composite key. So I need to be able to "merge" my pair lang/word and I don't have a good feeling about that:
Use a concatenation of language and word is a solution, but I don't think it's Ok for the partition key (sort key yes according to the video)
Abandon the translation table and put all the translations in an array as a third field of the word table. This imply that I duplicate everything and that my queries will be OK in only one direction.
Create one table per combination of languages, which doesn't sounds very beautiful too.
So now I think I obviously missed something with NoSQL, or that my scheme is wrong somewhere. Just need a fresh eye to spot my mistakes :)

I would design my key to concatenate the language and the word and then follow your approach to create a Global Secondary Index on the translated word. For example:
"en:vie" to represent the word "vie" in English and "fr:vie" to represent the word "vie" in French.
Why do you say that this is not an OK approach?

Related

how to design schema in dynamodb for a reading comprehension quiz application where data would be heavy?

Pls check the uml diagram
What I want to know is if there is 30quest and their options in section 1 ,20question in section 2,30question in section 3, how should i keep in the table as RC passages would have 300-400 words, plus the questions,options it would be around 7-800 words per question.
So each question should have one row in the table or , testwise i should have different columns of section and all question, option should be saved in json format in one column(item for dynamodb)?
I would follow these rules for DynamoDB table design:
Definitely keep everything in one table. It's rare for one application to need multiple tables. It is OK to have different items (rows) in DynamoDB represent different kinds of objects.
Start by identifying your access patterns, that is, what are the questions you need to ask of your data? This will determine your choice of partition key, sort key, and indexes.
Try to pick a partition key that will result in object accesses being spread somewhat evenly over your different partitions.
If you will have lots of different tests, with accesses spread somewhat evenly over the tests, then TestID could be a good partition key. You will probably want to pull up all the tests for a given instructor, so you could have a column InstructorID with a global secondary index pointing back to the primary key attributes.
Your sort key could be heterogenous--it could be different depending on whether the item is a question or a student's answer. For questions, the sort key could be QuestionID with the content of the question stored as other attributes. For question options it could be QuestionID#OptionID, with something like an OptionDescription attribute for the content of the option. Keep in mind that it's OK to have sparse attributes--not every item needs something populated for every attribute, and it's OK to have attributes that are meaningless for many items. For answers, your sort key could be QuestionID#OptionID#StudentID, with the content of the student's answer stored as a StudentAnswer attribute.
Here is a guide on DynamoDB best practices. For something more digestible, search in YouTube for "aws reinvent dynamo rick houlihan." Rick Houlihan has some good talks about data modeling in DynamoDB. Here are a couple, and one more on data modeling:
https://www.youtube.com/watch?v=6yqfmXiZTlM&list=PL_EDAAla3DXWy4GW_gnmaIs0PFvEklEB7
https://www.youtube.com/watch?v=HaEPXoXVf2k
https://www.youtube.com/watch?v=DIQVJqiSUkE
The better approach is to store each question and its option as a row in DynamoDB Table . Definitely will not suggest , the second approach of storing the question and answer as a JSON is definitely not advisable as the maximum size of a DynamoDb Item is 400 Kb. In such scenarios , using a document database is much more helpful.
Also try to come up with the type of queries that you will be running . Some of the typical ones are
Get all questions in a section by SectionID
Get the details of a Question by Question Id
Get all questions
If you can provide some more information , I could guide you in data modelling
Also I did not see the UML diagram
The following is my suggestion.Create the DynamoDB table
Store each sectionId , question and its option as a row in DynamoDB Table
Partition Key :- SectionID , Sort Key :- QuestionId
Create a GSI on the table with Partition Key :- QuestionId, Sort Key :- OptionId

What should be DynamoDB key schema for time sliding window data get scenario?

What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.

Dynamodb global/local secondary index, searching via category and rating

I have a table of songs in Dynamodb that looks like this:
I wish to return to my app a list of songs by two conditions "Category" and "UserRating"
At present my hash key is "Artist" and rangekey is "Songtitle".
I think that if I made a secondary key "Category" I could search for all the songs in a particular category and similarly I could do this for rating but I don't know how to do this for both?
I also believe I understand the understand the difference between the global and local index.
So what I am thinking (which is probably not correct) is that I need to create a global secondary index on "Category" and do a query on the attribute "UserRating".
Will this work? And even if this works is this the correct way to be doing it?
Thanks
With query you can only search for the Hash (now the partition key) and optionally the range (now the sort key). This has to drive your table and index design.
In your case if wish to query Category on its own then you'd create a new GSI with Category as the partition key. If you want to search within a Category for songs with a rating of something, then you'd create that index with a partition key of Category and a sort key of Rating.
If you need to query by rating alone, then you'd have to create a GSI with rating as the partition key. Bear in mind however you can't do anything like "greater than" or "between" on the partition key: you can only do this on the sort key.
One other factor to consider is expected performance. Amazon advise that partition keys have high cardinality. It is called the partition key because it is the means by which the data is physically organised into partitions. If you have an index with x number of rows across only a few categories, then your data will not be well distributed, which causes a potential performance bottleneck. For non-serious projects this won't be noticeable however.
Hope this helps somewhat.

How to perform a range query over AWS dynamoDB

I have a AWS DynamoDB table storing books information, the hash key is book id. There is an attribute for book price.
Now I want to perform a query to return all the books whose price is lower than a certain value. How to do this efficiently, without scanning the whole table?
The query on secondary-index seems only could return a set of entries with the index being a certain value, so I am confused about how to perform a range query efficiently. Thank you very much!
There are two things that maybe you are confusing. The range key with a range on an attribute.
To clarify, in this case you would need a secondary index and when querying the index you would specify a key condition (assuming java and assuming secondary index on value - this in pretty much any sdk supported language)
see http://docs.amazonaws.cn/en_us/AWSJavaSDK/latest/javadoc/index.html?com/amazonaws/services/dynamodbv2/model/QueryRequest.html w/ a BETWEEN condition.
You can't do query of that kind. DynamoDB is sharded across many nodes by hash key, so doing a query without hash key (on all hash keys) is essentially a full scan.
A hack for your case would be to have a hash key with only one value for the whole table, but this is fundamentally wrong because you loose all the pros of using DynamoDB. See hot hash key issue for more info: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html

SSMS Replace Unique Text in Constraint

Good day to everyone!
My question is simple to ask, but I haven't been able to do it.
I've generated a script for a database and its contents, now I would like to compare what I generated in one db against another.
Using winmerge, I've had difficulty since there are items like:
CONSTRAINT [PK__onepk__321403CF014925CB] PRIMARY KEY CLUSTERED
Where one script has 321403CF014925CB and another has 321403CF07820F21.
How can i replace all these texts so that it just becomes
CONSTRAINT [PK__onepk__] PRIMARY KEY CLUSTERED
of course, there are about a hundred primary keys under this condition.
Can anyone help?
Try this regular expression:
CONSTRAINT \[PK__onepk__([\w^\]]+)\] PRIMARY KEY CLUSTERED
Given this input string:
CONSTRAINT [PK__onepk__321403CF014925CB] PRIMARY KEY CLUSTERED
This part of the expression
([\w^\]]+)
Will match
321403CF014925CB