Relational DB to Single Dynamo DB Table - amazon-web-services

I've read many documentation from AWS and saw Re-Invent Videos. It talks about that any relational tables can be stored in single Dynamo DB (except few scenarios)
I have got below schema in Relational DB that I want to convert in single Dynamo DB but scratching my head, how it should look like.
My Use cases are:
Get all Attributes by Product / Item number
Get a specific attribute
for a Product / Item number
Get all Item / Product by an Attribute
name and attribute Value (For e.g. Get me all the Items where size is
45)
Get Attribute information by Attribute name (For e.g. Get me
details about the Color attribute)

Your use-cases are a better fit with a relational database rather than a NoSQL database.
A NoSQL database is excellent for storing and retrieving data based on a primary key. For example, "store record #12", or "retrieve record #12". The item that is stored is in JSON format and can contain a lot of information. DynamoDB can also provide predictable performance for such requests, making it ideal for speed-critical applications (eg retrieving user profiles in a popular web application).
However, NoSQL is not ideal if you wish to search for data such as "Get me all the Items where size is 45". You can achieve some of this by adding additional indexes, but it can become complex and is not as flexible as a relational database.
Yes, you can "store" relational tables in a NoSQL database, but you can't access them in the way you desire.
Your examples and your diagram would be better suited for a relational database. I would recommend Amazon RDS for MySQL or PostgreSQL.

Related

Should Dynamodb apply single table design instead of multiple table design when the entities are not relational

Let’s assume there are mainly 3 tables for the current database.
Pkey = partition key
Admin
-id(Pkey), username, email, createdAt,UpdatedAt
Banner
-id(Pkey), isActive, createdAt, caption
News
-id(Pkey), createdAt, isActive, title, message
None of the above tables have relation with other tables, and more tables will be required in the future(I think most of it also don’t have the relation with other tables).
According to the aws document
You should maintain as few tables as possible in a DynamoDB application.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
So I was considering the need to combine these 3 tables into a single table.
Should I start to use a single table from now on, or keep using multiple tables for the database?
If using a single table, how should I design the table schema?
DynamoDB is a NoSQL database, hence you design your schema specifically to make the most common and important queries as fast and as inexpensive as possible. Your data structures are tailored to the specific requirements of your business use cases.
When designing a data model for your DynamoDB Table, you should start from the access patterns of your data that would in turn inform the relation (or lack thereof) among them.
Two interesting resources that would help you get started are From SQL to NoSQL and NoSQL Design for DynamoDB, both part of the AWS Developer Documentation of DynamoDB.
In your specific example, based on the questions you're trying to answer (i.e. use case & access patterns), you could either work with only the Partition Key or more likely, benefit from the usage of composite Sort Keys / Sort Key overloading as described in Best Practices for Using Sort Keys to Organize Data.
Update, add example table design to get you started:

NoSQL encourages designing database based on access patterns. What to do when the patterns change?

NoSQL encourages designing database based on access patterns and it can perform those queries it was designed for very fast. For other queries, the performance is not so good. But for software, change is the norm. So when new requirements come in and we have to add new features, how can nosql databases adapt? Or better yet, how can I design nosql databases(preferably dynamodb) that will allow me to adapt to new feature additions.
The first approach that comes to my mind will be to design a new table and migrate all the previous data to the new table. But considering the table has millions of records, its probably not very cost effective
References:
Rick Houlihan talking about designing dynamodb table based on access patterns
Dynamodb design best practices from aws documentation
DynamoDB is schema-less, so you can add a new attribute at any time without having to do any backfill or migration. Just make sure your application knows what to do if the attribute is not present.
If you need to query that attribute, you can add a new GSI on the attribute. DynamoDB has an initial quota of 20 GSIs per table, but you can request a quota increase if you need more.
If your new use case isn’t satisfied by a GSI, you can create a new table containing your new attribute(s) to use alongside the existing table. If you need a guarantee of consistency between those tables, you can use DynamoDB transactions to keep them in sync.
One way to minimize full table migrations in order to adapt to new changes would be to use generic names for indexes. In the case of dynamodb, we would have pk as partition_key and sk for sort_key as well as all the attributes of the item. The value of pk and sk will actually be a derived value from other attributes. More importantly, we will add 5 LSIs during table creation and use them when necessary. For example, to store data about a book, a row in the table will have the following fields:
pk, sk, ISBN, data_type, author, created_at, ...other data, lsi1, lsi2, lsi3, lsi4, lsi5
The values for the fields:
pk->ISBN, sk->data_type, ISBN->ISBN, ...., lsi1->data_type#created_at , lsi(2-5)->empty
This way, unless there are drastic changes in the requirements the table structure of our table is unlikely to change. One thing to note here is that unless an item that is added, deleted or updated contains an attribute that belongs to an index, no computational or storage cost is incurred in dynamodb.

Data Model in DynamoDB

When using Mobile Hub (AWS), building a DynamoDB table. There is at some point the option to download the Data Model for the table. But we do not see this option (AFAIK) if we do not use Mobile Hub. So the question is: Is there a way to get the Data Model for the table, when not using Mobile Hub?
Just to clarify, DynamoDB doesn't have a full data model like RDBMS. However, it does have the hash key, partition key (if defined) and all the index details.
You can get this information using Describe table API. The API will give the output in JSON format. Kindly look at the link for more information.
Please note that all the non-key attributes are not included in the data model. This is the basic concept in NoSQL database and this is the flexibility of NoSQL database when compared to RDBMS.
The item structure (non-key attributes) need not be defined while
creating the table. In fact, DynamoDB doesn't allow to define the
non-key attributes while creating the table
The non-key attributes in one item need not be same in the another
item

Amazon Dynamo Table Schema

I have been trying to create a Schema for my Android application in Amazon Dynamo DB. I have very less experience with NoSQL Databases.
I have created a Survey based Android Application, now I have certain tables to store in Amazon Dynamo DB.
The Tables are Employees table, Survey table, Question table and the Response table.
The Employees table stores the information for all the employees, the survey table holds the name of the surveys and the employees, who has taken the survey.
My Issue is with the Question table and the Response table. The Questions are dynamic and are based on the employees who is taking the survey.
The answers in the Answer table depends on the number of Questions asked in the survey.
I want to know what should be my #DynamoDBHashKey, #DynamoDBIndexHashKey, #DynamoDBIndexRangeKey in the Question and the Response table, so that I can map question to respones and what should be the #DynamoDBAttribute in both the tables.
The Use Case can be: An Employee of the company posted 12 questions for all the other employees of the company.
Image was taken earlier, later on I added survey table as well
DynamoDB is good at some things and bad at others. Its great at scaling massively and horizontally while maintaining low read/write latency, but introduces eventual consistency, and forces you to make major schema decisions up front, such as what should be in a table, and how should data be partitioned, and what should be the indexes. It demands that you adopt its horizontal scaling model by partitioning your tables into pieces via the partition key.
From the way you have phrased the question it is clear that you are more comfortable in a relational database. DynamoDB is not a great place to start learning about NoSQL schema(less) design - I would have found it quite unforgiving if it was my first NoSQL database, especially trying to model a domain such as the one you describe. Its simply not a great domain modelling environment full stop - its all about horizontal scaling and performance.
If you are more comfortable in a relational database, then use a relational database. If you want to try NoSQL it is necessary to adopt a different mindset when it comes to modelling your domain into persist-able entities. For example, you might include closely related child objects within the schema of a parent record - in your example you might include Questions as children of a Survey and store them in one record, unlike in relational modelling where you would put these in separate tables.

A simple file for saving a class of vectors or a SQL database

I have a database that is made of sorted data from the user activities. If I wanted to keep a record of each users that which record belong to which user (like a class of vectors of numbers for each users), what is the best database type that I can use here? The speed is important and the database is very large (9 Gig ~ 700 million record).The number of users is around 2 million, so I don't think that a relational connection in SQL would be a good suggestion. (Coding are in C++).
I am going to provide an answer now based on our conversation in the comments as I have too much to write in a comment.
First of all, I would use a full RDBMS for this rather than SQLite. The Lite part of the name should serve as an indicator that it isn't trying to be a full strength database. I am just saying this because if SQLite does not perform well enough on your large database, I don't want you to blame it on RDBMS technology, but on the weak database that you are using. Choose PostgreSQL or MySQL as they have better optimizers (you don't have to code it).
Second your database should provide the features to join the tables together. It would look something like:
Select *
From users
Join activity on users.id = activity.user_id
Where users.id = ###
That combined with the appropriate indexes should give you what you need.
As far as indexes, your primary keys should produce the appropriate indexes for this join. You can also create a foreign key definition so that the database knows the relationship between the tables, and can enforce it. Some databases do not support foreign key constraints, but that is not critical.
A relational SQL database can handle this just well.
Use PostGreSQL
You can use ODBC from C, that way you can change the database should the need arise.
If your data is not really relational, you can also use redis.
http://code.google.com/p/credis/
Since its a sorted set of data, you can event go for a NoSQL or Bigtable database. HBase, Hadoop, etc are provided OpenSouce resources for you.