Should Dynamodb apply single table design instead of multiple table design when the entities are not relational - amazon-web-services

Let’s assume there are mainly 3 tables for the current database.
Pkey = partition key
Admin
-id(Pkey), username, email, createdAt,UpdatedAt
Banner
-id(Pkey), isActive, createdAt, caption
News
-id(Pkey), createdAt, isActive, title, message
None of the above tables have relation with other tables, and more tables will be required in the future(I think most of it also don’t have the relation with other tables).
According to the aws document
You should maintain as few tables as possible in a DynamoDB application.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
So I was considering the need to combine these 3 tables into a single table.
Should I start to use a single table from now on, or keep using multiple tables for the database?
If using a single table, how should I design the table schema?

DynamoDB is a NoSQL database, hence you design your schema specifically to make the most common and important queries as fast and as inexpensive as possible. Your data structures are tailored to the specific requirements of your business use cases.
When designing a data model for your DynamoDB Table, you should start from the access patterns of your data that would in turn inform the relation (or lack thereof) among them.
Two interesting resources that would help you get started are From SQL to NoSQL and NoSQL Design for DynamoDB, both part of the AWS Developer Documentation of DynamoDB.
In your specific example, based on the questions you're trying to answer (i.e. use case & access patterns), you could either work with only the Partition Key or more likely, benefit from the usage of composite Sort Keys / Sort Key overloading as described in Best Practices for Using Sort Keys to Organize Data.
Update, add example table design to get you started:

Related

Is NoSQL just a marketing buzz on top of RDBMS from Software Design Perspective?

From architectural perspective: Could you please help me understand why NoSQL DynamoDB is so hype.
DynamoDB supports some of the world’s largest scale applications by
providing consistent, single-digit millisecond response times at any
scale.
I'm trying to critic, in order to understand WHY part of the question.
We always have to specify partition Key and key attribute while retrieving to get millisecond of performance
If I design RDBMS:
where primary key or alternate key (INDEXED) always needs to be specified by in the query
I can use partition key to find out in which database my data is stored.
Never do JOINs
Isn't it same as NoSQL kind of architecture without any marketing buzz around it?
We're shifting to DynamoDB anyways but this is my innocent curiosity, there must be a strong reason which RDMBS can't do. Let's skip backup and maintenance advantages etc.
You are conflating two different things.
The definition of NoSQL
There isn't one, at least not one that can apply in all cases.
In most uses, NoSQL databases don't force your data into the fixed-schema "rows and columns" of a relational database. Although modern relational databases, such as Postgres, support data types such as JSONB that would have E. F. Codd spinning in his grave.
DynamoDB is a document database: it is optimized for retrieving and updating single documents based on a unique key, and it does not restrict the fields that those documents contain (other than requiring the ones used for a key).
Distributed Databases
A distributed database stores data on multiple nodes, and is able to perform parallel queries on those nodes and combine the results.
There are distributed SQL database: Redshift and BigQuery are optimized for queries against large datasets that may include joins, while MySQL (and no doubt others) which can run multiple engines and distribute the queries between them. It is possible for SQL databases to perform joins, including joins that cross nodes, but such joins generally perform poorly.
DynamoDB distributes items on shards based on their partition key. This makes it very fast for retrieving single items, because the query can be directed to a single shard. It is much less performant when scanning for items that reside on multiple shards.
As you note in your question, you can implement a sharded document DB on top of a relational database (either using native JSON columns or storing everything in a CLOB that is parsed for each access). But enough other people have done this (including DynamoDB) that it doesn't make sense (to me, at least) to re-implement.

NoSQL encourages designing database based on access patterns. What to do when the patterns change?

NoSQL encourages designing database based on access patterns and it can perform those queries it was designed for very fast. For other queries, the performance is not so good. But for software, change is the norm. So when new requirements come in and we have to add new features, how can nosql databases adapt? Or better yet, how can I design nosql databases(preferably dynamodb) that will allow me to adapt to new feature additions.
The first approach that comes to my mind will be to design a new table and migrate all the previous data to the new table. But considering the table has millions of records, its probably not very cost effective
References:
Rick Houlihan talking about designing dynamodb table based on access patterns
Dynamodb design best practices from aws documentation
DynamoDB is schema-less, so you can add a new attribute at any time without having to do any backfill or migration. Just make sure your application knows what to do if the attribute is not present.
If you need to query that attribute, you can add a new GSI on the attribute. DynamoDB has an initial quota of 20 GSIs per table, but you can request a quota increase if you need more.
If your new use case isn’t satisfied by a GSI, you can create a new table containing your new attribute(s) to use alongside the existing table. If you need a guarantee of consistency between those tables, you can use DynamoDB transactions to keep them in sync.
One way to minimize full table migrations in order to adapt to new changes would be to use generic names for indexes. In the case of dynamodb, we would have pk as partition_key and sk for sort_key as well as all the attributes of the item. The value of pk and sk will actually be a derived value from other attributes. More importantly, we will add 5 LSIs during table creation and use them when necessary. For example, to store data about a book, a row in the table will have the following fields:
pk, sk, ISBN, data_type, author, created_at, ...other data, lsi1, lsi2, lsi3, lsi4, lsi5
The values for the fields:
pk->ISBN, sk->data_type, ISBN->ISBN, ...., lsi1->data_type#created_at , lsi(2-5)->empty
This way, unless there are drastic changes in the requirements the table structure of our table is unlikely to change. One thing to note here is that unless an item that is added, deleted or updated contains an attribute that belongs to an index, no computational or storage cost is incurred in dynamodb.

Why well-designed DynamoDB application require only one table?

On Amazon DynamoDB help center I've read that
You should maintain as few tables as possible in a DynamoDB
application. Most well designed applications require only one table.
Sorry guys, but what does it mean? Whether should I design a database with just ONE table, or should I work with just one table in my (let it be php) application (but a database may contain several tables)?
Thank you!
I think this One Table concept means if you draw a relational database diagram of your models and associations, then all associated tables that connected should be able to be merged, or designed into one single NoSQL table. If you got two sets of tables in the same relational database that have no association between them, then you can group them into two separate NoSQL tables.

Amazon Dynamo Table Schema

I have been trying to create a Schema for my Android application in Amazon Dynamo DB. I have very less experience with NoSQL Databases.
I have created a Survey based Android Application, now I have certain tables to store in Amazon Dynamo DB.
The Tables are Employees table, Survey table, Question table and the Response table.
The Employees table stores the information for all the employees, the survey table holds the name of the surveys and the employees, who has taken the survey.
My Issue is with the Question table and the Response table. The Questions are dynamic and are based on the employees who is taking the survey.
The answers in the Answer table depends on the number of Questions asked in the survey.
I want to know what should be my #DynamoDBHashKey, #DynamoDBIndexHashKey, #DynamoDBIndexRangeKey in the Question and the Response table, so that I can map question to respones and what should be the #DynamoDBAttribute in both the tables.
The Use Case can be: An Employee of the company posted 12 questions for all the other employees of the company.
Image was taken earlier, later on I added survey table as well
DynamoDB is good at some things and bad at others. Its great at scaling massively and horizontally while maintaining low read/write latency, but introduces eventual consistency, and forces you to make major schema decisions up front, such as what should be in a table, and how should data be partitioned, and what should be the indexes. It demands that you adopt its horizontal scaling model by partitioning your tables into pieces via the partition key.
From the way you have phrased the question it is clear that you are more comfortable in a relational database. DynamoDB is not a great place to start learning about NoSQL schema(less) design - I would have found it quite unforgiving if it was my first NoSQL database, especially trying to model a domain such as the one you describe. Its simply not a great domain modelling environment full stop - its all about horizontal scaling and performance.
If you are more comfortable in a relational database, then use a relational database. If you want to try NoSQL it is necessary to adopt a different mindset when it comes to modelling your domain into persist-able entities. For example, you might include closely related child objects within the schema of a parent record - in your example you might include Questions as children of a Survey and store them in one record, unlike in relational modelling where you would put these in separate tables.

DynamoDb table design: Single table or multiple tables

I’m quite new to NoSQL and DynamoDB and I used to RDBMS. I’m designing database for a game and we're using DynamoDB and AWS Lambda for our backend. I created a table name “Users” for player profile that contains the user information and resources. Because the game has inventory system I also created a table name “UserItems”.
It’s all good until I realized DynamoDB don’t have transaction and any operation that is executed on both table (for example using an item that increase resource) has a chance of failure on one table while success on other and will cause missing data which affect our customers.
So I was thinking maybe my multiple tables design is not good since it’s a habit of me to design multiple table when I’m working with RDBMS. Which let me to think of storing the entire “UserItems” as hash in “Users” but I’m not sure this is a good practice because the size of a single row in Users table will be really big (we may have 500 unique items per users) and each time I pull or put data from/to “Users” (most of the time don’t need “UserItems” data) the read/write throughput will be also really large.
What should I do, keep the multiple tables design and handle transaction manually or switch to single table design? Or maybe there is a 3rd option?
Updated: more information about my use case
Currently I have 2 tables
Users: UserId (key), Username, Gold
UserItems: UserId (partition key), ItemId (sort key), Name, GoldValue
Scenarios:
User buy an item: Users.Gold will be deduced, new UserItem will be add to UserItems table.
User sell an item: Users.Gold will be increased, the Item will be deleted from UserItems table.
In both scenarios above I will have to do 2 update operation for 2 tables which without transaction there is a chance one of them failed.
To solve that I consider using single table solution which is a single Users table with 4 columns UserId(key), Username, Gold, UserItems. However there are two things I'm worried about:
Data in UserItems might be come to big for a single cell because one user could have up to 500 items.
To add/delete item I have to pull the UserItems from dynamodb, add/delete item and then put it back into Users. So I have to do 1 read and 1 write operation for 1 action. And because of issue (1) the read/write data size could become really big.
FWIW, the AWS documentation on NoSQL Design for DynamoDB suggests to use a single table:
As a general rule, you should maintain as few tables as possible in a
DynamoDB application. As emphasized earlier, most well designed
applications require only one table, unless there is a specific reason
for using multiple tables.
Exceptions are cases where high-volume time series data are involved,
or datasets that have very different access patterns—but these are
exceptions. A single table with inverted indexes can usually enable
simple queries to create and retrieve the complex hierarchical data
structures required by your application.
NoSql database is best suited for non-trasactional data. If you bring normalization(splitting your data into multiple tables) into noSQL, then you are beating the whole purpose of it. If performance is what matters most, then you should consider only having a single table for your use case. DynamoDB supports Range Keys, and also supports Secondary Indices. For your usecase, it would be better to redesign your table to use Range Keys.
If you can share more details about your current table, maybe i can help you with more inputs.