I have been trying to create a Schema for my Android application in Amazon Dynamo DB. I have very less experience with NoSQL Databases.
I have created a Survey based Android Application, now I have certain tables to store in Amazon Dynamo DB.
The Tables are Employees table, Survey table, Question table and the Response table.
The Employees table stores the information for all the employees, the survey table holds the name of the surveys and the employees, who has taken the survey.
My Issue is with the Question table and the Response table. The Questions are dynamic and are based on the employees who is taking the survey.
The answers in the Answer table depends on the number of Questions asked in the survey.
I want to know what should be my #DynamoDBHashKey, #DynamoDBIndexHashKey, #DynamoDBIndexRangeKey in the Question and the Response table, so that I can map question to respones and what should be the #DynamoDBAttribute in both the tables.
The Use Case can be: An Employee of the company posted 12 questions for all the other employees of the company.
Image was taken earlier, later on I added survey table as well
DynamoDB is good at some things and bad at others. Its great at scaling massively and horizontally while maintaining low read/write latency, but introduces eventual consistency, and forces you to make major schema decisions up front, such as what should be in a table, and how should data be partitioned, and what should be the indexes. It demands that you adopt its horizontal scaling model by partitioning your tables into pieces via the partition key.
From the way you have phrased the question it is clear that you are more comfortable in a relational database. DynamoDB is not a great place to start learning about NoSQL schema(less) design - I would have found it quite unforgiving if it was my first NoSQL database, especially trying to model a domain such as the one you describe. Its simply not a great domain modelling environment full stop - its all about horizontal scaling and performance.
If you are more comfortable in a relational database, then use a relational database. If you want to try NoSQL it is necessary to adopt a different mindset when it comes to modelling your domain into persist-able entities. For example, you might include closely related child objects within the schema of a parent record - in your example you might include Questions as children of a Survey and store them in one record, unlike in relational modelling where you would put these in separate tables.
Related
I preface all of this to say I’m still actively learning DynamoDB, and I think an answer to my question will help me understand a few things.
I have an analytics microservice that I’m pushing custom (internal) analytics events into a DynamoDB table. Columns in our Dynamo rows/items include data like:
User ID
IP Address
Event Action
Timestamp
Split Test ID
Split Test Value
One of the main questions we want to pull from this db is:
"How many users saw split test x with values y?"
I’m struggling to understand how I should index my database to account for this kind of requests? I set up a “Keys Only” index targeting Split Test ID, and the query to gather these are fairly efficient, but it only pulls UserID and Split Test ID. Ideally I want an efficient query that returns multiple other associated values as well…
How do I achieve this? Do I need to be doing something much differently? Additionally, if any of my understanding of Dynamo, based on my explanations, sounds completely lacking in some regard, please point me in the right direction!
You're thinking of DynamoDB as a schema-less database, which it obviously is. However, that does not mean that a schema is not important. Schemas in NoSQL databases are usually more important than they are in SQL databases and they are usually less straightforward.
The most important thing to determine how you will store your data is how you will access it. You will have to take into account all the ways that you will want to access your data and ensure it is possible by creating the necessary data columns and necessary indexes. In this case, if you want to know how many times two values are combined in a certain way, you could easily add a column that has these combined values (e.g., splitId#splitValue ) and use that in your indexes.
If you want to know more about advanced patterns and such, I advise you to watch this pretty famous re:invent talk by Rick Houlihan or to read the DynamoDB book.
As a last note, I want to add that switching to a SQL server usually is not the solution. Picking NoSQL over SQL is usually based on non-functional requirements. There is a reason NoSQL databases are used in applications that require very low-latency retrieval of data in huge datasets, but as with everything, trade-offs are the name of the game.
Let’s assume there are mainly 3 tables for the current database.
Pkey = partition key
Admin
-id(Pkey), username, email, createdAt,UpdatedAt
Banner
-id(Pkey), isActive, createdAt, caption
News
-id(Pkey), createdAt, isActive, title, message
None of the above tables have relation with other tables, and more tables will be required in the future(I think most of it also don’t have the relation with other tables).
According to the aws document
You should maintain as few tables as possible in a DynamoDB application.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
So I was considering the need to combine these 3 tables into a single table.
Should I start to use a single table from now on, or keep using multiple tables for the database?
If using a single table, how should I design the table schema?
DynamoDB is a NoSQL database, hence you design your schema specifically to make the most common and important queries as fast and as inexpensive as possible. Your data structures are tailored to the specific requirements of your business use cases.
When designing a data model for your DynamoDB Table, you should start from the access patterns of your data that would in turn inform the relation (or lack thereof) among them.
Two interesting resources that would help you get started are From SQL to NoSQL and NoSQL Design for DynamoDB, both part of the AWS Developer Documentation of DynamoDB.
In your specific example, based on the questions you're trying to answer (i.e. use case & access patterns), you could either work with only the Partition Key or more likely, benefit from the usage of composite Sort Keys / Sort Key overloading as described in Best Practices for Using Sort Keys to Organize Data.
Update, add example table design to get you started:
NoSQL encourages designing database based on access patterns and it can perform those queries it was designed for very fast. For other queries, the performance is not so good. But for software, change is the norm. So when new requirements come in and we have to add new features, how can nosql databases adapt? Or better yet, how can I design nosql databases(preferably dynamodb) that will allow me to adapt to new feature additions.
The first approach that comes to my mind will be to design a new table and migrate all the previous data to the new table. But considering the table has millions of records, its probably not very cost effective
References:
Rick Houlihan talking about designing dynamodb table based on access patterns
Dynamodb design best practices from aws documentation
DynamoDB is schema-less, so you can add a new attribute at any time without having to do any backfill or migration. Just make sure your application knows what to do if the attribute is not present.
If you need to query that attribute, you can add a new GSI on the attribute. DynamoDB has an initial quota of 20 GSIs per table, but you can request a quota increase if you need more.
If your new use case isn’t satisfied by a GSI, you can create a new table containing your new attribute(s) to use alongside the existing table. If you need a guarantee of consistency between those tables, you can use DynamoDB transactions to keep them in sync.
One way to minimize full table migrations in order to adapt to new changes would be to use generic names for indexes. In the case of dynamodb, we would have pk as partition_key and sk for sort_key as well as all the attributes of the item. The value of pk and sk will actually be a derived value from other attributes. More importantly, we will add 5 LSIs during table creation and use them when necessary. For example, to store data about a book, a row in the table will have the following fields:
pk, sk, ISBN, data_type, author, created_at, ...other data, lsi1, lsi2, lsi3, lsi4, lsi5
The values for the fields:
pk->ISBN, sk->data_type, ISBN->ISBN, ...., lsi1->data_type#created_at , lsi(2-5)->empty
This way, unless there are drastic changes in the requirements the table structure of our table is unlikely to change. One thing to note here is that unless an item that is added, deleted or updated contains an attribute that belongs to an index, no computational or storage cost is incurred in dynamodb.
On Amazon DynamoDB help center I've read that
You should maintain as few tables as possible in a DynamoDB
application. Most well designed applications require only one table.
Sorry guys, but what does it mean? Whether should I design a database with just ONE table, or should I work with just one table in my (let it be php) application (but a database may contain several tables)?
Thank you!
I think this One Table concept means if you draw a relational database diagram of your models and associations, then all associated tables that connected should be able to be merged, or designed into one single NoSQL table. If you got two sets of tables in the same relational database that have no association between them, then you can group them into two separate NoSQL tables.
I’m quite new to NoSQL and DynamoDB and I used to RDBMS. I’m designing database for a game and we're using DynamoDB and AWS Lambda for our backend. I created a table name “Users” for player profile that contains the user information and resources. Because the game has inventory system I also created a table name “UserItems”.
It’s all good until I realized DynamoDB don’t have transaction and any operation that is executed on both table (for example using an item that increase resource) has a chance of failure on one table while success on other and will cause missing data which affect our customers.
So I was thinking maybe my multiple tables design is not good since it’s a habit of me to design multiple table when I’m working with RDBMS. Which let me to think of storing the entire “UserItems” as hash in “Users” but I’m not sure this is a good practice because the size of a single row in Users table will be really big (we may have 500 unique items per users) and each time I pull or put data from/to “Users” (most of the time don’t need “UserItems” data) the read/write throughput will be also really large.
What should I do, keep the multiple tables design and handle transaction manually or switch to single table design? Or maybe there is a 3rd option?
Updated: more information about my use case
Currently I have 2 tables
Users: UserId (key), Username, Gold
UserItems: UserId (partition key), ItemId (sort key), Name, GoldValue
Scenarios:
User buy an item: Users.Gold will be deduced, new UserItem will be add to UserItems table.
User sell an item: Users.Gold will be increased, the Item will be deleted from UserItems table.
In both scenarios above I will have to do 2 update operation for 2 tables which without transaction there is a chance one of them failed.
To solve that I consider using single table solution which is a single Users table with 4 columns UserId(key), Username, Gold, UserItems. However there are two things I'm worried about:
Data in UserItems might be come to big for a single cell because one user could have up to 500 items.
To add/delete item I have to pull the UserItems from dynamodb, add/delete item and then put it back into Users. So I have to do 1 read and 1 write operation for 1 action. And because of issue (1) the read/write data size could become really big.
FWIW, the AWS documentation on NoSQL Design for DynamoDB suggests to use a single table:
As a general rule, you should maintain as few tables as possible in a
DynamoDB application. As emphasized earlier, most well designed
applications require only one table, unless there is a specific reason
for using multiple tables.
Exceptions are cases where high-volume time series data are involved,
or datasets that have very different access patterns—but these are
exceptions. A single table with inverted indexes can usually enable
simple queries to create and retrieve the complex hierarchical data
structures required by your application.
NoSql database is best suited for non-trasactional data. If you bring normalization(splitting your data into multiple tables) into noSQL, then you are beating the whole purpose of it. If performance is what matters most, then you should consider only having a single table for your use case. DynamoDB supports Range Keys, and also supports Secondary Indices. For your usecase, it would be better to redesign your table to use Range Keys.
If you can share more details about your current table, maybe i can help you with more inputs.