NoSQL encourages designing database based on access patterns. What to do when the patterns change? - amazon-web-services

NoSQL encourages designing database based on access patterns and it can perform those queries it was designed for very fast. For other queries, the performance is not so good. But for software, change is the norm. So when new requirements come in and we have to add new features, how can nosql databases adapt? Or better yet, how can I design nosql databases(preferably dynamodb) that will allow me to adapt to new feature additions.
The first approach that comes to my mind will be to design a new table and migrate all the previous data to the new table. But considering the table has millions of records, its probably not very cost effective
References:
Rick Houlihan talking about designing dynamodb table based on access patterns
Dynamodb design best practices from aws documentation

DynamoDB is schema-less, so you can add a new attribute at any time without having to do any backfill or migration. Just make sure your application knows what to do if the attribute is not present.
If you need to query that attribute, you can add a new GSI on the attribute. DynamoDB has an initial quota of 20 GSIs per table, but you can request a quota increase if you need more.
If your new use case isn’t satisfied by a GSI, you can create a new table containing your new attribute(s) to use alongside the existing table. If you need a guarantee of consistency between those tables, you can use DynamoDB transactions to keep them in sync.

One way to minimize full table migrations in order to adapt to new changes would be to use generic names for indexes. In the case of dynamodb, we would have pk as partition_key and sk for sort_key as well as all the attributes of the item. The value of pk and sk will actually be a derived value from other attributes. More importantly, we will add 5 LSIs during table creation and use them when necessary. For example, to store data about a book, a row in the table will have the following fields:
pk, sk, ISBN, data_type, author, created_at, ...other data, lsi1, lsi2, lsi3, lsi4, lsi5
The values for the fields:
pk->ISBN, sk->data_type, ISBN->ISBN, ...., lsi1->data_type#created_at , lsi(2-5)->empty
This way, unless there are drastic changes in the requirements the table structure of our table is unlikely to change. One thing to note here is that unless an item that is added, deleted or updated contains an attribute that belongs to an index, no computational or storage cost is incurred in dynamodb.

Related

Any downside of having SortKey for GSI in dynamoDB

I want to create a DynamoDB table with a GSI. For this GSI, currently I need only PartitionKey because I want to query by only one attribute. But in future I may need to query by other attributes. So, I am thinking of adding a SortKey just in case if I need to query by another attribute in future. For now it can be empty or can have the same value as PK.
In GSI, are there any drawbacks of adding SortKey, if I am not planning to use it in foreseeable future? Thank you.
The (probably very) minor downsides of this approach are:
marginally higher storage costs and
added developer overhead of having to provide the (currently meaningless) index SK as part of the primary Key in CRUD operations.

Should Dynamodb apply single table design instead of multiple table design when the entities are not relational

Let’s assume there are mainly 3 tables for the current database.
Pkey = partition key
Admin
-id(Pkey), username, email, createdAt,UpdatedAt
Banner
-id(Pkey), isActive, createdAt, caption
News
-id(Pkey), createdAt, isActive, title, message
None of the above tables have relation with other tables, and more tables will be required in the future(I think most of it also don’t have the relation with other tables).
According to the aws document
You should maintain as few tables as possible in a DynamoDB application.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
So I was considering the need to combine these 3 tables into a single table.
Should I start to use a single table from now on, or keep using multiple tables for the database?
If using a single table, how should I design the table schema?
DynamoDB is a NoSQL database, hence you design your schema specifically to make the most common and important queries as fast and as inexpensive as possible. Your data structures are tailored to the specific requirements of your business use cases.
When designing a data model for your DynamoDB Table, you should start from the access patterns of your data that would in turn inform the relation (or lack thereof) among them.
Two interesting resources that would help you get started are From SQL to NoSQL and NoSQL Design for DynamoDB, both part of the AWS Developer Documentation of DynamoDB.
In your specific example, based on the questions you're trying to answer (i.e. use case & access patterns), you could either work with only the Partition Key or more likely, benefit from the usage of composite Sort Keys / Sort Key overloading as described in Best Practices for Using Sort Keys to Organize Data.
Update, add example table design to get you started:

Amazon Dynamo Table Schema

I have been trying to create a Schema for my Android application in Amazon Dynamo DB. I have very less experience with NoSQL Databases.
I have created a Survey based Android Application, now I have certain tables to store in Amazon Dynamo DB.
The Tables are Employees table, Survey table, Question table and the Response table.
The Employees table stores the information for all the employees, the survey table holds the name of the surveys and the employees, who has taken the survey.
My Issue is with the Question table and the Response table. The Questions are dynamic and are based on the employees who is taking the survey.
The answers in the Answer table depends on the number of Questions asked in the survey.
I want to know what should be my #DynamoDBHashKey, #DynamoDBIndexHashKey, #DynamoDBIndexRangeKey in the Question and the Response table, so that I can map question to respones and what should be the #DynamoDBAttribute in both the tables.
The Use Case can be: An Employee of the company posted 12 questions for all the other employees of the company.
Image was taken earlier, later on I added survey table as well
DynamoDB is good at some things and bad at others. Its great at scaling massively and horizontally while maintaining low read/write latency, but introduces eventual consistency, and forces you to make major schema decisions up front, such as what should be in a table, and how should data be partitioned, and what should be the indexes. It demands that you adopt its horizontal scaling model by partitioning your tables into pieces via the partition key.
From the way you have phrased the question it is clear that you are more comfortable in a relational database. DynamoDB is not a great place to start learning about NoSQL schema(less) design - I would have found it quite unforgiving if it was my first NoSQL database, especially trying to model a domain such as the one you describe. Its simply not a great domain modelling environment full stop - its all about horizontal scaling and performance.
If you are more comfortable in a relational database, then use a relational database. If you want to try NoSQL it is necessary to adopt a different mindset when it comes to modelling your domain into persist-able entities. For example, you might include closely related child objects within the schema of a parent record - in your example you might include Questions as children of a Survey and store them in one record, unlike in relational modelling where you would put these in separate tables.

DynamoDb table design: Single table or multiple tables

I’m quite new to NoSQL and DynamoDB and I used to RDBMS. I’m designing database for a game and we're using DynamoDB and AWS Lambda for our backend. I created a table name “Users” for player profile that contains the user information and resources. Because the game has inventory system I also created a table name “UserItems”.
It’s all good until I realized DynamoDB don’t have transaction and any operation that is executed on both table (for example using an item that increase resource) has a chance of failure on one table while success on other and will cause missing data which affect our customers.
So I was thinking maybe my multiple tables design is not good since it’s a habit of me to design multiple table when I’m working with RDBMS. Which let me to think of storing the entire “UserItems” as hash in “Users” but I’m not sure this is a good practice because the size of a single row in Users table will be really big (we may have 500 unique items per users) and each time I pull or put data from/to “Users” (most of the time don’t need “UserItems” data) the read/write throughput will be also really large.
What should I do, keep the multiple tables design and handle transaction manually or switch to single table design? Or maybe there is a 3rd option?
Updated: more information about my use case
Currently I have 2 tables
Users: UserId (key), Username, Gold
UserItems: UserId (partition key), ItemId (sort key), Name, GoldValue
Scenarios:
User buy an item: Users.Gold will be deduced, new UserItem will be add to UserItems table.
User sell an item: Users.Gold will be increased, the Item will be deleted from UserItems table.
In both scenarios above I will have to do 2 update operation for 2 tables which without transaction there is a chance one of them failed.
To solve that I consider using single table solution which is a single Users table with 4 columns UserId(key), Username, Gold, UserItems. However there are two things I'm worried about:
Data in UserItems might be come to big for a single cell because one user could have up to 500 items.
To add/delete item I have to pull the UserItems from dynamodb, add/delete item and then put it back into Users. So I have to do 1 read and 1 write operation for 1 action. And because of issue (1) the read/write data size could become really big.
FWIW, the AWS documentation on NoSQL Design for DynamoDB suggests to use a single table:
As a general rule, you should maintain as few tables as possible in a
DynamoDB application. As emphasized earlier, most well designed
applications require only one table, unless there is a specific reason
for using multiple tables.
Exceptions are cases where high-volume time series data are involved,
or datasets that have very different access patterns—but these are
exceptions. A single table with inverted indexes can usually enable
simple queries to create and retrieve the complex hierarchical data
structures required by your application.
NoSql database is best suited for non-trasactional data. If you bring normalization(splitting your data into multiple tables) into noSQL, then you are beating the whole purpose of it. If performance is what matters most, then you should consider only having a single table for your use case. DynamoDB supports Range Keys, and also supports Secondary Indices. For your usecase, it would be better to redesign your table to use Range Keys.
If you can share more details about your current table, maybe i can help you with more inputs.

DynamoDB dynamic schema

I'd like to use AWS DynamoDB as a datastore for a data-collection application, where the data schema may vary over time.
For example, initially an Item may represent attributes of people e.g. {name, age}. However, later the schema may be modified to contain {name, age, gender}.
Each schema modification will be tracked and versioned and older data won't need to be migrated - but it may still need to be queried alongside newer data.
Is it an acceptable pattern to store each data-schema change in its own table? Is there a straightforward mechanism to query aggregated data across tables?
Schemas for DynamoDB tables are dynamic in nature. The only thing that needs to be set up upfront is the key name and type. You can add global indexes any time too (indexes with a different partition key). Local indexes, however, those with the same partition key but different sort key, they are added at table creation table. Because of this dynamic schema, you can add new fields, or stop adding them any time.
You need to design tables knowing how would you query them. Queries are quite restricted, you can filter but that's not a fast/cheap operation. Fast queries rely on existing indexes. Queries can fetch from a single table. Joins/unions aren't available.
A table scan is done without any criteria, only filters are available. With filters, data is fetched from disk but can be removed from the returned set. It's an expensive operation in both cost and time. Queries passing a key are faster because they fetch data from a single partition. So you might want to design a key with both a partition (userId for instance) and sort key (item id). It is usual to have compound keys on DynamoDB.
Also it is important to avoid hot spots inside a table. That is, data needs to be fairly distributed inside partition keys.
Reference: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html