On Amazon DynamoDB help center I've read that
You should maintain as few tables as possible in a DynamoDB
application. Most well designed applications require only one table.
Sorry guys, but what does it mean? Whether should I design a database with just ONE table, or should I work with just one table in my (let it be php) application (but a database may contain several tables)?
Thank you!
I think this One Table concept means if you draw a relational database diagram of your models and associations, then all associated tables that connected should be able to be merged, or designed into one single NoSQL table. If you got two sets of tables in the same relational database that have no association between them, then you can group them into two separate NoSQL tables.
Related
From architectural perspective: Could you please help me understand why NoSQL DynamoDB is so hype.
DynamoDB supports some of the world’s largest scale applications by
providing consistent, single-digit millisecond response times at any
scale.
I'm trying to critic, in order to understand WHY part of the question.
We always have to specify partition Key and key attribute while retrieving to get millisecond of performance
If I design RDBMS:
where primary key or alternate key (INDEXED) always needs to be specified by in the query
I can use partition key to find out in which database my data is stored.
Never do JOINs
Isn't it same as NoSQL kind of architecture without any marketing buzz around it?
We're shifting to DynamoDB anyways but this is my innocent curiosity, there must be a strong reason which RDMBS can't do. Let's skip backup and maintenance advantages etc.
You are conflating two different things.
The definition of NoSQL
There isn't one, at least not one that can apply in all cases.
In most uses, NoSQL databases don't force your data into the fixed-schema "rows and columns" of a relational database. Although modern relational databases, such as Postgres, support data types such as JSONB that would have E. F. Codd spinning in his grave.
DynamoDB is a document database: it is optimized for retrieving and updating single documents based on a unique key, and it does not restrict the fields that those documents contain (other than requiring the ones used for a key).
Distributed Databases
A distributed database stores data on multiple nodes, and is able to perform parallel queries on those nodes and combine the results.
There are distributed SQL database: Redshift and BigQuery are optimized for queries against large datasets that may include joins, while MySQL (and no doubt others) which can run multiple engines and distribute the queries between them. It is possible for SQL databases to perform joins, including joins that cross nodes, but such joins generally perform poorly.
DynamoDB distributes items on shards based on their partition key. This makes it very fast for retrieving single items, because the query can be directed to a single shard. It is much less performant when scanning for items that reside on multiple shards.
As you note in your question, you can implement a sharded document DB on top of a relational database (either using native JSON columns or storing everything in a CLOB that is parsed for each access). But enough other people have done this (including DynamoDB) that it doesn't make sense (to me, at least) to re-implement.
Let’s assume there are mainly 3 tables for the current database.
Pkey = partition key
Admin
-id(Pkey), username, email, createdAt,UpdatedAt
Banner
-id(Pkey), isActive, createdAt, caption
News
-id(Pkey), createdAt, isActive, title, message
None of the above tables have relation with other tables, and more tables will be required in the future(I think most of it also don’t have the relation with other tables).
According to the aws document
You should maintain as few tables as possible in a DynamoDB application.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html
So I was considering the need to combine these 3 tables into a single table.
Should I start to use a single table from now on, or keep using multiple tables for the database?
If using a single table, how should I design the table schema?
DynamoDB is a NoSQL database, hence you design your schema specifically to make the most common and important queries as fast and as inexpensive as possible. Your data structures are tailored to the specific requirements of your business use cases.
When designing a data model for your DynamoDB Table, you should start from the access patterns of your data that would in turn inform the relation (or lack thereof) among them.
Two interesting resources that would help you get started are From SQL to NoSQL and NoSQL Design for DynamoDB, both part of the AWS Developer Documentation of DynamoDB.
In your specific example, based on the questions you're trying to answer (i.e. use case & access patterns), you could either work with only the Partition Key or more likely, benefit from the usage of composite Sort Keys / Sort Key overloading as described in Best Practices for Using Sort Keys to Organize Data.
Update, add example table design to get you started:
Currently I'm working with a client on an IOT project involving sensors. Currently all their data is being put into one table. This data is coming from multiple sensor nodes. They want one table for every sensor node. I want to know if through AWS Dynamo Db it is possible to split the data into multiple separate tables using the hash key from an existing table. I have looked into GSI's and LSI's but this still isn't exactly what my client wants. Also would having multiple table even be more effective than using and LSI or GSI ? I am new to nosql and dynamo db so all the help is very appreciated.
DynamoDB does not support splitting data into multiple tables - in the sense that DynamoDB operations themselves, including the atomic conditional checks, can't be performed across table boundaries. But that doesn't mean that splitting data across tables is incompatible with DynamoDB - just that you have to add the logic in your application.
You can definitely do so as long as the data from the different sensors is isolated enough. A more common scenario would be to split data into multiple tables across time boundaries in order to discard/archive old data, since DynamoDB already makes it possible and convenient to handle partitioning your data with hash keys and global secondary indexes.
In the end I would say that there is no need and it doesn't make sense to split data into multiple tables on the hash key - but it can be done. However, a more useful case is to split data into multiple tables on some other attribute of the data that is not part of the hash, or range key (such as the time-series data example).
I have been trying to create a Schema for my Android application in Amazon Dynamo DB. I have very less experience with NoSQL Databases.
I have created a Survey based Android Application, now I have certain tables to store in Amazon Dynamo DB.
The Tables are Employees table, Survey table, Question table and the Response table.
The Employees table stores the information for all the employees, the survey table holds the name of the surveys and the employees, who has taken the survey.
My Issue is with the Question table and the Response table. The Questions are dynamic and are based on the employees who is taking the survey.
The answers in the Answer table depends on the number of Questions asked in the survey.
I want to know what should be my #DynamoDBHashKey, #DynamoDBIndexHashKey, #DynamoDBIndexRangeKey in the Question and the Response table, so that I can map question to respones and what should be the #DynamoDBAttribute in both the tables.
The Use Case can be: An Employee of the company posted 12 questions for all the other employees of the company.
Image was taken earlier, later on I added survey table as well
DynamoDB is good at some things and bad at others. Its great at scaling massively and horizontally while maintaining low read/write latency, but introduces eventual consistency, and forces you to make major schema decisions up front, such as what should be in a table, and how should data be partitioned, and what should be the indexes. It demands that you adopt its horizontal scaling model by partitioning your tables into pieces via the partition key.
From the way you have phrased the question it is clear that you are more comfortable in a relational database. DynamoDB is not a great place to start learning about NoSQL schema(less) design - I would have found it quite unforgiving if it was my first NoSQL database, especially trying to model a domain such as the one you describe. Its simply not a great domain modelling environment full stop - its all about horizontal scaling and performance.
If you are more comfortable in a relational database, then use a relational database. If you want to try NoSQL it is necessary to adopt a different mindset when it comes to modelling your domain into persist-able entities. For example, you might include closely related child objects within the schema of a parent record - in your example you might include Questions as children of a Survey and store them in one record, unlike in relational modelling where you would put these in separate tables.
I have a database that is made of sorted data from the user activities. If I wanted to keep a record of each users that which record belong to which user (like a class of vectors of numbers for each users), what is the best database type that I can use here? The speed is important and the database is very large (9 Gig ~ 700 million record).The number of users is around 2 million, so I don't think that a relational connection in SQL would be a good suggestion. (Coding are in C++).
I am going to provide an answer now based on our conversation in the comments as I have too much to write in a comment.
First of all, I would use a full RDBMS for this rather than SQLite. The Lite part of the name should serve as an indicator that it isn't trying to be a full strength database. I am just saying this because if SQLite does not perform well enough on your large database, I don't want you to blame it on RDBMS technology, but on the weak database that you are using. Choose PostgreSQL or MySQL as they have better optimizers (you don't have to code it).
Second your database should provide the features to join the tables together. It would look something like:
Select *
From users
Join activity on users.id = activity.user_id
Where users.id = ###
That combined with the appropriate indexes should give you what you need.
As far as indexes, your primary keys should produce the appropriate indexes for this join. You can also create a foreign key definition so that the database knows the relationship between the tables, and can enforce it. Some databases do not support foreign key constraints, but that is not critical.
A relational SQL database can handle this just well.
Use PostGreSQL
You can use ODBC from C, that way you can change the database should the need arise.
If your data is not really relational, you can also use redis.
http://code.google.com/p/credis/
Since its a sorted set of data, you can event go for a NoSQL or Bigtable database. HBase, Hadoop, etc are provided OpenSouce resources for you.