DynamoDB one-to-many relation denormalization or adjacency? - amazon-web-services

I am designing a table for a data structure that represents a business operation that can be performed either ad hoc or as part of a batch. Operations that are performed together as a batch must be linked and queryable, and there is meta data on the batch that will be persisted.
The table must support 2 queries: retrieve history, both ad hoc and batch instances.
Amazon suggests 2 approaches, adjacency and denormalization.
I am not sure which approach is best. Speed will be a priority, cost secondary.
This will be a multi-tenant database with multiple organizations with million+ operations. (Orgs will be a part of partition key to segregated these across nodes)
Here are the ideas I've come up with:
Denormalized, non adjacency - single root wrapper object with 1 (ad hoc) or more (batch) operation data.
Denormalized, adjacency - top level keys consist of operation instances (ad hoc) as well as parent objects containing collection of operation instances (batch)
Normalized, non adjacency, duplicate data - top level consists of operations instances, with or without a batch key, abd batch information duplicated among all members of batch
Is there a standard best practice? Any advice on setting up/generating keys?

Honestly it confuses me to understand the concept with these terms in NoSQL most specific DynamoDB. For me, hard to design the dynamodb table based on piece by piece of the whole business process. And precisely, I am more worried to data sizes instead of the speed in DynamoDB request due to we have 1MB limit per request. In other words, I should forget all things about relational DB concept and see the data as json object when working with dynamodb.
But well, for very simple one-to-many (i.e person loves some fruits) design I will have my best scheme choice has String PartitionKey. So my table will be like so :
|---------------------|---------------------------------------|
| PartitionKey | Infos |
|---------------------|---------------------------------------|
| PersonID | {name:String, age:Number, loveTo:map} |
|---------------------|---------------------------------------|
| FruitID | {name:String, otherProps} |
|---------------------|---------------------------------------|
The sample data :
|---------------------|---------------------------------------|
| PartitionKey | Infos |
|---------------------|---------------------------------------|
| person123 | { |
| | name:"Andy", |
| | age:24, |
| | loveTo:["fruit123","fruit432"] |
| | } |
|---------------------|---------------------------------------|
| personABC | { |
| | name:"Liza", |
| | age:20, |
| | loveTo:["fruit432"] |
| | } |
|---------------------|---------------------------------------|
| fruit123 | { |
| | name:"Apple", |
| | ... |
| | } |
|---------------------|---------------------------------------|
| fruit432 | { |
| | name:"Manggo", |
| | ... |
| | } |
|---------------------|---------------------------------------|
But let's see more complex case, for the sample chat app. Each channel allows many users, Each user is possible to join in any channels. Whether it should be one-to-many or many-to-many and how to make the relation? I will say I don't care about them. If we think as like relational DB, what a headache! In this case I will have composite SortKey and even Secondary Index to speedup the specific query.
So, the question what's the whole business process you work on will
help us to design the table, not piece by piece

Related

DynamoDB Unique list of elements across all records

I have a simple table that stores list of names for a particular record. I want to ensure that a name can never be used for any other record more than once. The names column should also not be empty; there should always be at least 1 given name.
| ID | Names |
|-----------------------|------------------------------------------------|
| 111 | [john, bob] |
| 222 | [tim] |
| 333 | [bob] (invalid bob already used) |
Easiest solution I believe for this case is to simply use a 2nd table for the values you are interested in being the primary key. Then in application code, simply check that new table if they exist or not to determine if you should create a new record in the primary table. For a List[L] this simply avoids having to traverse every single list of every record to determine if a particular scalar value already exists or not. Credit to Tamas.
https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/
Here’s a blog post describing how to best enforce uniqueness constraints: https://aws.amazon.com/blogs/database/simulating-amazon-dynamodb-unique-constraints-using-transactions/

Fastest way to search a varchar column in MySQL

I want to implement a search query for a bookshop. I use MySQL and I have a varchar column which contains name, author or other details like The Tragedy of Hamlet, Prince of Denmark, by William Shakespeare and I want to search like shakespeare tragedy or denmark tragedy to have a list of books have them in their one column.
I have three queries to implement this but I want to know about their performance.
LIKE %%
my first way is to split search text into words and create a dynamic command based on word counts:
SELECT * FROM books
WHERE name LIKE '%shakespeare%'
AND name LIKE '%tragedy%'
But I was told that like is a slow operator specially with two % because it can not use index.
TAG table and relational division
My second way is to have another table which contains tags like:
-------------------------
| book_id | tag |
|-----------------------|
| 1 | Tragedy |
| 1 | Hamlet |
| 1 | Prince |
| 1 | Denmark |
| 1 | William |
| 1 | Shakespeare |
-------------------------
And create a dynamic divide command:
SELECT DISTINCT book_id FROM booktag AS b1
WHERE ((SELECT 'shakespeare' as tag UNION SELECT 'tragedy' as tag)
EXCEPT
SELECT tag FROM booktag AS b2 WHERE b1.book_id = b2.book_id) IS NULL
But I was told that relational division is so slow too.
REGEXP
My third way is to use regular expressions:
SELECT * FROM books
WHERE name REGEXP '(?=.*shakespeare)(?=.*tragedy)'
But someone told me that it is slower than LIKE
Please help me decide which way is faster?
Surely using LIKE which is a built-in operand, is more optimized than Regular expression. But there is an important point here that you can not compare these two recipes together, because LIKE used to add a wildcard to string and regex is for matching a string based on a pattern which can be much complex.
Anyway the best ways which come in my mind for this aim, would be one of the followings:
Use LIKE on your column which has been indexed properly.1
Using some optimized search technologies like elastic search.
Implement a multithreading algorithm 2 which performs very good with IO tasks. For this one you can use some tricks like defining an offset and divide the table among the threads.
Also for some alternative ways read this article https://technet.microsoft.com/en-us/library/aa175787%28v=sql.80%29.aspx
1. You should be careful of the way you put the indices on your columns.read this answer for more info https://stackoverflow.com/a/10354292/2867928 and this post http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning
2.Read this answer for more info Multi Thread in SQL?

Visual C++ how to find name of a column in MySql

I am currently using the following code to fill a combo box with the column information inside of MySql database:
private: void Fillcombo1(void){
String^ constring=L"datasource=localhost;port=3307;username=root;password=root";
MySqlConnection^ conDataBase=gcnew MySqlConnection(constring);
MySqlCommand^ cmdDataBase= gcnew MySqlCommand("select * from database.combinations ;", conDataBase);
MySqlDataReader^ myReader;
try{
conDataBase->Open();
myReader=cmdDataBase->ExecuteReader();
while(myReader->Read()){
String^ vName;
vName= myReader->GetString("OD");
comboBox1->Items->Add(vName);
}
}catch(Exception^ex){
MessageBox::Show(ex->Message);
}
}
Is there any simple method for finding the name of the column and placing it within a combo box?
Also, I am adding small details to my app such as a news feed which would need updating every so often, will I have to dedicate a full new database spreadsheet to this single news feed text so that I can updated it or is there a simpler alternative?
Thanks.
An alternative is to use the DESCRIBE statement:
mysql> describe rcp_categories;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| ID_Category | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Category_Text | varchar(32) | NO | UNI | NULL | |
+---------------+------------------+------+-----+---------+----------------+
2 rows in set (0.20 sec)
There may be an easier way without launching any other query but you could also use "SHOW COLUMNS" MySQL Query.
SHOW COLUMNS FROM combinations FROM database
or
SHOW COLUMNS FROM database.combinatons
Both will work.

django south migration: Reset the schema only few tables

I am new django south migration. I have my main application and most of additional functions of that application I built as sub applications of the main application. Now what I want to do is reset the tables that are specific to sub application of the main application. I don't want to loose any data from other tables table.
This is how my database look like:
public | tos_agreement | table | g_db_admin
public | tos_agreementversion | table | g_db_admin
public | tos_signature | table | g_db_admin
public | userclickstream_click | table | g_db_admin
public | userclickstream_stream | table | g_db_admin
public | vote | table | g_db_admin
(80 rows)
I only want to re-build (dump all data of)
public | userclickstream_click | table | g_db_admin
public | userclickstream_stream | table | g_db_admin
How can I do this using south migration.
In my south_migrationhistory table I have following:
15 | userclickstream | 0001_initial | 2013-12-10 13:26:15.684678-06
16 | userclickstream | 0002_auto__del_field_stream_auth_user | 2013-12-10 13:26:15.693485-06
17 | userclickstream | 0003_auto__del_field_stream_username__add_field_stream_user | 2013-12-10 13:26:15.721449-06
I assume this record took place when I initially wired up it with south migration.
I was also thinking what if?
Delete the above records from south_migrationhistory and re-run the migration for this app which will generate the tables.
./manage.py schemamigration userclickstream --initial
./manage.py migrate userclickstream
Do it this way:
Open up your terminal and write manage.py dumpdata > backup.json. it will create a json fixture with all data currently in the database. That way, if you screw up anything, you can always re-load the data with manage.py loaddata backup.json (note that all tables need to be empty for this to work).
optional: pass the data to a new development db using the aformentioned loaddata command
Write your own migration, and not worry about breaking anything because - hey, you got a backup. It might take some learning, but the basic idea is you create a migration class with two functions - forward and reverse. Check out the south documentation and pick it up slowly from there.
Come back to SO with any more specific question and troubles you have along the way
This isn't a coded "here's the solution" answer, but I hope this helps nonetheless

Error: "Index '' does not exist on table" when trying to create entities in Doctrine 2.0 CLI

I have a mySQL database. I am trying to get Doctrine2 to create entities from the MySQL schema. I tried this with our production database, and got the following error:
[Doctrine\DBAL\Schema\SchemaException] Index '' does not exist on table user
I then created a simple test database with only one table, and only three fields: an auto-increment primary key field and three varchar fields. When attempting to have doctrine create entities from this database, I got the same error.
Here is the table that I was trying to create an entitie for. (Should have been simple)
mysql> desc user;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| iduser | int(11) | NO | PRI | NULL | auto_increment |
| firstname | varchar(45) | YES | | NULL | |
| lastname | varchar(45) | YES | | NULL | |
| username | varchar(45) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
Here is the command that I used in an attempt to get said entities created:
./doctrine orm:convert-mapping --from-database test ../models/test
I am running:
5.1.49-1ubuntu8.1 (Ubuntu)
mysql Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (i686) using readline 6.1
Doctrine 2.0.1
I am facing the same problem right now. I have traced the problem back to the primary key being not identified / set correctly. The default value is boolean(false) which is cast to the string ''. Doctrine subsequently fails to locate an index for this attribute. ;-)
Solution: Define a PRIMARY KEY.