I'm trying to define, using the interleaving approach of Google Spanner, a mechanism to have rows from several tables in the same split. According to the documentation (https://cloud.google.com/spanner/docs/schema-and-data-model#database-splits), rows that share the same primary key prefix are placed in the same split. But what is defining the "same primary key prefix"? Let's put an example. I've three tables with primary keys:
table A has PK (C1, CA)
table B has PK (C1, CB)
table C has PK (C1, CC)
These three tables share the first element of their primary key, column C1. I wish that all rows with the same value for C1 to go to the same split.
Can I define table A as the parent table for B and C?
Do I need to create a dummy table with PK (C1)?
Any other approach?
The database will have lots of reads, many updates but few inserts.
Any suggestion will be highly appreciated.
Thanks
In order to define a child table, the child table's primary key must contain the entire primary key of the parent as a prefix.
In your example, you would need to create a table with primary key C1. You could then interleave tables A, B, and/or C in this parent table.
The parent table can't quite be a dummy table. To insert a value into a child table, the corresponding row must exist in the parent table. So in your example, you would need to ensure that the parent table has a row for each value of C1 that you want to add to any of the child tables.
Related
I want to merge two or more tables in to one, for example, I have table1.csv and table2.csv, they are from different Mysql server but have the same structure like [A, B, C, datatime].
For different records, if the values of A, B, C are not the same, then directly treat it as different records, if the values of A, B, and C are the same, then only the record with the latest datatime will be kept.
If I first use the program to select which records are useful locally, and then insert them into mysql together, will it be faster than inserting them one by one while selecting?
You can do it easy with a composite unique key over the three field on the table where you want to insert
This query will add a unique key, so you can add the same row again
ALTER TABLE `table1` ADD UNIQUE `unique_index`(`a`, `b`, `c`);
This query will append only different records
INSERT IGNORE table1 SELECT * FROM table2
If I have a DynamoDB table with pk and sk where pk is such that I can query the table for a given pk and get all items in a given category, how does this differ from scanning a sparse secondary index that contains only items from said category? I know GSI read/write units are separate from the main table, but I'm wondering if there is a latency or other benefit to be had from doing one over the other.
AFAIK, in theory, there shouldn't be any performance difference between them. First of all, the primary table and GSI both use the same underlying storage nodes, so the IO performance should be the same. Secondly, no matter you query the primary table or scan the sparse GSI, the partition key of the records you are retrieving is the same, which means all those records reside in the same partition (not split in shards).
Some benefits I can think of to do queries in the primary table:
Save RCU, WCU and storage cost of the GSI
You have the ability to do consistent reads
I'm attempting to create multiple joins with two tables (Table A + Table B) using the same key from Table A. The key on table A is "Name" and there are multiple columns in Table B that I need to join this with. Any ideas on the best way to do this?
Just go ahead and do it. Power BI allows multiple relations between tables, but only one of them will be "active":
Multiple Relationships Between Tables
To use inactive relations, you will have to refer to them in DAX using function called
USERELATIONSHIP
Alternatively, you can replicate your table A as many times as you need, and setup regular relations. In my opinion, it's a better data model - it's more intuitive and easier to use.
I am trying to solve an Informatica problem
I have two tables: Table A and Table B have the following structure
Table A
A_Key
A_Name
A_Address
A_PostalCode
A_Country
A_Latitude
A_Longitude
Table B
B_Key
B_Name
B_PostalCode
B_Latitude
B_Longitude
I need to combine A & B in order to have one output table that contains all the Attribute of A & B.
Since I am new to Informatica Data Quality tool, I am trying to find the logic how I can implement this.
Does anyone have a better solution?
You can use a Joiner Transformation to do this.
It has two groups - Master and Detail. Ideally, you should connect the table with lesser data to the Master and the table with additional data should be connected to Detail section.
Ensure your table data is sorted before connecting to the joiner. Also, enable the Sorted Input in the advanced section of the Joiner Transformation.
Again for powercenter, this scenario sounds more like a union to me and setting the missing colums to null from group b
Coming off a NLTK NER problem, I have PERSONS and ORGANIZATIONS, which I need to store in a sqlite3 db. The obtained wisdom is that I need to create separate TABLEs to hold these sets. How can i create a TABLE when len(PERSONs) could vary for each id. It can even be zero. The normal use of:
insert into table_name values (?),(t[0]) will return a fail.
Thanks to CL.'s comment, I figured out the best way is to think rows in a two-column table, where the first column is id INT, and the second column contains person_names. This way, there will be no issue with varying lengths of PERSONS list. of course, to link the main table with the persons table, the id field has to REFERENCE (foreign keys) to the story_id (main table).