Merging 2 tables in a single table with different schema - microsoft-sync-framework

I would like to sync a single table with the result of a Join between 2 tables. I designed these dbs for exercitation (the EQUI JOIN is between PERSON.AddressId and ADDRESS.Id):
How can I perform the provisioning of dbs and the synchronization?
Until now I developed some examples about common scenarios, like different table names, different column names or the removing of columns.
How can this be achieved?
-->FIRST DB<--
PERSON:
->Id(PK, int, not null)
->Name(nchar(10), not null)
->Surname(nchar(10), not null)
->AddressId(FK, int, not null)
ADDRESS:
->Id(PK, int, not null)
->Street(nchar(10), not null)
->City(nchar(10), not null)
->Country(nchar(10), not null)
-->SECOND DB<--
CUSTOMER:
->Id(PK, int, not null)
->Name(nchar(10), not null)
->Surname(nchar(10), not null)
->Address(nchar(10), not null)
->City(nchar(10), not null)

As far as I can tell from the documentation, the Schema's need to match for the Sync to work. You can sync between tables with the same schema but different names using the GlobalName property of the DbSyncTableDescription. See JuneT's post:
http://jtabadero.wordpress.com/2011/05/08/synching-tables-with-different-table-names/
For this particular situation, I think you'd be best served by creating Person and Address on Second DB, then creating "Customer" as a view (also on Second DB) to present the two tables in a consolidated manner.

Related

Netsuite suiteql how to get all available tables to query?

I am using Postman and Netsuite's SuiteQL to query some tables. I would like to write two queries. One is to return all items (fulfillment items) for a given sales order. Two is to return all sales orders that contain a given item. I am not sure what tables to use.
The sales order I can return from something like this.
"q": "SELECT * FROM transaction WHERE Type = 'SalesOrd' and id = '12345'"
The item I can get from this.
"q": "SELECT * FROM item WHERE id = 1122"
I can join transactions and transactionline for the sale order, but no items.
"q": "SELECT * from transactionline tl join transaction t on tl.transaction = t.id where t.id in ('12345')"
The best reference I have found is the Analytics Browser, https://system.netsuite.com/help/helpcenter/en_US/srbrowser/Browser2021_1/analytics/record/transaction.html, but it does not show relationships like an ERD diagram.
What tables do I need to join to say, given this item id 1122, return me all sales orders (transactions) that have this item?
You are looking for TransactionLine.item. That will allow you to query transaction lines whose item is whatever internal id you specify.
{
"q": "SELECT Transaction.ID FROM Transaction INNER JOIN TransactionLine ON TransactionLine.Transaction = Transaction.ID WHERE type = 'SalesOrd' AND TransactionLine.item = 1122"
}
If you are serious about getting all available tables to query take a look at the metadata catalog. It's not technically meant to be used for learning SuiteQL (supposed to make the normal API Calls easier to navigate), but I've found the catalog endpoints are the same as the SuiteQL tables for the most part.
https://{{YOUR_ACCOUNT_ID}}.suitetalk.api.netsuite.com/services/rest/record/v1/metadata-catalog/
Headers:
Accept application/schema+json
You can review all the available records, fields and joins in the Record Catalog page (Customization > Record Catalog).

RedShift: need helps for optimizations subquery WHERE IN (SELECT *)

I have next query to RedShift:
SELECT contributor_user_id,
device_id_source,
device_os,
device_model,
device_design,
device_serial,
device_carrier,
device_os_version,
device_manufacturer,
device_current_app_build,
device_current_app_version
FROM all_values
WHERE all_values.device_id_source :: VARCHAR NOT IN (SELECT device_id_source FROM table WHERE device_id_source IS NOT NULL)
AND all_values.device_os :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_os IS NOT NULL)
AND all_values.device_model :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_model IS NOT NULL)
AND all_values.device_design :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_design IS NOT NULL)
AND all_values.device_serial :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_serial IS NOT NULL)
AND all_values.device_carrier :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_carrier IS NOT NULL)
AND all_values.device_os_version :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_os_version IS NOT NULL)
AND all_values.device_manufacturer :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_manufacturer IS NOT NULL)
AND all_values.device_current_app_build :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_current_app_build IS NOT NULL)
AND all_values.device_current_app_version :: VARCHAR NOT IN (SELECT device_os FROM table WHERE device_current_app_version IS NOT NULL)
)
As I know, WHERE IN (SELECT) works slowly than "JOIN" and there are many identical requests in subquery and I think that it's not good. But I'm newbie in SQL and I don't know how I can rewrite the code above with JOIN. Could you help me with knowledge?
Thnx!
The "WHERE NOT IN (SELECT ..." can be very expensive as the list can be very long and take a lot of comparisons to determine if the value is not in the list. A somewhat less expensive way to do this is with "WHERE NOT EXISTS (SELECT ..." which is more of a JOIN structure internally but still may not be fast enough for your case.
Note these are just guesses based on your SQL and past experience. Given how simple the rest of the query looks it is a good bet. You may still want to look at the EXPLAIN plan for the query and see where the cost is increasing the most.
The best answer is to rethink this query and remove the negative logic. If I'm reading this right you want to find all the rows in contributor_user_id where the corresponding column value in "table" for ANY of the listed columns are NULL. To do this you are performing a subtraction algorithm using "WHERE NOT IN". I don't know your data model so I'm not sure if this logic is not correct.
The difficulty here is that I don't know your data and data-model. The query will flag any row that any column being NULL in "table" but only if there are no repeats of device_os in "table". For example one row in "table" with NULL for device_model but is not NULL for device_design in another row and has the same device_os value will not be flagged. It all depends on what the legal patterns are in your data. Are multiple rows with the same device_os legal in your data?
A better way is to make this into an additive algorithm which may greatly reduce the work needed to get the desired answer. Not understanding the data and the desired logic it is impossible for me to propose a solution. Example data and expected results would help in making a different solution proposal.

Is it possible to query all records in Cassandra based on a condition?

I have a table with contains a list of user records. I need to query all records based on some condition. The use case is I have about 30 million records in the user table and my condition would match 3 million.
I have gone thru https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause however, I couldn't find any real solution.
Is it even possible to query Cassandra table based on a condition?
I need to query and paginate the records just like a tradional rdbms or document store.
Cassandra has a concept of paging where you can specify the fetch size and then iterate over it page by page. The below code is if you are using Datastax Java Driver to query data. But other language should also have something similar.
final int RESULTS_PER_PAGE = 100;
Statement st = new SimpleStatement("your query");
st.setFetchSize(RESULTS_PER_PAGE);
String requestedPage = extractPagingStateStringFromURL();
// This will be absent for the first page
if (requestedPage != null) {
st.setPagingState(
PagingState.fromString(requestedPage));
}
ResultSet rs = session.execute(st);
PagingState nextPage = rs.getExecutionInfo().getPagingState();
// Note that we don't rely on RESULTS_PER_PAGE, since Cassandra might
// have not respected it, or we might be at the end of the result set
int remaining = rs.getAvailableWithoutFetching();
for (Row row : rs) {
renderInResponse(row);
if (--remaining == 0) {
break;
}
}
// This will be null if there are no more pages
if (nextPage != null) {
renderNextPageLink(nextPage.toString());
}
More details can be found here.

SqlAlchemy: Join Two Tables With Not Equal Condition

I two tables (Loan_Contract and Loan_Amend) that have same column LoanID. My purpose is that I want to get all data from table Loan_Contract only if they are not exist in table Loan_Amend.
So I tried my query as below:
db.session.query(
Loan_Contract.ID,
Loan_Contract.Currency,
Loan_Contract.DisbursedAmount
).\
join(Loan_Amend,Loan_Amend.LoanID != Loan_Contract.ID).\
all()
And
db.session.query(
Loan_Contract.ID,
Loan_Contract.Currency,
Loan_Contract.DisbursedAmount
).\
join(Loan_Amend,Loan_Amend.LoanID == Loan_Contract.ID).\
filter(Loan_Contract.ID != Loan_Amend.LoanID).\
all()
However, either of query above returned all record from Loan_Contract even though LoanID exist in Loan_Amend.
What is correct way to archive result as expected above purpose? Thanks.
To get all Loan_Contract rows that don't have any Loan_Amend referring to it, you need to use a LEFT JOIN:
SELECT * FROM Loan_Contract LEFT JOIN Loan_Amend ON Loan_Contract.ID = Loan_Amend.LoanID
WHERE Loan_Amend.LoanID IS NULL;
Using SQLAlchemy:
session.query(Loan_Contract) \
.outerjoin(Loan_Amend, Loan_Contract.ID == Loan_Amend.LoanID) \
.filter(Loan_Amend.LoanID.is_(None))

MapReduce to replicate self join

In the traditional db way, I can do joins and find say, a list of users who visited 'pageA' but not 'pageB'.
Heres how I'm doing:
Table Schema:
t_user_actions {
user_id,
action,
page
}
Sample Data:
user_id, action, page
111, visit, pageX
222, visit, pageA
222, visit, pageB
333, visit, pageA
I can write this SQL to find list of all users who visited pageA but not pageB:
SELECT distinct u1.user_id user_id
FROM t_user_actions u1 left join t_user_actions u2 USING (user_id)
WHERE u1.page="pageA" and u2.page="pageB" and
u2.user_id is NULL
How do I achieve the same with MapReduce if I'm working on a large data set assuming I can import/insert the raw data into some NOSQL db?
I notice there are ways to do union, intersect but I'm trying to figure out how to do relative complement or difference in tuples.
Depending on which database you are actually using there might be much better ways to do this than MapReduce. But you asked specifically for MapReduce, so...
The Map phase would check all documents for action == "visit" && (page == "pageA" || page == "pageB"). When this is true, it would emit a document with the user_id as key and page as value.
The Reduce phase would iterate all values it receives per user. When there is at least one value with "pageB" it returns "pageB", otherwise it returns "pageA".
When you examine the result-set, ignore all returned values with page == "pageB". Those user visited pageB at least once (but not necessarily pageA as well). Those with page == "pageA" are those who you are searching for: the users who visited only pageA but never pageB.