SQL update and join three tables based on rows in one table and not another

SQL update and join three tables based on rows in one table and not another - c++

I have a bit of a complicated sql query I need to do, and I'm a bit stuck. I'm using SQLite if that changes anything.
I have the following table structure:
Table G
---------
G_id (primary key) | Other cols ...
====================================
21
22
23
24
25
26
27 (no g_to_s_map)
28
.
Table S
---------
S_id (primary key) | S_num | Other cols.....
====================================
1 1101
2 1102
3 1103
4 1104
5 1105
6 1106
7 1107 (no g_to_s_map, no s_to_t_map)
8 1108 (no g_to_s_map, there IS an s_to_t_map)
9 1109 (there is an g_to_s_map, but no s_to_t map)
.
Table T
---------
T_id (primary key) | Other cols...
==================================
1
2
Then I also have two mapping tables:
Table G_to_S_Map (1:1 mapping, unique values of both g_id and s_id)
----------
G_id (foreign key ref g)| S_id (foreign key ref s)
===================================================
21 1
22 2
23 3
24 4
25 5
26 6
28 9
.
Table S_to_T_Map (many:1 mapping, many unique s_id to a t_id)
----------
S_id (foreign key ref s) | T_id (foreign key ref s)
===================================================
1 1
2 1
3 1
4 2
5 2
6 2
8 2
Given only a T_id and a G_id, I need to be able to update the G_to_S_Map with the first S_id corresponding to the specified T_id (in the S_to_T_Map) that is NOT in the G_to_S_Map
The first thing I was thinking of was just getting any S_id's that corresponded to the T_id in the S_to_T_Map:
SELECT S_id FROM S_to_T_Map where T_id = GIVEN_T_ID;
Then presumably I would join those values somehow with the G_to_S_Map using a left/right join maybe, and then look for the first value which doesn't exist on one of the sides? Then I'd need to do an insert into the G_to_S_Map based on that S_id and the GIVEN_G_ID value or something.
Any suggestions on how to go about this? Thanks!
Edit: Added some dummy data:

I believe this should work:
INSERT INTO G_To_S_Map (G_id, S_id)
(SELECT :inputGId, a.S_id
FROM S_To_T_Map as a
LEFT JOIN G_To_S_Map as b
ON b.S_id = a.S_id
AND b.G_id = :inputGId
WHERE a.T_id = :inputTId
AND b.G_id IS NULL
ORDER BY a.S_id
LIMIT 1);
EDIT:
If you're wanting to do the order by a different table, use this version:
INSERT INTO G_To_S_Map (G_id, S_id)
(SELECT :inputGId, a.S_id
FROM S_To_T_Map as a
JOIN S as b
ON b.S_id = a.S_id
LEFT JOIN G_To_S_Map as c
ON c.S_id = a.S_id
AND c.G_id = :inputGId
WHERE a.T_id = :inputTId
AND c.G_id IS NULL
ORDER BY b.S_num
LIMIT 1);
(As an aside, I really hope your tables aren't actually named like this, because that's a terrible thing to do. The use of Map, especially, should probably be avoided)
EDIT:
Here's some example test data. Have I missed something? Did I conceptualize the relationships incorrectly?
S_To_T_Map
================
S_ID T_ID
1 1
2 1
3 1
1 2
1 3
3 3
G_To_S_Map
==================
G_ID S_ID
1 1
3 1
2 1
3 2
2 3
3 3
Resulting joined data:
(CTEs used to generate cross-join test data)
Results:
=============================
G_TEST T_TEST S_ID
1 1 3
2 1 2
1 3 3
EDIT:
Ah, okay, now I get the problem. My issue was that I was assuming there was some sort of many-one relationship between S and G. As this is not the case, use this amended statement:
INSERT INTO G_To_S_Map (G_id, S_id)
(SELECT :inputGId, a.S_id
FROM S_To_T_Map as a
JOIN S as b
ON b.S_id = a.S_id
LEFT JOIN G_To_S_Map as c
ON c.S_id = a.S_id
OR c.G_id = :inputGId
WHERE a.T_id = :inputTId
AND c.G_id IS NULL
ORDER BY b.S_num
LIMIT 1);
Specficially, the line checking G_To_S_Map for a row containing the G_Id needed to be switched from using an AND to an OR - the key requirement which had not been specified previously was the fact that both G_Id and S_Id were unique in G_To_S_Map.
This statement will not insert a line if either the provided G_Id has been mapped previously, or if all S_Ids mapped to the given T_Id have been mapped.

Hmm, the following seems to work nicely, although I haven't combined an "insert" with it yet.
Select s.S_ID from S as s
inner join(
Select st.S_ID from s_to_t_map as st
where st.T_ID=???? AND not exists
(Select * from g_to_s_Map as gs where gs.S_ID = st.S_ID)
) rslt on s.S_ID=rslt.S_ID ORDER BY s.s_Num ASC limit 1;

Related

How to write a foreach loop statement in SAS?

I'm working in SAS as a novice. I have two datasets:
Dataset1
Unique ID
ColumnA
1
15
1
39
2
20
3
10
Dataset2
Unique ID
ColumnB
1
40
2
55
2
10
For each UniqueID, I want to subtract all values of ColumnB by each value of ColumnA. And I would like to create a NewColumn that is 1 anytime 1>ColumnB-Column >30. For the first row of Dataset 1, where UniqueID= 1, I would want SAS to go through all the rows in Dataset 2 that also have a UniqueID = 1 and determine if there is any rows in Dataset 2 where the difference between ColumnB and ColumnA is greater than 1 or less than 30. For the first row of Dataset 1 the NewColumn should be assigned a value of 1 because 40 - 15 = 25. For the second row of Dataset 1 the NewColumn should be assigned a value of 0 because 40 - 39 = 1 (which is not greater than 1). For the third row of Dataset 1, I again want SAS to go through every row of ColumnB in Dataset 2 that has the same UniqueID as in Dataset1, so 55 - 20 = 35 (which is greater than 30) but NewColumn would still be assigned a value of 1 because (moving to row 3 of Datatset 2 which has UniqueID =2) 20 - 10 = 10 which satisfies the if statement.
So I want my output to be:
Unique ID
ColumnA
NewColumn
1
15
1
1
30
0
2
20
1
I have tried concatenating Dataset1 and Dataset2 into a FullDataset. Then I tried using a do loop statement but I can't figure out how to do the loop for each value of UniqueID. I tried using BY but that of course produces an error because that is only used for increments.
DATA FullDataset;
set Dataset1 Dataset2; /*Concatenate datasets*/
do i=ColumnB-ColumnA by UniqueID;
if 1<ColumnB-ColumnA<30 then NewColumn=1;
output;
end;
RUN;
I know I'm probably way off but any help would be appreciated. Thank you!

So, the way that answers your question most directly is the keyed set. This isn't necessarily how I'd do this, but it is fairly simple to understand (as opposed to a hash table, which is what I'd use, or a SQL join, probably what most people would use). This does exactly what you say: grabs a row of A, says for each matching row of B check a condition. It requires having an index on the datasets (well, at least on the B dataset).
data colA(index=(id));
input ID ColumnA;
datalines;
1 15
1 39
2 20
3 10
;;;;
data colB(index=(id));
input ID ColumnB;
datalines;
1 40
2 55
2 30
;;;;
run;
data want;
*base: the colA dataset - you want to iterate through that once per row;
set colA;
*now, loop while the check variable shows 0 (match found);
do while (_iorc_ = 0);
*bring in other dataset using ID as key;
set colB key=ID ;
* check to see if it matches your requirement, and also only check when _IORC_ is 0;
if _IORC_ eq 0 and 1 lt ColumnB-ColumnA lt 30 then result=1;
* This is just to show you what is going on, can remove;
put _all_;
end;
*reset things for next pass;
_ERROR_=0;
_IORC_=0;
run;

Replace a row value with previous by group in SAS

Is there a way I could replace a row value to its previous row by each group?
Below is the before and after data set. Product for each type - C needs to be changed as type - L for each customer when the ID is same it has the highest amount.
Before
ObsCust LINK_ID Type Product Amount
1 1 12432 L A 23
2 1 12432 C B 0
3 2 23213 L C 234
4 2 23145 L D 25
5 2 23145 C E 0
6 3 21311 L F 34
7 3 21324 L G 45
8 3 21324 L H 35
9 3 21324 C I 0
After
Cust LINK_ID Type Product Amount
1 12432 L A                234
1 12432 C A                   -  
2 23213 L C           23,212
2 23145 L D                335
2 23145 C D                   -  
3 21311 L F                323
3 21324 L G             2,344
3 21324 L H                  34
3 21324 C G                   -  
Thank you!

if i understand correctly, you want to have product value for C Type be the product associated with the highest amount in L Types. If this is correct one possible way is to use the following. First the product with the highest amount for L-Type within each group of customers and IDs are calculated as follows:
note that the original dataset is assumed to be named "example".
proc sql;
create table L_Type as
select cust, LINK_ID, product, amount
from example
where type = 'L' and amount = max(amount)
group by cust, LINK_ID
;
quit;
then product calculated above is coded for c type in the original example.
proc sql;
select
e.cust
, e.LINK_ID
, e.type
, case when e.type = 'C' then b.product end as product
, e.amount
from example e left join L_Type b
on e.cust = b.cust and e.LINK_ID = b.LINK_ID
;
quit;

So you have a couple processing tasks to do:
Have you considered all the edge cases ?
For a customer find the row(s) with the maximum amount.
Is one of them type L ?
No, do nothing
Yes, track the Product and LinkId as follows
Is there more than one 'maximal' row ?
No, track the Product & LinkId from the one row
Yes, Is there more than one Product in the rows ?
No, track the Product value
Is there more than one LinkId ?
No, track the LinkId
Yes, Which LinkIds?
Track all the different LinkIds
Track one of these: first, lowest, highest, last LinkId
Yes, now what ?
Log an error ?
Track one of the Product values because only one can be used, which one ?
first occurring ?
lowest value ?
highest value ?
last occurring ?
For the tracked LinkIds (there might not be any) apply the tracked Product to the rows that are type C (or perhaps type not L)

I want to know how to execute CONNECT BY REGEXP in Google Big query

I have following statement in oracle sql I want to run this in Google Big Query.
CONNECT BY REGEXP_SUBSTR(VALUE, '[^,]+', 1, LEVEL) IS NOT NULL)
How can I run above code in Big query?

I am guessing here - but usually this construct is used for so called string decomposition
So, in BigQuery you can use SPLIT(value) or REGEXP_EXTRACT_ALL(value, r'[^,]+') for this as in below examples
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, SPLIT(value) value
FROM `project.dataset.table`
or
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, REGEXP_EXTRACT_ALL(value, r'[^,]+') value
FROM `project.dataset.table`
both above query will return
Row id value
1 1 1
2
3
4
5
6
7
2 2 a
b
c
d
Here, as you can see - value in each row gets split into array of elements but still in the same row
To flatten result you can further use UNNEST() as in below examples
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, value
FROM `project.dataset.table`,
UNNEST(SPLIT(value)) value
or
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, value
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(value, r'[^,]+')) value
both return below result (with all extracted elements in separate row)
Row id value
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 7
8 2 a
9 2 b
10 2 c
11 2 d

How to combine two queries and use results of first one as output for the second

Tables
I have the following tables:
entite var_entite variation
ID | NAME ID_entite|id_var ID | NAME
1 | x1 1 | 1 1 | y1
2 | x2 1 | 2 2 | y2
3 | x3 1 | 3 3 | y3
2 | 1 4 | y4
2 | 4 5 | y5
3 | 2
3 | 5
Schema
CREATE TABLE ENTITE (ID PRIMARY KEY NOT NULL, NAME STRING)
CREATE TABLE VAR_ENTITE(ID_ENTITE INTEGER NOT NULL,
ID_VARIATION INTEGER NOT NULL,
FOREIGN KEY (ID_ENTITE) REFERENCES ENTITE(ID),
FOREIGN KEY (ID_VARIATION) REFERENCES VARIATIONS(ID);
CREATE TABLE VARIATIONS (ID PRIMARY KEY NOT NULL, NAME STRING)
Problem
I am using sqlite3. the ? represent the input variables.
1- For every entity x in table entite we want to select its variations y
SELECT e.id, e.name, v.name FROM var_entite ve
JOIN entite as e ON e.id_entite = ve.id_entite
JOIN variations as v ON v.id = ve.id_var;
2- For every selected entity in the previous query weselect all the rest entities having the same varitions.
SELECT e.id, e.name, v.name FROM var_entite ve
JOIN entite as e ON e.id = ve.id
JOIN variations as v ON v.id = ve.id_var
where e.id <> ? and v.name =? ;
e.id_entite <> ? for the rest of entities having a different entity from query 1
v.name =? if an entity has the same variation as the one returned in 1 then we select it
My questions are the following:
How can we combine these 2 queries in order to get the same results?
If we combine the queries will the querying become faster?
Example
Query 1:
1,x1,y1
1,x1,y2
1,x1,y3
2,x2,y1
2,x2,y4
3,x3,y2
3,x3,y5
Query 2 outputs for entity with id = 1
2,x2,y1
3,x3,y2
Query 2 output for entity with id = 2
1,x1,y1
etc.

The results of queries can be reused by using subqueries.
In this case, the subquery does not need any names, so it does not need to do any joins:
SELECT e.id, e.name, v.name
FROM var_entite ve
JOIN entite as e ON e.id = ve.id_entite
JOIN variations as v ON v.id = ve.id_variation
WHERE e.id <> ?
AND v.id IN (SELECT id_variation
FROM var_entite
WHERE id_entite = ?)

Left join of tables with specific instructions

Would you kindly be able to assist me with writing SAS script for a specific type of left join as described below?
I’m looking to do a left join of Table – A to Table B [given below], where full matching of all identifying fields or partial matching [at least 1 field] with the remaining fields in Table – B being missing/ null is also treated a missing; however, any partial/ full matching with at least one field populated in Table – B whilst being null/ missing in Table – A will be treated as non-match.
Here’s an example of input tables [A and B] and output matching analysis/ results below:
TABLE - A
S/N COL_1 COL_2 COL_3 COL_4
-----------------------------------
1 A p ii
2 A
3 B r
TABLE - B
S/N COL_1 COL_2 COL_3 COL_4
-----------------------------------
1 A p ii
2 A q
3 A
4 A p 7 ii
5 B
6 B r n
OUTPUT/ MATCHING ANALYSIS
TABLE - A TABLE - B MATCH NO MATCH
----------------------------------------
1 1 Y
1 2 N
1 3 Y
1 4 N
2 1 N
2 2 N
2 3 Y
2 4 N
3 5 Y
3 6 N

I've decided not to use join as there could be more than 4 columns to join...
First,
let's find the equals:
proc sql;
create table Equals as
select a.*,'Y' as Match, '' as No_Match from table_a as a
intersect
select b.*,'Y' as Match, '' as No_Match from table_b as b ;
quit;
Now, let's fine not equals:
proc sql;
create table Not_Equals as
select a.*,'' as Match, 'N' as No_Match from table_a as a
except
select b.*,'' as Match, 'N' as No_Match from table_b as b
union
select b.*,'' as Match, 'N' as No_Match from table_b as b
except
select a.*,'' as Match, 'N' as No_Match from table_a as a ;
quit;
and finally - let's merge the 2 data sets:
data All;
set Equals Not_Equals;
run;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SQL update and join three tables based on rows in one table and not another - c++

Related

How to write a foreach loop statement in SAS?

Replace a row value with previous by group in SAS

I want to know how to execute CONNECT BY REGEXP in Google Big query

How to combine two queries and use results of first one as output for the second

Left join of tables with specific instructions

Categories

Resources