Recursive CTE- Gives me dupes of 3 rows - common-table-expression

I am trying to find a group account and populate the aggregation of its child accounts QTY and Market value using CTE and Recursive CTE.. it gives me the correct result 3 times.. Not sure what i am missing here.
Scenario:
Example
Composite account CMP_1 contains the following account memberships.
DIM_ACCOUNT_CONSTITUENT
PARENT_ACCT_CD CHILD_ACCT_CD
CMP_1 FND_A
CMP_1 FND_B
CMP_1 FND_C
The holding for each account asof 11/13/2022 for all sources of data is shown below.
FCT_POSITION_SECURITY_LEVEL
SERVICE_ID POSITION_DATE ACCT_CD SEC_ID LONG_SHT_CD STRATEGY_ID QTY
1111 11/13/2022 FND_A 101 L ~NA~ 1000
1111 11/13/2022 FND_A 201 S ~NA~ 2000
1111 11/13/2022 FND_A 301 L ~NA~ 3000
1111 11/13/2022 FND_B 201 L ~NA~ 2000
1111 11/13/2022 FND_B 301 L ~NA~ 3000
1111 11/13/2022 FND_C 101 L ~NA~ 1000
1111 11/13/2022 FND_D 401 S ~NA~ 4000
2222 11/13/2022 FND_A 401 L ~NA~ 4000
2222 11/13/2022 FND_A 501 S ~NA~ 5000
2222 11/13/2022 FND_A 601 L ~NA~ 6000
2222 11/13/2022 FND_C 401 L ~NA~ 4000
2222 11/13/2022 FND_D 501 S ~NA~ 5000
When aggregation is applied, the following new data is created for the composite account. Notice the aggregation is based on the position business key POSITION_ID which is POSITION_DATE, ACCT_CD, SEC_ID, LONG_SHT_CD, and STRATEGY_ID. Not shown in this example is aggregation across any FCT_POSITION_SECURITY_LEVEL extension (_EXT) tables. Aggregation would work in the same way.
SERVICE_ID POSITION_DATE ACCT_CD SEC_ID LONG_SHT_CD STRATEGY_ID QTY
1111 11/13/2022 CMP_1 101 L ~NA~ 2000
1111 11/13/2022 CMP_1 201 L ~NA~ 2000
1111 11/13/2022 CMP_1 201 S ~NA~ 2000
1111 11/13/2022 CMP_1 301 L ~NA~ 6000
1111 11/13/2022 CMP_1 401 S ~NA~ 4000
2222 11/13/2022 CMP_1 401 L ~NA~ 8000
2222 11/13/2022 CMP_1 501 S ~NA~ 10000
2222 11/13/2022 CMP_1 601 L ~NA~ 6000
Query:
WITH CTE AS (
SELECT
PS.SERVICE_ID,
PS.POSITION_DATE,
PARENT_ACCT_CD AS ACCT_CD,
PS.SEC_ID,
PS.LONG_SHT_CD,
PS.STRATEGY_ID,
PS.QTY,
PS.MKT_VAL
FROM
DIM_ACCOUNT_CONSTITUENT AC
INNER JOIN
FCT_POSITION_SECURITY_LEVEL PS
ON
AC.CHILD_ACCT_CD = PS.ACCT_CD
WHERE
AC.PARENT_ACCT_CD = 'CMP_1' AND
PS.POSITION_DATE =CURRENT_DATE()
),
REC_CTE AS (
SELECT
SERVICE_ID,
POSITION_DATE,
ACCT_CD,
SEC_ID,
LONG_SHT_CD,
STRATEGY_ID,
QTY,
MKT_VAL
FROM
CTE
UNION ALL
SELECT
CTE.SERVICE_ID,
CTE.POSITION_DATE,
DIM_ACCOUNT_CONSTITUENT.PARENT_ACCT_CD AS ACCT_CD,
CTE.SEC_ID,
CTE.LONG_SHT_CD,
CTE.STRATEGY_ID,
CTE.QTY,
CTE.MKT_VAL
FROM
CTE
INNER JOIN
DIM_ACCOUNT_CONSTITUENT
ON
CTE.ACCT_CD = DIM_ACCOUNT_CONSTITUENT.CHILD_ACCT_CD
WHERE
DIM_ACCOUNT_CONSTITUENT.PARENT_ACCT_CD <> 'CMP_1'
AND CTE.POSITION_DATE=current_date()
)
SELECT *
FROM REC_CTE;

I think I found the issue, in my query, I have a Dimension table with precursory rows. adding a "where" clause filter would fix it.

Related

Minimum per subgroup in stata

In Stata, I want to calculate the minimum and maximum for subgroups per country and year, while the result should be in every observation.
Ulitmately, I want to have the difference between min and max as a separate variable.
Here is an example for my dataset:
country
year
oranges
type
USA
2021
100
1
USA
2021
200
0
USA
2021
900
0
USA
2022
500
1
USA
2022
300
0
Canada
2022
300
0
Canada
2022
400
1
The results should look like this:
country
year
oranges
type
min(tpye=1)
max(type=0)
distance
USA
2021
100
1
100
900
800
USA
2021
200
0
100
900
800
USA
2021
900
0
100
900
800
USA
2022
500
1
500
300
-200
USA
2022
300
0
500
300
-200
Canada
2022
300
0
400
300
-100
Canada
2022
400
1
400
300
-100
So far, I tried the following code:
bysort year country: egen smalloranges = min(oranges) if type == 1
bysort year country: egen bigoranges = max(oranges) if type == 0
gen distance = bigoranges - smalloranges
I would approach this directly, as follows:
* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country int(year oranges) byte type
"USA" 2021 100 1
"USA" 2021 200 0
"USA" 2021 900 0
"USA" 2022 500 1
"USA" 2022 300 0
"Canada" 2022 300 0
"Canada" 2022 400 1
end
egen min = min(cond(type == 1, oranges, .)), by(country year)
egen max = max(cond(type == 0, oranges, .)), by(country year)
gen wanted = max - min
list, sepby(country year)
b +------------------------------------------------------+
| country year oranges type min max wanted |
|------------------------------------------------------|
1. | USA 2021 100 1 100 900 800 |
2. | USA 2021 200 0 100 900 800 |
3. | USA 2021 900 0 100 900 800 |
|------------------------------------------------------|
4. | USA 2022 500 1 500 300 -200 |
5. | USA 2022 300 0 500 300 -200 |
|------------------------------------------------------|
6. | Canada 2022 300 0 400 300 -100 |
7. | Canada 2022 400 1 400 300 -100 |
+------------------------------------------------------+
For more discussion, see Section 9 of https://www.stata-journal.com/article.html?article=dm0055
I am not sure if I understand the purpose of type 1 and 0, but this generates the exact result you describe in the tables. It might seem convoluted to create temporary files like this, but I think it modularizes the code into clean blocks.
* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country int(year oranges) byte type
"USA" 2021 100 1
"USA" 2021 200 0
"USA" 2021 900 0
"USA" 2022 500 1
"USA" 2022 300 0
"Canada" 2022 300 0
"Canada" 2022 400 1
end
tempfile min1 max0
* Get min values for type 1 in each country-year
preserve
keep if type == 1
collapse (min) min_type_1=oranges , by(country year)
save `min1'
restore
* Get max values for type 0 in each country-year
preserve
keep if type == 0
collapse (max) max_type_0=oranges , by(country year)
save `max0'
restore
* Merge the min and the max
merge m:1 country year using `min1', nogen
merge m:1 country year using `max0', nogen
* Calculate distance
gen distance = max_type_0 - min_type_1

operations with reference cells proc sql?

I have this table, call it "pre_report":
initial_balance
deposit
withdrawal
final_balance
1000
50
0
.
1000
0
25
.
1000
45
0
.
1000
30
0
.
1000
0
70
.
I want create a code in SAS that updates the "final_balance" field, the "deposit" field adds to the balance and "withdrawal" subtracts, but at the same time changes the values of the "initial_balance" field, in such a way that my desired output be this:
initial_balance
deposit
withdrawal
final_balance
1000
50
0
1050
1050
0
25
1025
1025
45
0
1070
1070
30
0
1100
1100
0
70
1030
I try this:
proc sql;
select initial_balance format=dollar32.2,
deposit format=dollar32.2,
withdrawal format=dollar32.2,
sum(initial_balance,deposit,-withdrawal) as final_balance,
calculated final_balance as initial_balance
from work.pre_report;
quit;
But it doesn't work properly. This code create two fields "final_balance" and "initial_balance" but both with the sames quantity.
code for creating "pre_report" table
data work.pre_report;
input initial_balance deposit withdrawal final_balance;;
datalines;
1000 50 0 .
1000 0 25 .
1000 45 0 .
1000 30 0 .
1000 0 70 .
run;
I would really appreciate if you help me.

DISCOUNT with multiple criteria in Power BI

I have two tables are Data and Report.
Data Table: In the Data table, two columns are Item, Qty, and Order. The Item columns contain as a text & number and qty and number column stored as text and number.
The item column is repeated according to the order and the same item column contains two different qty according to the order column.
Report Table:
I have a unique item column.
Data and Report file looks like.
Data
ITEM QTY ORDER
123 200 1
123 210 0
5678 220 1
5678 230 0
5555 240 1
6666 250 1
9876 260 1
2345 270 1
901 280 1
901 280 1
902 300 1
902 300 1
123456 200 1
123456 200 1
123456 210 1
123456 210 1
123456 0 1
567 200 1
567 210 1
567 210 1
567 0 1
453 5000 1
453 5000 1
453 5000 1
453 5000 1
112 5000 1
112 5000 1
112 5000 1
112 5000 1
116 5000 1
116 5001 1
116 0 1
116 0 1
116 5000 0
116 5001 0
116 0 0
116 0 0
Report
ITEM DESIRED RESULT (QTY)
123 200
5678 220
5555 240
6666 250
9876 260
2345 270
901 280
902 300
123456 MIXED
567 MIXED
4444 NA
12 NA
10 NA
453 5000
112 5000
116 MIXED
Expand snippet
Desired Result
I would like to pull the qty against the order “1” from the data table into the report table according to the item.
If the item is found in the data table then return the qty in the report table according to the item. {Please refer to the “Data” and “Report table for item 123 and 5678 etc….}
If an item is not found in the data table then return “NA” in the report table according to the item. {Please refer to the “Data” and “Report table for item 10, 12,444}
The same item contains two different qty then returns as a text “Mixed” in the report table according to the item. {Please refer to the “Data” and “Report table for item 123456,116 & 567}
Currently I am using the following calculated column CURRENT DAX FOR QTY = LOOKUPVALUE(DATA[QTY],DATA[ITEM],'DESIRED RESULT'[ITEM],DATA[ORDER],1,"NA") enter image description here
It’s almost working fine but it’s giving the wrong result “NA” were two different qty for the same item & two different order (0,1) or (1) or (o) {Please refer to the “Data” and “Report table for item 123456, 116 & 567} but the desired result is “Mixed” those three items.
Note: I convert the qty column from number to text otherwise it gives an error, is there any alternative option to achieve my result.
Herewith attached the PBI file for your reference https://www.dropbox.com/s/hf40q27pvn3ij2g/DAX-LOOKUPVALUE%20FILTER%20BY.pbix?dl=0.
If I'm understanding correctly, this can be done with the method I suggested previously with the addition of a filter for DATA[ORDER] = 1.
IF (
CALCULATE ( DISTINCTCOUNT ( DATA[QTY] ), DATA[ORDER] = 1 ) > 1,
"MIXED",
CALCULATE ( SELECTEDVALUE ( DATA[QTY], "NA" ), DATA[ORDER] = 1 )
)

Merging the two datasets

Dataset A:
Company_Name Match Sales EMPS
1234 0 0 0
1234 0 0 0
1234 0 0 0
5678 0 0 0
5678 0 0 0
5678 0 0 0
9123 9123 500 2
9123 9123 500 2
9123 9123 500 2
Dataset B:
Company_Name Match Sales EMPS
1234 1234 600 10
1234 1234 600 10
1234 1234 600 10
5678 5678 900 56
5678 5678 900 56
5678 5678 900 56
I am trying to merge the above tables using proc sql, and here is the desired output
Dataset A:
Company_Name Match Sales EMPS
1234 1234 600 10
1234 1234 600 10
1234 1234 600 10
5678 5678 900 56
5678 5678 900 56
5678 5678 900 56
9123 9123 500 2
9123 9123 500 2
9123 9123 500 2
However, when I try to do a join, it only takes the first table's values. I know I should do a case statement somewhere, but not sure how. For example, since datasetb had values for company_name=1234, the final output should capture that, and if there are no values, it should take the column values of the first table, if that makes sense
proc sql;
create table merge_table as
select a.*,b* from dataseta as a inner join datasetb as b on (a.company_name=b.company_name);quit;
Use the COALESCE() function to code your preference for B values over A value.
create table merge_table as
select a.company_name
, coalesce(b.match,a.match) as match
, coalesce(b.sales,a.sales) as sales
, coalesce(b.EMPS,a.EMPS) as EMPS
from dataseta as a
inner join datasetb as b
on (a.company_name=b.company_name)
;
But your example has repeats for COMPANY_NAME in both datasets. How do you want to handle that? Currently it will match each of the three records from A for company 1234 with each of the three records from B for company 1234 and produce 9 records for that company in the result set. You need some other variable(s) to include in the join condition so that a performs a 1 to 1 match (or at least a 1 to N match) instead of the current N to M match.
Assuming same number of zero rows and non-zero rows to replace, consider a union query to stack non-zeros and other dataset:
proc sql;
create table merge_table as
select b.Company_Name, b.Match, b.Sales, b.EMPS
from datasetb as b
union
select a.Company_Name, a.Match, a.Sales, a.EMPS
from dataseta as a
where (a.Match + a.Sales + a.EMPS) ^= 0;
quit;

Populate df row value based on column header

Appreciate any help. Basically, I have a poor data set and am trying to make it more useful.
Below is a representation
df = pd.DataFrame({'State': ("Texas","California","Florida"),
'Q1 Computer Sales': (100,200,300),
'Q1 Phone Sales': (400,500,600),
'Q1 Backpack Sales': (700,800,900),
'Q2 Computer Sales': (200,200,300),
'Q2 Phone Sales': (500,500,600),
'Q2 Backpack Sales': (800,800,900)})
I would like to have a df that creates separate columns for the Quarters and Sales for the respective state.
I think perhaps regex, str.contains, and loops perhaps?
snapshot below
IIUC, you can use:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)))
df_a.stack(0).reset_index()
Output:
State level_1 Backpack Sales Computer Sales Phone Sales
0 Texas Q1 700 100 400
1 Texas Q2 800 200 500
2 California Q1 800 200 500
3 California Q2 800 200 500
4 Florida Q1 900 300 600
5 Florida Q2 900 300 600
Or we can go further:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)), names=['Quarters','Items'])
df_a = df_a.stack(0).reset_index()
df_a['Quarters'] = df_a['Quarters'].str.extract('(\d+)')
print(df_a)
Output:
Items State Quarters Backpack Sales Computer Sales Phone Sales
0 Texas 1 700 100 400
1 Texas 2 800 200 500
2 California 1 800 200 500
3 California 2 800 200 500
4 Florida 1 900 300 600
5 Florida 2 900 300 600