Merging the two datasets - sas

Dataset A:
Company_Name Match Sales EMPS
1234 0 0 0
1234 0 0 0
1234 0 0 0
5678 0 0 0
5678 0 0 0
5678 0 0 0
9123 9123 500 2
9123 9123 500 2
9123 9123 500 2
Dataset B:
Company_Name Match Sales EMPS
1234 1234 600 10
1234 1234 600 10
1234 1234 600 10
5678 5678 900 56
5678 5678 900 56
5678 5678 900 56
I am trying to merge the above tables using proc sql, and here is the desired output
Dataset A:
Company_Name Match Sales EMPS
1234 1234 600 10
1234 1234 600 10
1234 1234 600 10
5678 5678 900 56
5678 5678 900 56
5678 5678 900 56
9123 9123 500 2
9123 9123 500 2
9123 9123 500 2
However, when I try to do a join, it only takes the first table's values. I know I should do a case statement somewhere, but not sure how. For example, since datasetb had values for company_name=1234, the final output should capture that, and if there are no values, it should take the column values of the first table, if that makes sense
proc sql;
create table merge_table as
select a.*,b* from dataseta as a inner join datasetb as b on (a.company_name=b.company_name);quit;

Use the COALESCE() function to code your preference for B values over A value.
create table merge_table as
select a.company_name
, coalesce(b.match,a.match) as match
, coalesce(b.sales,a.sales) as sales
, coalesce(b.EMPS,a.EMPS) as EMPS
from dataseta as a
inner join datasetb as b
on (a.company_name=b.company_name)
;
But your example has repeats for COMPANY_NAME in both datasets. How do you want to handle that? Currently it will match each of the three records from A for company 1234 with each of the three records from B for company 1234 and produce 9 records for that company in the result set. You need some other variable(s) to include in the join condition so that a performs a 1 to 1 match (or at least a 1 to N match) instead of the current N to M match.

Assuming same number of zero rows and non-zero rows to replace, consider a union query to stack non-zeros and other dataset:
proc sql;
create table merge_table as
select b.Company_Name, b.Match, b.Sales, b.EMPS
from datasetb as b
union
select a.Company_Name, a.Match, a.Sales, a.EMPS
from dataseta as a
where (a.Match + a.Sales + a.EMPS) ^= 0;
quit;

Related

Flag everytime when ID change date DAX

I have table where with orders, articles belonging to orders and their shipping dates. What I want to do is, flag every time when shipping date changed or (when all dates for OrderID are the same) flag only once.
I tried to use calculated columns wrote in DAX, like nextdate, prevdate, nextorder, prevorder and reffer to them, but it doesn't work
I would appreciate every tip how to solve my prblem. Thanks!
OrderID
Article ID
Shipping date
Flag
123
1
01.01.2012
1
123
2
01.01.2012
0
123
1
02.01.2012
1
1234
12
15.03.2012
1
678
12
25.05.2014
1
678
345
25.05.2014
0
678
567
25.05.2014
0

operations with reference cells proc sql?

I have this table, call it "pre_report":
initial_balance
deposit
withdrawal
final_balance
1000
50
0
.
1000
0
25
.
1000
45
0
.
1000
30
0
.
1000
0
70
.
I want create a code in SAS that updates the "final_balance" field, the "deposit" field adds to the balance and "withdrawal" subtracts, but at the same time changes the values of the "initial_balance" field, in such a way that my desired output be this:
initial_balance
deposit
withdrawal
final_balance
1000
50
0
1050
1050
0
25
1025
1025
45
0
1070
1070
30
0
1100
1100
0
70
1030
I try this:
proc sql;
select initial_balance format=dollar32.2,
deposit format=dollar32.2,
withdrawal format=dollar32.2,
sum(initial_balance,deposit,-withdrawal) as final_balance,
calculated final_balance as initial_balance
from work.pre_report;
quit;
But it doesn't work properly. This code create two fields "final_balance" and "initial_balance" but both with the sames quantity.
code for creating "pre_report" table
data work.pre_report;
input initial_balance deposit withdrawal final_balance;;
datalines;
1000 50 0 .
1000 0 25 .
1000 45 0 .
1000 30 0 .
1000 0 70 .
run;
I would really appreciate if you help me.

DISCOUNT with multiple criteria in Power BI

I have two tables are Data and Report.
Data Table: In the Data table, two columns are Item, Qty, and Order. The Item columns contain as a text & number and qty and number column stored as text and number.
The item column is repeated according to the order and the same item column contains two different qty according to the order column.
Report Table:
I have a unique item column.
Data and Report file looks like.
Data
ITEM QTY ORDER
123 200 1
123 210 0
5678 220 1
5678 230 0
5555 240 1
6666 250 1
9876 260 1
2345 270 1
901 280 1
901 280 1
902 300 1
902 300 1
123456 200 1
123456 200 1
123456 210 1
123456 210 1
123456 0 1
567 200 1
567 210 1
567 210 1
567 0 1
453 5000 1
453 5000 1
453 5000 1
453 5000 1
112 5000 1
112 5000 1
112 5000 1
112 5000 1
116 5000 1
116 5001 1
116 0 1
116 0 1
116 5000 0
116 5001 0
116 0 0
116 0 0
Report
ITEM DESIRED RESULT (QTY)
123 200
5678 220
5555 240
6666 250
9876 260
2345 270
901 280
902 300
123456 MIXED
567 MIXED
4444 NA
12 NA
10 NA
453 5000
112 5000
116 MIXED
Expand snippet
Desired Result
I would like to pull the qty against the order “1” from the data table into the report table according to the item.
If the item is found in the data table then return the qty in the report table according to the item. {Please refer to the “Data” and “Report table for item 123 and 5678 etc….}
If an item is not found in the data table then return “NA” in the report table according to the item. {Please refer to the “Data” and “Report table for item 10, 12,444}
The same item contains two different qty then returns as a text “Mixed” in the report table according to the item. {Please refer to the “Data” and “Report table for item 123456,116 & 567}
Currently I am using the following calculated column CURRENT DAX FOR QTY = LOOKUPVALUE(DATA[QTY],DATA[ITEM],'DESIRED RESULT'[ITEM],DATA[ORDER],1,"NA") enter image description here
It’s almost working fine but it’s giving the wrong result “NA” were two different qty for the same item & two different order (0,1) or (1) or (o) {Please refer to the “Data” and “Report table for item 123456, 116 & 567} but the desired result is “Mixed” those three items.
Note: I convert the qty column from number to text otherwise it gives an error, is there any alternative option to achieve my result.
Herewith attached the PBI file for your reference https://www.dropbox.com/s/hf40q27pvn3ij2g/DAX-LOOKUPVALUE%20FILTER%20BY.pbix?dl=0.
If I'm understanding correctly, this can be done with the method I suggested previously with the addition of a filter for DATA[ORDER] = 1.
IF (
CALCULATE ( DISTINCTCOUNT ( DATA[QTY] ), DATA[ORDER] = 1 ) > 1,
"MIXED",
CALCULATE ( SELECTEDVALUE ( DATA[QTY], "NA" ), DATA[ORDER] = 1 )
)

How to sum by group in Power Query Editor?

My table look like this :
Serial WO# Value Indicator
A 333 10 333-1
A 333 4 333-2
B 456 5 456-1
A 334 1 334-1
A 334 5 334-2
I want to create a new column that sums up the Values based on WO#. It should look like this:
Serial WO# Value Indicator SumValue
A 333 10 333-1 14
A 333 4 333-2 14
B 456 5 456-1 5
A 334 1 334-1 6
A 334 5 334-2 6
Eventually I will remove duplicates on the WO# and remove the Value and Indicator Columns from the data. I can't seem to find a function in M that allows for sum by group. Thanks in advance!
If you load the data with Power Query, there is a Group command on the ribbon that will do just that.
Make sure to use the Advanced option and add all columns you want to retain to the grouping section. Screenshot from Excel ....
.... and from Power BI

Summarizing data in SAS across groups

My data set is in this format as mentioned below:
NEWID
Age
H_PERS
Income
OCCU
FAMTYPE
REGION
Metro(Yes/No)
Exp_alcohol
population sample-(This is the weighted population each new id represents) etc.
I would like to generate a summarized view like below:
average expenditure value (This should be sum of (exp_alcohol/population sample))
% of population sample across Region Metro and each demographic variable
Please help me with your ideas.
Since I can't see your data set and your description was not very clear, I'm going to guess that you have data that looks something like this and you would like add some new variables that summarizes your data...
data alcohol;
input NEWID Age H_PERS Income OCCU $ FAMTYPE $ REGION $ Metro $
Exp_alcohol population_sample;
datalines;
1234 32 4 65000 abc m CA Yes 2 4
5678 23 5 35000 xyz s WA Yes 3 6
9923 34 3 49000 def d OR No 3 9
8844 26 4 54000 gdp m CA No 1 5
;
run;
data summar;
set alcohol;
retain TotalAvg_expend metro_count total_pop;
Divide = exp_alcohol/population_sample;
TotalAvg_expend + Divide;
total_pop + population_sample;
if metro = 'Yes' then metro_count + population_sample;
percent_metro = (metro_count/total_pop)*100;
drop NEWID Age H_PERS Income OCCU FAMTYPE REGION Divide;
run;
Output:
Exp_ population_ TotalAvg_ metro_ total_ percent_
Metro alcohol sample expend count pop metro
Yes 2 4 0.50000 4 4 100.000
Yes 3 6 1.00000 10 10 100.000
No 3 9 1.33333 10 19 52.632
No 1 5 1.53333 10 24 41.667