SAS COLUMN MERGING WITH UNEQUAL ROWS - - sas

I need to create a 3rd table which will look like the following:
Data1:
policy
Risk
Premium
KOK1
002
150
KOK2
003
130
Data2:
Source
policy
Risk
Item1
ageofbuild
KOK1
002
3
yearofbuild
KOK1
002
5
Discount
KOK1
002
10%
Discount
KOK1
002
5%
ageofbuild
KOK2
003
4
yearofbuild
KOK2
003
6
Discount
KOK2
003
15%
Discount
KOK2
003
7%
Discount
KOK2
003
3%
Use dataset1 and dataset2 to create dataset 3?
Data3: (an extension of dataset1 with a discount column)
policy
Risk
Premium
Discount
KOK1
002
150
10%*5%
KOK2
003
130
15%*7%*3%
How do I set up a formula where policy and risk in data1 matches with policy and risk in dataset2 and grabs the discounts (multiplies them) and then creates a new table called data3 which is an extension of dataset1 with an additional column at the end called discount. Can someone please set up a code for me to achieve that?
I tried merging and using Hash tables but it did not work. I don't have much knowledge on SAS.

First thing is to convert your listings into actual datasets so we can see what TYPE of variables you have. So lets assume you meant you had data like this:
data policy ;
input policy $ Risk $ Premium ;
cards;
KOK1 002 150
KOK2 003 130
;
data rules ;
input Source :$20. policy $ Risk $ Item1 :8.2;
cards;
ageofbuild KOK1 002 3
yearofbuild KOK1 002 5
Discount KOK1 002 10
Discount KOK1 002 5
ageofbuild KOK2 003 4
yearofbuild KOK2 003 6
Discount KOK2 003 15
Discount KOK2 003 7
Discount KOK2 003 3
;
Now we just need to merge the two and concatenate the ITEM1 values for the observations where source is DISCOUNT. To make it LOOK like your output I made the new DISCOUNT variable a character string.
proc sort data=rules ;
by policy risk;
run;
proc sort data=policy;
by policy risk;
run;
data want;
merge policy (in=in1) rules (in=in2 where=(source='Discount'));
by policy risk;
length discount $30 ;
retain discount;
if in1 ;
if first.risk then discount=' ';
if in2 then discount=catx('*',discount,put(item1,percent5.));
if last.risk;
keep policy risk premium discount;
run;
Result
Obs policy Risk Premium discount
1 KOK1 002 150 10%*5%
2 KOK2 003 130 15%*7%*3%
You do not explain what multiplying discounts mean ( I doubt that combining a 5% discount with a 3% discount means you want 0.015% discount that the multiplication symbol in your listing implies). Do you want to add them? So 5% + 3% yields 8%? Compound them in some other way?

Related

Add and update flags based on dates

I have the following data set:
ID Start Stop
001 01JAN2013 31JAN2013
001 01FEB2013 31DEC2013
002 01MAR2013 31DC2013
003 01JAN2013 31DEC2013
I need the following output:
ID Start Stop Start_flag End_flag
001 01JAN2013 31JAN2013 1 2
001 01FEB2013 31DEC2013 2 3
002 01MAR2013 31DC2013 1 2
003 01JAN2013 31DEC2013 1 2
In other words I need to add a flag for the start and end with the exception that for consecutive periods the end flag of the previous period will become the start flag of the subsequent period and the remaining end flag will be increased by 1.
Can anyone help me please?
Thnk you in advance
Use the LAG() function
proc sort data=have; by id start; run;
data want(drop=lag_stop);
set have;
by id start notsorted;
lag_stop = lag(stop);
if first.id then do;
start_flag=1;
end_flag=start_flag+1;
end;
else if lag_stop+1 = start then do;
start_flag+1;
end_flag+1;
end;
run;
want
id start stop start_flag end_flag
001 01JAN2013 31JAN2013 1 2
001 01FEB2013 31DEC2013 2 3
002 01MAR2013 31DEC2013 1 2
003 01JAN2013 31DEC2013 1 2

How to count rows in a connected table as a column in Power BI?

I have two tables:
table1
Client Client#
A 001
B 002
C 003
D 004
table2
Client# Machine
001 A
001 B
002 A
002 B
002 C
003 A
004 A
tables are connected on Client#. I want to be able to create a column in table 1 that counts the number of machines for that Client# in table 2. So it would look like this:
table1
Client Client# Machines
A 001 2
B 002 3
C 003 1
D 004 1
Thanks in advance!
This can be done by adding a Calculated column in table 1 like this:
Machines =
CALCULATE(
COUNTROWS('table 2')
)
This works since calculate takes the current row context and uses it as a filter context.

How to reference the correct value in a many to many relationship in Power BI?

I have two tables connected by Serial Number.
SAMPLES
SERIAL# SMU SAMPLE
001 52 GREEN
002 25 GREEN
001 124 YELLOW
003 41 RED
001 266 GREEN
001 280 GREEN
WARRANTY
SERIALl# SMUSTART SMUEND LIFE
001 1 100 1
002 5 105 1
003 1 100 1
001 101 200 2
001 201 300 3
I am trying to be able to create a slicer on LIFE that will show me only the SAMPLES where the SMU is within the SMUSTART and SMUEND range. I've tried pulling the LIFE column into the SAMPLES table, concatenating SERIAL# and LIFE and then connecting the tables on my new concatenated columns. But I was never able to get LIFE successfully brought over to the SAMPLES table and didn't know if I was even heading down the right path. Any and all help is appreciated. Thank you. Let me know if I need to clarify anything.
This is possible if you transform your tables.
Add a new column to the samples table that is a combination of the serial no and the SMU.
SAMPLES
SERIAL# SMU SAMPLE SER_SMU
001 52 GREEN 001_52
002 25 GREEN 002_25
001 124 YELLOW 001_124
003 41 RED 003_41
001 266 GREEN 001_266
001 280 GREEN 001_280
The warrantly table you first have to "expand". Create a row for every possible SMU. And then add again a new column that is the combination of Serial no and SMU.
Like this:
SERIALl# SMU LIFE SER_SMU
001 1 1 001_1
001 2 1 001_2
...
001 100 1 001_100
This way you can join the tables on the SER_SMU.
What is your source of the data? Can you transform the tables when importing them, otherwise you have to do it with power query, which might be a bit challenging.
EDIT: example of SQL query:
SELECT s.serial, s.SMU, s.SAMPLE,w.LIFE
FROM samples s
LEFT JOIN warranty w
ON s.serial = w.serial
AND s.SMU BETWEEN w.SMUSTART AND w.SMUEND

SAS - How to skip a record when reading in a data file

I have data that looks something like this:
ID Test Date
001 A 9/1/2011
001 A 10/2/2011
001 A 9/12/2012
001 A 10/10/2013 001 B 10/1/2011 001 B 1/1/2012 002 A 10/12/2014
002 A 10/13/2014 002 A 2/2/2015 002 A 11/15/2015
What I would like to do is read in the first record of ID/Test, and then compare it to the next record of the same ID/Test. If that test date is NOT at least 365 days later then delete it. And then re-test the next record. If it is at least 365 days later, then I will keep it, and use it as the new comparison date within that ID/Test group for the next records. But each ID/Test combination will have a varying number of records and dates.
I would like it to end up like this:
ID Test Date
001 A 9/1/2011
001 A 9/12/2012
001 A 10/10/2013 001 B 10/1/2011 002 A 10/12/2014
002 A 11/15/2015
Thanks for any help -
ETA: Code I have tried:
data want; set have;
lagid=lag(id); lagtest=lag(test); lagdate=lag(date):
if id=lagid AND test=lagtest then days=date-lagdate;
if 1 le days le 365 then delete;
run;
This code only works for pairs that are next to each other. In my sample data it would give me the incorrect results of -
ID Test Date
001 A 9/1/2011
001 A 10/10/2013
001 B 10/1/2011
002 A 10/12/2014
ETA: I found a solution using RETAIN and set by ID and Test.
data begin;
input ID Test $ date mmddyy10.;
cards;
001 A 09/01/2011
001 A 10/02/2011
001 A 09/12/2012
001 A 10/10/2013
001 B 10/01/2011
001 B 01/01/2012
002 A 10/12/2014
002 A 10/13/2014
002 A 02/02/2015
002 A 11/15/2015
;
run;
proc sort data=begin; by id test date; run;
data processed;
retain days_since;
set begin;
by id test;
if first.test then do; /*Prime the flow variable and output the base values*/
days_since=date;
output;
end;
if (date-days_since)>=365 then do;
days_since = date;
output;
end;
format date yymmdd10.;
run;

How to flag or separate data based on a condition within an ID variable sas

I would like to pick values within an ID variable which are 10% of each other.
For example, my data looks like this:
ID Var1
001 100
001 109
001 200
001 210
001 220
001 300
001 310
002 500
002 510
My desired output is some way to flag this so that I can separate this into groups:
ID Var1 Flag
001 100 1
001 109 1
001 200 2
001 210 2
001 220 2
001 300 3
001 310 3
002 500 1
002 510 1
I tried using a lag function and flagging data but it only flags the second row in a pair; I am not able to pull both the values in a pair that are within 10 percent of each other.
Here's how to flag if the difference between records are within 10% of each other. You can determine the 10% ratio by dividing the numbers, subtracting 1 and taking the absolute value. This assumes your data is sorted by ID and ascending var1 value.
data want;
set have;
by ID;
retain group;
lagv1=lag(var1);
if first.id then do;
lagv1=.;
group=1;
end;
else do;
diff = abs(var1/lagv1-1);
if diff >0.1 then group+1;
end;
run;