SAS: Adding observation and fill forward - sas

I want to add an observation in SAS per group at a certain time and fill forward all values (except the time). I don't want to do it manually with datalines and proc append. Is there another way?
In the example: always insert a row per security at exactly 10:00am and use the value from the one above:
Security Time Value
ABC 9:59 2
ABC 10:01 3
.
.
.
DCE 9:58 9
DCE 10:01 3
.
.
Output:
Security Time Value
ABC 9:59 2
ABC 10:00 2
ABC 10:01 3
.
.
.
DCE 9:58 9
DCE 10:00 9
DCE 10:01 3
.
.
Thankful for any help!
Best

Also you can use proc sql to insert row:
PROC SQL;
INSERT INTO table_name
VALUES (value1,value2,value3,...);
QUIT;
OR
PROC SQL;
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);
QUIT;

Related

Transposing table while collapsing duplicate observations per BY group

I have a dataset with diagnosis records, where a patient can have one or more records even for same code. I am unable to use group by variable 'code' since it shows error similar as The ID value "code_v58" occurs twice in the same BY group.
data have;
input id rand found code $;
datalines;
1 101 1 001
2 102 1 v58
2 103 0 v58 /* second diagnosis record for patient 2 */
3 104 1 v58
4 105 1 003
4 106 1 003 /* second diagnosis record for patient 4 */
5 107 0 v58
;
Desired output:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 1 . /* second diagnosis code's {v58} status for patient 2 is 1, so it has to be taken*/
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
When I tried with let statement like [this],
proc transpose data=temp out=want(drop=_name_) prefix=code_ let;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
I got output as below:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 0 .
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
I tried this and modified PROC TRANSPOSE to use ID and count in the BY statement
proc transpose data=temp out=want(drop=_name_) prefix=code_;
by id count;
id code; * column name becomes <prefix><code>;
var found;
run;
and got output like below:
Obs id count code_001 code_v58 code_003
1 1 1 1 . .
2 2 1 . 1 .
3 2 2 . 0 .
4 3 1 . 1 .
5 4 1 . . 1
6 4 2 . . 1
7 5 1 . 0 .
May I know how to remove duplicate patient ids and update the code to 1 if found in any records?
You can transpose a group aggregate view.
proc sql;
create view have_v as
select id, code, max(found) as found
from have
group by id, code
order by id, code
;
proc transpose data=have_v out=want prefix=code_;
by id;
id code;
var found;
run;
Follow up with Proc STDIZE (thanks #Reeza) if you want to replace the missing values (.) with 0
proc stdize data=want out=want missing=0 reponly;
var code_:;
run;
Seems to me that you want something like this - first preprocess the data to get the value you want for FOUND, then transpose (if you actually need to). The TABULATE does what it seems like you want to do for FOUND (take the max value of it, 1 if present, 0 if only 0s are present, missing otherwise), and then TRANSPOSE that the same way you were doing before.
proc tabulate data=have out=tab;
class id code;
var found;
tables id,code*found*max;
run;
proc transpose data=tab out=want prefix=code_;
by id;
id code;
var found_max;
run;

SAS Finding Last Year data from archive

Good day, I am looking through an archive of policies and want to create a variables (column) that shows the price of the policy from 1 year ago.
Every policy has a Policy ID, and the archive has every policy (including renewals). So the same Policy ID can appear more than once in the archive but have different values in every other column. For example, say I have this
Policy_ID Start_Date End_Date Premium LYPremium15 LYPremium16
1 01/01/2015 31/12/2015 500 . .
2 04/03/2015 03/03/2016 450 . .
3 03/02/2015 02/02/2016 600 . .
4 07/04/2015 06/04/2016 470 . .
5 01/01/2015 31/12/2015 500 . .
2 04/03/2016 03/03/2017 510 . .
I would like to fill the columns LYPremium15, LYPremium16, LYPremium17 with the premium from the year before. So it will look like this,
Policy_ID Start_Date End_Date Premium LYPremium15 LYPremium16
1 01/01/2015 31/12/2015 500 . .
2 04/03/2015 03/03/2016 450 . .
3 03/02/2015 02/02/2016 600 . .
4 07/04/2015 06/04/2016 470 . .
5 01/01/2015 31/12/2015 500 . .
2 04/03/2016 03/03/2017 510 450 .
Because Policy ID 2 is a renewal, so it does have data from last year.
I am new to SAS, and not sure how I can code this. I was thinking of using where combined with if and contains but I am not sure that is an option.
Can I use the standard way of creating variable?
data mylib.van_LYprem;
set mylib.van_combined_total;
LYPrem15=...;
run;
Or will I have to approach this in a more advanced way?
SAS will process your dataset record by records. So you will have to keep the old year values.
I assume the startdate is what determines the year.
If we sort the dataset like :
proc sort data=work.van_combined_total;
by Policy_ID start_date;
run;
We can use a by statement and retain the values;
data work.van_LYprem;
set work.van_combined_total;
by Policy_ID start_date;
retain LYPrem15 LYPrem16 LYPrem17;
if (first.Policy_ID) then do;
LYPrem15=.;
LYPrem16=.;
LYPrem17=.;
end;
output;
if(year(start_date) eq 2015) then do;
LYPrem15=Premium;
end;
if(year(start_date) eq 2016) then do;
LYPrem16=Premium;
end;
if(year(start_date) eq 2017) then do;
LYPrem17=Premium;
end;
run;
After this you will have records with premium and LYPremiumXX. If there are more renewals in 1 year you will only have the last value in LYPremiumXX...
You could make it more dynamic using macro's...

Why many to many merge doesn't do cartesian product

data jul11.merge11;
input month sales ;
datalines ;
1 3123
1 1234
2 7482
2 8912
3 1284
;
run;
data jul11.merge22;
input month goal ;
datalines;
1 4444
1 5555
1 8989
2 9099
2 8888
3 8989
;
run;
data jul11.merge1;
merge jul11.merge11 jul11.merge22 ;
by month;
difference =goal - sales ;
run;
proc print data=jul11.merge1 noobs;
run;
output:
month sales goal difference
1 3123 4444 1321
1 1234 5555 4321
1 1234 8989 7755
2 7482 9099 1617
2 8912 8888 -24
3 1284 8989 7705
Why it didn't match all observation in table 1 with in table 2 for common months ?
pdv retains data of observation to seek if any more observation are left for that particular by group before it reinitialises it , in that case it should have done cartesian product .
Gives perfect cartesian product for one to many merging but not for many to many .
This is because of how SAS processes the data step. A merge is never a true cartesian product (ie, all records are searched and matched up against all other records, like a SQL comma join might ); what SAS does (in the case of two datasets) is it follows down one dataset (the one on the left) and advances to the next particular by-group value; then it looks over on the right dataset, and advances until it gets to that by group value. If there are other records in between, it processes those singly. If there are not, but there is a match, then it matches up those records.
Then it looks on the left to see if there are any more in that by group, and if so, advances to the next. It does the same on the right. If only one of these has a match then it will only bring in those values; hence if it has 1 element on the left and 5 on the right, it will do 1x5 or 5 rows. However, if there are 2 on the left and 3 on the right, it won't do 2x3=6; it does 1:1, 2:2, and 2:3, because it's advancing record pointers sequentially.
The following example is a good way to see how this works. If you really want to see it in action, throw in the data step debugger and play around with it interactively.
data test1;
input x row1;
datalines;
1 1
1 2
1 3
1 4
2 1
2 2
2 3
3 1
;;;;
run;
data test2;
input x row2;
datalines;
1 1
1 2
1 3
2 1
3 1
3 2
3 3
;;;;
run;
data test_merge;
merge test1 test2;
by x;
put x= row1= row2=;
run;
If you do want to do a cartesian join in SAS datastep, you have to do nested SET statements.
data want;
set test1;
do _n_ = 1 to nobs_2;
set test2 point=_n_ nobs=nobs_2;
output;
end;
run;
That's the true cartesian, you can then test for by group equality; but that's messy, really. You could also use a hash table lookup, which works better with BY groups. There are a few different options discussed here.
SAS doesn't handle many-to-many merges very well within the datastep. You need to use a PROC SQL if you want to do a many-to-many merge.

Shift columns to the right

I have a SAS dataset which looks like this:
Month Col1 Col2 Col3 Col4
200801 11 2 3 20
200802 5 9 4 10
. . . . .
. . . . .
. . . . .
201212 3 34 1 0
I want to create a dataset by shift each row's column Col1-Col4 values, to the right. It will look diagonally shifted.
Month Col1 Col2 Col3 Col4 Col5 Col6 Col7 . . . . . . . Coln
200801 11 2 3 20
200802 . 5 9 4 10
. . . . .
. . . . .
. . . . .
201212 . . . . . . . . . 3 34 1 0
Can someone suggest how I can do it?
Thanks!
First off, if you can avoid doing so, do. This is a pretty sparse way to store data, and will involve large datasets (definitely use OPTIONS COMPRESS at least), and usually can be worked around with good use of CLASS variables.
If you really must do this, PROC TRANSPOSE is your friend. While this is possible in the data step, it's less messy and more flexible in PROC TRANSPOSE.
First, make a totally vertical dataset (month+colname+colvalue):
data pre_t;
set have;
array cols col1-col4;
do _t = 1 to dim(cols);
colname = cats("col",((_N_-1) + _t)); *shifting here, edit this logic as needed;
value = cols[_t];
output;
keep colname value month;
run;
In that datastep, you are creating the eventual column name in colname and setting it up for transpose. If you have data not identical to the above (in particular, if you have data grouped by something else), N may not work and you may need to do some logic (such as figuring out difference from 200801) to calculate the col#.
Then, proc transpose:
proc transpose data=pre_t out=want;
by month;
id colname;
var value;
run;
And voilĂ , you should have what you were looking for. Make sure it's sorted properly in order to get the output in the expected order.

Combining SAS Data Sets with different no. of columns

I am having problem in combining two tables with different no. of columns.
Say my first table is table1:
table1
t1_col_1 t1_col_2 t1_col_3 ... t1_col_13
and my second table is table2:
table2
t2_col_1 t2_col2 t2_col3 t2_col4
Now if I type command:
data table3;
set tabel1 table2;
run;
What will be the out put of table3 ?
The SAS link says this command do a concatanation:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001107839.htm
Since the columns no. are different, concatenation will cause problem.
So how does this command exactly works ? And what will be its output in this case ?
Appending (concatenating) two or more data sets is basically just stacking the data sets together with values in variables of the same name being stacked together. Unique variables in each data set will form their own variables in the new combined data set. Right now we have different number of variables. This article explains how concatenation works between data sets with different variables: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001312944.htm
For example, suppose we have:
data work.table1;
input col1 $ col2 col3 col4;
datalines;
George 10 10 10
Lucy 10 10 10
;
run;
data work.table2;
input col1 $ col2;
datalines;
Shane 3
Peter 3
;
run;
data work.table3;
set table1 table2;
run;
OUTPUT:
col1 col2 col3 col4
George 10 10 10
Lucy 10 10 10
Peter 3 . . <== These entries are
Shane 3 . . empty.
col1 and col2 are present in both sets, so the values inside them will be stacked. col3 and col4 are only present in table1, so some of the values under them in the new combined set will be empty.