Shift columns to the right - sas

I have a SAS dataset which looks like this:
Month Col1 Col2 Col3 Col4
200801 11 2 3 20
200802 5 9 4 10
. . . . .
. . . . .
. . . . .
201212 3 34 1 0
I want to create a dataset by shift each row's column Col1-Col4 values, to the right. It will look diagonally shifted.
Month Col1 Col2 Col3 Col4 Col5 Col6 Col7 . . . . . . . Coln
200801 11 2 3 20
200802 . 5 9 4 10
. . . . .
. . . . .
. . . . .
201212 . . . . . . . . . 3 34 1 0
Can someone suggest how I can do it?
Thanks!

First off, if you can avoid doing so, do. This is a pretty sparse way to store data, and will involve large datasets (definitely use OPTIONS COMPRESS at least), and usually can be worked around with good use of CLASS variables.
If you really must do this, PROC TRANSPOSE is your friend. While this is possible in the data step, it's less messy and more flexible in PROC TRANSPOSE.
First, make a totally vertical dataset (month+colname+colvalue):
data pre_t;
set have;
array cols col1-col4;
do _t = 1 to dim(cols);
colname = cats("col",((_N_-1) + _t)); *shifting here, edit this logic as needed;
value = cols[_t];
output;
keep colname value month;
run;
In that datastep, you are creating the eventual column name in colname and setting it up for transpose. If you have data not identical to the above (in particular, if you have data grouped by something else), N may not work and you may need to do some logic (such as figuring out difference from 200801) to calculate the col#.
Then, proc transpose:
proc transpose data=pre_t out=want;
by month;
id colname;
var value;
run;
And voilĂ , you should have what you were looking for. Make sure it's sorted properly in order to get the output in the expected order.

Related

How to remove missing value in SAS by a sequence of variables

Here is the demonstrate data.
data faminc;
input famid faminc1-faminc12;
cards;
1 3281 3413 3114 2500 2700 . 3114 3319 3514 1282 2434 2818
2 4042 . . . . . 1531 2914 3819 4124 4274 4471
3 6015 . . . . . . . . . . .
;
run;
I would like to create an indicator variable called fam_indicator. If variables faminc2-faminc12 are all missing, then fam_indicator=1. Otherwise fam_indicator=0.
I tried the code below but it didn't work.
data fam;
set faminc;
if missing(faminc2-faminc12) then fam_indicator=1;
else fam_indicator=0;
run;
You can do this a bunch of different ways. If the variables are all numeric, then n will do it for you.
data fam;
set faminc;
if n(of faminc2-faminc12) eq 0 then fam_indicator=1;
else fam_indicator=0;
run;
cmiss and nmiss also could work; cmiss is generic regardless of type, while nmiss is only for numerics. They would count the number of missings, so you'd want if cmiss(of faminc2-faminc12) eq 11 or similar.
The other thing you needed was the of. n(faminc2-faminc12) would just subtract the one from the other. of says "the next thing here is a variable list" and it will then expand the list out.
nmiss function could be used directly, sum function is also another option, sum of all missing values is still missing value.
fam_indicator=ifn(sum(of faminc2-faminc12)=.,1,0);

SAS: Adding observation and fill forward

I want to add an observation in SAS per group at a certain time and fill forward all values (except the time). I don't want to do it manually with datalines and proc append. Is there another way?
In the example: always insert a row per security at exactly 10:00am and use the value from the one above:
Security Time Value
ABC 9:59 2
ABC 10:01 3
.
.
.
DCE 9:58 9
DCE 10:01 3
.
.
Output:
Security Time Value
ABC 9:59 2
ABC 10:00 2
ABC 10:01 3
.
.
.
DCE 9:58 9
DCE 10:00 9
DCE 10:01 3
.
.
Thankful for any help!
Best
Also you can use proc sql to insert row:
PROC SQL;
INSERT INTO table_name
VALUES (value1,value2,value3,...);
QUIT;
OR
PROC SQL;
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);
QUIT;

Using the UPDATE statement in SAS to carry forward the last observation by group

I have a dataset with observations of patients and their diagnoses at multiple points in time. The values of the dummy variables for diagnosis are sometimes missing. Here is an example:
data have ;
infile datalines dsd delimiter=' ';
input patient $ year $ K50 $ K51 $ K52 $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . . .
2 2009 1 . .
2 2010 . . .
2 2013 . 1 .
2 2015 . . .
;
run;
If the values of the dummy variables are missing in the current observation, I want to carry forward the values of the dummy variables in the previous observation, provided that the patient ID is the same. To achieve this, I have experimented with the following code:
data master_dt;
if 0 then set have;
if 1 then delete;
run;
data master_dt;
update master_dt have;
by patient;
output;
run;
Unfortunately, the code above does not achieve quite what I am looking for. It carries forward the value of a dummy variable to the next observation if the value of that variable is missing in the next observation, regardless of whether any of the other variables in the observation are present. I only want to carry forward values when all dummy values are missing in the next observation.
Any ideas how I can modify my code to achieve this?
Data set options. Your data set to create master with 0 obs is not needed. Also you INFILE statement in data have is unnecessary and causing problems.
data have ;
input patient $ year $ K50 $ K51 $ K52 $ ;
datalines;
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . . .
2 2009 1 . .
2 2010 . . .
2 2013 . 1 .
2 2015 . . .
;
run;
proc print;
run;
data want;
if 0 then set have;
update have(obs=0 keep=patient) have(drop=year);
by patient;
set have(keep=year);
output;
run;
proc print;
run;
So if want the missing values to overwrite the previous values you need to make them have the special missing value of ._.
data fix_missing ;
set have ;
array x k50-k52 ;
if 0 < N(of x(*)) < dim(x) then do _n_=1 to dim(x);
if x(_n_)=. then x(_n_)=._;
end;
run;
data want;
update have(obs=0) fix_missing;
by patient;
output;
run;
Which yields this list of values:
1 2010 . . .
1 2011 . 1 .
1 2012 . 1 1
1 2014 . 1 1
2 2009 1 . .
2 2010 1 . .
2 2013 . 1 .
2 2015 . 1 .

Combining SAS Data Sets with different no. of columns

I am having problem in combining two tables with different no. of columns.
Say my first table is table1:
table1
t1_col_1 t1_col_2 t1_col_3 ... t1_col_13
and my second table is table2:
table2
t2_col_1 t2_col2 t2_col3 t2_col4
Now if I type command:
data table3;
set tabel1 table2;
run;
What will be the out put of table3 ?
The SAS link says this command do a concatanation:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001107839.htm
Since the columns no. are different, concatenation will cause problem.
So how does this command exactly works ? And what will be its output in this case ?
Appending (concatenating) two or more data sets is basically just stacking the data sets together with values in variables of the same name being stacked together. Unique variables in each data set will form their own variables in the new combined data set. Right now we have different number of variables. This article explains how concatenation works between data sets with different variables: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001312944.htm
For example, suppose we have:
data work.table1;
input col1 $ col2 col3 col4;
datalines;
George 10 10 10
Lucy 10 10 10
;
run;
data work.table2;
input col1 $ col2;
datalines;
Shane 3
Peter 3
;
run;
data work.table3;
set table1 table2;
run;
OUTPUT:
col1 col2 col3 col4
George 10 10 10
Lucy 10 10 10
Peter 3 . . <== These entries are
Shane 3 . . empty.
col1 and col2 are present in both sets, so the values inside them will be stacked. col3 and col4 are only present in table1, so some of the values under them in the new combined set will be empty.

Generating Interdependent Data in SAS

I am trying to compute a column in SAS, that has dependency on itself. For example, I have the following list of initial values
ID Var_X Var_Y Var_Z
1 2 3 .
2 . 2 .
3 . . .
4 . . .
5 . . .
6 . . .
7 . . .
I need to fill up the blank spaces. The formulae are as follows:
Var_Z = 0.1 + 4*Var_x + 5*Var_Y
Var_X = lag1(Var_Z)
Var_Y = lag2(Var_Z)
As we see values of Var_X, Var_Y and Var_Z are inter-dependent. So the computaion needs to follow an specific order.
First we compute when ID = 1, Var_Z = 0.1 + 4*2 + 5*3 = 23.1
Next, when ID = 2, Var_X = lag1(Var_Z) = 23.1
Var_Y does not need computation at ID = 2 as we already have the initial value here. So, we have
ID Var_X Var_Y Var_Z
1 2 3 23.1
2 23.1 2 102.5 (= 0.1 + 4*23.1 +5*2)
3 . . .
4 . . .
5 . . .
6 . . .
7 . . .
We keep repeating this procedure until all vaues are calculated.
Is there a way, SAS can handle this? I tried DO loop, but I guess I did not do a good job coding it right. It just stops after ID = 2.
I am new at SAS so not familiar if there is a way SAS can handle this easily. Will wait for your suggestions.
You don't need to use LAG or RETAIN, if you're just doing this in a single data step. DO loop by itself will handle things nicely. RETAIN would only be needed if we were doing something involving a pre-existing data set, but there's really no reason to use one.
I'm using a shortcut here - while you describe VAR_Y in terms of VAR_Z, you really mean that after one iteration, VAR_Z moves to VAR_X and VAR_X moves to VAR_Y, so I do that (in the proper order to not mix things up).
data test_data;
if _n_ = 1 then do;
var_x=2;
var_y=3;
end;
do _iter = 1 to 7;
var_z = 0.1+4*var_x+5*var_y;
output;
var_y=var_x;
var_x=var_z;
end;
run;
proc print data=test_data;
run;
I believe you can do this within a DO loop - the key is making SAS remember the last values of your variables. My suggestion is to poke around a bit for a simple "counter" program that, in pseudo SAS code, is something like:
Do i = 1 to 100;
i = i + 1;
run;
And see what the actual syntax is in SAS. I suspect your problem is you're not using the retain statement within your DO loop. Check the SAS documentation for that and see if it fixes your problem?