SAS retain statement not working as I hoped - sas

I have the following dataset
data have;
input SUBJID VISIT$ PARAMN ABLF$ AVAL;
cards;
1 screen 1 . 151
1 random 1 YES .
1 visit1 1 . .
1 screen 2 . 65.5
1 random 2 YES 65
1 visit1 2 . .
1 screen 3 . .
1 random 3 YES 400
1 visit1 3 . 420
;
run;
I want to create another variable called BASE that captures the value of AVAL (when there is an actual value in place) when ABLF=YES and and then drag it down until a new PARAMN is encountered.
Basically I want the output to look like this
SUBJID VISIT$ PARAMN ABLF$ AVAL BASE;
1 screen 1 . 151 .
1 random 1 YES . .
1 visit1 1 . . .
1 screen 2 . 65.5 65
1 random 2 YES 65 65
1 visit1 2 . . 65
1 screen 3 . . 400
1 random 3 YES 400 400
1 visit1 3 . 420 400
I used the the following code
data want;
set have;
by SUBJID PARAMN;
if first.PARAMN and ABLF=' ' then BASE=.;
if ABLF='YES' then BASE=AVAL;
retain BASE;
run;
however when I run this I don't the data to look exactly as I want above

RETAIN does not look like the right tool for this. RETAIN can only move data forward in the file. It cannot move it backwards.
Looks like there is just one observation with the "BASE" value. So just merge it back onto the data.
data want;
merge have
have(keep=subjid paramn aval ablf rename=(aval=BASE ablf=xx)
where=(xx='YES'))
;
by SUBJID PARAMN;
drop xx;
run;

Pro SQL:
proc sql;
select a.*,b.aval as BASE from have a left join have(drop=visit where=(ablf='YES')) b
on a.subjid=b.subjid and a.paramn=b.paramn;
quit;
Double do loop:
data want;
do until(last.visit);
set have;
retain temp;
by subjid paramn notsorted;
if ablf='YES' then temp=aval;
end;
do until(last.visit);
set have;
by subjid paramn notsorted;
base=temp;
end;
drop temp;
run;

Related

Copy value from observation below in SAS

I want to copy an observation to the lines above and below in variables with a specific characteristic (in this case the ID).If the variable is missing, no actions are required. I really have no clue how to do this. Any help would be greatly appreciated. Thanks in advance!
Have:
ID var
1 .
1 1
1 1
1 1
1 .
1 .
1 .
2 .
2 .
2 .
2 .
2 .
3 .
3 .
3 1
3 .
4 .
4 .
4 .
4 .
Want:
ID var
1 1
1 1
1 1
1 1
1 1
1 1
1 1
2 .
2 .
2 .
2 .
2 .
3 1
3 1
3 1
3 1
4 .
4 .
4 .
4 .
Just merge the data with itself.
data want;
merge have(drop=var) have(keep=id var where=(var=1));
by id;
run;
Another method: proc timeseries if you have SAS/ETS.
proc timeseries data=have out=want(drop=time);
by id;
var var / setmissing=maximum;
run;
But if it absolutely has to be the next value, you can run through proc timeseries twice: once to get the next value in-line and another to get the rest.
proc timeseries data=have out=have2;
by id;
var var / setmissing=next;
run;
proc timeseries data=have2 out=want(drop=time);
by id;
var var / setmissing=previous;
run;
Simplest way is SQL.
take aggregate statistic of column and assign to new value
Must select all variables so that a summary table is not generated
proc sql;
create table want as
select ID, var, max(var) as filled_var
from have
group by ID
from have;
quit;
Or in a data step.
sort by ID and descending VAR, any empty values are now at the end
Use RETAIN to hold value across rows
Drop old variable
proc sort data=have;
by id descending var;
run;
data want;
set have;
retain filled_var;
if first.id then filled_var = var;
keep id filled_var;
run;
data have;
input ID var;
datalines;
1 .
1 1
1 1
1 1
1 .
1 .
1 .
2 .
2 .
2 .
2 .
2 .
3 .
3 .
3 1
3 .
4 .
4 .
4 .
4 .
;
data want(drop = v);
do _N_ = 1 by 1 until (last.ID);
set have;
by ID;
v = max(v, var);
end;
do _N_ = 1 to _N_;
set have;
var = v;
output;
end;
run;

SAS: assign a value to a variable based on characteristics of other variables

I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;

List frequency of presence of each variable using loop in SAS

I tried some solutions already here and I am still unable to get a desired output.
The data I have is given below (ID is unique):
data have;
input id code_1 code_2 code_3 code_4 randa randb randc$;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;;;;;
run
I need to get the frequency of only the presence of various codes. (code1, code2 etc..)
The desired output:
Variable Frequency
code_1 3
code_2 1
code_3 2
code_4 3
I tried the solution in this and the code is given below:
ods output onewayfreqs=preds;
proc freq data=have;
tables _all_;
run;
ods output close;
proc tabulate data=preds;
class table frequency;
tables table,frequency;
run;
Output:
Frequenza
1 2 3
N N N
Table 1 . 1
Tabella code_1
Tabella code_2 1 . 1
Tabella code_3 . 2 .
Tabella code_4 1 . 1
Tabella id 4 . .
Tabella randa 4 . .
Tabella randb 4 . .
Tabella randc 4 . .
Also I tried as the code below:
proc freq data=have order=freq;
array codes code_:;
do _n_ = 1 to dim(codes);
table codes(_n_)/list missing out=var1_freq;
end;
run;
But I donot know how to write the code properly.
I am getting output for the code below (only for one code at a time):
proc freq data=have order=freq ;
tables code_1/list missing out=var1_freq;
run;
But how to get for multiple codes? Many thanks for your help..!
The out= option for the tables statement will only produce output for the last variable listed, so you won't get all 4 codes.
You can count the 1 valued code_* variables after transposition.
data have;
input id code_1 code_2 code_3 code_4 randa randb randc $ ;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;
data idcodes / view=idcodes;
set have;
array codes code_1-code_4;
do _n_ = 1 to dim (codes);
variable = vname(codes(_n_));
flag = codes(_n_);
output;
end;
keep id variable flag;
run;
proc freq data=idcodes;
where flag;
table variable / out=freqs(keep=variable count);
run;
Presuming codes are only 0/1, you could also sum the codes and transpose the result.
proc means noprint data=have;
var code_:;
output out=flagsum sum=;
run;
proc transpose data=flagsum out=want(rename=(_name_=variable col1=frequency));
var code_:;
run;

SAS Comparing values across multiple columns of the same observation?

I have observations with column ID, a, b, c, and d. I want to count the number of unique values in columns a, b, c, and d. So:
I want:
I can't figure out how to count distinct within each row, I can do it among multiple rows but within the row by the columns, I don't know.
Any help would be appreciated. Thank you
********************************************UPDATE*******************************************************
Thank you to everyone that has replied!!
I used a different method (that is less efficient) that I felt I understood more. I am still going to look into the ways listed below however to learn the correct method. Here is what I did in case anyone was wondering:
I created four tables where in each table I created a variable named for example ‘abcd’ and placed a variable under that name.
So it was something like this:
PROC SQL;
CREATE TABLE table1_a AS
SELECT
*
a as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table2_b AS
SELECT
*
b as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table3_c AS
SELECT
*
c as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table4_d AS
SELECT
*
d as abcd
FROM table_I_have_with_all_columns
;
QUIT;
Then I stacked them (this means I have duplicate rows but that ok because I just want all of the variables in 1 column and I can do distinct count.
data ALL_STACK;
set
table1_a
table1_b
table1_c
table1_d
;
run;
Then I counted all unique values in ‘abcd’ grouped by ID
PROC SQL ;
CREATE TABLE count_unique AS
SELECT
My_id,
COUNT(DISTINCT abcd) as Count_customers
FROM ALL_STACK
GROUP BY my_id
;
RUN;
Obviously, it’s not efficient to replicate a table 4 times just to put a variables under the same name and then stack them. But my tables were somewhat small enough that I could do it and then immediately delete them after the stack. If you have a very large dataset this method would most certainly be troublesome. I used this method over the others because I was trying to use Procs more than loops, etc.
A linear search for duplicates in an array is O(n2) and perfectly fine for small n. The n for a b c d is four.
The search evaluates every pair in the array and has a flow very similar to a bubble sort.
data have;
input id a b c d; datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
55 . . . .
66 1 2 3 4
run;
The linear search for duplicates will occur on every row, and the count_distinct will be initialized automatically in each row to a missing (.) value. The sum function is used to increment the count when a non-missing value is not found in any prior array indices.
* linear search O(N**2);
data want;
set have;
array x a b c d;
do i = 1 to dim(x) while (missing(x(i)));
end;
if i <= dim(x) then count_distinct = 1;
do j = i+1 to dim(x);
if missing(x(j)) then continue;
do k = i to j-1 ;
if x(k) = x(j) then leave;
end;
if k = j then count_distinct = sum(count_distinct,1);
end;
drop i j k;
run;
Try to transpose dataset, each ID becomes one column, frequency each ID column by option nlevels, which count frequency of value, then merge back with original dataset.
Proc transpose data=have prefix=ID out=temp;
id ID;
run;
Proc freq data=temp nlevels;
table ID:;
ods output nlevels=count(keep=TableVar NNonMisslevels);
run;
data count;
set count;
ID=compress(TableVar,,'kd');
drop TableVar;
run;
data want;
merge have count;
by id;
run;
one more way using sortn and using conditions.
data have;
input id a b c d; datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
55 . . . .
66 1 2 3 4
77 . 3 . 4
88 . 9 5 .
99 . . 2 2
76 . . . 2
58 1 1 . .
50 2 . 2 .
66 2 . 7 .
89 1 1 1 .
75 1 2 3 .
76 . 5 6 7
88 . 1 1 1
43 1 . . 1
31 1 . . 2
;
data want;
set have;
_a=a; _b=b; _c=c; _d=d;
array hello(*) _a _b _c _d;
call sortn(of hello(*));
if a=. and b = . and c= . and d =. then count=0;
else count=1;
do i = 1 to dim(hello)-1;
if hello(i) = . then count+ 0;
else if hello(i)-hello(i+1) = . then count+0;
else if hello(i)-hello(i+1) = 0 then count+ 0;
else if hello(i)-hello(i+1) ne 0 then count+ 1;
end;
drop i _:;
run;
You could just put the unique values into a temporary array. Let's convert your photograph into data.
data have;
input id a b c d;
datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
;
So make an array of the input variables and another temporary array to hold the unique values. Then loop over the input variables and save the unique values. Finally count how many unique values there are.
data want ;
set have ;
array unique (4) _temporary_;
array values a b c d ;
call missing(of unique(*));
do _n_=1 to dim(values);
if not missing(values(_n_)) then
if not whichn(values(_n_),of unique(*)) then
unique(_n_)=values(_n_)
;
end;
count=n(of unique(*));
run;
Output:
Obs id a b c d count
1 11 2 3 4 4 3
2 22 1 8 1 1 2
3 33 6 . 1 2 3
4 44 . 1 1 . 1

how to extract specific variable to to fill another cell value?

I have the following dataset created for this example
/*sample data*/
data have;
input subj param value visit$ base;
cards;
1 1 50 scr .
1 1 55 rand 55
1 1 . 1 55
1 1 . 2 55
1 2 120 scr .
1 2 125 rand 125
1 2 . 1 125
1 2 . 2 125
;
run;
I want to make sure that the base value of scr is the same as 'base' when visit='rand' so that it looks like the following
/*sample data*/
data want;
set have;
input subj param value visit$ base;
cards;
1 1 50 scr 55
1 1 55 rand 55
1 1 . 1 55
1 1 . 2 55
1 2 120 scr 125
1 2 125 rand 125
1 2 . 1 125
1 2 . 2 125
;
run;
Double do-loop:
data want;
do until(last.param);
set have;
by subj param notsorted;
if visit='rand' then temp=base;
end;
do until(last.param);
set have;
by subj param notsorted;
if visit='scr' then base=temp;
output;
end;
drop temp;
run;
Here's a different approach, which uses SQL and joins the table with itself.
Not quite sure how to get back your original order, if that's important you may need to add a sort variable ahead of time.
proc sql;
create table want as
select t1.subj, t1.param, t1.value, t1.visit,
case when visit='scr' then t2.base
else t1.base
end as base
from have as t1
left join (select subj, param, base from have where visit = 'rand') as t2
on t1.subj=t2.subj
and t1.param=t2.param
order by 1, 2, 3, 4;
quit;
Lots of ways to do this. Assuming the data is not huge in size, use a hash table to update the values.
Get the %create_hash() macro from here
data have;
input subj param value visit$ base;
cards;
1 1 50 scr .
1 1 55 rand 55
1 1 . 1 55
1 1 . 2 55
1 2 120 scr .
1 2 125 rand 125
1 2 . 1 125
1 2 . 2 125
;
run;
data want;
set have;
if _n_ =1 then do;
%create_hash(lk,subj param,base,"have(where=(visit='rand'))");
/* The macro above generates the following code
declare hash lk(dataset:"have(where=(visit='rand'))");
rc = lk.definekey( "subj", "param" );
rc = lk.definedata("base");
rc = lk.definedone();
*/
end;
rc = lk.find();
drop rc;
run;
This reads the data set into memory and gives you a hash based look up into the values. Use a subsetting where clause to only load the rand records into the hash.