I am trying to create a simple dataset of only 1 column. The values in the column will be "Name1","Name2","Name3",etc... to "Name15". This is the code I have tried:
data names;
drop i;
length Name $15;
do i=1 to 15;
Name=cats("Name",i);
end;
run;
But all this does is print my final name, Name15. I know this is a simple fix but for some reason I just can't figure it out. Any help would be appreciated.
In your case, you need to OUTPUT explicitly.
data names;
drop i;
length Name $15;
do i=1 to 15;
Name=cats("Name",i);
OUTPUT;
end;
run;
Alternatively
data name;
retain Name1-Name15 1;
stop;
run;
proc transpose name=name data=name out=name;
run;
data names;
drop i;
length Name $15;
do i=1 to 15;
Name=cats("Name",i);
output;
end;
run;
hmmmmm ... same answer as Haikuo Bian :-(
Related
I am trying to collapse my multiple rows of binary variables into a single row per patient id as depicted in my illustration. Could someone please help me with the SAS code to do this? Thanks
If the rule is that to set it to 1 if it is ever 1 then take the MAX. If the rule is to set it to one only if all of them are one then take the MIN.
proc summary data=have nway ;
by id;
output out=want max= ;
run;
Update trick
data want;
update have(obs=0) have;
by id;
run;
Or
proc sql;
create table want as
select ID, max('2018'n) as Y2018, max('2019'n) as Y2019, max('2020'n) as Y2020
from have
group by ID
order by ID;
quit;
Untested because you provided data as images, please post as text, preferably as a data step.
Here is a data step-based solution. Certainly more complex than the above answers, but it does show ways you can use arrays, first. and last. processing, and the retain statement.
Use a retained temporary array to hold the values of 2018-2020 until the last observation of each id group. On the last value of each id, check if each held value is 1 and set each value of the year to a 1 or 0.
data want;
set have;
by id;
array year[3] '2018'n--'2020'n;
array hold[3] _TEMPORARY_;
retain hold;
if(first.id) then call missing(of hold[*]);
do i = 1 to dim(year);
if(year[i] = 1) then hold[i] = 1;
end;
if(last.id) then do;
do i = 1 to dim(year);
year[i] = (hold[i] = 1);
end;
output;
end;
drop i;
run;
I have the following problem:
I want to fill missing values with proc expand be simply taking the value from the next data row.
My data looks like this:
date;index;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
As you can see for some dates the index is missing. I want to achieve the following:
date;index;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;-1688
05.Jul09;-1688
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;-1683
12.Jul09;-1683
13.Jul09;-1683
As you can see the values for the missing data where taken from the next row (11.Jul09 and 12Jul09 got the value from 13Jul09)
So proc expand seems to be the right approach and i started using this code:
PROC EXPAND DATA=DUMMY
OUT=WORK.DUMMY_TS
FROM = DAY
ALIGN = BEGINNING
METHOD = STEP
OBSERVED = (BEGINNING, BEGINNING);
ID date;
CONVERT index /;
RUN;
QUIT;
This filled the gaps but from the previous row and whatever I set for ALIGN, OBSERVED or even sorting the data descending I do not achieve the behavior I want.
If you know how to make it right it would be great if you could give me a hint. Good papers on proc expand are apprechiated as well.
Thanks for your help and kind regards
Stephan
I don't know about proc expand. But apparently this can be done with a few steps.
Read the dataset and create a new variable that will get the value of n.
data have;
set have;
pos = _n_;
run;
Sort this dataset by this new variable, in descending order.
proc sort data=have;
by descending pos;
run;
Use Lag or retain to fill the missing values from the "next" row (After sorting, the order will be reversed).
data want;
set have (rename=(index=index_old));
retain index;
if not missing(index_old) then index = index_old;
run;
Sort back if needed.
proc sort data=want;
by pos;
run;
I'm no PROC EXPAND expert but this is what I came up with. Create LEADS for the maximum gap run (2) then coalesce them into INDEX.
data index;
infile cards dsd dlm=';';
input date:date11. index;
format date date11.;
cards4;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
;;;;
run;
proc print;
run;
PROC EXPAND DATA=index OUT=index2 method=none;
ID date;
convert index=lead1 / transform=(lead 1);
CONVERT index=lead2 / transform=(lead 2);
RUN;
QUIT;
proc print;
run;
data index3;
set index2;
pocb = coalesce(index,lead1,lead2);
run;
proc print;
run;
Modified to work for any reasonable gap size.
data index;
infile cards dsd dlm=';';
input date:date11. index;
format date date11.;
cards4;
27.Jun09;
28.Jun09;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
14.Jul09;
15.Jul09;
16.Jul09;
17.Jul09;-1694
;;;;
run;
proc print;
run;
/* find the largest gap */
data gapsize(keep=n);
set index;
by index notsorted;
if missing(index) then do;
if first.index then n=0;
n+1;
if last.index then output;
end;
run;
proc summary data=gapsize;
output out=maxgap(drop=_:) max(n)=maxgap;
run;
/* Gen the convert statement for LEADs */
filename FT67F001 temp;
data _null_;
file FT67F001;
set maxgap;
do i = 1 to maxgap;
put 'Convert index=lead' i ' / transform=(lead ' i ');';
end;
stop;
run;
proc expand data=index out=index2 method=none;
id date;
%inc ft67f001;
run;
quit;
data index3;
set index2;
pocb = coalesce(index,of lead:);
drop lead:;
run;
proc print;
run;
Is there any more elegant way than that presented below for the following task:
to create Indicator Variables (below "MAX_X1" and "MAX_X2") whithin each group (below "key1") of multiple observation (below "key2") with value 1 if this observation corresponds to the maximum value of the variable in eache group and 0 otherwise
data have;
call streaminit(4321);
do key1=1 to 10;
do key2=1 to 5;
do x1=rand("uniform");
x2=rand("Normal");
output;
end;
end;
end;
run;
proc means data=have noprint;
by key1;
var x1 x2;
output out=max
max= / autoname;
run;
data want;
merge have max;
by key1;
drop _:;
run;
proc sql;
title "MAX";
select name into :MAXvars separated by ' '
from dictionary.columns
WHERE LIBNAME="WORK" AND MEMNAME="WANT" AND NAME like "%_Max"
order by name;
quit;
title;
data want; set want;
array MAX (*) &MAXvars;
array XVars (*) x1 x2;
array Indicators (*) MAX_X1 MAX_X2;
do i=1 to dim(MAX);
if XVars[i]=MAX[i] then Indicators[i]=1; else Indicators[i]=0;
end;
drop i;
run;
Thanks for any suggestion of optimization
Proc sql can be used with a group by statement to allow summary functions across values of a variable.
data have;
call streaminit(4321);
do key1=1 to 10;
do key2=1 to 5;
do x1=rand("uniform");
x2=rand("Normal");
output;
end;
end;
end;
run;
proc sql;
create table want
as select
key1,
key2,
x1,
x2,
case
when x1 = max(x1) then 1
else 0 end as max_x1,
case
when x2 = max(x2) then 1
else 0 end as max_x2
from have
group by key1
order by key1, key2;
quit;
It is also possible to do this in a single data step, provided that you read the input dataset twice - this is an example of a double DOW-loop.
data have;
call streaminit(4321);
do key1=1 to 10;
do key2=1 to 5;
do x1=rand("uniform");
x2=rand("Normal");
output;
end;
end;
end;
run;
/*Sort by key1 (or generate index) if not already sorted*/
proc sort data = have;
by key1;
run;
data want;
if 0 then set have;
array xvars[3,2] x1 x2 x1_max_flag x2_max_flag t_x1_max t_x2_max;
/*1st DOW-loop*/
do _n_ = 1 by 1 until(last.key1);
set have;
by key1;
do i = 1 to 2;
xvars[3,i] = max(xvars[1,i],xvars[3,i]);
end;
end;
/*2nd DOW-loop*/
do _n_ = 1 to _n_;
set have;
do i = 1 to 2;
xvars[2,i] = (xvars[1,i] = xvars[3,i]);
end;
output;
end;
drop i t_:;
run;
This may be a bit complicated to understand, so here's a rough explanation of how it flows:
Read one by group with the first DOW-loop, updating rolling max variables as each row is read in. Don't output anything yet.
Now read the same by-group again using the second DOW-loop, checking to see whether each row is equal to the rolling max and outputting each row.
Go back to first DOW-loop, read the next by-group and repeat.
I was hoping to write a macro using IML that would be able to extract the column names of the dataset to use as names later.
Some pseudocode:
proc iml;
read all dataset into matrix_a [colname = varnames];
(...)
names = varnames;
create new_data_set [rownames = names];
quit;
Is this possible?
Sure is.
data test;
array x[10];
do i=1 to 10;
do j=1 to 10;
x[j] = (i-1)*10 + j;
end;
output;
end;
drop i j;
run;
proc iml;
use test;
read all var _num_ into test[colname=names];
close test;
test = test`;
names = names`;
create test_t from test[rowname=names];
append from test[rowname=names];
close test_t;
quit;
I need help with proc transpose procedure in SAS. My code initially was:
proc transpose data=temp out=temp1;
by patid;
var text;
Id datanumber;
run;
This gave me error "The ID value " " occurs twice in the same BY group". I modified the code to this:
proc sort data = temp;
by patid text datanumber;
run;
data temp;
set temp by patid text datanumber;
if first.datanunmber then n = 0;
n+1;
run;
proc sort data = temp;
by patid text datanumber n;
run;
proc transpose out=temp1 (drop=n) let;
by patid;
var text;
id datanumber;
run;
This is giving me error: variable n is not recognized. Adding a let option is giving a lot of error "occurs twice in the same BY group". I want to keep all id values.
Please help me in this.
Data Example:
Patid Text
When you get that error it is telling you that you have multiple data points for one or more variables that you are trying to create. SAS can force the transpose and delete the extra datapoints if you add "let" to the proc transpose line.
Your data is possibly not unique? I created a dataset (with unique values of patid and datanumber) and the transpose works:
data temp (drop=x y);
do x=1 to 4;
PATID='PATID'||left(x);
do y=1 to 3;
DATANUMBER='DATA'||left(y);
TEXT='TEXT'||left(x*y);
output;
end;
end;
proc sort; by _all_;
proc transpose out=temp2 (drop=_name_);
by patid;
var text;
id datanumber;
run;
my recommendation would be to forget the 'n' fix and focus on making the data unique for patid and datanumber, a dirty approach would be:
proc sort data = temp nodupkey;
by patid datanumber;
run;
at the start of your code..
Try to sort your dataset by patid text n datanumber, (n before datanumber).
Try to sort your dataset by patid n datanumber, (n before datanumber). and proc transpose "by patib n ";