I'm trying to reproduce the code found here, specifically on page 7:
http://www.nesug.org/proceedings/nesug04/pm/pm13.pdf
/* set up example*/
%let var_1 = 'abc';
%let var_2 = 'def';
%let var_3 = 'ghi';
%let val_1 = 1.5;
%let val_2 = 3;
%let val_3 = 4.5;
/* use symget to create a list of var names and values */
data scores;
length var_name $32 value 8.;
do _N_ = 1 to 3;
var_name = symget('var_' || left(_N_));
value = symget('val_' || left(_N_));
end;
run;
However, the end result that I'm getting is only the last variable, not all 3:
var_name value
ghi 4.5
I want:
var_name value
abc 1.5
def 3
ghi 4.5
Why isn't this working?
You are missing an output statement to write each row. Insert it here:
do _N_ = 1 to 3;
var_name = symget('var_' || left(_N_));
value = symget('val_' || left(_N_));
output;
end;
Related
I have a SAS data set t3. I want to run a data step inside a loop through a set of variables to create additional sets based on the variable value = 1, and rank two variables bal and otheramt in each subset, and then merge the ranks for each subset onto the original data set. Each rank column needs to be dynamically named so I know what subset is getting ranked. I know how to do proc rank and macros basically but do not know how to do this in the most dynamic way inside of a macro. Can you assist?
ID
bal
otheramt
firstvar
secondvar
lastvar
444
581
100
1
1
555
255
200
1
1
1
666
255
300
--------------
1
--------------
%macro dog();
data new;
set t3;
ARRAY Indicators(5) FirstVar--LastVar;
/*create data set for each of the subsets if firstvar = 1, secondvar = 1 ... lastvar = 1 */
/*for each new data set, rank by bal and otheramt*/
/*name the new rank columns [FirstVar]BalRank, [FirstVar]OtherAmtRank; */
/*merge the new ranks onto the original data set by ID*/
%mend;
%dog()
The Proc rank section would be something like this, but I would need the rank columns to have information about what subset I am ranking.
proc rank data=subset1 out=subset1ranked;
var bal otheramt;
ranks bal_rank otheramt_rank;
run;
Instead of using macro, use data transformation and reshaping that allows simpler steps to be written.
Example:
Rows are split into multiple rows based on flag so group processing in RANK can occur. Two transposes are required to reshape the results back a single row per id.
data have;
call streaminit(20230216);
do id = 1 to 100;
foo = rand('integer', 50,150);
bar = rand('integer', 100,200);
flag1 = rand('integer', 0, 1);
flag2 = rand('integer', 0, 1);
flag3 = rand('integer', 0, 1);
output;
end;
run;
data step1;
set have;
/* important: the group value becomes part of the variable name later */
if flag1 then do; group='flag1_'; output; end;
if flag2 then do; group='flag2_'; output; end;
if flag3 then do; group='flag3_'; output; end;
drop flag:;
run;
proc sort data=step1;
by group;
run;
proc rank data=step1 out=step2;
by group;
var foo bar;
ranks foo_rank bar_rank;
run;
proc sort data=step2;
by id group;
run;
* pivot (reshape) so there is one row per ranked var;
proc transpose data=step2 out=step3(drop=_label_);
by id foo bar group;
var foo_rank bar_rank;
run;
* pivot again so there is one row per id;
proc transpose data=step3 out=step4(drop=_name_);
by id;
var col1;
id group _name_;
run;
* merge so those 0 0 0 flag rows remain intact;
data want;
merge have step4;
by id;
run;
Since we don't have much sample data, I created test data from sashelp.class with some indicator variables like yours.
data have;
set sashelp.class;
firstvar=round(rand('uniform',1));
secondvar=round(rand('uniform',1));
thirdvar=round(rand('uniform',1));
drop sex weight;
run;
Partial output:
Name Age Height firstvar secondvar thirdvar
Alfred 14 69 1 0 1
Alice 13 56.5 0 1 1
Barbara 13 65.3 1 0 0
Carol 14 62.8 0 0 0
To dynamically rank data based on indicator variables, I created a macro that accepts a list of indicators and rank variables. The 2 lists help to create the specific variable names you requested. Here's the macro call:
%rank(indicators=firstvar secondvar thirdvar,
rank_vars=age height);
Here's part of the final output. Notice the indicators in the sample output above coincide with the ranks in this output. Also note that Carol is not in the output because she had no indicators set to 1.
Name Age Height firstvar_age_rank firstvar_height_rank secondvar_age_rank secondvar_height_rank thirdvar_age_rank thirdvar_height_rank
Alfred 14 69 8 11 . . 6.5 10
Alice 13 56.5 . . 3.5 2 4.5 2
Barbara 13 65.3 6.5 8 . . . .
Henry 14 63.5 . . 5.5 5 . .
The full macro is listed below. It has 3 parts.
Create a temp data set with a group variable that contains the number of the indicator variable based on the order of the variable in the list. Whenever an indicator = 1 the obs is output. If an obs has all 3 indicators set to 1 then it will be output 3 times with the group variable set to the number of each indicator variable. This step is important because proc rank will rank groups independently.
Generate the rankings on the temp data set. Each group will be ranked independently of the other groups and can be done in one step.
Construct the final data set by essentially transposing the ranked data into columns.
%macro rank(indicators=, rank_vars=);
%let cnt_ind = %sysfunc(countw(&indicators));
%let cnt_vars = %sysfunc(countw(&rank_vars));
data temp;
set have;
array indicators(*) &indicators;
do i = 1 to dim(indicators);
if indicators(i) = 1 then do;
group = i; * create a group based on order of indicators;
output; * an obs can be output multiple times;
end;
end;
drop i &indicators;
run;
proc sort data=temp;
by group;
run;
* Generate rankings by group;
proc rank data=temp out=ranks;
by group;
var &rank_vars;
ranks
%let vars = ;
%do i = 1 %to &cnt_vars;
%let var = %scan(&rank_vars, &i);
%let vars = &vars &var._rank;
%end;
&vars;
run;
proc sort data=ranks;
by name group;
run;
* Contruct final data set by transposing the ranks into columns;
data want;
set ranks;
by name;
* retain statement to declare new variables and retain values;
retain
%let vars = ;
%do i = 1 %to &cnt_ind;
%let ivar = %scan(&indicators, &i);
%do j = 1 %to &cnt_vars;
%let jvar = %scan(&rank_vars, &j);
%let vars = &vars &ivar._&jvar._rank;
%end;
%end;
&vars;
if first.name then call missing (of &vars);
* option 1: build series of IF statements;
%let vars = ;
%do i = 1 %to &cnt_ind;
%let ivar = %scan(&indicators, &i);
%str(if group = &i then do;)
%do j = 1 %to &cnt_vars;
%let jvar = %scan(&rank_vars, &j);
%let newvar = &ivar._&jvar._rank;
%str(&newvar = &jvar._rank;)
%end;
%str(end;)
%end;
if last.name then output;
drop group
%let vars = ;
%do i = 1 %to &cnt_vars;
%let var = %scan(&rank_vars, &i);
%let vars = &vars &var._rank;
%end;
&vars;
run;
%mend;
When constructing the final data set and transposing the rank variables, there are a couple of options. The first option shown above is to dynamically build a series of if statements. Here is what the code generates:
MPRINT(RANK): * option 1: build series of IF statements;
MPRINT(RANK): if group = 1 then do;
MPRINT(RANK): firstvar_age_rank = age_rank;
MPRINT(RANK): firstvar_height_rank = height_rank;
MPRINT(RANK): end;
MPRINT(RANK): if group = 2 then do;
MPRINT(RANK): secondvar_age_rank = age_rank;
MPRINT(RANK): secondvar_height_rank = height_rank;
MPRINT(RANK): end;
MPRINT(RANK): if group = 3 then do;
MPRINT(RANK): thirdvar_age_rank = age_rank;
MPRINT(RANK): thirdvar_height_rank = height_rank;
MPRINT(RANK): end;
The 2nd option is to use an array and mathematically calculate the index into the array by the group number and variable number. Here is the snippet of macro code to replace the if series code:
* option 2: create arrays and calculate index into array
* by group number and variable number;
array ranks(*) &vars;
array rankvars(*)
%let vars = ;
%do i = 1 %to &cnt_vars;
%let var = %scan(&rank_vars, &i);
%let vars = &vars &var._rank;
%end;
&vars;
%str(idx = dim(rankvars) * (group - 1);)
%str(do i = 1 to dim(rankvars);)
%str(ranks(idx + i) = rankvars(i);)
%str(end;)
Here is the generated code:
MPRINT(RANK): * option 2: create arrays and calculate index into array * by group number and variable number;
MPRINT(RANK): array ranks(*) firstvar_age_rank firstvar_height_rank secondvar_age_rank secondvar_height_rank thirdvar_age_rank
thirdvar_height_rank;
MPRINT(RANK): array rankvars(*) age_rank height_rank;
MPRINT(RANK): idx = dim(rankvars) * (group - 1);
MPRINT(RANK): do i = 1 to dim(rankvars);
MPRINT(RANK): ranks(idx + i) = rankvars(i);
MPRINT(RANK): end;
It takes a minute to understand the array option, but once you do, it is preferable over generating if statments. As the number of variables increases, the code generated by the array option is the same and operates more efficiently.
given dataset 'temp' looks like this..
index
code1
code2
code3
A
P1
P2
P3
B
P1
P3
P4
C
P2
P4
N1
then I want to make new dataset like this
index
P1
P2
P3
P4
n1
A
1
1
1
0
0
B
1
0
1
1
0
C
0
1
0
1
1
My code is here...
%macro freq;
%do i = 1 %to 3;
%do j = 1 %to 5;
if substr(code&i.,1,1) = "P" then
if input(substr(code&i.,2,1),1.) = &j. then p&j. = 1;
if substr(code&i.,1,1) = "N" then
if input(substr(code&i.,2,1),1.) = &j. then n&j. = 1;
%end;
%end;
%mend;
But it's not cool :(
How can I create a new column whose name is the value of variables(code1, code2,...)?
Is there any other simple way?
How about
data have;
input (index code1 code2 code3)($);
datalines;
A P1 P2 P3
B P1 P3 P4
C P2 P4 N1
;
data temp;
set have;
array c code:;
do over c;
v = c;
d = 1;
output;
end;
run;
proc transpose data = temp out = want(drop = _:);
by index;
id v;
var d;
run;
You can achieve this without a macro by using ARRAY and the VNAME function in a DATA step.
data want;
set have;
/* Initialize flag variables. */
length P1-P4 3 N1 3;
/* Define arrays. */
array code [*] code1-code3;
array flags [*] P1-P4 N1;
/* Loop over the arrays. */
do i = 1 to dim(flags);
flags[i] = 0;
do j = 1 to dim(code);
if vname(flags[i]) = code[j] then flags[i] = 1;
end;
end;
keep index P1-P4 N1;
run;
The simplest way to convert values into variable names is via PROC TRANSPOSE. So first convert your wide dataset into a tall dataset. You could use PROC TRANSPOSE to do that, but to make your target dataset PROC TRANSPOSE will need some numeric variable to transpose. So why not use a data step to make the tall dataset and include a numeric variable that is set to 1.
The PROC TRANSPOSE step will give you a dataset with either a 1 or a missing value for the new variables. You can use PROC STDIZE to change the missing values into zeros.
data have;
input index $ (code1-code3) (:$32.) ;
cards;
A P1 P2 P3
B P1 P3 P4
C P2 P4 N1
;
data tall;
set have ;
array code code1-code3;
length _name_ $32 dummy 8;
retain dummy 1;
do column=1 to dim(code);
_name_=code[column];
if not missing(_name_) then output;
end;
run;
proc transpose data=tall out=want(drop=_name_);
by index ;
id _name_;
var dummy;
run;
proc stdize reponly missing=0 data=want ;
var _numeric_;
run;
One more alternative:
proc transpose data=have out=long;
by index;
var code:;
run;
data long2;
set long;
value = 1;
run;
proc transpose data=long2 out=wide;
by index;
id col1;
var value;
run;
/* Convert missing to zeroes */
data want;
set wide;
array vars _NUMERIC_;
do over vars;
if(vars = .) then vars = 0;
end;
drop _NAME_;
run;
Output:
index P1 P2 P3 P4 N1
A 1 1 1 0 0
B 1 0 1 1 0
C 0 1 0 1 1
I have a HCC dataset DATA_HCC that with member ID and 79 binary variables:
Member_ID HCC1 HCC2 HCC6 HCC8 ... HCC189
XXXXXXX1 1 0 1 0 ... 0
XXXXXXX2 0 0 1 0 ... 0
XXXXXXX3 0 1 0 0 ... 1
I am trying to create a output dataset that could create new binary variables for all the combination of those 79 variables. Each new variable represents if a member had both of the variables as 1.
%LET hccList = HCC1 HCC2 HCC6 HCC8 HCC9 HCC10 HCC11 HCC12 HCC17 HCC18 HCC19 HCC21 HCC22 HCC23 HCC27
HCC28 HCC29 HCC33 HCC34 HCC35 HCC39 HCC40 HCC46 HCC47 HCC48 HCC54 HCC55 HCC57 HCC58
HCC70 HCC71 HCC72 HCC73 HCC74 HCC75 HCC76 HCC77 HCC78 HCC79 HCC80 HCC82 HCC83 HCC84
HCC85 HCC86 HCC87 HCC88 HCC96 HCC99 HCC100 HCC103 HCC104 HCC106 HCC107 HCC108 HCC110
HCC111 HCC112 HCC114 HCC115 HCC122 HCC124 HCC134 HCC135 HCC136 HCC137 HCC157 HCC158
HCC161 HCC162 HCC166 HCC167 HCC169 HCC170 HCC173 HCC176 HCC186 HCC188 HCC189;
DATA COUNT_HCC; SET DATA_HCC;
ARRAY HCC [*] &hccList.;
DO i = 1 TO DIM(HCC);
DO j = i+1 TO DIM(HCC);
%LET HCC_COMBO = CATX('_', VARNAME(HCC[i]), VARNAME(HCC[j]));
&HCC_COMBO. = MIN(HCC[i], HCC[j]);
END;
END;
RUN;
I tried to use CATX function to just concat the two variable names but it didn't work.
Here is the log error that I got:
ERROR: Undeclared array referenced: CATX.
ERROR: Variable CATX has not been declared as an array.
ERROR 71-185: The VARNAME function call does not have enough arguments.
And the results output sample would like this:
Member_ID HCC1_HCC2 HCC1_HCC6 HCC1_HCC8 ... HCC188_HCC189
XXXXXXX1 0 1 0 ... 0
XXXXXXX2 0 0 0 ... 0
XXXXXXX3 0 0 0 ... 1
To achieve dynamic variable name generation, use a macro to create the variables that you need. The below code generates dynamic variable names and generates data step code to create the variables.
%macro get_hcc_combo_mins;
%do i = 1 %to %sysfunc(countw(&hccList.));
%do j = %eval(&i.+1) %to %sysfunc(countw(&hccList.));
%let hcc1 = %scan(&hccList., &i.);
%let hcc2 = %scan(&hccList., &j.);
&hcc1._&hcc2. = min(&hcc1., &hcc2.);
%end;
%end;
%mend;
DATA COUNT_HCC; SET DATA_HCC;
ARRAY HCC [*] &hccList.;
%get_hcc_combo_mins;
RUN;
The macro %get_hcc_combo_mins generates this code in the data step:
HCC1_HCC2 = min(HCC1, HCC2);
HCC1_HCC6 = min(HCC1, HCC6);
HCC1_HCC8 = min(HCC1, HCC8);
...
There may be other ways to do this all within one data step that I'm not aware of, but macros can get the job done.
A DATA Step with LEXCOMB can generate variable name pairs. CALL EXECUTE submit a statement using those names.
Example:
Presume HCC: variable names, which specific ones not known apriori.
data have;
call streaminit(1234);
do id = 1 to 100;
array hcc hcc1 hcc3 hcc5 hcc7 hcc10-hcc79 hcc150 hcc155 hcc180 hcc190-hcc191;
do over hcc;
hcc = rand('uniform', dim(hcc)) < _i_;
end;
output;
end;
run;
data _null_;
set have;
array hcc hcc:;
do _n_ = 1 to dim(hcc);
hcc(_n_) = _n_;
end;
call execute("data pairwise; set have;");
do _n_ = 1 to comb(dim(hcc),2);
call lexcomb(_n_, 2, of hcc(*));
index1 = hcc(1);
index2 = hcc(2);
name1 = vname(hcc(index1));
name2 = vname(hcc(index2));
put name1=;
call execute (cats(
catx( '_',name1,name2),
'=',
catx(' and ',name1,name2),
';'
));
end;
call execute('run;');
stop;
run;
See if you can use this as a template.
/* Example data */
data have (drop = i j);
array h {*} HCC1 HCC2 HCC6 HCC8 HCC9 HCC10 HCC11 HCC12 HCC17 HCC18 HCC19 HCC21 HCC22 HCC23 HCC27
HCC28 HCC29 HCC33 HCC34 HCC35 HCC39 HCC40 HCC46 HCC47 HCC48 HCC54 HCC55 HCC57 HCC58
HCC70 HCC71 HCC72 HCC73 HCC74 HCC75 HCC76 HCC77 HCC78 HCC79 HCC80 HCC82 HCC83 HCC84
HCC85 HCC86 HCC87 HCC88 HCC96 HCC99 HCC100 HCC103 HCC104 HCC106 HCC107 HCC108 HCC110
HCC111 HCC112 HCC114 HCC115 HCC122 HCC124 HCC134 HCC135 HCC136 HCC137 HCC157 HCC158
HCC161 HCC162 HCC166 HCC167 HCC169 HCC170 HCC173 HCC176 HCC186 HCC188 HCC189;
do i = 1 to 10;
do j = 1 to dim (h);
h [j] = rand('uniform') > .5;
end;
output;
end;
run;
/* Create long version of output data */
data temp (drop = i j);
set have;
array a {*} HC:;
do i = 1 to dim (a)-1;
do j = i+1 to dim (a);
v = catx('_', vname (a[i]), vname (a[j]));
d = a [i] * a [j];
n = _N_;
output;
end;
end;
run;
/* Transpose to wide format */
proc transpose data=temp out=temp2 (drop=_: n);
by n;
id v;
var d;
run;
/* Merge back with original data */
data want;
merge have temp2;
run;
hi I am trying to use DATA NULL step to assign value to variable based on different criteria. This variable from NULL statement will be assigned to WHERE statement in the following DATA step.
Ideally if I run it today(Thursday which is 5) the code should return 30APR2019 for both the variables. But my code throws only the variable value in the LAST-IF- statement.
data _null_;
if weekday(today()) = 5 then do;
%let exc_st_day = '30APR2019'd;
%let exc_en_day = '30APR2019'd;
end;
else if weekday(today()) = 6 then do;
%let exc_st_day = '01MAY2019'd;
%let exc_en_day = '01MAY2019'd;
end;
else if weekday(today()) = 2 then do;
%let exc_st_day = '02MAY2019'd;
%let exc_en_day = '02MAY2019'd;
end;
else if weekday(today()) = 3 then do;
%let exc_st_day = '03MAY2019'd;
%let exc_en_day = '03MAY2019'd;
end;
else if weekday(today()) = 4 then do;
%let exc_st_day = '04MAY2019'd;
%let exc_en_day = '06MAY2019'd;
end;
%put &exc_st_day &exc_en_day;
run;
You need to use CALL SYMPUTX() to create macro variables, not %LET within a data step.
if weekday(today()) = 5 then do;
call symputx('exc_st_day', '30APR2019'd);
call symputx('exc_en_day', '30APR2019'd);
end;
Macro code is evaluated BEFORE the SAS code that it generates runs. So you told SAS to run this code:
%let exc_st_day = '30APR2019'd;
%let exc_en_day = '30APR2019'd;
%let exc_st_day = '01MAY2019'd;
%let exc_en_day = '01MAY2019'd;
%let exc_st_day = '02MAY2019'd;
%let exc_en_day = '02MAY2019'd;
%let exc_st_day = '03MAY2019'd;
%let exc_en_day = '03MAY2019'd;
%let exc_st_day = '04MAY2019'd;
%let exc_en_day = '06MAY2019'd;
%put &exc_st_day &exc_en_day;
data _null_;
if weekday(today()) = 5 then do;
end;
else if weekday(today()) = 6 then do;
end;
else if weekday(today()) = 2 then do;
end;
else if weekday(today()) = 3 then do;
end;
else if weekday(today()) = 4 then do;
end;
run;
If you want to create macro variable values from a data step use the CALL SYMPUTX() function. Or if your really need to insert leading and/or trailing spaces into the macro variable value use the older CALL SYMPUT() function.
I have two different formats defined.
proc format;
value fmtA
1 = 3
2 = 5
;
value fmtB
1 = 2
2 = 4
;
run;
function myfun returns formatted value
proc fcmp outlib=WORK.pac.funcs;
function myfun(n);
val = put(n,fmta.);
return (val);
endsub;
run;
I want to make this a bit more dynamic - val will be based on function input.
EDIT
proc fcmp outlib=WORK.pac.funcs;
function myfun(n,myfmt $);
if myfmt = 'fmtA' then val = put(n,fmtA.);
else if myfmt = 'fmtB' then val = put(n,fmtB.);
else val = n;
return (val);
endsub;
run;
data test;
n = 2;
myfmt = 'fmtA';
output;
myfmt = 'fmtB';
output;
myfmt = 'fmtC';
output;
run;
data test2;
set test;
/* try to do sth like this */
value = myfun(n,myfmt);
run;
This solution works. However it requires a long list of checking when I have so many different formats. And it's not possible before I take a look at format name in the input test dataset.
Just use the PUTN() function.
proc format;
value fmtA 1 = '3' 2 = '5' ;
value fmtB 1 = '2' 2 = '4' ;
value fmtc 1 = '1' 2 = '3' ;
run;
data test;
do myfmt = 'fmtA.','fmtB.','fmtC.';
do n= 1,2;
str = putn(n,myfmt);
value = input(str,32.);
output;
end;
end;
run;