I'm looking for a way to use a normal variable value as a macro variable in a data step.
For example I have macro variable &statesList_Syphilis = AAA
and another macro variable &statesList_Giardia = BBB
And in a data step I have a variable Germ wich contains 2 rows: "Syphilis" and "Giardia".
In my data step I need to find AAA when iterating over the first row when Germ="Syphilis"
and BBB when iterating over the second row, when Germ="Giardia"
an attempt would look like this
%let statesList_Syphilis = AAA;
%let statesList_Giardia = BBB;
data test;
set mytablewithgerms; * contains variable Germ ;
* use germ and store it in &germ macro variable ;
* something like %let germ = germ; or call symput ('germ',germ);
* I want to be able to do this;
xxx = "&&statesList_&germ"; * would give xxx = "AAA" or xxx = "BBB";
* or this;
&&statesList_&germ = "test"; * would give AAA = "test" or BBB = "test";
run;
I don't think this is possible, but I figured I would ask just to be sure.
Thanks!
EDIT (Following questions in the comments, I'm adding context to my specific problem, but I feel this is making things more complicated):
This was an attempt to simplify the problem.
In reality AAA and BBB are long lists of words
like
"asymptomatic_1 fulminant_1 chronic_1 chronic_1 fatalFulminant_1 hepatocellular_1 compensated_1 hepatocellular_2 decompensated_1 fatalHepatocellular_1 fatalHepatocellular_2 fatalDecompensated_1"
And I don't want to store this long string in a variable, I want to iterate each word of this string in a do loop with something like:
%do k=1 %to %sysfunc(countw(&&statesList_&germ));
%let state = %scan(&&statesList_&germ, &k);
* some other code here ;
%end;
EDIT2:
here is a more complete view of my problem:
%macro dummy();
data DALY1;
* set lengths ;
length Germ $10 Category1 $50 Category2 $50 AgeGroupDALY $10 Gender $2 value 8 stateList$999;
* make link to hash table ;
if _n_=1 then do;
*modelvalues ----------------;
declare hash h1(dataset:'modelData');
h1.definekey ('Germ', 'Category1', 'Category2', 'AgeGroupDALY', 'Gender') ;
h1.definedata('Value');
h1.definedone();
call missing(Germ, Value, Category1, Category2);
* e.g.
rc=h1.find(KEY:Germ, KEY:"ssssssssss", KEY:"ppppppppppp", KEY:AgeGroupDALY, KEY:Gender);
*states ---------------------;
declare hash h2(dataset:'states');
h2.definekey ('Germ') ;
h2.definedata('stateList');
h2.definedone();
end;
set DALY_agregate;
put "°°°°° _n_=" _n_;
DALY=0; * addition of terms ;
rc2=h2.find(KEY:Germ); * this creates the variable statesList;
put "statesList =" statesList;
* here i need statesList as a macro variable,;
%do k=1 %to %sysfunc(countw(&statesList)); *e.g. acute_1 asymptomatic_1 ...;
%let state = %scan(&statesList, &k);
put "=== &k &state";
&state = 1; * multiplication of terms ;
* more code here;
%end;
run;
%mend dummy;
%dummy;
EDIT3:
The input dataset looks like this
Germ AgeGroup1 AgeGroup2 Gender Cases Year
V_HBV 15-19 15-19 M 12 2015
V_HBV 15-19 15-19 M 8 2016
V_HBV 20-24 20-24 F 37 2011
V_HBV 20-24 20-24 F 46 2012
V_HBV 20-24 20-24 F 66 2013
The output dataset will add variables contained in the string defined by the macro variable which depends on the Germ.
e.g. for V_HBV it will create these variables: asymptomatic_1 fulminant_1 chronic_1 chronic_1 fatalFulminant_1 hepatocellular_1 compensated_1 hepatocellular_2 decompensated_1 fatalHepatocellular_1 fatalHepatocellular_2 fatalDecompensated_1
I'm not following the big picture, but one of the previous iterations of your question had some code (pseudo code) that illustrates possible confusion about how the macro language works. Consider this step:
data _null_;
germ="Syph";
call symput('germ',germ);
%let Germ=%sysfunc(cats(germ));
put "germ = &germ";
run;
%put &germ;
The log from executing that in a fresh SAS session shows:
1 data _null_;
2 germ="Syph";
3 call symput('germ',germ);
4 %let Germ=%sysfunc(cats(germ));
5 put "germ = &germ";
6 run;
germ = germ
7 %put &germ;
Syph
Now let's talk about what's happening. I'll use the line numbers from the log.
Line 2 assigns text string Syph to data step variable germ. Nothing special.
Line 3 creates a macro variable named Germ, and assigns in the value of the datastep variable germ. So it assigns it the value Syph. This CALL SYMPUT statement executes when the data step executes.
Line 4 is a macro %let statement. It creates a macro variable named Germ, and assigns it the value germ. Because this is a macro statement, it executes before any of the DATA STEP code has executed. It does not know about data step variables. Line 4 is equivalent to %let Germ=germ. To the macro language, the right hand side is just a four-character string germ. It is not the name of a data step variable. %syfunc(cats()) is doing nothing, because there is no list of items to concatenate.
Line 5 is a data step PUT statement. The macro reference &germ is resolved while the data step is compiling. At this point the macro variable germ resolves to Germ because the %LET statement has executed (the CALL SYMPUT statement has not executed yet).
Line 7 is a %PUT statement that executes after the DATA NULL step has completed (and after the CALL SYMPUT has written the value Syph to macro variable Germ).
As a general principle, it is difficult (and unusual) to have a single data step in which you are using data to create a macro variable (e.g. via call symput) and using that macro variable in the same step (i.e. referencing the macro variable). Macro references are resolved before any of the data step code executes.
Typically if your data are already in a dataset, you can get what you want with data step statements (DO loops rather than %DO loops, etc). Or alternatively you can use one DATA step to generate your macro variables, and a second DATA step can reference them.
Hope that helps.
Related
I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?
Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988
The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_
I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.
how do I get so inside the loop I get: var1, var2? I know it does not work to dereference j but the meaning gets more clear to what I want to do (see below)
%let var1 = apple;
%let var2 = pear;
data _null_;
do j=1 to j=2;
put &var&j; //<---?
end;
run;
in the log:
apple
pear
As noted above, J is not a macro variable so you cannot use it as such. You can use the SYMGET function to retrieve the value though. Assuming you want data step logic for some reason:
data _null_;
do i=1 to 2;
x= symget(catt('var', i));
put x;
end;
run;
Sounds like you want to resolve a macro variable whose name you are creating by appending the value of another macro variable to some constant prefix.
If you try to use code like this:
%let var1 = apple;
%let var2 = pear;
%let j=1 ;
%put &var&j;
You will get an error message that the macro variable named VAR does not exist.
You need to signal to the macro processor that it needs to delay trying to evaluate &var until after the suffix has been appended. The way to do this is to double the first &.
%put &&var&j;
The presence of double &'s will cause the macro processor to replace them with a single & and set a reminder to itself the re-scan the result for more macro variable references.
So the first pass will replace && with & and replace &j with 1. Then the second pass will replace &var1 with apple.
I have 5 separate datasets(actually many more but i want to shorten the code) named dk33,dk34,dk35,dk51,dk63, each dataset contains a numeric field: surv_probs. I would like to load the values into 5 arrays and then use the arrays in a datastep(result), however, I need advice what is the best way to do it.
I am getting error when I use the macro: setarrays: (code below)
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
ERROR: Illegal reference to the array dk33_arr.
Here is the main code.
%let var1 = dk33;
%let var2 = dk34;
%let var3 = dk35;
%let var4 = dk51;
%let var5 = dk63;
%let varN = 5;
/*put length of each column into macro variables */
%macro getlength;
%do i=1 %to &varN;
proc sql noprint;
select count(surv_probs)
into : &&var&i.._rows
from work.&&var&i;
quit;
%end;
%mend;
/*load values of column:surv_probs into macro variables*/
%macro readin;
%do i=1 %to &varN;
proc sql noprint;
select surv_probs
into: &&var&i.._list separated by ","
from &&var&i;
quit;
%end;
%mend;
data _null_;
call execute('%readin');
call execute('%getlength');
run;
/* create arrays*/
%macro setarrays;
%do i=1 %to 1;
j=1;
array &&var&i.._arr{&&&&&&var&i.._rows};
do while(scan("&&&&&&var&i.._list",j,",") ne "");
&&var&i.._arr = scan("&&&&&&var&i.._list",j,",");
j=j+1;
end;
%end;
%mend;
data result;
%setarrays
put dk33_arr(1);
* some other statements where I use the arrays*
run;
Answer to toms question:
*macro getlength(when executed) creates 5 macro variables named: dk33_rows,dk34_rows,dk35_rows,dk51_rows,dk63_rows
*the macro readin(when executed):creates 5 macro variables dk33_list,dk34_list,dk35_list,dk51_list,dk63_list. Each containing a string which is comma separates the values from the column: eg.: 0.99994,0.1999,0.1111
*the macro setarrays creates 5 arrays,when executed, dk33_arr,dk34_arr,... holding the parsed values from the macro variables created by readin
I find that "macro arrays" like VAR1,VAR2,.... are generally more trouble than they are worth. Either keep your list of dataset names in an actual dataset and generate code from that. Or if the list is short enough put the list into a single macro variable and use %SCAN() to pull out the items as you need them.
But either way it is also better to avoid trying to write macro code that needs more than three &'s. Build up the reference in multiple steps. Build a macro variable that has the name of the macro you want to reference and then pull the value of that into another macro variable. It might take more lines of code, but you can more easily understand what is happening.
%let i=1 ;
%let mvarname=var&i;
%let dataset_name=&&&mvarname;
Before you begin using macro code (or other code generation techniques) make sure you know what code you are trying to generate. If you want to load a variable into a temporary array you can just use a DO loop. There is no need to macro code, or copying values, or even counts, into macro variables. For example instead of getting the count of the observations you could just make your temporary array larger than you expect to ever need.
data test1 ;
if _n_=1 then do;
do i=1 to nobs_dk33;
array dk33 (1000) _temporary_;
set dk33 nobs=nobs_dk33 ;
dk33(i)=surv_probs;
end;
do i=1 to nobs_dk34;
array dk34 (1000) _temporary_;
set dk34 nobs=nobs_dk34 ;
dk34(i)=surv_probs;
end;
end;
* What ever you are planning to do with the DK33 and DK34 arrays ;
run;
Or you could transpose the dataset first.
proc transpose data=dk33 out=dk33_t prefix=dk33_ ;
var surv_probs ;
run;
Then your later step is easier since you can just use a SET statement to read in the one observation that has all of the values.
data test;
if _n_=1 then do;
set dk33_t ;
array dk33 dk33_: ;
end;
....
run;
I have a few columns in SAS for which name starts with 100_Section_xxx. So first part of the name is the same, while second part (xxx) is different.
I want to write condition for every column such as If 100_Section_xxx >1 then do error_100_Section_xxx ="yes" How do I write it, in order Sas takes the second part of the name of 100_Section_xxx and add xxx to the second part of the name of the column error_100_Section_xxx.
You can do this with a simple macro and data step with an array:
This macro loops through a list of names (the XXX values) and lists variables that end with it.
%macro var_names(names, prefix=);
%local i n var;
%let n=%sysfunc(countw(&names));
%do i=1 to &n;
%let var=%scan(&names,&i);
&prefix.&var
%end;
%mend;
Now use that, a data step, and 2 arrays.
data have;
_100_Section_a = 1;
_100_Section_b = 0;
_100_Section_c = 10;
run;
data want;
set have;
array vars[*] %var_names(a b c, prefix=_100_Section_);
format %var_names(a b c, prefix=error_100_Section_) $8.;
array errors[*] %var_names(a b c, prefix=error_100_Section_);
do i=1 to dim(vars);
if vars[i] > 1 then
errors[i] = "yes";
end;
drop i;
run;
Note, I added a _ to the variable names. SAS variables cannot normally start with a number.
I have a list of values defined in a macro variable, e.g.,
%let datelist = 20100614 20120309 20151215;
Now, I want to put these values into the corresponding number of macro variables. In this case, I want to put them into Date1, Date2, Date3.
Of course, I could manually type out:
%let Date1 = 20100614;
%let Date2 = 20120309;
%let Date3 = 20151215;
How can I do that in a dynamic way so that if there were 25 dates, or 2, it would still work?
Ok, I'll suggest a data step, because I prefer that over macro loops any day.
Use COUNTW() to count the number of loops required and use CALL SYMPUTX to create the macro variables. You should look into the third parameter of the function if you want to control the scope of the macro variable.
%let datelist = 20100614 20120309 20151215;
data _null_;
word = "&datelist";
n=countw(word);
do i=1 to n;
call symputx('date'||Put(i, 8. -l), scan(word, i));
end;
run;
%put &date1.;
%put &date2.;
%put &date3.;