SAS: dim and macro variables - sas

data example1;
input var1 var2 var3;
datalines;
10 11 14
3 5 8
0 1 2
;
data example2;
input var;
datalines;
1
2
8
;
Let's say that the number of var variables depending on data input. I want to put that number to macro variable and use in another data step, for example:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
%let array_dim = dim(var_array);
do i = 1 to &array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim; /* doesn't work */
run;
%mend;
%m(example1);
%let array_dim = dim(var_array); doesn't work in second data step, because dim(var_array) isn't evaluated, but %eval or %sysevalf in declaring the macro variable does't work here. How to do that correctly?

You are mixing up macro code and data step code in a way that is not supported in SAS. If you want to assign a macro variable a value that you're generating as part of a data step, you need to use call symput.
Also, if you create a macro variable during a data step, you cannot resolve it during the same data step in the way that you are attempting to do (unless you use the resolve function...). It's easier just to use a data set variable for this.
So here's a fixed version of your code that I think probably does what you want:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
array_dim = dim(var_array);
/*Only export the macro variable once, for the first row*/
if _n_ = 1 then call symput('array_dim_mvar', array_dim);
do i = 1 to array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim_mvar;
run;
%mend;
%m(example1);

Related

put values to a file using functions without creating new variables

I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?
Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988
The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_
I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.

How to get array size during compile time?

I have a dataset holding parameters like thus
Parameters
year threshold1 threshold2
1 100 200
2 150 300
....
7 200 390
I can do
data output;
set input;
if 0 then set set parameters;
array thresholds [2] thresholds:;
%do year = 1 %to 7;
year = &year.;
set parameters point=year;
array my_thresholds&year. [2] _temporary_;
do i = 1 to 2;
my_thresholds&year.[i] = thresholds[i];
end;
%end;
This would, for every observation in INPUT, threshold1 threshold2 for each year as variables and set up an array for my_thresholds&year. holding each.
The problem however, is if the number of thresholds is unknown. I can't do dim(thresholds) nor *.
How can I get SAS to know at compile how to set up the array?
To my knowledge you cannot dynamically set the size of the array at the compile time.
One possibility to get this done is to use proc contents and proc sql to figure out how many threshold parameters there are in the parameters data set and then pass that information to the data step by the macro variable.
data parameters;
do year=1 to 7;
threshold1 = 1;
threshold2 = 2;
threshold3 = 3;
output;
end;
run;
proc contents data=parameters out=cont noprint;
run;
proc sql noprint;
select count(*) into :thr_count
from cont
where name like "threshold%";
quit;
%put &thr_count.;

How to calculate a mean for the non zero values using proc means or proc summary

I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.

How can I use %SCAN within a macro variable name?

I'm trying to write robust code to assign values to macro variables. I want the names of the macro variables to depend on values coming from the variable 'subgroup'. So subgroup could equal 1, 2, or 45 etc. and thus have macro variable names trta_1, trta_2, trt_45 etc.
Where I am having difficulty is calling the macro variable name. So instead of calling e.g. &trta_1 I want to call &trta_%SCAN(&subgroups, &k), which resolves to trta_1 on the first iteration. I've used a %SCAN function in the macro variable name, which is throwing up a warning 'WARNING: Apparent symbolic reference TRTA_ not resolved.'. However, the macro variables have been created with values assigned.
How can I resolve the warning? Is there a function I could run with the %SCAN function to get this to work?
data data1 ;
input subgroup trta trtb ;
datalines ;
1 30 58
2 120 450
3 670 3
run;
%LET subgroups = 1 2 3 ;
%PUT &subgroups;
%MACRO test;
%DO k=1 %TO 3;
DATA test_&k;
SET data1;
WHERE subgroup = %SCAN(&subgroups, &k);
CALL SYMPUTX("TRTA_%SCAN(&subgroups, &k)", trta, 'G');
CALL SYMPUTX("TRTB_%SCAN(&subgroups, &k)", trtb, 'G');
RUN;
%PUT "&TRTA_%SCAN(&subgroups, &k)" "&TRTB_%SCAN(&subgroups, &k)";
%END;
%MEND test;
%test;
Using the structure you've provided the following will achieve the result you're looking for.
data data1;
input subgroup trta trtb;
datalines;
1 30 58
2 120 450
3 670 3
;
run;
%LET SUBGROUPS = 1 2 3;
%PUT &SUBGROUPS;
%MACRO TEST;
%DO K=1 %TO 3;
%LET X = %SCAN(&SUBGROUPS, &K) ;
data test_&k;
set data1;
where subgroup = &X ;
call symputx(cats("TRTA_",&X), trta, 'g');
call symputx(cats("TRTB_",&X), trtb, 'g');
run;
%PUT "&&TRTA_&X" "&&TRTB_&X";
%END;
%MEND TEST;
%TEST;
However, I'm not sure this approach is particularly robust. If your list of subgroups changes you'd need to change the 'K' loop manually, you can determine the upper bound of the loop by dynamically counting the 'elements' in your subgroup list.
If you want to call the macro variables you've created later in your code, you could a similar method.
data data2;
input subgroup value;
datalines;
1 20
2 25
3 15
45 30
;
run ;
%MACRO TEST2;
%DO K=1 %TO 3;
%LET X = %SCAN(&SUBGROUPS, &K) ;
data data2 ;
set data2 ;
if subgroup = &X then percent = value/&&TRTB_&X ;
format percent percent9.2 ;
run ;
%END;
%MEND TEST2;
%TEST2 ;
Effectively, you're re-writing data2 on each iteration of the loop.
This should cover your requirements. You can load and unload an array of macro variable without a macro. I have included an alternate method of unloading a macro variable array with a macro for comparison.
Load values into macro variables including Subgroup number within macro variable name e.g. TRTA_45.
data data1;
input subgroup trta trtb;
call symput ('TRTA_'||compress (subgroup), trta);
call symput ('TRTB_'||compress (subgroup), trtb);
datalines;
1 30 58
2 120 450
3 670 3
45 999 111
;
run;
No need for macro to load or refer to macro variables.
%put TRTA_45: &TRTA_45.;
%let Subgroup_num = 45;
%put TRTB__&subgroup_num.: &&TRTB_&subgroup_num.;
If you need to loop through the macro variables then you can use Proc SQL to generate a list of subgroups.
proc sql noprint;
select subgroup
, count (*)
into :subgroups separated by ' '
, :No_Subgroups
from data1
;
quit;
%put Subgroups: &subgroups.;
%put No_Subgroups: &No_Subgroups.;
Use a macro to loop through the macro variable array and populate a table.
%macro subgroups;
data subgroup_data_macro;
%do i = 1 %to &no_subgroups.;
%PUT TRTA_%SCAN(&subgroups, &i ): %cmpres(&TRTA_%SCAN(&subgroups, &i ));
%PUT TRTB_%SCAN(&subgroups, &i ): %cmpres(&TRTB_%SCAN(&subgroups, &i ));
subgroup = %SCAN(&subgroups, &i );
TRTA = %cmpres(&TRTA_%SCAN(&subgroups, &i ));
TRTB = %cmpres(&TRTB_%SCAN(&subgroups, &i ));
output;
%end;
run;
%mend subgroups;
%subgroups;
Or use a data step (outside a macro) to loop through the macro variable array and populate a table.
data subgroup_data_sans_macro;
do i = 1 to &no_subgroups.;
subgroup = SCAN("&subgroups", i );
TRTA = input (symget (compress ('TRTA_'||subgroup)),20.);
TRTB = input (symget (compress ('TRTB_'||subgroup)),20.);
output;
end;
run;
Ensure both methods (within and without a macro) produce the same result.
proc compare
base = subgroup_data_sans_macro
compare = subgroup_data_macro
;
run;

do loop on sas but not with a macro

It is a simple one but I'm a struggling a bit.
What I have :
What I want :
I want to remove the v0 , v1 and etc.
I'm using this piece of code
data IndieDay20140704;
set IndieDay20140704;
do i=1 to 5;
VAR1=tranwrd(var1,"v&i","");
end;
run;
It is not working correctly as it is giving me this instead (see below) plus the error
WARNING: Apparent symbolic reference I not resolved.
Questions:
1) Do I need a macro?
2) Why the error?
Many thanks for your insights.
There's an error because you're (unintentionally) using macro variable i, that you did not initialize.
I guess the idea of tranwrd is to remove words in VAR2, VAR3.. from VAR1.
The logical error is to do it also for VAR1 itself.
Check if this helps (using array):
data IndieDay20140704;
length VAR1 VAR2 VAR3 VAR3 VAR5 $10;
VAR1 = 'TEST IT';VAR5 = 'TEST';
output;
VAR1 = 'STEST IT';VAR5 = 'TEST';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
array vals VAR1 - VAR5;
do i=1 to dim(vals);
if i ne 1 then VAR1=tranwrd(var1,trim(vals(i)),"");
end;
drop i;
run;
Here I'm creating a SAS view on top of table (not a good idea to overwrite the source).
Also I think you should trim() the values from VAR2,VAR3... depending on what you want to achieve and what's in the data.
EDIT:
here the version with 'v0', 'v1'...'v5' strings:
data IndieDay20140704;
length VAR1$10;
VAR1 = 'TEST v0';
output;
VAR1 = 'TEST v11';
output;
VAR1 = 'TEST v1';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
org_var1 = var1;
do i=0 to 5;
var1 =tranwrd(var1, catt('v', put(i, 1. -L)),"");
end;
run;
catt('v', put(i, 1. -L)) concatenates string 'v' and the result of put.
put(i, 1. -L)) converts numeric variable i to text using plain numeric format w.d, 1. used here - enough for single digit numbers, -L left aligns the result
Here's one way, there are many others and this may not work if your data has a lot of variability.
data have;
length VAR1$10;
VAR1 = 'fic19v0.csv';
output;
VAR1 = 'fic19v1.cs';
output;
run;
data want ;
set have;
original_var=var1;
var1=substr(var1, 1, index(var1, ".")-3)||".csv";
run;