How can I peform same datastep across many variables in SAS? - sas

I have data that looks like this and has 500 variables with a target:
var1 var2 var3 var4 ... var500 target
The names of the variables are not sequential as above so I don't think I can use something like var1:var500. I want to loop through the variables to create graphs. Some of the variables are continous and some are nominal.
for var1 through var500
if nominal then create graphtypeA var[i] * target
else if continous then create graphtypeB var[i] * target
end;
I can easily create a second table that has the data type in it to check against. Arrays seem like they might be useful to peform this task of looping through variables. Something like:
data work.mydata;
set archive.mydata;
array myarray{501] myarray1 - myarray501
do i=1 to 500;
proc sgpanel;
panelby myarray[501];
histogram myarray[i];
end;
run;
This doesn't work though and it doens't check to see what type of variable it is. If we assume I have another sas.dataset that has varname and vartype (continuous, nominal) how can I loop through to create the desired graphs for the given vartype? Thanks in advance.

Basically, you need to loop over some variables, apply some logic to determine the variable type, then produce output depending on the variable type. While there are many approaches to this problem, one solution is to select your variables into a macro variable, loop over this "list" (not a formal data structure) of variables, and use macro control logic to designate different subroutines for numeric and character variables.
I'll use the sashelp.cars data set to illustrate. In this example the variable origin is your 'Target' variable and the variables Make, Type, Horsepower, and Cylinders are the numeric and character variables.
* get some data;
data set1 (keep = Make Type Origin Horsepower Cylinders);
set sashelp.cars;
run;
* create dataset of variable names and types;
proc contents data = set1
out = vars
noprint;
run;
* get variable names and variable types (1=numeric, 2=character)
* into two macro variable "lists" where each entry is seperated
* by a space;
proc sql noprint;
select name, type
into :varname separated by ' ', :vartype separated by ' '
from vars
where name <> "Make";
quit;
* put the macro variables to the log to confirm they are what
* you expect
%put &varname;
%put &vartype;
Now, use a macro to loop over the values in the macro variable list. The countw function counts the number of variables, and uses this number as the loop iterator limit. The scan function reads in each variable name and type by its relative position in the respective macro variable lists. For each variable the type is then evaluated and a plot is produced depending on whether it is character or numeric. In this example, a histogram with density plot is produced for numeric variables and a bar chart of frequency counts is produced for character variables.
The loop logic is general, and Proc sgpanel and Proc sgplot cab be modified or replaced with other desired data step processing or procedures.
* turn on options that are useful for
* macro debugging, turn them off
* when using in production;
options mlogic mprint symbolgen;
%macro plotter;
%do i = 1 %to %sysfunc(countw(&varname));
%let nextvar = %scan(&varname, &i, %str( ));
%let nextvartype = %scan(&vartype, &i, %str( ));
%if &nextvartype. = 1 %then %do;
proc sgpanel data=set1 noautolegend;
title "&nextvar. Distribution";
panelby Origin;
histogram &nextvar.;
density &nextvar.;
run;
%end;
%if &nextvartype. = 2 %then %do;
proc sgplot data=set1;
title "&nextvar. Count by Origin";
vbar &nextvar. /group= origin;
run;
%end;
%end;
%mend plotter;
*call the macro;
%plotter;

Unfortunately it is not possible to use arrays outside a data step in the way that you propose here, at least not in any very efficient way. However, there are quite a few options available to you. One would be just to call your graphing proc once and tell it to graph every numeric variable in your dataset, e.g. like so:
proc univariate data = sashelp.class;
var _NUMERIC_;
histogram;
run;
If the variables you want to graph that are of the same type are adjacent in the column order of your dataset, you can use a double-dash list, e.g.
proc univariate data = sashelp.class;
var age--weight;
histogram;
run;
In general you should seek to avoid calling procs or running data steps separately for every variable - it is nearly always more efficient to call them just once and process everything in one go.

Related

loop a list of variables in SAS

I have a dataset with 10+ dependent variables and several categorical variables as independent variables. I'm plan to use proc sgplot and proc mixed functions to do analysis. However, putting all variables one by one in the same function will be really time consuming. I'm pretty new to SAS, is there a way to create a loop with dependent variables and put them into the function.
Something like:
%let var_list= read math science english spanish
proc mixed data=mydata;
model var_list= gender age race/ solution;
random int/subject=School;
run;
Thank you!
SAS has a macro language you can use to generate code. But for this problem you might want to just restructure your data so that you can use BY processing instead.
data tall ;
set mydata ;
array var_list read math science english spanish ;
length varname $32 value 8;
do _n_=1 to dim(var_list);
varname=vname(var_list(_n_));
value = var_list(_n_);
output;
end;
run;
proc sort data=tall;
by varname ;
run;
Now you can process each value of VARNAME (ie 'read','math', ....) as separate analyses with one PROC MIXED call.
proc mixed data=tall;
by varname;
model value = gender age race/ solution;
random int/subject=School;
run;
I would do something like this. This creates a loop around your proc mixed -call. I didn't take a look at the proc mixed -specification, but that may not work as described in your example.
The loop works however, and loops through whatever you put in the place of the proc mixed -call and the loop is dynamically sized based on the number of elements in the dependent variable list.
First define some macro variables.
%let y_var_list = read math science english spanish;
%let x_var_list = gender age race;
%let mydata = my_student_data;
Then define the macro that does the looping.
%macro do_analysis(my_data=, y_variables=, x_variables=);
%* this checks the nr of variables in y_var_list;
%let len_var_list = %eval(%sysfunc(count(&y_variables., %quote( )))+1);
%do _i=1 %to &len_var_list;
%let y_var = %scan(&y_variables, &_i);
%put &y_var; %* just printing out the macrovar to be sure it works;
%* model specification;
proc mixed data=&my_data.; %* data given as parameter in the macro call. proc mixed probably needs some output options too, to work;
model &y_var = &x_variables/ solution; %* independent vars as a macro parameter;
random int/subject=School;
run;
%end;
%mend do_analysis;
Last but not least, remember to call your macro with the given variable lists and dataset specifications. Hope this helps!
%do_analysis(my_data=&mydata, y_variables=&y_var_list, x_variables=&x_var_list);

Using SAS SET statement with numbered macro variables

I'm trying to create a custom transformation within SAS DI Studio to do some complicated processing which I will want to reuse often. In order to achieve this, as a first step, I am trying to replicate the functionality of a simple APPEND transformation.
To this end, I've enabled multiple inputs (max of 10) and am trying to leverage the &_INPUTn and &_INPUT_count macro variables referenced here. I would like to simply use the code
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set &_INPUT0 - &&_INPUT&max_input_index;
keep col1 col2 col3;
run;
However, I receive the following error:
ERROR: Missing numeric suffix on a numbered data set list (WORK.SOME_INPUT_TABLE-WORK.ANOTHER_INPUT_TABLE)
because the macro variables are resolved to the names of the datasets they refer to, whose names do not conform to the format required for the
SET dataset1 - dataset9;
statement. How can I get around this?
Much gratitude.
You need to create a macro that loops through your list and resolves the variables. Something like
%macro list_tables(n);
%do i=1 %to &n;
&&_INPUT&i
%end;
%mend;
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set %list_tables(&max_input_index);
keep col1 col2 col3;
run;
The SET statement will need a list of the actual dataset names since they might not form a sequence of numeric suffixed names.
You could use a macro %DO loop if are already running a macro. Make sure to not generate any semi-colons inside the %DO loop.
set
%do i=1 %to &_inputcount ; &&_input&i %end;
;
But you could also use a data step to concatenate the names into a single macro variable that you could then use in the SET statement.
data _null_;
call symputx('_input1',symget('_input'));
length str $500 ;
do i=1 to &_inputcount;
str=catx(' ',str,symget(cats('_input',i)));
end;
call symputx('_input',str);
run;
data .... ;
set &_input ;
...
The extra CALL SYMPUTX() at the top of the data step will handle the case when count is one and SAS only creates the _INPUT macro variable instead of creating the series of macro variables with the numeric suffix. This will set _INPUT1 to the value of _INPUT so that the DO loop will still function.

Text manipulation of macro list variables to stack datasets with automated names

I have written a macro that accepts a list of variables, runs a proc mixed model using each variable as a predictor, and then exports the results to a dataset with the variable name appended to it. I am trying to figure out how to stack the results from all of the variables in a single data set.
Here is the macro:
%macro cogTraj(cog,varlist);
%let j = 1;
%let var = %scan(&varlist, %eval(&j));
%let solution = sol;
%let outsol = &solution.&var.;
%do %while (&var ne );
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &outsol;
run;
%let j = %eval(&j + 1);
%let var = %scan(&varlist, %eval(&j));
%let outsol = &solution.&var.;
%end;
%mend;
/* Example */
%cogTraj(mmmscore, varlist = bio1 bio2 bio3);
The result would be the creation of Solbio1, Solbio2, and Solbio3.
I have created a macro variable containing the "varlist" (Ideally, I'd like to input a macro variable list as the argument but I haven't figured out how to deal with the scoping):
%let biolist = bio1 bio2 bio3;
I want to stack Solbio1, Solbio2, and Solbio3 by using text manipulation to add "Sol" to the beginning of each variable. I tried the following, outside of any data step or macro:
%let biolistsol = %add_string( &biolist, Sol, location = prefix);
without success.
Ultimately, I want to do something like this;
data Solbio_stack;
set %biolistsol;
run;
with the result being a single dataset in which Solbio1, Solbio2, and Solbio3 are stacked, but I'm sure I don't have the right syntax.
Can anyone help me with the text string/dataset stacking issue? I would be extra happy if I could figure out how to change the macro to accept %biolist as the argument, rather than writing out the list variables as an argument for the macro.
I would approach this differently. A good approach for the problem is to drive it with a dataset; that's what SAS is good at, really, and it's very easy.
First, construct a dataset that has a row for each variable you're running this on, and a variable name that contains the variable name (one per row). You might be able to construct this using PROC CONTENTS or sashelp.vtable or dictionary.tables, if you're using a set of variables from one particular dataset. It can also come from a spreadsheet you import, or a text file, or anything else really - or just written as datalines, as below.
So your example would have this dataset:
data vars_run;
input name $ cog $;
datalines;
bio1 mmmscore
bio2 mmmscore
bio3 mmmscore
;;;;
run;
If your 'cog' is fairly consistent you don't need to put it in the data, if it is something that might change you might also have a variable for it in the data. I do in the above example include it.
Then, you write the macro so it does one pass on the PROC MIXED - ie, the inner part of the %do loop.
%macro cogTraj(cog=,var=, sol=sol);
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &sol.&var.;
run;
%mend cogTraj;
I put the default for &sol in there. Now, you generate one call to the macro from each row in your dataset. You also generate a list of the sol sets.
proc sql;
select cats('%cogTraj(cog=',cog,',var=',name,',sol=sol)')
into :callList
sepearated by ' '
from have;
select cats('sol',name') into :solList separated by ' '
from have;
quit;
Next, you run the macro:
&callList.
And then you can do this:
data sol_all;
set &solList.;
run;
All done, and a lot less macro variable parsing which is messy and annoying.

SAS - Creating variables from macro variables

I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;

Categorical variables with macro

I am trying to create categorical variables in sas. I have written the following macro, but I get an error: "Invalid symbolic variable name xxx" when I try to run. I am not sure this is even the correct way to accomplish my goal.
Here is my code:
%macro addvars;
proc sql noprint;
select distinct coverageid
into :coverageid1 - :coverageid9999999
from save.test;
%do i=1 %to &sqlobs;
%let n=coverageid&i;
%let v=%superq(&n);
%let f=coverageid_&v;
%put &f;
data save.test;
set save.test;
%if coverageid eq %superq(&v)
%then &f=1;
%else &f=0;
run;
%end;
%mend addvars;
%addvars;
You're combining macro code with data step code in a way that isn't correct. %if = macro language, meaning you are actually evaluating whether the text "coverageid" is equal to the text that %superq(&v) evaluates to, not whether the contents of the coverageid variable equal the value in &v. You could just convert %if to if, but even if you got that to work properly it would be hideously inefficient (you're rewriting the dataset N times, so if you have 1500 values for coverageID you rewrite the entire 500MB dataset or whatnot 1500 times, instead of just once).
If what you want to do is take the variable 'coverageid' and convert it to a set of variables that consist of all possible values of coverageid, 1/0 binary, for each, there are a nubmer of ways to do it. I'm fairly sure the ETS module has a procedure that just does this, but I don't recall it off the top of my head - if you were to post this to the SAS mailing list, one of the guys there would undoubtedly have it quickly.
The simple way for me, is to do this with entirely datastep code. First determine how many potential values there are for COVERAGEID, then assign each to a direct value, then assign the value to the correct variable.
If the COVERAGEID values are consecutive (ie, 1 to some number, no skips, or you don't mind skipping) then this is easy - set up an array and iterate over it. I will assume they are NOT consecutive.
*First, get the distinct values of coverageID. There are a dozen ways to do this, this works as well as any;
proc freq data=save.test;
tables coverageid/out=coverage_values(keep=coverageid);
run;
*Then save them into a format. This converts each value to a consecutive number (so the lowest value becomes 1, the next lowest 2, etc.) This is not only useful for this step, but it can be useful in the future in converting back.;
data coverage_values_fmt;
set coverage_values;
start=coverageid;
label=_n_;
fmtname='COVERAGEF';
type='i';
call symputx('CoverageCount',_n_);
run;
*Import the created format;
proc format cntlin=coverage_values_fmt;
quit;
*Now use the created format. If you had already-consecutive values, you could skip to this step and skip the input statement - just use the value itself;
data save.test_fin;
set save.test;
array coverageids coverageid1-coverageid&coveragecount.;
do _t = 1 to &coveragecount.;
if input(coverageid,COVERAGEF.) = _t then coverageids[_t]=1;
else coverageids[_t]=0;
end;
drop _t;
run;
Here's another way that doesn't use formats, and may be easier to follow.
First, just make some test data:
data test;
input coverageid ##;
cards;
3 27 99 105
;
run;
Next, create a data set with no observations but one variable for each level of coverageid. Note that this approach allows arbitrary values here.
proc transpose data=test out=wide(drop=_name_);
id coverageid;
run;
Finally, create a new data set that combines the initial data set and the wide one. Then, for each level of x, look at each categorical variable and decide whether to turn it "on".
data want;
set test wide;
array vars{*} _:;
do i=1 to dim(vars);
vars{i} = (coverageid = substr(vname(vars{i}),2,1));
end;
drop i;
run;
The line
vars{i} = (coverageid = substr(vname(vars{i}),2));
may require more explanation. vname returns the name of the variable, and since we didn't specify a prefix in proc transpose, all variables are named something like _1, _2, etc. So we take the substring of the variable name that starts in the second position, and compare it to coverageid; if they're the same, we set the variable to 1; otherwise it evaluates to 0.