Using a vector generated in SAS/IML as a macro variable - sas

I am writing a macro that will run PROC MIXED with the level-1 residual variance fixed to a near-zero value using the PARMS statement. I am trying to generate the bulk of the starting values for the PARMS statement using SAS/IML, something like:
%macro test (dataset= , classroom= , preds= , outcome=);
proc iml;
/*count number of variables*/
%let nvars = 0;
%do %while(%qscan(&preds,&nvars+1,%str( )) ne %str());
%let nvars = %eval(&nvars+1);
%end;
/*determine location of level-1 residual in the start value vector*/
%let error_location = %eval(((&nvars*(&nvars-1))/2)+&nvars+1);
/*create vector of start values from lower triangle of identity matrix*/
start_vector = symsqr(I(&nvars));
%let starts = %str(start_vector[label=""]);
/*analyze data*/
proc mixed data=&dataset noprofile method=ml;
class &classroom;
model &outcome = &preds /noint;
random &preds /type=un sub=&classroom g;
parms
&starts
.00000001 /hold= &error_location;
run;
quit;
%mend;
The code works fine without the PARMS statement in the PROC MIXED code. When I run the code as is, however, SAS apparently puts the literal string 'start_vector[label=""]' after PARMS rather than listing the values generated by IML.
How can I avoid this error and have SAS specify the values contained in START_VECTOR as starting values for the PARMS statement?

You should use the SYMPUT or SYMPUTX routines in SAS/IML to convert a vector to a macro variable.
This is one way to get a vector into a single string in a macro variable.
proc iml;
start = {"Hi","Bye"};
call symput("start",rowcat(start`));
%put &start;
quit;
With a numeric vector, you need to use char to convert it:
proc iml;
start_vector = j(5);
call symputx("start_vector",rowcat(char(j)));
%put &start_vector;
quit;
With a numeric matrix, you need to use SHAPE to flatten it:
proc iml;
start_vector = j(5,5);
call symputx("start_vector",rowcat(shape(char(start_vector),1)));
%put &start_vector;
quit;

Your problem and two solutions are discussed in the article "Passing values from PROC IML into SAS Procedures."
Do you have to wrap this in a macro? If so, the SUBMIT and ENDSUBMIT statements won't work, since they can't be called form a macro. However, since SAS/IML enables you to define and call modules with arguments, I usually avoid the macro language and define a module that takes arguments, then call the module directly.

Related

how to generate unique random vector on each iteration?

I'm new to SAS, I would like to produce plot for each random numerical vector.
therefore I have wrapped my proc iml with a macro, and have tried to invoke it before calling the macro generate_scatter_plot. but I get the same set of points each iteration.
Can somebody please explain what is the proper way to do it SAS.
%MACRO generate_random_points();
proc iml;
N = 6;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
submit rands;
data my_data;
input x y ##;
datalines;
&rands
;
run;
endsubmit;
%MEND;
%MACRO generate_scatter_plot();
/* call execute('%generate_random_points();'); */
proc sgplot data=my_data;
scatter x=x y=y;
run;
%MEND;
data _null_;
do i = 1 to 20;
call execute('%generate_scatter_plot();');
end;
run;
I find SAS different from the rest of languages out there.
Thank you in advance to all who are willing to help!
IML is not needed, a data step loop can generate the random values
Assuming you're looking at learning macro programming
CALL EXECUTE is required in a data step but not outside the data step
CALL EXECUTE can also generate code similar to macro
MPRINT/MLOGIC options help when debugging macro code otherwise code is not displayed to log
The following expands a bit on your logic to demonstrate the functionality of macro's.
options mprint mlogic;
%macro generate_random_points(Num=);
*Macro to generate random numbers;
*number of points generated are equal to the NUM=parameter;
data my_data;
do i=1 to &num.;
x=rand('uniform');
y=rand('uniform');
output;
end;
run;
%mend;
%macro generate_scatter_plot(Num_Points=);
*create random data with specified points;
%generate_random_points(Num=&Num_Points);
*graph data;
proc sgplot data=my_data;
scatter x=x y=y;
run;
%MEND;
*Run macro with different parameters in loop;
data _null_;
do i = 3 to 5;
call execute(catt('%generate_scatter_plot(Num_Points=', i, ');'));
end;
run;
option nomprint nomlogic;
And a slight variation on your process:
data _null_;
do i = 3 to 5;
call execute(catt("Title 'Num Points = ", i, " '; ", ' %generate_scatter_plot(Num_Points=', i, ');'));
end;
run;
If you are working in IML you should not have any need to use the SAS macro language to generate code.
You already showed how you can generate the random numbers into a IML matrix.
And you can use the SUBMIT/ENDSUBMIT block to call your PROC SGPLOT code.
What you seem to be missing is the IML syntax for converting a matrix into a dataset. https://blogs.sas.com/content/iml/2011/04/18/writing-data-from-a-matrix-to-a-sas-data-set.html
proc iml;
N = 6;
x = t(1:N);
y = j(N,1);
call randgen(y, 'Uniform');
create my_data var {x y};
append;
close my_data;
submit;
proc sgplot data=my_data;
scatter x=x y=y;
run;
endsubmit;
quit;
Although you are using IML to pass data as text into a datalines statement, you really do not need to do this. There are simpler ways of achieving your goal.
SAS does everything through datasets. They're analogous to Data Frames in Pandas. If you want to create a random vector of data, you'll create it within a dataset and use that within other procedures. datalines should be avoided in production whenever possible. There are some very special cases where it is useful, but it's mainly used for sample data or prototyping.
SAS will randomly generate data based on the system clock unless you set a seed through call streaminit(). You should always get new points. A much simpler way to achieve your results is shown below. The below macro will generate a new random dataset and plot it each time you call it.
%macro generate_scatter_plot(n=100);
data random;
do i = 1 to &n;
x = rand('uniform');
y = rand('uniform');
output;
end;
drop i;
run;
proc sgplot data=random;
scatter x=x y=y;
run;
%mend;
%generate_scatter_plot(n=100);
%generate_scatter_plot(n=1000);

loop a list of variables in SAS

I have a dataset with 10+ dependent variables and several categorical variables as independent variables. I'm plan to use proc sgplot and proc mixed functions to do analysis. However, putting all variables one by one in the same function will be really time consuming. I'm pretty new to SAS, is there a way to create a loop with dependent variables and put them into the function.
Something like:
%let var_list= read math science english spanish
proc mixed data=mydata;
model var_list= gender age race/ solution;
random int/subject=School;
run;
Thank you!
SAS has a macro language you can use to generate code. But for this problem you might want to just restructure your data so that you can use BY processing instead.
data tall ;
set mydata ;
array var_list read math science english spanish ;
length varname $32 value 8;
do _n_=1 to dim(var_list);
varname=vname(var_list(_n_));
value = var_list(_n_);
output;
end;
run;
proc sort data=tall;
by varname ;
run;
Now you can process each value of VARNAME (ie 'read','math', ....) as separate analyses with one PROC MIXED call.
proc mixed data=tall;
by varname;
model value = gender age race/ solution;
random int/subject=School;
run;
I would do something like this. This creates a loop around your proc mixed -call. I didn't take a look at the proc mixed -specification, but that may not work as described in your example.
The loop works however, and loops through whatever you put in the place of the proc mixed -call and the loop is dynamically sized based on the number of elements in the dependent variable list.
First define some macro variables.
%let y_var_list = read math science english spanish;
%let x_var_list = gender age race;
%let mydata = my_student_data;
Then define the macro that does the looping.
%macro do_analysis(my_data=, y_variables=, x_variables=);
%* this checks the nr of variables in y_var_list;
%let len_var_list = %eval(%sysfunc(count(&y_variables., %quote( )))+1);
%do _i=1 %to &len_var_list;
%let y_var = %scan(&y_variables, &_i);
%put &y_var; %* just printing out the macrovar to be sure it works;
%* model specification;
proc mixed data=&my_data.; %* data given as parameter in the macro call. proc mixed probably needs some output options too, to work;
model &y_var = &x_variables/ solution; %* independent vars as a macro parameter;
random int/subject=School;
run;
%end;
%mend do_analysis;
Last but not least, remember to call your macro with the given variable lists and dataset specifications. Hope this helps!
%do_analysis(my_data=&mydata, y_variables=&y_var_list, x_variables=&x_var_list);

Using SAS SET statement with numbered macro variables

I'm trying to create a custom transformation within SAS DI Studio to do some complicated processing which I will want to reuse often. In order to achieve this, as a first step, I am trying to replicate the functionality of a simple APPEND transformation.
To this end, I've enabled multiple inputs (max of 10) and am trying to leverage the &_INPUTn and &_INPUT_count macro variables referenced here. I would like to simply use the code
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set &_INPUT0 - &&_INPUT&max_input_index;
keep col1 col2 col3;
run;
However, I receive the following error:
ERROR: Missing numeric suffix on a numbered data set list (WORK.SOME_INPUT_TABLE-WORK.ANOTHER_INPUT_TABLE)
because the macro variables are resolved to the names of the datasets they refer to, whose names do not conform to the format required for the
SET dataset1 - dataset9;
statement. How can I get around this?
Much gratitude.
You need to create a macro that loops through your list and resolves the variables. Something like
%macro list_tables(n);
%do i=1 %to &n;
&&_INPUT&i
%end;
%mend;
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set %list_tables(&max_input_index);
keep col1 col2 col3;
run;
The SET statement will need a list of the actual dataset names since they might not form a sequence of numeric suffixed names.
You could use a macro %DO loop if are already running a macro. Make sure to not generate any semi-colons inside the %DO loop.
set
%do i=1 %to &_inputcount ; &&_input&i %end;
;
But you could also use a data step to concatenate the names into a single macro variable that you could then use in the SET statement.
data _null_;
call symputx('_input1',symget('_input'));
length str $500 ;
do i=1 to &_inputcount;
str=catx(' ',str,symget(cats('_input',i)));
end;
call symputx('_input',str);
run;
data .... ;
set &_input ;
...
The extra CALL SYMPUTX() at the top of the data step will handle the case when count is one and SAS only creates the _INPUT macro variable instead of creating the series of macro variables with the numeric suffix. This will set _INPUT1 to the value of _INPUT so that the DO loop will still function.

How can I peform same datastep across many variables in SAS?

I have data that looks like this and has 500 variables with a target:
var1 var2 var3 var4 ... var500 target
The names of the variables are not sequential as above so I don't think I can use something like var1:var500. I want to loop through the variables to create graphs. Some of the variables are continous and some are nominal.
for var1 through var500
if nominal then create graphtypeA var[i] * target
else if continous then create graphtypeB var[i] * target
end;
I can easily create a second table that has the data type in it to check against. Arrays seem like they might be useful to peform this task of looping through variables. Something like:
data work.mydata;
set archive.mydata;
array myarray{501] myarray1 - myarray501
do i=1 to 500;
proc sgpanel;
panelby myarray[501];
histogram myarray[i];
end;
run;
This doesn't work though and it doens't check to see what type of variable it is. If we assume I have another sas.dataset that has varname and vartype (continuous, nominal) how can I loop through to create the desired graphs for the given vartype? Thanks in advance.
Basically, you need to loop over some variables, apply some logic to determine the variable type, then produce output depending on the variable type. While there are many approaches to this problem, one solution is to select your variables into a macro variable, loop over this "list" (not a formal data structure) of variables, and use macro control logic to designate different subroutines for numeric and character variables.
I'll use the sashelp.cars data set to illustrate. In this example the variable origin is your 'Target' variable and the variables Make, Type, Horsepower, and Cylinders are the numeric and character variables.
* get some data;
data set1 (keep = Make Type Origin Horsepower Cylinders);
set sashelp.cars;
run;
* create dataset of variable names and types;
proc contents data = set1
out = vars
noprint;
run;
* get variable names and variable types (1=numeric, 2=character)
* into two macro variable "lists" where each entry is seperated
* by a space;
proc sql noprint;
select name, type
into :varname separated by ' ', :vartype separated by ' '
from vars
where name <> "Make";
quit;
* put the macro variables to the log to confirm they are what
* you expect
%put &varname;
%put &vartype;
Now, use a macro to loop over the values in the macro variable list. The countw function counts the number of variables, and uses this number as the loop iterator limit. The scan function reads in each variable name and type by its relative position in the respective macro variable lists. For each variable the type is then evaluated and a plot is produced depending on whether it is character or numeric. In this example, a histogram with density plot is produced for numeric variables and a bar chart of frequency counts is produced for character variables.
The loop logic is general, and Proc sgpanel and Proc sgplot cab be modified or replaced with other desired data step processing or procedures.
* turn on options that are useful for
* macro debugging, turn them off
* when using in production;
options mlogic mprint symbolgen;
%macro plotter;
%do i = 1 %to %sysfunc(countw(&varname));
%let nextvar = %scan(&varname, &i, %str( ));
%let nextvartype = %scan(&vartype, &i, %str( ));
%if &nextvartype. = 1 %then %do;
proc sgpanel data=set1 noautolegend;
title "&nextvar. Distribution";
panelby Origin;
histogram &nextvar.;
density &nextvar.;
run;
%end;
%if &nextvartype. = 2 %then %do;
proc sgplot data=set1;
title "&nextvar. Count by Origin";
vbar &nextvar. /group= origin;
run;
%end;
%end;
%mend plotter;
*call the macro;
%plotter;
Unfortunately it is not possible to use arrays outside a data step in the way that you propose here, at least not in any very efficient way. However, there are quite a few options available to you. One would be just to call your graphing proc once and tell it to graph every numeric variable in your dataset, e.g. like so:
proc univariate data = sashelp.class;
var _NUMERIC_;
histogram;
run;
If the variables you want to graph that are of the same type are adjacent in the column order of your dataset, you can use a double-dash list, e.g.
proc univariate data = sashelp.class;
var age--weight;
histogram;
run;
In general you should seek to avoid calling procs or running data steps separately for every variable - it is nearly always more efficient to call them just once and process everything in one go.

Text manipulation of macro list variables to stack datasets with automated names

I have written a macro that accepts a list of variables, runs a proc mixed model using each variable as a predictor, and then exports the results to a dataset with the variable name appended to it. I am trying to figure out how to stack the results from all of the variables in a single data set.
Here is the macro:
%macro cogTraj(cog,varlist);
%let j = 1;
%let var = %scan(&varlist, %eval(&j));
%let solution = sol;
%let outsol = &solution.&var.;
%do %while (&var ne );
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &outsol;
run;
%let j = %eval(&j + 1);
%let var = %scan(&varlist, %eval(&j));
%let outsol = &solution.&var.;
%end;
%mend;
/* Example */
%cogTraj(mmmscore, varlist = bio1 bio2 bio3);
The result would be the creation of Solbio1, Solbio2, and Solbio3.
I have created a macro variable containing the "varlist" (Ideally, I'd like to input a macro variable list as the argument but I haven't figured out how to deal with the scoping):
%let biolist = bio1 bio2 bio3;
I want to stack Solbio1, Solbio2, and Solbio3 by using text manipulation to add "Sol" to the beginning of each variable. I tried the following, outside of any data step or macro:
%let biolistsol = %add_string( &biolist, Sol, location = prefix);
without success.
Ultimately, I want to do something like this;
data Solbio_stack;
set %biolistsol;
run;
with the result being a single dataset in which Solbio1, Solbio2, and Solbio3 are stacked, but I'm sure I don't have the right syntax.
Can anyone help me with the text string/dataset stacking issue? I would be extra happy if I could figure out how to change the macro to accept %biolist as the argument, rather than writing out the list variables as an argument for the macro.
I would approach this differently. A good approach for the problem is to drive it with a dataset; that's what SAS is good at, really, and it's very easy.
First, construct a dataset that has a row for each variable you're running this on, and a variable name that contains the variable name (one per row). You might be able to construct this using PROC CONTENTS or sashelp.vtable or dictionary.tables, if you're using a set of variables from one particular dataset. It can also come from a spreadsheet you import, or a text file, or anything else really - or just written as datalines, as below.
So your example would have this dataset:
data vars_run;
input name $ cog $;
datalines;
bio1 mmmscore
bio2 mmmscore
bio3 mmmscore
;;;;
run;
If your 'cog' is fairly consistent you don't need to put it in the data, if it is something that might change you might also have a variable for it in the data. I do in the above example include it.
Then, you write the macro so it does one pass on the PROC MIXED - ie, the inner part of the %do loop.
%macro cogTraj(cog=,var=, sol=sol);
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &sol.&var.;
run;
%mend cogTraj;
I put the default for &sol in there. Now, you generate one call to the macro from each row in your dataset. You also generate a list of the sol sets.
proc sql;
select cats('%cogTraj(cog=',cog,',var=',name,',sol=sol)')
into :callList
sepearated by ' '
from have;
select cats('sol',name') into :solList separated by ' '
from have;
quit;
Next, you run the macro:
&callList.
And then you can do this:
data sol_all;
set &solList.;
run;
All done, and a lot less macro variable parsing which is messy and annoying.