I have a dataset with 10+ dependent variables and several categorical variables as independent variables. I'm plan to use proc sgplot and proc mixed functions to do analysis. However, putting all variables one by one in the same function will be really time consuming. I'm pretty new to SAS, is there a way to create a loop with dependent variables and put them into the function.
Something like:
%let var_list= read math science english spanish
proc mixed data=mydata;
model var_list= gender age race/ solution;
random int/subject=School;
run;
Thank you!
SAS has a macro language you can use to generate code. But for this problem you might want to just restructure your data so that you can use BY processing instead.
data tall ;
set mydata ;
array var_list read math science english spanish ;
length varname $32 value 8;
do _n_=1 to dim(var_list);
varname=vname(var_list(_n_));
value = var_list(_n_);
output;
end;
run;
proc sort data=tall;
by varname ;
run;
Now you can process each value of VARNAME (ie 'read','math', ....) as separate analyses with one PROC MIXED call.
proc mixed data=tall;
by varname;
model value = gender age race/ solution;
random int/subject=School;
run;
I would do something like this. This creates a loop around your proc mixed -call. I didn't take a look at the proc mixed -specification, but that may not work as described in your example.
The loop works however, and loops through whatever you put in the place of the proc mixed -call and the loop is dynamically sized based on the number of elements in the dependent variable list.
First define some macro variables.
%let y_var_list = read math science english spanish;
%let x_var_list = gender age race;
%let mydata = my_student_data;
Then define the macro that does the looping.
%macro do_analysis(my_data=, y_variables=, x_variables=);
%* this checks the nr of variables in y_var_list;
%let len_var_list = %eval(%sysfunc(count(&y_variables., %quote( )))+1);
%do _i=1 %to &len_var_list;
%let y_var = %scan(&y_variables, &_i);
%put &y_var; %* just printing out the macrovar to be sure it works;
%* model specification;
proc mixed data=&my_data.; %* data given as parameter in the macro call. proc mixed probably needs some output options too, to work;
model &y_var = &x_variables/ solution; %* independent vars as a macro parameter;
random int/subject=School;
run;
%end;
%mend do_analysis;
Last but not least, remember to call your macro with the given variable lists and dataset specifications. Hope this helps!
%do_analysis(my_data=&mydata, y_variables=&y_var_list, x_variables=&x_var_list);
Related
Suppose there are ten datasets with same structure: date and price, particularly they have same time period but different price
date price
20140604 5
20140605 7
20140607 9
I want to combine them and create a panel dataset. Since there is no name in each datasets, I attempt to add a new variable name into each data and then combine them.
The following codes are used to add name variable into each dataset
%macro name(sourcelib=,from=,going=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:obs
from mytables;
%let obs=&obs.;
select memname
into : memname1-:memname&obs.
from mytables;
quit;
%do i=1 %to &obs.;
data
&going.&&memname&i;
set
&from.&&memname&i;
name=&&memname&i;
run;
%end;
%mend;
So, is this strategy correct? Whether are there a different way to creating a panel data?
There are really two ways to setup repeated measures data. You can use the TALL method that your code will create. That is generally the most flexible. The other would be a wide format with each PRICE being stored in a different variable. That is usually less flexible, but can be easier for some analyses.
You probably do not need to use macro code or even code generation to combine 10 datasets. You might find that it is easier to just type the 10 dataset names than to write complex code to pull the names from metadata. So a data step like this will let you list any number of datasets in the SET statement and use the membername as the value for the new PANEL variable that distinguishes the source dataset.
data want ;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
panel = scan(dsn,-1,'.') ;
run;
And if your dataset names follow a pattern that can be used as a member list in the SET statement then the code is even easier to write. So you could have a list of names that have a numeric suffix.
set in1.panel1-in1.panel10 indsname=dsn ;
or perhaps names that all start with a particular prefix.
set in1.panel: indsname=dsn ;
If the different panels are for the same dates then perhaps the wide format is easier? You could then merge the datasets by DATE and rename the individual PRICE variables. That is generate a data step that looks like this:
data want ;
merge in1.panel1 (rename=(price=price1))
in1.panel2 (rename=(price=price2))
...
;
by date;
run;
Or perhaps it would be easier to add a BY statement to the data set that makes the TALL dataset and then transpose it into the WIDE format.
data tall;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
by date ;
panel = scan(dsn,-1,'.') ;
run;
proc transpose data=tall out=want ;
by date;
id panel;
var price ;
run;
I can't comment on the SQL code but the strategy is correct. Add a name to each data set and then panel on the name with the PANELBY statement.
That is a valid way to achieve what you are looking for.
You are going to need 2 . in between the macros for library.data syntax. The first . is used to concatenate. The second shows up as a ..
I assume you will want to append all of these data sets together. You can add
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i
%end;
;
run;
You can combine your loop that adds the names and that data step like this:
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i (in=d&i)
%end;
;
%do i=1 %to &obs;
if d&i then
name = &&memname&i;
%end;
run;
I have data that looks like this and has 500 variables with a target:
var1 var2 var3 var4 ... var500 target
The names of the variables are not sequential as above so I don't think I can use something like var1:var500. I want to loop through the variables to create graphs. Some of the variables are continous and some are nominal.
for var1 through var500
if nominal then create graphtypeA var[i] * target
else if continous then create graphtypeB var[i] * target
end;
I can easily create a second table that has the data type in it to check against. Arrays seem like they might be useful to peform this task of looping through variables. Something like:
data work.mydata;
set archive.mydata;
array myarray{501] myarray1 - myarray501
do i=1 to 500;
proc sgpanel;
panelby myarray[501];
histogram myarray[i];
end;
run;
This doesn't work though and it doens't check to see what type of variable it is. If we assume I have another sas.dataset that has varname and vartype (continuous, nominal) how can I loop through to create the desired graphs for the given vartype? Thanks in advance.
Basically, you need to loop over some variables, apply some logic to determine the variable type, then produce output depending on the variable type. While there are many approaches to this problem, one solution is to select your variables into a macro variable, loop over this "list" (not a formal data structure) of variables, and use macro control logic to designate different subroutines for numeric and character variables.
I'll use the sashelp.cars data set to illustrate. In this example the variable origin is your 'Target' variable and the variables Make, Type, Horsepower, and Cylinders are the numeric and character variables.
* get some data;
data set1 (keep = Make Type Origin Horsepower Cylinders);
set sashelp.cars;
run;
* create dataset of variable names and types;
proc contents data = set1
out = vars
noprint;
run;
* get variable names and variable types (1=numeric, 2=character)
* into two macro variable "lists" where each entry is seperated
* by a space;
proc sql noprint;
select name, type
into :varname separated by ' ', :vartype separated by ' '
from vars
where name <> "Make";
quit;
* put the macro variables to the log to confirm they are what
* you expect
%put &varname;
%put &vartype;
Now, use a macro to loop over the values in the macro variable list. The countw function counts the number of variables, and uses this number as the loop iterator limit. The scan function reads in each variable name and type by its relative position in the respective macro variable lists. For each variable the type is then evaluated and a plot is produced depending on whether it is character or numeric. In this example, a histogram with density plot is produced for numeric variables and a bar chart of frequency counts is produced for character variables.
The loop logic is general, and Proc sgpanel and Proc sgplot cab be modified or replaced with other desired data step processing or procedures.
* turn on options that are useful for
* macro debugging, turn them off
* when using in production;
options mlogic mprint symbolgen;
%macro plotter;
%do i = 1 %to %sysfunc(countw(&varname));
%let nextvar = %scan(&varname, &i, %str( ));
%let nextvartype = %scan(&vartype, &i, %str( ));
%if &nextvartype. = 1 %then %do;
proc sgpanel data=set1 noautolegend;
title "&nextvar. Distribution";
panelby Origin;
histogram &nextvar.;
density &nextvar.;
run;
%end;
%if &nextvartype. = 2 %then %do;
proc sgplot data=set1;
title "&nextvar. Count by Origin";
vbar &nextvar. /group= origin;
run;
%end;
%end;
%mend plotter;
*call the macro;
%plotter;
Unfortunately it is not possible to use arrays outside a data step in the way that you propose here, at least not in any very efficient way. However, there are quite a few options available to you. One would be just to call your graphing proc once and tell it to graph every numeric variable in your dataset, e.g. like so:
proc univariate data = sashelp.class;
var _NUMERIC_;
histogram;
run;
If the variables you want to graph that are of the same type are adjacent in the column order of your dataset, you can use a double-dash list, e.g.
proc univariate data = sashelp.class;
var age--weight;
histogram;
run;
In general you should seek to avoid calling procs or running data steps separately for every variable - it is nearly always more efficient to call them just once and process everything in one go.
I'm working with a rather large several dataset that are provided to me as a CSV files. When I attempt to import one of the files the data will come in fine but, the number of variables in the file is too large for SAS, so it stops reading the variable names and starts assigning them sequential numbers. In order to maintain the variable names off of the data set I read in the file with the data row starting on 1 so it did not read the first row as variable names -
proc import file="X:\xxx\xxx\xxx\Extract\Live\Live.xlsx" out=raw_names dbms=xlsx replace;
SHEET="live";
GETNAMES=no;
DATAROW=1;
run;
I then run a macro to start breaking down the dataset and rename the variables based on the first observations in each variable -
%macro raw_sas_datasets(lib,output,start,end);
data raw_names2;
raw_names;
if _n_ ne 1 then delete;
keep A -- E &start. -- &end.;
run;
proc transpose data=raw_names2 out=raw_names2;
var A -- &end.;
run;
data raw_names2;
set raw_names2;
col1=compress(col1);
run;
data raw_values;
set raw;
keep A -- E &start. -- &end.;
run;
%macro rename(old,new);
data raw_values;
set raw_values;
rename &old.=&new.;
run;
%mend rename;
data _null_;
set raw_names2;
call execute('%rename('||_name_||","||col1||")");
run;
%macro freq(var);
proc freq data=raw_values noprint;
tables &var. / out=&var.;
run;
%mend freq;
data raw_names3;
set raw_names2;
if _n_ < 6 then delete;
run;
data _null_;
set raw_names3;
call execute('%freq('||col1||")");
run;
proc sort data=raw_values;
by StudySubjectID;
run;
data &lib..&output.;
set raw_values;
run;
%mend raw_sas_datasets;
The problem I'm running into is that the variable names are now all set properly and the data is lined up correctly, but the labels are still the original SAS assigned sequential numbers. Is there any way to set all of the labels equal to the variable names?
If you just want to remove the variable labels (at which point they default to the variable name), that's easy. From the SAS Documentation:
proc datasets lib=&lib.;
modify &output.;
attrib _all_ label=' ';
run;
I suspect you have a simpler solution than the above, though.
The actual renaming step needs to be done differently. Right now it's rewriting the entire dataset over and over again - for a lot of variables that is a terrible idea. Get your rename statements all into one datastep, or into a PROC DATASETS, or something else. Look up 'list processing SAS' for details on how to do that; on this site or on google you will find lots of solutions.
You likely can get SAS to read in the whole first line. The number of variables isn't the problem; it is probably the length of the line. There's another question that I'll find if I can on this site from a few months ago that deals with this exact problem.
My preferred option is not to use PROC IMPORT for CSVs anyway; I would suggest writing a metadata table that stores the variable names and the length/types for the variables, then using that to write import code. A little more work at first, but only has to be done once per study and you guarantee PROC IMPORT isn't making silly decisions for you.
In the library sashelp is a table vcolumn. vcolumn contains all the names of your variables for each library by table. You could write a macro that puts all your variable names into macro variables and then from there set the label.
Here's some code that I put together (not very pretty) but it does what you're looking for:
data test.label_var;
x=1;
y=1;
label x = 'xx';
label y = 'yy';
run;
proc sql noprint;
select count(*) into: cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%let cnt = &cnt;
proc sql noprint;
select name into: name1 - :name&cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%macro test;
%do i = 1 %to &cnt;
proc datasets library=test nolist;
modify label_var;
label &&name&i=&&name&i;
quit;
%end;
%mend test;
%test;
I am writing a macro that will run PROC MIXED with the level-1 residual variance fixed to a near-zero value using the PARMS statement. I am trying to generate the bulk of the starting values for the PARMS statement using SAS/IML, something like:
%macro test (dataset= , classroom= , preds= , outcome=);
proc iml;
/*count number of variables*/
%let nvars = 0;
%do %while(%qscan(&preds,&nvars+1,%str( )) ne %str());
%let nvars = %eval(&nvars+1);
%end;
/*determine location of level-1 residual in the start value vector*/
%let error_location = %eval(((&nvars*(&nvars-1))/2)+&nvars+1);
/*create vector of start values from lower triangle of identity matrix*/
start_vector = symsqr(I(&nvars));
%let starts = %str(start_vector[label=""]);
/*analyze data*/
proc mixed data=&dataset noprofile method=ml;
class &classroom;
model &outcome = &preds /noint;
random &preds /type=un sub=&classroom g;
parms
&starts
.00000001 /hold= &error_location;
run;
quit;
%mend;
The code works fine without the PARMS statement in the PROC MIXED code. When I run the code as is, however, SAS apparently puts the literal string 'start_vector[label=""]' after PARMS rather than listing the values generated by IML.
How can I avoid this error and have SAS specify the values contained in START_VECTOR as starting values for the PARMS statement?
You should use the SYMPUT or SYMPUTX routines in SAS/IML to convert a vector to a macro variable.
This is one way to get a vector into a single string in a macro variable.
proc iml;
start = {"Hi","Bye"};
call symput("start",rowcat(start`));
%put &start;
quit;
With a numeric vector, you need to use char to convert it:
proc iml;
start_vector = j(5);
call symputx("start_vector",rowcat(char(j)));
%put &start_vector;
quit;
With a numeric matrix, you need to use SHAPE to flatten it:
proc iml;
start_vector = j(5,5);
call symputx("start_vector",rowcat(shape(char(start_vector),1)));
%put &start_vector;
quit;
Your problem and two solutions are discussed in the article "Passing values from PROC IML into SAS Procedures."
Do you have to wrap this in a macro? If so, the SUBMIT and ENDSUBMIT statements won't work, since they can't be called form a macro. However, since SAS/IML enables you to define and call modules with arguments, I usually avoid the macro language and define a module that takes arguments, then call the module directly.
I have written a macro that accepts a list of variables, runs a proc mixed model using each variable as a predictor, and then exports the results to a dataset with the variable name appended to it. I am trying to figure out how to stack the results from all of the variables in a single data set.
Here is the macro:
%macro cogTraj(cog,varlist);
%let j = 1;
%let var = %scan(&varlist, %eval(&j));
%let solution = sol;
%let outsol = &solution.&var.;
%do %while (&var ne );
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &outsol;
run;
%let j = %eval(&j + 1);
%let var = %scan(&varlist, %eval(&j));
%let outsol = &solution.&var.;
%end;
%mend;
/* Example */
%cogTraj(mmmscore, varlist = bio1 bio2 bio3);
The result would be the creation of Solbio1, Solbio2, and Solbio3.
I have created a macro variable containing the "varlist" (Ideally, I'd like to input a macro variable list as the argument but I haven't figured out how to deal with the scoping):
%let biolist = bio1 bio2 bio3;
I want to stack Solbio1, Solbio2, and Solbio3 by using text manipulation to add "Sol" to the beginning of each variable. I tried the following, outside of any data step or macro:
%let biolistsol = %add_string( &biolist, Sol, location = prefix);
without success.
Ultimately, I want to do something like this;
data Solbio_stack;
set %biolistsol;
run;
with the result being a single dataset in which Solbio1, Solbio2, and Solbio3 are stacked, but I'm sure I don't have the right syntax.
Can anyone help me with the text string/dataset stacking issue? I would be extra happy if I could figure out how to change the macro to accept %biolist as the argument, rather than writing out the list variables as an argument for the macro.
I would approach this differently. A good approach for the problem is to drive it with a dataset; that's what SAS is good at, really, and it's very easy.
First, construct a dataset that has a row for each variable you're running this on, and a variable name that contains the variable name (one per row). You might be able to construct this using PROC CONTENTS or sashelp.vtable or dictionary.tables, if you're using a set of variables from one particular dataset. It can also come from a spreadsheet you import, or a text file, or anything else really - or just written as datalines, as below.
So your example would have this dataset:
data vars_run;
input name $ cog $;
datalines;
bio1 mmmscore
bio2 mmmscore
bio3 mmmscore
;;;;
run;
If your 'cog' is fairly consistent you don't need to put it in the data, if it is something that might change you might also have a variable for it in the data. I do in the above example include it.
Then, you write the macro so it does one pass on the PROC MIXED - ie, the inner part of the %do loop.
%macro cogTraj(cog=,var=, sol=sol);
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &sol.&var.;
run;
%mend cogTraj;
I put the default for &sol in there. Now, you generate one call to the macro from each row in your dataset. You also generate a list of the sol sets.
proc sql;
select cats('%cogTraj(cog=',cog,',var=',name,',sol=sol)')
into :callList
sepearated by ' '
from have;
select cats('sol',name') into :solList separated by ' '
from have;
quit;
Next, you run the macro:
&callList.
And then you can do this:
data sol_all;
set &solList.;
run;
All done, and a lot less macro variable parsing which is messy and annoying.