I started to learn SAS here fairly recently and am getting the basics down pretty well, but have a question regarding something that is a little outside of my current realm of knowledge. Does anyone happen to know of a way to cycle through all variables in a SAS dataset? I know how to run a do loop/array on variables in a range (x1-x99), but ideally would like to look at every variable without having to rename any variables. Basically, I'm looking to run through a dataset and change variable values when the current value = 'True'/'False'. My guess is that I'll need to use proc contents in someway here, but not really sure how to go about using it correctly. Any tips/insight would be greatly appreciated. Thanks!
You can create an array of non-similarly-named variables. You're on the right track with PROC CONTENTS, although you also can use dictionary.columns or sashelp.vcolumn, which contain basically the same information.
proc sql;
select name into :collist separated by ' '
from dictionary.columns
where memname='DATASETNAME' and libname='LIBNAME' and <other criteria>;
quit;
The variables have to be all of the same type (char/numeric) so you may want to include a criterion of variable type in your query, plus any other limiting factor you may need.
That will create a list, &collist., in a macro variable you can use in your array
array vars &collist.;
and now you can loop over the array.
You may also be able to cheat things, if all of your variables are the same type, and you know the order is fixed . The double dash list (x1--x99) is 'in variable order, all variables from x1 to x99' and doesn't require numeric suffixes or anything like that.
Finally, you also might be able to write a format in PROC FORMAT to accomplish what you need, depending on what you are intending to do (mapping TRUE to 1 and FALSE to 0 or something like that).
Adding to Joe's answer: you can overcome the requirement that all variables should be of the same type. For that you can use macro loop instead of array. Firstly you need to define the macro:
%macro loop;
%do i=1 %to %sysfunc(countw(&collist));
....
<here goes your code for changing values, where instead of a variable name
you use macro function %scan(&collist,&i)>
....
%end;
%mend loop;
and now you can paste %loop into the DATA step where you're going to process all variables.
Related
I'm trying to convert a SAS dataset column to a list of macro variables but am unsure of how indexing works in this language.
DATA _Null_;
do I = 1 to &num_or;
set CondensedOverrides4 nobs = num_or;
call symputx("Item" !! left(put(I,8.))
,"Rule", "G");
end;
run;
Right now this code creates a list of macro variables Item1,Item2,..ItemN etc. and assigns the entire column called "Rule" to each new variable. My goal is to put the first observation of "Rule" in Item1, the second observation in that column in Item2, etc.
I'm pretty new to SAS and understand you can't brute force logic in the same way as other languages but if there's a way to do this I would appreciate the guidance.
Much easier to create a series of macro variables using PROC SQL's INTO clause. You can save the number of items into a macro variable.
proc sql noprint;
select rule into :Item1-
from CondensedOverrides4
;
%let num_or=&sqlobs;
quit;
If you want to use a data step there is no need for a DO loop. The data step iterates over the inputs automatically. Put the code to save the number of observations into a macro variable BEFORE the set statement in case the input dataset is empty.
data _null_;
if eof then call symputx('num_or',_n_-1);
set CondensedOverrides4 end=eof ;
call symputx(cats('Item',_n_),rule,'g');
run;
SAS does not need loops to access each row, it does it automatically. So your code is really close. Instead of I, use the automatic variable _n_ which can function as a row counter though it's actually a step counter.
DATA _Null_;
set CondensedOverrides4;
call symputx("Item" || put(_n_,8. -l) , Rule, "G");
run;
To be honest though, if you're new to SAS using macro variables to start isn't recommended, there are usually multiple ways to avoid it anyways and I only use it if there's no other choice. It's incredibly powerful, but easy to get wrong and harder to debug.
EDIT: I modified the code to remove the LEFT() function since you can use the -l option on the PUT statement to left align the results directly.
EDIT2: Removing the quotes around RULE since I suspect it's a variable you want to store the value of, not the text string 'RULE'. If you want the macro variables to resolve to a string you would add back the quotes but that seems incorrect based on your question.
I am using this code:
LibraryName.Bla_&SomeDate._&AnotherDate.;
to create a dynamic dataset name. The code produces for example:
LibraryName.Bla_2016-10-29_2016-11-12
which SAS does not like. What can I do to fix this? I guess this would be a valid name:
LibraryName.Bla_2016_10_29_2016_11_12
One option is named literal:
LibraryName."Bla_&SomeDate._&AnotherDate."n;
That should allow you to use it. May or may not be a good idea, but it's possible.
If you prefer to use normal SAS names, you can process it in a %sysfunc call. If you only ever have - and you want them to be _ that's easy:
%let somedate=2016-10-29;
%let anotherdate=2016-11-12;
%let datasetvar = %sysfunc(translate(Bla_&somedate._&anotherdate.,_,-));
%put &=datasetvar.;
If you have other characters it could be more complex, depending on the situation; you could use the sas function nvalid to see if it would be a legal variable name (which is also more or less the same set of rules as for dataset, or 'member', names) for example.
proc means data=tableepisodes noprint;
output out=tableepisodes
mean(%ratings %dummies)=%ratings %dummies;
by ProgCodeID ProgSeasonCodeID year week
I was reading through a SAS code and I am not sure what the mean part of the code does ,
Is it that it only takes the mean of %ratings variables and attach the % dummies variables to the output ?
would really appreciate if I could get help in understanding this code snippet
That isn't a complete code snippet, and no.
It calculates the mean of the variables listed in %rating AND %dummies, assuming of course that's what is included in those macros.
Without seeing the macro definitions we can't be sure of what it is actually doing.
As written, the code is going to evaluate the means of the variables stored inside the macro variables ratings and dummies. Taking ratings as an example, we're assuming it was defined earlier on as something like:
%let ratings = good bad ugly;
So, when you pass it through the proc means, %ratings will evaluate to good bad ugly and SAS will take the means of all three variables.
You could have written the proc means function as:
proc means data = tableepisodes noprint;
by ProgCodeID ProgSeasonCodeID year week;
var good bad ugly;
output out = tableepisodes mean= / autoname;
run;
instead. (Also, note that you're overwriting your original dataset here, which you may want to avoid.)
So I have created a macro, which works perfectly fine. Within the macro, I set where the observation will begin reading, and then how many observations it will read.
But, in my proc print call, I am not able to simply do:
(firstobs=&start obs=&obs)
Because although firstobs will correctly start where I want it, obs does not cooperate, as it must be a higher number than firstobs. For example,
%testmacro(start=5, obs=3)
Does not work, because it is reading in the first 3 observations, but trying to start at observation 5. What I want the macro to do is, start at observation 5, and then read the next 3. So what I did is this:
(firstobs=&start obs=%eval((&obs-1)+&start))
This works perfectly fine when I use it. But I am just wondering if there is a simpler way to do this, rather than having to use the whole %eval... call. Is there one simple call, something like numberofobservations=...?
I don't think there is. You can only simplify your macro a little, within the %eval(). .
%let start=5;
%let obs=3;
data want;
set sashelp.class (firstobs=&start obs=%eval(&obs-1+&start));
run;
Data set options are listed here:
http://support.sas.com/documentation/cdl/en/ledsoptsref/68025/HTML/default/viewer.htm#p0h5nwbig8mobbn1u0dwtdo0c0a0.htm
You could count the obs inside the data step using a counter and only outputting the records desired, but that won't work on something like proc print and isn't efficient for larger data steps.
You could try the point= option, but I'm not familiar with that method, and again I don't think it will work with proc print.
As #Reeza said - there is not a dataset option that will do what you are looking for. You need to calculate the ending observation unfortunately, and %eval() is about as good a way to do it as any.
On a side-note, I would recommend making your macro parameter more flexible. Rather than this:
%testmacro(start=5, obs=3)
Change it to take a single parameter which will be the list of data-set options to apply:
%macro testmacro(iDsOptions);
data want;
set sashelp.class (&iDsOptions);
run;
%mend;
%testmacro(firstobs=3 obs=7);
This provides more flexibility if you need to add in additional options later, which means fewer future code changes, and it's simpler to call the macro. You also defer figuring out the observation counts in this case to the calling program which is a good thing.
I have a large table with several variables which will be input to a statistical analysis. As a statistician, I prefer all factors to be numeric, so that they work predictably in regression models, with formats to show labels for numeric values, e.g. race, sex.
when I set the original data, I rename all the character variables that I want to recode to add a suffix of c (for character values). I then use an input statement for each of them, with the corresponding best1. or best8. or appropriate format.
The problem is that the log becomes cluttered when missing variables are coded as actually missing values, eg .. I could add a line of if not missing(varc) then var = input(varc, best8.); for each variable, but this seems inefficient and hard to read.
Is there a better way to handle this?
If you're okay with eliminating the missing values message entirely (including things that 'could' be an issue) you can prepend ?? to the informat to tell it to not give you that warning.
var=input(varc,??8.);
Create a format like this and use it instead of best8..
proc format library=work;
invalue myform
'' = .
other=best8.
;
run;
options fmtsearch=(work);
Can you try to use
PROC STDIZE ..
REPONLY MISSING=0;
RUN;