I just want to print the output of a simple operation to the SAS log.
For instance, how can I see the value of quantile('T', .75, 1000)? Is there a smarter way to check what does quantile('T', .75, 1000) equal to than printing it to the log?
%let t_value = quantile('T', .75, 1000);
%put &t_value.;
Others use a proc procedure as here but I cannot believe I need to create a data set to check a value...
Use the macro function %SYSFUNC to invoke non-macro (i.e. DATA step) functions in macro.
Example:
%put NOTE: quantile('T', .75, 1000) is %SYSFUNC(quantile(T, .75, 1000));
will log
11854 %put NOTE: quantile('T', .75, 1000) is %SYSFUNC(quantile(T, .75, 1000));
NOTE: quantile('T', .75, 1000) is 0.67473516460692
Tip: Literal arguments you might use in DATA step do not need to be so in %SYSFUNC invocations. Carefully examine %SYSFUNC(quantile(T, .75, 1000))
You don't need to create a data set, but another option is a data step with the _null_ data set.
data _null_;
x = quantile('T', 0.75, 1000);
put x;
run;
Related
SAS is interpreting in_year_tolerance and abs_cost as text vars in the below if statement. Therefore the if statement is testing the alphabetical order of the vars rather than the numerical order. How do I get SAS to treat them as numbers? I have tried sticking the if condition, and the macro vars, in %sysevalf but that made no difference. in_year_tolerance is 10,000,000 and cost can vary, but in the test run starts around 20,000,000 before dropping to 9,000,000 at which point it should exit the loop but doesn't.
%macro set_downward_caps(year, in_year_tolerance, large, small, start, end, increment);
%do c = &start. %to &end. %by &increment.;
%let nominal_down_large_&year. = %sysevalf(&large. + (&c. / 1000));
%let nominal_down_small_&year. = %sysevalf(&small. + (&c. / 100));
%let real_down_large_&year. = %sysevalf((1 - &&nominal_down_large_&year.) * &&rpi&year.);
%let real_down_small_&year. = %sysevalf((1 - &&nominal_down_small_&year.) * &&rpi&year.);
%rates(&year.);
proc means data = output.s_&scenario. noprint nway;
var transbill&year.;
output out = temporary (drop = _type_ _freq_) sum=cost;
run;
data _null_;
set temporary;
call symputx('cost', put(cost,best32.));
run;
data temp;
length scenario $ 30;
scenario = "&scenario.";
large = &&real_down_large_&year.;
small = &&real_down_small_&year.;
cost = &cost.;
run;
data output.summary_of_caps;
set output.summary_of_caps temp;
run;
%let abs_cost = %sysevalf(%sysfunc(abs(&cost)));
%if &in_year_tolerance. > &abs_cost. %then %return;
%end;
%mend set_downward_caps;
Use %sysevalf(&in_year_tolerance > &abs_cost,boolean).
As you have seen, compares are text based. If you put it in an %eval() or %sysevalf() the values will be interpreted as numbers.
The ,boolean option lets it know you want a TRUE/FALSE.
I'm looking for a macro or something in SAS that can help me in isolating the outliers from a dataset. I define an outlier as: Upperbound: Q3+1.5(IQR) Lowerbound: Q1-1.5(IQR). I have the following SAS code:
title 'Fall 2015';
proc univariate data = fall2015 freq;
var enrollment_count;
histogram enrollment_count / vscale = percent vaxis = 0 to 50 by 5 midpoints = 0 to 300 by 5;
inset n mean std max min range / position = ne;
run;
I would like to get rid of the outliers from fall2015 dataset. I found some macros, but no luck in working the macro. Several have a class variable which I don't have. Any ideas how to separate my data?
Here's a macro I wrote a while ago to do this, under slightly different rules. I've modified it to meet your criteria (1.5).
Use proc means to calculate Q1/Q3 and IQR (QRANGE)
Build Macro to cap based on rules
Call macro using call execute and boundaries set, using the output from step 1.
*Calculate IQR and first/third quartiles;
proc means data=sashelp.class stackods n qrange p25 p75;
var weight height;
ods output summary=ranges;
run;
*create data with outliers to check;
data capped;
set sashelp.class;
if name='Alfred' then weight=220;
if name='Jane' then height=-30;
run;
*macro to cap outliers;
%macro cap(dset=,var=, lower=, upper=);
data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;
%mend;
*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p25-1.5*qrange;
upper=p75+1.5*qrange;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;
Is it possible to repeat a data step a number of times (like you might in a %do-%while loop) where the number of repetitions depends on the result of the data step?
I have a data set with numeric variables A. I calculate a new variable result = min(1, A). I would like the average value of result to equal a target and I can get there by scaling variable A by a constant k. That is solve for k where target = average(min(1,A*k)) - where k and target are constants and A is a list.
Here is what I have so far:
filename f0 'C:\Data\numbers.csv';
filename f1 'C:\Data\target.csv';
data myDataSet;
infile f0 dsd dlm=',' missover firstobs=2;
input A;
init_A = A; /* store the initial value of A */
run;
/* read in the target value (1 observation) */
data targets;
infile f1 dsd dlm=',' missover firstobs=2;
input target;
K = 1; * initialise the constant K;
run;
%macro iteration; /* I need to repeat this macro a number of times */
data myDataSet;
retain key 1;
set myDataSet;
set targets point=key;
A = INIT_A * K; /* update the value of A /*
result = min(1, A);
run;
/* calculate average result */
proc sql;
create table estimate as
select avg(result) as estimate0
from myDataSet;
quit;
/* compare estimate0 to target and update K */
data targets;
set targets;
set estimate;
K = K * (target / estimate0);
run;
%mend iteration;
I can get the desired answer by running %iteration a few times, but Ideally I would like to run the iteration until (target - estimate0 < 0.01). Is such a thing possible?
Thanks!
I had a similar problem to this just the other day. The below approach is what I used, you will need to change the loop structure from a for loop to a do while loop (or whatever suits your purposes):
First perform an initial scan of the table to figure out your loop termination conditions and get the number of rows in the table:
data read_once;
set sashelp.class end=eof;
if eof then do;
call symput('number_of_obs', cats(_n_) );
call symput('number_of_times_to_loop', cats(3) );
end;
run;
Make sure results are as expected:
%put &=number_of_obs;
%put &=number_of_times_to_loop;
Loop over the source table again multiple times:
data final;
do loop=1 to &number_of_times_to_loop;
do row=1 to &number_of_obs;
set sashelp.class point=row;
output;
end;
end;
stop; * REQUIRED BECAUSE WE ARE USING POINT=;
run;
Two part answer.
First, it's certainly possible to do what you say. There are some examples of code that works like this available online, if you want a working, useful-code example of iterative macros; for example, David Izrael's seminal Rakinge macro, which performs a rimweighting procedure by iterating over a relatively simple process (proc freqs, basically). This is pretty similar to what you're doing. In the process it looks in the datastep at the various termination criteria, and outputs a macro variable that is the total number of criteria met (as each stratification variable separately needs to meet the termination criterion). It then checks %if that criterion is met, and terminates if so.
The core of this is two things. First, you should have a fixed maximum number of iterations, unless you like infinite loops. That number should be larger than the largest reasonable number you should ever need, often by around a factor of two. Second, you need convergence criteria such that you can terminate the loop if they're met.
For example:
data have;
x=5;
run;
%macro reduce(data=, var=, amount=, target=, iter=20);
data want;
set have;
run;
%let calc=.;
%let _i=0;
%do %until (&calc.=&target. or &_i.=&iter.);
%let _i = %eval(&_i.+1);
data want;
set want;
&var. = &var. - &amount.;
call symputx('calc',&var.);
run;
%end;
%if &calc.=&target. %then %do;
%put &var. reduced to &target. in &_i. iterations.;
%end;
%else %do;
%put &var. not reduced to &target. in &iter. iterations. Try a larger number.;
%end;
%mend reduce;
%reduce(data=have,var=x,amount=1,target=0);
That is a very simple example, but it has all of the same elements. I prefer to use do-until and increment on my own but you can do the opposite also (as %rakinge does). Sadly the macro language doesn't allow for do-by-until like the data step language does. Oh well.
Secondly, you can often do things like this inside a single data step. Even in older versions (9.2 etc.), you can do all of what you ask above in a single data step, though it might look a little clunky. In 9.3+, and particularly 9.4, there are ways to run that proc sql inside the data step and get the result back without waiting for another data step, using RUN_MACRO or DOSUBL and/or the FCMP language. Even something simple, like this:
data have;
initial_a=0.3;
a=0.3;
target=0.5;
output;
initial_a=0.6;
a=0.6;
output;
initial_a=0.8;
a=0.8;
output;
run;
data want;
k=1;
do iter=1 to 20 until (abs(target-estimate0) < 0.001);
do _n_ = 1 to nobs;
if _n_=1 then result_tot=0;
set have nobs=nobs point=_n_;
a=initial_a*k;
result=min(1,a);
result_tot+result;
end;
estimate0 = result_tot/nobs;
k = k * (target/estimate0);
end;
output;
stop;
run;
That does it all in one data step. I'm cheating a bit because I'm writing my own data step iterator, but that's fairly common in this sort of thing, and it is very fast. Macros iterating multiple data steps and proc sql steps will be much slower typically as there is some overhead from each one.
Does anybody know how to find the non-zero minimum in a row using the min function in SAS? Or any other option in SAS code?
Current code:
PIP_factor = `min(PIPAllAutos, PIPNotCovByWC, PIPCovByWC, PIPNotPrincOpByEmpls);
I think you need to use an array solution, ie
array pipArray pip:; *or whatever;
PIP_factor=9999;
do _n = 1 to dim(pipArray);
if pipArray[_n] > 0 then
PIP_factor = min(PIP_factor,pipArray[_n]);
end;
Or somesuch.
Here is another way, using the IFN function:
data null_;
PIPAllAutos = 2;
PIPNotCovByWC = .;
PIPCovByWC = 0;
PIPNotPrincOpByEmpls = 1;
PIP_factor = min(ifn(PIPAllAutos=0, . ,PIPAllAutos)
, ifn(PIPNotCovByWC=0, . ,PIPNotCovByWC)
, ifn(PIPCovByWC=0, . ,PIPCovByWC)
, ifn(PIPNotPrincOpByEmpls=0, . ,PIPNotPrincOpByEmpls)
);
put PIP_factor=;
run;
Note the min function ignores missing values; the ifn function sets zero values to missing.
Might be more typing than it's worth; offered only as an alternative. There are many ways to skin the cat.
This one doesn't suffer from 9999 limitation of the approved answer.
%macro minnonzero/parmbuff;
%local _argn _args _arg;
/* get rid of external parenthesis */
%let _args=%substr(%bquote(&syspbuff),2,%length(%bquote(&syspbuff))-2);
%let _argn=1;
min(
%do %while (%length(%scan(%bquote(&_args),&_argn,%str(|))) ne 0);
%let _arg=%scan(%bquote(&_args),&_argn,%str(|));
%if &_argn>1 %then %do;
,
%end;
ifn(&_arg=0,.,&_arg)
%let _argn=%eval(&_argn+1);
%end;
);
%mend;
You call it with pipe-separated list of arguments, e.g.
data piesek;
a=3;
b="kotek";
c=%minnonzero(a|findc(b,"z");
put c; /* 3, "kotek" has no "z" in it, so findc returns 0 */
run;
/* For each row, find the variable name corresponding to the minimum value */
proc iml;
use DATASET; /* DATASET is your dataset name of interest */
read all var _NUM_ into X[colname=VarNames]; /* read in only numerical columns */
close DATASET;
idxMin = X[, >:<]; /* find columns for min of each row */
varMin = varNames[idxMin]; /* corresponding var names */
print idxMin varMin;
For Max:
idxMax = X[, <:>];
I wasn't familiar with the operator above, SAS provides a helpful table for IML operators:
In PROC IML you can also create new datasets/append the results to your old one if you need them later on.
Full blog post: source, all credit goes to Rick Wicklin at SAS
edit: For the non-zero part, I would just do a PROC SQL using a WHERE variable is not 0 to filter before feeding it in the PROC IML. I am sure it can be done within PROC IML, but I just started using it myself. So, please comment if you know a way around it in PROC IML and I will include the fix.
I created the following macro. Proc power returns table pw_cout containing column Power. The data _null_ step assigns the value in column Power of pw_out to macro variable tpw. I want the macro to return the value of tpw, so that in the main program, I can call it in DATA step like:
data test;
set tmp;
pw_tmp=ttest_power(meanA=a, stdA=s1, nA=n1, meanB=a2, stdB=s2, nB=n2);
run;
Here is the code of the macro:
%macro ttest_power(meanA=, stdA=, nA=, meanB=, stdB=, nB=);
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw'=&power);
run;
&tpw
%mend ttest_power;
#itzy is correct in pointing out why your approach won't work. But there is a solution maintaing the spirit of your approach: you need to create a power-calculation function uisng PROC FCMP. In fact, AFAIK, to call a procedure from within a function in PROC FCMP, you need to wrap the call in a macro, so you are almost there.
Here is your macro - slightly modified (mostly to fix the symput statement):
%macro ttest_power;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw', power);
run;
%mend ttest_power;
Now we create a function that will call it:
proc fcmp outlib=work.funcs.test;
function ttest_power_fun(meanA, stdA, nA, meanB, stdB, nB);
rc = run_macro('ttest_power', meanA, stdA, nA, meanB, stdB, nB, tpw);
if rc = 0 then return(tpw);
else return(.);
endsub;
run;
And finally, we can try using this function in a data step:
options cmplib=work.funcs;
data test;
input a s1 n1 a2 s2 n2;
pw_tmp=ttest_power_fun(a, s1, n1, a2, s2, n2);
cards;
0 1 10 0 1 10
0 1 10 1 1 10
;
run;
proc print data=test;
You can't do what you're trying to do this way. Macros in SAS are a little different than in a typical programming language: they aren't subroutines that you can call, but rather just code that generate other SAS code that gets executed. Since you can't run proc power inside of a data step, you can't run this macro from a data step either. (Just imagine copying all the code inside the macro into the data step -- it wouldn't work. That's what a macro in SAS does.)
One way to do what you want would be to read each observation from tmp one at a time, and then run proc power. I would do something like this:
/* First count the observations */
data _null_;
call symputx('nobs',obs);
stop;
set tmp nobs=obs;
run;
/* Now read them one at a time in a macro and call proc power */
%macro power;
%do j=1 %to &nobs;
data _null_;
nrec = &j;
set tmp point=nrec;
call symputx('meanA',meanA);
call symputx('stdA',stdA);
call symputx('nA',nA);
call symputx('meanB',meanB);
call symputx('stdB',stdB);
call symputx('nB',nB);
stop;
run;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
proc append base=pw_out_all data=pw_out; run;
%end;
%mend;
%power;
By using proc append you can store the results of each round of output.
I haven't checked this code so it might have a bug, but this approach will work.
You can invoke a macro which calls procedures, etc. (like the example) from within a datastep using call execute(), but it can get a bit messy and difficult to debug.