How do I use the output variable of one PROC into another PROC.
I'm new to SAS and have spend many hours trying to solve this problem in the program below.
DATA FA2;
SET FA2;
proc iml;
start main;
use FA2;
read all var {Close};
s = Close;
u = j(nrow(s)-1,1,0);
do i=2 to nrow(s);
u[i-1]=log(s[i]/s[i-1]);
end;
n=nrow(s)-1;
rsigma=sqrt(252/n*(u'*u));
mu = mean(u);
call qntl(q, u);
print q[rowname={"P05", "P95"}];
s = quartile(u);
PRINT mu, rsigma, s;
finish;
run;
PROC UNIVARIATE DATA = FA2;
var u;
run;
ERROR
PROC UNIVARIATE DATA = FA2;
var u;
Variable U not found.
run;
Your first two lines accomplish nothing and are not useful - remove them.
data fa2;
set fa2;
You are not creating any output data set within the IML statement, and you really shouldn't reuse the same data set name (FA2) so many times. It makes it harder to debug your code. See the example below.
proc iml;
/** create SAS data set from vectors **/
y = {1,0,3,2,0,3}; /** 6 x 1 vector **/
z = {8,7,6,5,6}; /** 5 x 1 vector **/
c = {A A A B B B}; /** 1 x 6 character vector **/
create MyData var {y z c}; /** create data set **/
append; /** write data in vectors **/
close MyData; /** close the data set **/
quit;
IML ends with a QUIT; at the end.
PROC UNIVARIATE is referencing your FA2 data set which does not have U, because you did not save the variable to the data set so it doesn't exist. So it is correctly generating an error. Fixing the above issues should resolve this.
Related
I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here
I have a process flow in SAS Enterprise Guide which is comprised mainly of Data views rather than tables, for the sake of storage in the work library.
The problem is that I need to calculate percentiles (using proc univariate) from one of the data views and left join this to the final table (shown in the screenshot of my process flow).
Is there any way that I can specify the outfile in the univariate procedure as being a data view, so that the procedure doesn't calculate everything prior to it in the flow? When the percentiles are left joined to the final table, the flow is calculated again so I'm effectively doubling my processing time.
Please find the code for the univariate procedure below
proc univariate data=WORK.QUERY_FOR_SGFIX noprint;
var CSA_Price;
by product_id;
output out= work.CSA_Percentiles_Prod
pctlpre= P
pctlpts= 40 to 60 by 10;
run;
In SAS, my understanding is that procs such as proc univariate cannot generally produce views as output. The only workaround I can think of would be for you to replicate the proc logic within a data step and produce a view from the data step. You could do this e.g. by transposing your variables into temporary arrays and using the pctl function.
Here's a simple example:
data example /view = example;
array _height[19]; /*Number of rows in sashelp.class dataset*/
/*Populate array*/
do _n_ = 1 by 1 until(eof);
set sashelp.class end = eof;
_height[_n_] = height;
end;
/*Calculate quantiles*/
array quantiles[3] q40 q50 q60;
array points[3] (40 50 60);
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/*Keep only the quantiles we calculated*/
keep q40--q60;
run;
With a bit more work, you could also make this approach return percentiles for individual by groups rather than for the whole dataset at once. You would need to write a double-DOW loop to do this, e.g.:
data example;
array _height[19];
array quantiles[3] q40 q50 q60;
array points[3] _temporary_ (40 50 60);
/*Clear heights array between by groups*/
call missing(of _height[*]);
/*Populate heights array*/
do _n_ = 1 by 1 until(last.sex);
set class end = eof;
by sex;
_height[_n_] = height;
end;
/*Calculate quantiles*/
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/* Output all rows from input dataset, with by-group quantiles attached*/
do _n_ = 1 to _n_;
set class;
output;
end;
keep name sex q40--q60;
run;
I am attempting to replicate Excel's Goal Seek in SAS.
I would like to find a constant number that when added to the initial data the overall average of the data equals the target. This gets a bit tricky when a transformation is involved.
So my three data points (var1) are 0.78, 0.8, 0.85. The target is 0.87.
I would like to find x where AVERAGE(1/(1+EXP(-(LN(var1/(1+var1)) + x))) = 0.87
This is the code I currently have, but it gets x = 0.4803 when it should be 0.4525 (found via Excel).
data aa;
input var1 target;
datalines;
0.78 0.87
0.8 0.87
0.85 0.87
;
run;
proc model data=aa outparms=parm;
target = 1/(1+EXP(-(log(var1/(1-var1)) + x)));
fit target;
run;
I think this isn't working bc it doesn't include an average of all 3 data points. I'm not sure how to do this. Ideally I'd just be able to change the second line in the proc model node to this:
target = Avg(1/(1+EXP(-(log(var1/(1-var1)) + x))));
But that doesn't work.
proc model is primarily designed for time-series, and doesn't do well with using summary functions vertically; however, it does great when doing it horizontally. One way to resolve it would be by transposing the problem:
proc transpose data=aa out=aa_trans;
by target;
var var1;
run;
proc model data=aa_trans;
endo x;
exo COL1-COL3 target;
target = mean(1/(1+EXP(-(log(COL1/(1-COL1)) + x)))
, 1/(1+EXP(-(log(COL2/(1-COL2)) + x)))
, 1/(1+EXP(-(log(COL3/(1-COL3)) + x))) );
solve / out=solution solveprint ;
run;
We get an answer of 0.4531398172. This can be checked by directly plugging in the value:
data _null_;
set aa_trans;
x = 0.4531398172;
check = mean(1/(1+EXP(-(log(COL1/(1-COL1)) + x)))
, 1/(1+EXP(-(log(COL2/(1-COL2)) + x)))
, 1/(1+EXP(-(log(COL3/(1-COL3)) + x))) );
put '*********** ' check;
run;
This method requires additional macro programming to generalize, and may be very computationally expensive if you have many observations to transpose. To generalize it for any given number of columns, you could use the following macro program:
%macro generateEquation;
%global eq;
%let eq = ;
proc sql noprint;
select count(*)
into :total
from aa
;
quit;
%do i = 1 %to &total.;
%let eq = %cmpres(&eq 1/(1+EXP(-(log(COL&i/(1-COL&i))+x))));
%end;
%let eq = mean(%sysfunc(tranwrd(&eq, %str( ), %str(,) ) ) );
%put &eq;
%mend;
%generateEquation;
proc model data=aa_trans;
endo x;
exo COL1-COL3 target;
target = &eq.;
solve / out=solution solveprint ;
run;
Instead, you might want to reframe this problem as an optimization problem with no objective function. proc optmodel, if available at your site, lets you do this matrix manipulation. The resulting code is more complex and manual, but will give you a more generalized and computationally feasible result.
You will need to add two new variables and separate the target to a new dataset.
data aa;
input targetid obs var1;
datalines;
1 1 0.78
1 2 0.8
1 3 0.85
;
run;
data bb;
input targetid target;
datalines;
1 0.87
;
run;
proc optmodel;
set id;
set obs;
set <num,num> id_obs;
/* Constants */
number target{id};
number var1{id_obs};
read data bb into id=[targetid]
target;
read data aa into id_obs=[targetid obs]
var1;
/* Parameter of interest */
var x{id};
/* Force the solver to seek the required goal */
con avg {i in id}: target[i] = sum{<j,n> in id_obs: j=i} (1/(1+EXP(-(log(var1[j, n]/(1-var1[j, n])) + x[i]))) )
/ sum{<j,n> in id_obs: j=i} 1;
/* Check if it's the equation that we want */
expand;
/* Solve using the non-linear programming solver with no objective */
solve with nlp noobjective;
/* Output */
create data solution from [targetid] = {i in id}
x[i];
quit;
optmodel returns a similar answer: 0.4531395426, which differs by 0.0000002746 decimal places. The answers are not identical due to differing methods and optimality tolerances; however, the solution checks out.
proc sql;
select Avg(1/(1+EXP(-(log(var1/(1-var1)) + 0.4531395426))))
from aa;
quit;
Is there an equivalent of R's function predict(model, data) in SAS?
For example, how would you apply the model below to a large test data set where the response variable "Age" is unknown?
proc reg data=sashelp.class;
model Age = Height Weight ;
run;
I understand you can extract the formula Age = Intercept + Height(Estimate_height) + Weight(Estimate_weight) from the results window and manually predict "Age" for unknown observations, but that's not very efficient.
SAS does this by itself. As long as the model has enough data points to go on, it will output the predicted value. I've used proc glm, but you can use any model procedure to create this kind of output.
/* this is a sample dataset */
data mydata;
input age weight dataset $;
cards;
1 10 mydata
2 11 mydata
3 12 mydata
4 15 mydata
5 12 mydata
;
run;
/* this is a test dataset. It needs to have all of the variables that you'll use in the model */
data test;
input weight dataset $;
cards;
6 test
7 test
10 test
;
run;
/* append (add to the bottom) the test to the original dataset */
proc append data=test base=mydata force; run;
/* you can look at mydata to see if that worked, the dependent var (age) should be '.' */
/* do the model */
proc glm data=mydata;
model age = weight/p clparm; /* these options after the '/' are to show predicte values in results screen - you don't need it */
output out=preddata predicted=pred lcl=lower ucl=upper; /* this line creates a dataset with the predicted value for all observations */
run;
quit;
/* look at the dataset (preddata) for the predicted values */
proc print data=preddata;
where dataset='test';
run;
Does anybody know how to find the non-zero minimum in a row using the min function in SAS? Or any other option in SAS code?
Current code:
PIP_factor = `min(PIPAllAutos, PIPNotCovByWC, PIPCovByWC, PIPNotPrincOpByEmpls);
I think you need to use an array solution, ie
array pipArray pip:; *or whatever;
PIP_factor=9999;
do _n = 1 to dim(pipArray);
if pipArray[_n] > 0 then
PIP_factor = min(PIP_factor,pipArray[_n]);
end;
Or somesuch.
Here is another way, using the IFN function:
data null_;
PIPAllAutos = 2;
PIPNotCovByWC = .;
PIPCovByWC = 0;
PIPNotPrincOpByEmpls = 1;
PIP_factor = min(ifn(PIPAllAutos=0, . ,PIPAllAutos)
, ifn(PIPNotCovByWC=0, . ,PIPNotCovByWC)
, ifn(PIPCovByWC=0, . ,PIPCovByWC)
, ifn(PIPNotPrincOpByEmpls=0, . ,PIPNotPrincOpByEmpls)
);
put PIP_factor=;
run;
Note the min function ignores missing values; the ifn function sets zero values to missing.
Might be more typing than it's worth; offered only as an alternative. There are many ways to skin the cat.
This one doesn't suffer from 9999 limitation of the approved answer.
%macro minnonzero/parmbuff;
%local _argn _args _arg;
/* get rid of external parenthesis */
%let _args=%substr(%bquote(&syspbuff),2,%length(%bquote(&syspbuff))-2);
%let _argn=1;
min(
%do %while (%length(%scan(%bquote(&_args),&_argn,%str(|))) ne 0);
%let _arg=%scan(%bquote(&_args),&_argn,%str(|));
%if &_argn>1 %then %do;
,
%end;
ifn(&_arg=0,.,&_arg)
%let _argn=%eval(&_argn+1);
%end;
);
%mend;
You call it with pipe-separated list of arguments, e.g.
data piesek;
a=3;
b="kotek";
c=%minnonzero(a|findc(b,"z");
put c; /* 3, "kotek" has no "z" in it, so findc returns 0 */
run;
/* For each row, find the variable name corresponding to the minimum value */
proc iml;
use DATASET; /* DATASET is your dataset name of interest */
read all var _NUM_ into X[colname=VarNames]; /* read in only numerical columns */
close DATASET;
idxMin = X[, >:<]; /* find columns for min of each row */
varMin = varNames[idxMin]; /* corresponding var names */
print idxMin varMin;
For Max:
idxMax = X[, <:>];
I wasn't familiar with the operator above, SAS provides a helpful table for IML operators:
In PROC IML you can also create new datasets/append the results to your old one if you need them later on.
Full blog post: source, all credit goes to Rick Wicklin at SAS
edit: For the non-zero part, I would just do a PROC SQL using a WHERE variable is not 0 to filter before feeding it in the PROC IML. I am sure it can be done within PROC IML, but I just started using it myself. So, please comment if you know a way around it in PROC IML and I will include the fix.