SAS "Goal Seek" with Data Transformations - sas

I am attempting to replicate Excel's Goal Seek in SAS.
I would like to find a constant number that when added to the initial data the overall average of the data equals the target. This gets a bit tricky when a transformation is involved.
So my three data points (var1) are 0.78, 0.8, 0.85. The target is 0.87.
I would like to find x where AVERAGE(1/(1+EXP(-(LN(var1/(1+var1)) + x))) = 0.87
This is the code I currently have, but it gets x = 0.4803 when it should be 0.4525 (found via Excel).
data aa;
input var1 target;
datalines;
0.78 0.87
0.8 0.87
0.85 0.87
;
run;
proc model data=aa outparms=parm;
target = 1/(1+EXP(-(log(var1/(1-var1)) + x)));
fit target;
run;
I think this isn't working bc it doesn't include an average of all 3 data points. I'm not sure how to do this. Ideally I'd just be able to change the second line in the proc model node to this:
target = Avg(1/(1+EXP(-(log(var1/(1-var1)) + x))));
But that doesn't work.

proc model is primarily designed for time-series, and doesn't do well with using summary functions vertically; however, it does great when doing it horizontally. One way to resolve it would be by transposing the problem:
proc transpose data=aa out=aa_trans;
by target;
var var1;
run;
proc model data=aa_trans;
endo x;
exo COL1-COL3 target;
target = mean(1/(1+EXP(-(log(COL1/(1-COL1)) + x)))
, 1/(1+EXP(-(log(COL2/(1-COL2)) + x)))
, 1/(1+EXP(-(log(COL3/(1-COL3)) + x))) );
solve / out=solution solveprint ;
run;
We get an answer of 0.4531398172. This can be checked by directly plugging in the value:
data _null_;
set aa_trans;
x = 0.4531398172;
check = mean(1/(1+EXP(-(log(COL1/(1-COL1)) + x)))
, 1/(1+EXP(-(log(COL2/(1-COL2)) + x)))
, 1/(1+EXP(-(log(COL3/(1-COL3)) + x))) );
put '*********** ' check;
run;
This method requires additional macro programming to generalize, and may be very computationally expensive if you have many observations to transpose. To generalize it for any given number of columns, you could use the following macro program:
%macro generateEquation;
%global eq;
%let eq = ;
proc sql noprint;
select count(*)
into :total
from aa
;
quit;
%do i = 1 %to &total.;
%let eq = %cmpres(&eq 1/(1+EXP(-(log(COL&i/(1-COL&i))+x))));
%end;
%let eq = mean(%sysfunc(tranwrd(&eq, %str( ), %str(,) ) ) );
%put &eq;
%mend;
%generateEquation;
proc model data=aa_trans;
endo x;
exo COL1-COL3 target;
target = &eq.;
solve / out=solution solveprint ;
run;
Instead, you might want to reframe this problem as an optimization problem with no objective function. proc optmodel, if available at your site, lets you do this matrix manipulation. The resulting code is more complex and manual, but will give you a more generalized and computationally feasible result.
You will need to add two new variables and separate the target to a new dataset.
data aa;
input targetid obs var1;
datalines;
1 1 0.78
1 2 0.8
1 3 0.85
;
run;
data bb;
input targetid target;
datalines;
1 0.87
;
run;
proc optmodel;
set id;
set obs;
set <num,num> id_obs;
/* Constants */
number target{id};
number var1{id_obs};
read data bb into id=[targetid]
target;
read data aa into id_obs=[targetid obs]
var1;
/* Parameter of interest */
var x{id};
/* Force the solver to seek the required goal */
con avg {i in id}: target[i] = sum{<j,n> in id_obs: j=i} (1/(1+EXP(-(log(var1[j, n]/(1-var1[j, n])) + x[i]))) )
/ sum{<j,n> in id_obs: j=i} 1;
/* Check if it's the equation that we want */
expand;
/* Solve using the non-linear programming solver with no objective */
solve with nlp noobjective;
/* Output */
create data solution from [targetid] = {i in id}
x[i];
quit;
optmodel returns a similar answer: 0.4531395426, which differs by 0.0000002746 decimal places. The answers are not identical due to differing methods and optimality tolerances; however, the solution checks out.
proc sql;
select Avg(1/(1+EXP(-(log(var1/(1-var1)) + 0.4531395426))))
from aa;
quit;

Related

Finding specific values for all variables in a table using SAS EG

I have a table which contains one key id and 100 variables (x1, x2, x3 ..... x100) and i need to check every variables if there are any values stored as -9999, -8888, -7777, -6666 in of them.
For one variable i use
proc sql;
select keyid, x1
from mytable
where x1 in(-9999,-8888,-7777,-6666);
quit;
This is the data i am trying to get but it is just for one variable.
I do not have time for copying and pasting all the variables (100 times) in this basic query.
I have searched the forum but the answers i have found are a bit far from what i actually need
and since i am new to SAS i can not write a macro.
Can you help me please?
Thanks.
Try this. Just made up some sample data that resembles what you describe :-)
data have;
do key = 1 to 1e5;
array x x1 - x100;
do over x;
x = rand('integer', -10000, -5000);
end;
output;
end;
run;
data want;
set have;
array x x1 - x100;
do over x;
if x in (-9999, -8888, -7777, -6666) then do;
output;
leave;
end;
end;
run;
Don't use SQL. Instead use normal SAS code so you can take advantage of SAS syntax like ARRAYs and variable lists.
So make an array containing the variable you want to look at. Then loop over the array. There is no need to keep looking once you find one.
data want;
set mytable;
array list var1 varb another_var x1-x10 Z: ;
found=0;
do index=1 to dim(list) until (found);
found = ( list[index] in (-9999 -8888 -7777 -6666) );
end;
if found;
run;
And if you want to search all of the numeric variables you can even use the special variable list _NUMERIC_ when defining the array:
array list _numeric_;
thank you for your help i have found a solution and wanted to share it with you.
It has some points that needs to be evaluated but it is fine for me now. (gets the job done)
`%LET LIB = 'LIBRARY';
%LET MEM = 'GIVENTABLE';
%PUT &LIB &MEM;
PROC SQL;
SELECT
NAME INTO :VARLIST SEPARATED BY ' '
FROM DICTIONARY.COLUMNS
WHERE
LIBNAME=&LIB
AND
MEMNAME=&MEM
AND
TYPE='num';
QUIT;
%PUT &VARLIST;
%MACRO COUNTS(INPUT);
%LOCAL i NEXT_VAR;
%DO i=1 %TO %SYSFUNC(COUNTW(&VARLIST));
%LET NEXT_VAR = %SCAN(&VARLIST, &i);
PROC SQL;
CREATE TABLE &NEXT_VAR AS
SELECT
COUNT(ID) AS NUMBEROFDESIREDVALUES
FROM &INPUT
WHERE
&NEXT_VAR IN (6666, 7777, 8888, 9999)
GROUP BY
&NEXT_VAR;
QUIT;
%END;
%MEND;
%COUNTS(GIVENTABLE);`
The answer you provided to your own question gives more insight to what you really wanted. However, the solution you offered while it works is not very efficient. The SQL statement runs 100 times for each variable in the source data. That means the source table is read 100 times. Another problem is that it creates 100 output tables. Why?
A better solution is to create 1 table that contains the counts for each of the 100 variables. Even better is to do it in 1 pass of the source data instead of 100.
data sum;
set have end=eof;
array x(*) x:;
array csum(100) _temporary_;
do i = 1 to dim(x);
x(i) = (x(i) in (-9999, -8888, -7777, -6666)); * flag (0 or 1) those meeting criteria;
csum(i) + x(i); * cumulative count;
if eof then do;
x(i) = csum(i); * move the final total to the orig variable;
end;
end;
if eof then output; * only output the final obs which has the totals;
drop key i;
run;
Partial result:
x1 x2 x3 x4 x5 x6 x7 x8 ...
90 84 88 85 81 83 59 71 ...
You can keep it in that form or you can transpose it.
proc transpose data=sum out=want (rename=(col1=counts))
name=variable;
run;
Partial result:
variable counts
x1 90
x2 84
x3 88
x4 85
x5 81
... ...

SaS 9.4: How to use different weights on the same variable without datastep or proc sql

I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here

SAS Macro: How can I record proc means output in one dataset?

[I have this piece of code. However, the Macro in proc univariate generate too many separate dataset due to loop t from 1 to 310. How can I modify this code to include all proc univariate output into one dataset and then modify the rest of the code for a more efficient run?]
%let L=10; %* 10th percentile *;
%let H=%eval(100 - &L); %* 90th percentile*;
%let wlo=V1&L V2&L V3&L ;
%let whi=V1&H V2&H V3&H ;
%let wval=wV1 wV2 wV3 ;
%let val=V1 V2 V3;
%macro winsorise();
%do v=1 %to %sysfunc(countw(&val));
%do t=1 %to 310;
proc univariate data=regressors noprint;
var &val;
output out=_winsor&t._V&v pctlpts=&H &L
prtlpre=&val&t._V&v;
where time_count<=&t;run;
%end;
data regressors (drop=__:);
set regressors;
if _n_=1 then set _winsor&t._V&v;
&wval&t._V&v=min(max(&val&t._V&v,&wlo&t._V&v),&whi&t._V&v);
run;
%end;
%mend;
Thank you.
Presume you have data time_count, x1, x2, x3 with samples at every 0.5 time unit.
data regressors;
call streaminit(123);
do time_count = 0 to 310 by .5;
x1 = 2 ** (sin(time_count/6) * log(time_count+1));
x2 = log2 (time_count+1) + log(time_count/10+.1);
x3 = rand('normal',
output;
end;
format x: 7.3;
run;
Stack the data into groups based on integer time_count levels. The stack is constructed from a full outer join with a less than (<=) criteria. Each group is identified by the top time_count in the group.
proc sql;
create table stack as
select
a.time_count
, a.x1
, a.x2
, a.x3
, b.time_count as time_count_group /* save top value in group variable */
from regressors as a
full join regressors as b /* self full join */
on a.time_count <= b.time_count /* triangular criteria */
where
int(b.time_count)=b.time_count /* select integer top values */
order by
b.time_count, a.time_count
;
quit;
Now compute ALL your stats for ALL your variables for ALL your groups in one go. No macro, no muss, no fuss.
proc univariate data=stack noprint;
by time_count_group;
var x1 x2 x3;
output out=_winsor n=group_size pctlpts=90 10 pctlpre=x1_ x2_ x3_;
run;

How to write a concise list of variables in table of a freq when the variables are differentiated only by a suffix?

I have a dataset with some variables named sx for x = 1 to n.
Is it possible to write a freq which gives the same result as:
proc freq data=prova;
table s1 * s2 * s3 * ... * sn /list missing;
run;
but without listing all the names of the variables?
I would like an output like this:
S1 S2 S3 S4 Frequency
A 10
A E 100
A E J F 300
B 10
B E 100
B E J F 300
but with an istruction like this (which, of course, is invented):
proc freq data=prova;
table s1:sn /list missing;
run;
Why not just use PROC SUMMARY instead?
Here is an example using two variables from SASHELP.CARS.
So this is PROC FREQ code.
proc freq data=sashelp.cars;
where make in: ('A','B');
tables make*type / list;
run;
Here is way to get counts using PROC SUMMARY
proc summary missing nway data=sashelp.cars ;
where make in: ('A','B');
class make type ;
output out=want;
run;
proc print data=want ;
run;
If you need to calculate the percentages you can instead use the WAYS statement to get both the overall and the individual cell counts. And then add a data step to calculate the percentages.
proc summary missing data=sashelp.cars ;
where make in: ('A','B');
class make type ;
ways 0 2 ;
output out=want;
run;
data want ;
set want ;
retain total;
if _type_=0 then total=_freq_;
percent=100*_freq_/total;
run;
So if you have 10 variables you would use
ways 0 10 ;
class s1-s10 ;
If you just want to build up the string "S1*S2*..." then you could use a DO loop or a macro %DO loop and put the result into a macro variable.
data _null_;
length namelist $200;
do i=1 to 10;
namelist=catx('*',namelist,cats('S',i));
end;
call symputx('namelist',namelist);
run;
But here is an easy way to make such a macro variable from ANY variable list not just those with numeric suffixes.
First get the variables names into a dataset. PROC TRANSPOSE is a good way if you use the OBS=0 dataset option so that you only get the _NAME_ column.
proc transpose data=have(obs=0) ;
var s1-s10 ;
run;
Then use PROC SQL to stuff the names into a macro variable.
proc sql noprint;
select _name_
into :namelist separated by '*'
from &syslast
;
quit;
Then you can use the macro variable in your TABLES statement.
proc freq data=have ;
tables &namelist / list missing ;
run;
Car':
In short, no. There is no shortcut syntax for specifying a variable list that crosses dimension.
In long, yes -- if you create a surrogate variable that is an equivalent crossing.
Discussion
Sample data generator:
%macro have(top=5);
%local index;
data have;
%do index = 1 %to &top;
do s&index = 1 to 2+ceil(3*ranuni(123));
%end;
array V s:;
do _n_ = 1 to 5*ranuni(123);
x = ceil(100*ranuni(123));
if ranuni(123) < 0.1 then do;
ix = ceil(&top*ranuni(123));
h = V(ix);
V(ix) = .;
output;
V(ix) = h;
end;
else
output;
end;
%do index = 1 %to &top;
end;
%end;
run;
%mend;
%have;
As you probably noticed table s: created one freq per s* variable.
For example:
title "One table per variable";
proc freq data=have;
tables s: / list missing ;
run;
There is no shortcut syntax for specifying a variable list that crosses dimension.
NOTE: If you specify out=, the column names in the output data set will be the last variable in the level. So for above, the out= table will have a column "s5", but contain counts corresponding to combinations for each s1 through s5.
At each dimensional level you can use a variable list, as in level1 * (sublev:) * leaf. The same caveat for out= data applies.
Now, reconsider the original request discretely (no-shortcut) crossing all the s* variables:
title "1 table - 5 columns of crossings";
proc freq data=have;
tables s1*s2*s3*s4*s5 / list missing out=outEach;
run;
And, compare to what happens when a data step view uses a variable list to compute a surrogate value corresponding to the discrete combinations reported above.
data haveV / view=haveV;
set have;
crossing = catx(' * ', of s:); * concatenation of all the s variables;
keep crossing;
run;
title "1 table - 1 column of concatenated crossings";
proc freq data=haveV;
tables crossing / list missing out=outCat;
run;
Reality check with COMPARE, I don't trust eyeballs. If zero rows with differences (per noequal) then the out= data sets have identical counts.
proc compare noprint base=outEach compare=outCat out=diffs outnoequal;
var count;
run;
----- Log -----
NOTE: There were 31 observations read from the data set WORK.OUTEACH.
NOTE: There were 31 observations read from the data set WORK.OUTCAT.
NOTE: The data set WORK.DIFFS has 0 observations and 3 variables.
NOTE: PROCEDURE COMPARE used (Total process time)

How to perform likelihood ratio test on logistic regression in SAS?

I want to perform the standard likelihood ratio test in logsitic regression using SAS. I will have a full logistic model, containing all variables, named A and a nested logistic model B, which is derived by dropping out one variable from A.
If I want to test whether that drop out variable is significant or not, I shall perform a likelihood ratio test of model A and B. Is there an easy way to perform this test (essentially a chi-square test) in SAS using a PROC? Thank you very much for the help.
If you want to perform likelihood ratio tests that are full model v.s. one variable dropped model, you can use the GENMOD procedure with the type3 option.
Script:
data d1;
do z = 0 to 2;
do y = 0 to 1;
do x = 0 to 1;
input n ##;
output;
end; end; end;
cards;
100 200 300 400
50 100 150 200
50 100 150 200
;
proc genmod data = d1;
class y z;
freq n;
model x = y z / error = bin link = logit type3;
run;
Output:
LR Statistics For Type 3 Analysis
Chi-
Source DF Square Pr > ChiSq
y 1 16.09 <.0001
z 2 0.00 1.0000
I'm not sure about a PROC statement that can specifically perform LRT but you can compute the test for nested models.
Script
proc logistic data = full_model;
model dependent_var = independent_var(s);
ods output GlobalTests = GlobalTests_full;
run;
data _null_;
set GlobalTests_full;
if test = "Likelihood Ratio" then do;
call symput("ChiSq_full", ChiSq);
call symput("DF_full", DF);
end;
run;
proc logistic data = reduced_model;
model dependent_var = independent_var(s);
ods output GlobalTests = GlobalTests_reduced;
run;
data _null_;
set GlobalTests_reduced;
if test = "Likelihood Ratio" then do;
call symput("ChiSq_reduced", ChiSq);
call symput("DF_reduced", DF);
end;
run;
data LRT_result;
LR = &ChiSq_full - &ChiSq_reduced;
DF = &DF_full - &DF_reduced;
p = 1 - probchi(ChiSq,DF);
run;
I'm no expert on logistic regression, but I think what you are trying to accomplish can be done with PROC LOGISTIC, using the "SELECTION=SCORE" option on the MODEL statement. There are other SELECTION options available, such as STEPWISE, but I think SCORE matches closest to what you are looking for. I would suggest reading up on it though, because there are some associated options (BEST=, START= STOP=) that you might benefit from too.