CONDITIONAL statement with sequential values in SAS - sas

2I am writing an IF statement to filter out some values which are sequential. Is there a way to write an IF statement to bring out the sequential values
data H;
input HH $;
cards;
Y1
Y2
Y3
Y4
Y5
; run;
data t;
set H;
if hh in ('Y2' -'Y4');
run;

Use the Scan function to extract the number part then filer the numbers you want:
if scan(hh,1,'Y') >= 1 & scan(hh,1,'Y') <=4;
New Datastep:
data t;
set H;
if scan(hh,1,'Y') >= 1 & scan(hh,1,'Y') <=4;
run;

You can take advantage of the fact that < and > work with character variables and sort order:
data t;
set H;
if 'Y2' <= hh <= 'Y4';
run;
However, Y22 would also be sorted between Y2 and Y4.
data H;
input HH $;
cards;
Y1
Y2
Y3
Y4
Y5
Y22
; run;
data t;
set H;
if 'Y2' <= hh <= 'Y4';
run;
So you would need to add additional logic in that case.

Related

How to add a custom fitted line to SAS SGplot Scatter

I have a simple SAS data set I am plotting as a scatter plot, my two questions are:
I am trying to adjust the y-axis without excluding the (0.02,51) data point but I need the y-axis to only show 60 to 160 by 20. When I define this it excludes that specific data point and I don't know how to fix it.
I cannot figure out how to add a custom fitted curve and display the formula. Here is my line: Y=(160.3*x)/(0.0477+x)
Here is my code:
proc sgplot data=work.sas1;
title 'Puromycin Uptake Experiments';
scatter x=x y=y/ markerattrs=(color=black);
xaxis Label='Reactant Concentration X (mg/l)';
yaxis Label='Reaction Velocity Y (mg/s)' values=(60 to 160 by 20);
run;
Can anyone please help?
Try using OFFSETMIN= to extend the yaxis beyond your values.
Add a new variable, y_hat with the values of your formula. Plot that and label it appropriately.
data sas1;
x=.02; y=67; output;
x=.02; y=51; output;
x=.06; y=84; output;
x=.06; y=86; output;
x=.11; y=98; output;
x=.11; y=115; output;
x=.22; y=131; output;
x=.22; y=124; output;
x=.56; y=144; output;
x=.56; y=158; output;
x=1.1; y=160; output;
run;
data sas1;
set sas1;
Y_hat=(160.3*x)/(0.0477+x);
run;
proc sgplot data=work.sas1;
title 'Puromycin Uptake Experiments';
scatter x=x y=y/ markerattrs=(color=black);
series x=x y=y_hat / curvelabel="Y=(160.3*x)/(0.0477+x)";
xaxis Label='Reactant Concentration X (mg/l)';
yaxis Label='Reaction Velocity Y (mg/s)' offsetmin=.1 values=(60 to 160 by 20);
run;
Produces:
y axis
There are a couple y-axis options can affect the axis rendering. consider offsetmin or a tweaked list in the values=
formula line
There is no formula statement in SGPLOT so you have to create an auxiliary column for drawing the formula in a series. Some times you can align the x's of the data with the x's of the formula. However, for the case of wanting a higher density of x's for the formula you stack the scatter and formula data. Don't get hung up on the chunks of missing values and any feelings of wastefulness.
I am not sure where your curve fit comes from, but statistical graphics (the SG in SGPLOT) has many features for fitting data built into it.
* make some example data that looks something like the fit curve;
data have;
do x = 0.03 to 1 by 0.0125;
y = ( 160.3 * x ) / ( 0.0477 + x ) ;
y + round ( 4 * ranuni(123) - 8, 0.0001);
output;
x = x * ( 1 + ranuni(123) );
end;
x = 0.02;
y = 51;
output;
run;
* generate the series data for drawing the fit curve;
* for complicated formula you may want to adjust step during iteration;
data fit;
step = 0.001;
do x = 0 to 1;
y = ( 160.3 * x ) / ( 0.0477 + x ) ;
output;
* step = step + smartly-adjusted-x-increment;
x + step;
end;
keep x y;
rename x=xfit y=yfit;
run;
* stack the scatter data and the curve fit data;
data have_stack_fit;
set have fit;
run;
proc sgplot data=have_stack_fit;
scatter x = x y = y;
series x = xfit y = yfit / legendlabel="( 160.3 * x ) / ( 0.0477 + x )";
yaxis values = (0 60 to 160 by 20) ;
run;

Counting number of different fields in SAS

I have a table in SAS (using WPS Workbench) that looks like this.
ID Band_1 Band_2 Band_2 ... Band_160
1 Y Y N Y
2 N N N N
3 Y N N Y
4 N Y Y Y
..
200 Y N N Y
I want to summarise the table as follows: For each Band, I want a count of the number of Y and N values, with the table transposed (optional).
So down the left will consist of each band, and across the top will be a Y count and an N count. Or the bands can be across the top I don't mind.
Array processing is one (of several) ways to obtain your summary counts.
data have;
do id = 1 to 200;
array band(160) $1;
do _n_ = 1 to dim(band);
band(_n_) = substr('YN', 1+(ranuni(123)<0.4));
end;
output;
end;
run;
data want1(keep=column yes_n no_n);
set have end=last;
array band(160);
array Yes(160) _temporary_ (160*0);
array No(160) _temporary_ (160*0);
* accumulate counts;
do _n_ = 1 to dim(band);
if band(_n_) = 'Y' then Yes(_n_)+1; else
if band(_n_) = 'N' then No(_n_)+1;
end;
* emit counts;
if last then
do _n_ = 1 to dim(band);
column = vname(band(_n_));
yes_n = Yes(_n_);
no_n = No(_n_);
output;
end;
run;
The same 'want' data could be obtained from other techniques that use
Hash object
Transpose / Report
Transpose / Tabulate
Transpose / Freq

SaS Scenario Generation

Can someone please help with the scenario below? I am very new to SaS and am not sure how to get this to work?
Simulate 200 observations from the following linear model:
Y = alpha + beta1 * X1 + beta2 * X2 + noise
where:
• alpha=1, beta1=2, beta2=-1.5
• X1 ~ N(1, 4), X2 ~ N(3,1), noise ~ N(0,1)
I have tried this code but not sure its completely accurate:
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
DO i = 1 to 200;
Y=alpha+beta1*X1+beta2*X2+Noise;
X1=Rannor(1);
X2=rannor(3);
Noise=ranuni(0);
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
You need to have a look in the SAS help for the topics
"rannor","ranuni","generating random numbers",...
rannor: generating standard normal distributed RVs.
ranuni: uniform distributed RVs.
The argument in rannor is the seed number, not the expected value.
If N(x,y) in your example means that the random variable is normally distributed with expected value x and standard deviation y (or do you mean the variance???) then the code could be (have a look on the changed order of the statements; the definition of Y has to be after the definition of the random numbers...):
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
seed = 1234;
DO i = 1 to 200;
X1=1+4*Rannor(seed);
X2=3+rannor(seed);
Noise=rannor(seed);
Y=alpha+beta1*X1+beta2*X2+Noise;
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
There are also variants for generating random numbers, e.g. "call rannor". There are different concepts to deal with seed numbers in SAS. See the SAS help for these topics, e.g. here

In PROC LOGISTIC which value of the parameter is modelled?

My colleague and I are running exactly the same SAS PROC LOGISTIC, but with different input files.
SAS models ooX = 1 when I do it, and ooX = 0 when he does it.
We've checked record counts and FREQ counts for the main variables. They are the same.
Type 3 analysis of effects are the same. MLE estimates are the same, except for the intercept.
Does SAS require input to be sorted a certain way?
PROC LOGISTIC data = TTTT;
class ooX Y1 Y2 Y3 Y4;
model ooX = Y1 Y2 Y3 q1 q2 q3;
RUN;
If your data are not sorted you can specify the order of your outcome variable right after calling PROC LOGISTIC.
I don't have the data, but assuming that ooX is a binary outcome variable with levels 0 and 1, the model will default to modeling ooX = 0 unless you specify that you want it in descending order.
PROC LOGISTIC data = TTTT descending; /* will model ooX = 1 */
class ooX Y1 Y2 Y3 Y4; /* Not sure if it makes sense to have your outcome in the class statement */
model ooX = Y1 Y2 Y3 q1 q2 q3;
RUN;
As explained in SAS manual (http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect030.htm)
For binary response data with event and nonevent categories, if your event category has a higher Ordered Value, then by default the nonevent is modeled.

sas- compare components of a vector without a loop inside proc iml

I'm writing a code in proc iml and I want to run an if statement that evaluates each component of a vector and returns another vector but in one step. Is there any function to do so? Here's the code:
proc iml;
use chap0; read all var{X} into X;
read all var{t} into t;
count=0;/*init count number*/
W=1;
s= exp(X*w)/(1+ exp(X*w));
s1=j(5,1,10);
do step = 1 to 6;
count=count+1;
s= exp(X*w)/(1+ exp(X*w));
if s <0.5 then s1= 0; /**in this part I need to get a vector with 0 and 1**/
if s >0.5 then s1= 1; /*I need to evaluate each component of the vector in this step*/
print s s1;
e = ssq(s - t);
g=2*(s-t)*s`*(1-s);
h=2 * s * (1 - s)` * (s * (1 - s)` + (s - t) * (1 - 2 * s)`);
o=j(1,5,1);
gg=(o*g);
hh=((o*h)*o`);
gi=gg/hh;
w1=w-gi;
s= exp(X*w)/(1+ exp(X*w));
if s <0.5 then s1= 0; /**here again:
in this part I need to get a vector with 0 and 1**/
if s >0.5 then s1= 1;
print s1;
e = ssq(s - t);
e1 = ssq(s1 - t);
w=w1;
print w w1 e e1 count;
end;
Thanks!
It's just as easy as it seems.
proc iml;
s = 1:5;
s1 = (s>3); *this assigns 1 (true) or 0 (false), for each element, based on relation to 3.;
print s1;
quit;