Counting number of different fields in SAS - sas

I have a table in SAS (using WPS Workbench) that looks like this.
ID Band_1 Band_2 Band_2 ... Band_160
1 Y Y N Y
2 N N N N
3 Y N N Y
4 N Y Y Y
..
200 Y N N Y
I want to summarise the table as follows: For each Band, I want a count of the number of Y and N values, with the table transposed (optional).
So down the left will consist of each band, and across the top will be a Y count and an N count. Or the bands can be across the top I don't mind.

Array processing is one (of several) ways to obtain your summary counts.
data have;
do id = 1 to 200;
array band(160) $1;
do _n_ = 1 to dim(band);
band(_n_) = substr('YN', 1+(ranuni(123)<0.4));
end;
output;
end;
run;
data want1(keep=column yes_n no_n);
set have end=last;
array band(160);
array Yes(160) _temporary_ (160*0);
array No(160) _temporary_ (160*0);
* accumulate counts;
do _n_ = 1 to dim(band);
if band(_n_) = 'Y' then Yes(_n_)+1; else
if band(_n_) = 'N' then No(_n_)+1;
end;
* emit counts;
if last then
do _n_ = 1 to dim(band);
column = vname(band(_n_));
yes_n = Yes(_n_);
no_n = No(_n_);
output;
end;
run;
The same 'want' data could be obtained from other techniques that use
Hash object
Transpose / Report
Transpose / Tabulate
Transpose / Freq

Related

How to add a custom fitted line to SAS SGplot Scatter

I have a simple SAS data set I am plotting as a scatter plot, my two questions are:
I am trying to adjust the y-axis without excluding the (0.02,51) data point but I need the y-axis to only show 60 to 160 by 20. When I define this it excludes that specific data point and I don't know how to fix it.
I cannot figure out how to add a custom fitted curve and display the formula. Here is my line: Y=(160.3*x)/(0.0477+x)
Here is my code:
proc sgplot data=work.sas1;
title 'Puromycin Uptake Experiments';
scatter x=x y=y/ markerattrs=(color=black);
xaxis Label='Reactant Concentration X (mg/l)';
yaxis Label='Reaction Velocity Y (mg/s)' values=(60 to 160 by 20);
run;
Can anyone please help?
Try using OFFSETMIN= to extend the yaxis beyond your values.
Add a new variable, y_hat with the values of your formula. Plot that and label it appropriately.
data sas1;
x=.02; y=67; output;
x=.02; y=51; output;
x=.06; y=84; output;
x=.06; y=86; output;
x=.11; y=98; output;
x=.11; y=115; output;
x=.22; y=131; output;
x=.22; y=124; output;
x=.56; y=144; output;
x=.56; y=158; output;
x=1.1; y=160; output;
run;
data sas1;
set sas1;
Y_hat=(160.3*x)/(0.0477+x);
run;
proc sgplot data=work.sas1;
title 'Puromycin Uptake Experiments';
scatter x=x y=y/ markerattrs=(color=black);
series x=x y=y_hat / curvelabel="Y=(160.3*x)/(0.0477+x)";
xaxis Label='Reactant Concentration X (mg/l)';
yaxis Label='Reaction Velocity Y (mg/s)' offsetmin=.1 values=(60 to 160 by 20);
run;
Produces:
y axis
There are a couple y-axis options can affect the axis rendering. consider offsetmin or a tweaked list in the values=
formula line
There is no formula statement in SGPLOT so you have to create an auxiliary column for drawing the formula in a series. Some times you can align the x's of the data with the x's of the formula. However, for the case of wanting a higher density of x's for the formula you stack the scatter and formula data. Don't get hung up on the chunks of missing values and any feelings of wastefulness.
I am not sure where your curve fit comes from, but statistical graphics (the SG in SGPLOT) has many features for fitting data built into it.
* make some example data that looks something like the fit curve;
data have;
do x = 0.03 to 1 by 0.0125;
y = ( 160.3 * x ) / ( 0.0477 + x ) ;
y + round ( 4 * ranuni(123) - 8, 0.0001);
output;
x = x * ( 1 + ranuni(123) );
end;
x = 0.02;
y = 51;
output;
run;
* generate the series data for drawing the fit curve;
* for complicated formula you may want to adjust step during iteration;
data fit;
step = 0.001;
do x = 0 to 1;
y = ( 160.3 * x ) / ( 0.0477 + x ) ;
output;
* step = step + smartly-adjusted-x-increment;
x + step;
end;
keep x y;
rename x=xfit y=yfit;
run;
* stack the scatter data and the curve fit data;
data have_stack_fit;
set have fit;
run;
proc sgplot data=have_stack_fit;
scatter x = x y = y;
series x = xfit y = yfit / legendlabel="( 160.3 * x ) / ( 0.0477 + x )";
yaxis values = (0 60 to 160 by 20) ;
run;

SaS Scenario Generation

Can someone please help with the scenario below? I am very new to SaS and am not sure how to get this to work?
Simulate 200 observations from the following linear model:
Y = alpha + beta1 * X1 + beta2 * X2 + noise
where:
• alpha=1, beta1=2, beta2=-1.5
• X1 ~ N(1, 4), X2 ~ N(3,1), noise ~ N(0,1)
I have tried this code but not sure its completely accurate:
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
DO i = 1 to 200;
Y=alpha+beta1*X1+beta2*X2+Noise;
X1=Rannor(1);
X2=rannor(3);
Noise=ranuni(0);
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
You need to have a look in the SAS help for the topics
"rannor","ranuni","generating random numbers",...
rannor: generating standard normal distributed RVs.
ranuni: uniform distributed RVs.
The argument in rannor is the seed number, not the expected value.
If N(x,y) in your example means that the random variable is normally distributed with expected value x and standard deviation y (or do you mean the variance???) then the code could be (have a look on the changed order of the statements; the definition of Y has to be after the definition of the random numbers...):
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
seed = 1234;
DO i = 1 to 200;
X1=1+4*Rannor(seed);
X2=3+rannor(seed);
Noise=rannor(seed);
Y=alpha+beta1*X1+beta2*X2+Noise;
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
There are also variants for generating random numbers, e.g. "call rannor". There are different concepts to deal with seed numbers in SAS. See the SAS help for these topics, e.g. here

How to do multiplication between two matrix using IML in SAS

I have data set named input_data below import from EXCEL.
0.353481635 0.704898683 0.078640917 0.813815803 0.510842666 0.240912872 0.986312218 0.781868961 0.682272971
0.443441526 0.653187181 0.753981865 0.34909803 0.84215961 0.793863082 0.047816942 0.176759112 0.54213244
0.21443281 0.142501578 0.927011587 0.407251043 0.290280445 0.90730524 0.677030212 0.770541244 0.915728969
0.583493041 0.685127614 0.119042255 0.067769934 0.795793907 0.405029459 0.817724346 0.594170688 0.345660875
0.816193304 0.636823417 0.036348358 0.027985453 0.117027493 0.436516667 0.593191955 0.916981676 0.574223091
0.766842249 0.743249552 0.400052263 0.809650253 0.683610082 0.42152573 0.050520292 0.329441952 0.868549022
0.112847881 0.462579082 0.526220066 0.320851313 0.944585551 0.233027402 0.66141107 0.8380858 0.120044416
0.873949265 0.118525986 0.590234323 0.481974796 0.668976582 0.466558592 0.934633956 0.643438048 0.053508922
And I have another data set called p below
data p;
input p;
datalines;
0.12
0.23
0.11
0.49
0.52
0.78
0.8
0.03
0.02
run;
proc transpose data = p out=p2;
run;
What I want to do is matrix manipulation in IML using SAS.
I have some code already, but the final calculation got error. Can someone give me a hand?
proc iml;
use input_data;
read all var _num_ into x;
print x;
proc iml;
use p2;
read all var _num_ into k;
print k;
proc iml;
Value1 = k * x;
print Value1;
quit;
You have several problems here.
First off, you have three PROC IML statements. PROC IML only persists values while it's running; once it quits, all of the vectors go away forever. So remove the PROC IMLs.
Second, you need to make sure your matrices are correctly ordered and structured. Matrix multiplication works by the following:
m x n * n x p = m x p
Where both N's must be the same. This is rows x columns, so the left-side matrix must have the same number of columns as the right-side matrix has rows. (This is because each element of each row on the left-side matrix is multiplied by the corresponding element in the column on the right-side matrix and then summed, so if the numbers don't match it's not possible to do.)
So you have 8x9 and 9x1, which you transpose to 1x9. So first off, don't transpose p, leave it 9x1. Then, make sure you have the order right (matrix multiplication is NOT commutative, the order matters). k * x means 9x1 * 8x9 which doesn't work (since 1 and 8 aren't the same - remember, the inner two numbers have to match.) x*k does work, since that is 8x9 * 9x1, the two 9s match.
Final output:
proc iml;
use input_data;
read all var _num_ into x;
print x;
use p;
read all var _num_ into k;
print k;
Value1 = x * k;
print Value1;
quit;

SAS generate normal Y~N(...)

For my SAS project I have to generate pairs of (X,Y) with a distribution Y ~ N(3 + X + .5X^2, sd = 2). I have looked at all of the SAS documentation for normal() and I see absolutely no way to do this. I have tried many different methods and am very frustrated.
I believe this is an example of what the asker wants to do:
data sample;
do i = 1 to 1000;
x = ranuni(1);
y = rand('normal', 3 + x + 0.5*x**2, 2);
output;
end;
run;
proc summary data = sample;
var x y;
output out = xy_summary;
run;
Joe is already more or less there - I think the only key point that needed addressing was making the mean of each y depend on the corresponding x, rather than using a single fixed mean for all the pairs. So rather than 1000 samples from the same Normal distribution, the above generates 1 sample from each of 1000 different Normal distributions.
I've used a uniform [0,1] distribution for x, but you could use any distribution you like.
You generate random numbers in SAS using the rand function. It has all sorts of distributions available; read the documentation to fully understand.
I'm not sure if you can directly use your PDF, but if you're able to use it with a regular normal distribution, you can do that. On top of that, most of the Univariate DFs SAS supports start out with the Uniform distribution and then apply their formula (Discrete or continuous) to that, so that might be the right way to go. That's heading into stat-land which is somewhere I'm averse to going. There isn't a direct way to simply pass a function for X as far as I know, however.
To generate [numsamp] normals with mean M and standard deviation SD:
%let m=0;
%let sd=2;
%let numsamp=100;
data want;
call streaminit(7);
do id = 1 to &numsamp;
y = rand('Normal',&m.,&sd.);
output;
end;
run;
So if I understand what you want right, this might work:
%let m=0;
%let sd=2;
%let numsamp=1000;
data want;
call streaminit(7);
do id = 1 to &numsamp;
x = rand('Normal',&m.,&sd.);
y = 0.5*x**2 + x + 3;
output;
end;
run;
proc means data=want;
var x y;
run;
X has mean 0.5 with SD 1.96 (roughly what you ask for). Y has mean 5 with SD 3.5. If you're asking for Y to have a SD of 2, i'm not sure how to do that.

Compute percentages in a PROC REPORT

PRODUCT CODE Quantity
A 1 100
A 2 150
A 3 50
total product A 300
B 1 10
B 2 15
B 3 5
total product B 30
I made a proc report and the break after product gives me the total quantity for each product. How can I compute an extra column on the right to calculate the percent quantity of product based on the subtotal?
SAS has a good example of this in their documentation, here. I reproduce a portion of this with some additional comments below. See the link for the initial datasets and formats (or create basic ones yourself).
proc report data=test nowd split="~" style(header)=[vjust=b];
format question $myques. answer myyn.;
column question answer,(n pct cum) all;
/* Since n/pct/cum are nested under answer, they are columns 2,3,4 and 5,6,7 */
/* and must be referred to as _c2_ _c3_ etc. rather than by name */
/* in the OP example this may not be the case, if you have no across nesting */
define question / group "Question";
define answer / across "Answer";
define pct / computed "Column~Percent" f=percent8.2;
define cum / computed "Cumulative~Column~Percent" f=percent8.2;
define all / computed "Total number~of answers";
/* Sum total number of ANSWER=0 and ANSWER=1 */
/* Here, _c2_ refers to the 2nd column; den0 and den1 store the sums for those. */
/* compute before would be compute before <variable> if there is a variable to group by */
compute before;
den0 = _c2_;
den1 = _c5_;
endcomp;
/* Calculate percentage */
/* Here you divide the value by its denominator from before */
compute pct;
_c3_ = _c2_ / den0;
_c6_ = _c5_ / den1;
endcomp;
/* This produces a summary total */
compute all;
all = _c2_ + _c5_;
/* Calculate cumulative percent */
temp0 + _c3_;
_c4_ = temp0;
temp1 + _c6_;
_c7_ = temp1;
endcomp;
run;