PRODUCT CODE Quantity
A 1 100
A 2 150
A 3 50
total product A 300
B 1 10
B 2 15
B 3 5
total product B 30
I made a proc report and the break after product gives me the total quantity for each product. How can I compute an extra column on the right to calculate the percent quantity of product based on the subtotal?
SAS has a good example of this in their documentation, here. I reproduce a portion of this with some additional comments below. See the link for the initial datasets and formats (or create basic ones yourself).
proc report data=test nowd split="~" style(header)=[vjust=b];
format question $myques. answer myyn.;
column question answer,(n pct cum) all;
/* Since n/pct/cum are nested under answer, they are columns 2,3,4 and 5,6,7 */
/* and must be referred to as _c2_ _c3_ etc. rather than by name */
/* in the OP example this may not be the case, if you have no across nesting */
define question / group "Question";
define answer / across "Answer";
define pct / computed "Column~Percent" f=percent8.2;
define cum / computed "Cumulative~Column~Percent" f=percent8.2;
define all / computed "Total number~of answers";
/* Sum total number of ANSWER=0 and ANSWER=1 */
/* Here, _c2_ refers to the 2nd column; den0 and den1 store the sums for those. */
/* compute before would be compute before <variable> if there is a variable to group by */
compute before;
den0 = _c2_;
den1 = _c5_;
endcomp;
/* Calculate percentage */
/* Here you divide the value by its denominator from before */
compute pct;
_c3_ = _c2_ / den0;
_c6_ = _c5_ / den1;
endcomp;
/* This produces a summary total */
compute all;
all = _c2_ + _c5_;
/* Calculate cumulative percent */
temp0 + _c3_;
_c4_ = temp0;
temp1 + _c6_;
_c7_ = temp1;
endcomp;
run;
Related
How can I get the odds ratio and 95% confidence interval from mixed effect logistic regression in sas?
I am aware that odds ration could be derived by exponentiating the obtained estimate.
I saw this link for getting odds ratio in R but I need it in sas.
A sample dataset and code:
data herd;
call streaminit(1);
do herd = 1 to 10;
do testyear = 2005, 2015;
do Time = 1 to 6;
eta = -1 + 0.1*herd + 0.5*Time - 2*(testyear=2015);
mu = logistic(eta);
mpd = rand("Bernoulli", mu);
output;
end;
end;
end;
proc GLIMMIX data = herd;
class testyear TIME;
model MPD = TESTYEAR TIME / s dist=binary;
RANDOM HERD;
RUN;
Appreciate any advice.
For the following data, I want to create several new variables that combine Australia and Canada with different weights. In total, I would like to examine 10 different weight combinations.
Is there a way to do this where I can use the one formula and just change the weight values?
For example, rather than calculate Weight_1 to Weight_etc, can I just list the weights I want and then create the variables based on this list?
data Weighted_returns; set returns;
Weight_1 = (Australia*0.6)+(Canada*0.4);
Weight_2 = (Australia*0.5)+(Canada*0.5);
Weight_3 = (Australia*0.4)+(Canada*0.6);
run;
DATA Step does not have any sort of vector math syntax. You can use one array to arrange and reference the target variables and another to hold the weights.
Your result variables weight* would be a little conflicting with an array of weights, so I named the result variables result*
data have;
input australia canada;
datalines;
0.07 0.08
0.02 -0.001
0.05 0.01
run;
data want;
set have;
array results result_1-result_3;
array weights (3) _temporary_ (0.6 0.5 0.4);
do _n_ = 1 to dim(results);
results(_n_) = australia * weights(_n_) + canada * (1 - weights(_n_));
end;
run;
Use two weight arrays if the transformation is such that the sum of the weights to apply are not unity.
I have a table in SAS (using WPS Workbench) that looks like this.
ID Band_1 Band_2 Band_2 ... Band_160
1 Y Y N Y
2 N N N N
3 Y N N Y
4 N Y Y Y
..
200 Y N N Y
I want to summarise the table as follows: For each Band, I want a count of the number of Y and N values, with the table transposed (optional).
So down the left will consist of each band, and across the top will be a Y count and an N count. Or the bands can be across the top I don't mind.
Array processing is one (of several) ways to obtain your summary counts.
data have;
do id = 1 to 200;
array band(160) $1;
do _n_ = 1 to dim(band);
band(_n_) = substr('YN', 1+(ranuni(123)<0.4));
end;
output;
end;
run;
data want1(keep=column yes_n no_n);
set have end=last;
array band(160);
array Yes(160) _temporary_ (160*0);
array No(160) _temporary_ (160*0);
* accumulate counts;
do _n_ = 1 to dim(band);
if band(_n_) = 'Y' then Yes(_n_)+1; else
if band(_n_) = 'N' then No(_n_)+1;
end;
* emit counts;
if last then
do _n_ = 1 to dim(band);
column = vname(band(_n_));
yes_n = Yes(_n_);
no_n = No(_n_);
output;
end;
run;
The same 'want' data could be obtained from other techniques that use
Hash object
Transpose / Report
Transpose / Tabulate
Transpose / Freq
Can someone please help with the scenario below? I am very new to SaS and am not sure how to get this to work?
Simulate 200 observations from the following linear model:
Y = alpha + beta1 * X1 + beta2 * X2 + noise
where:
• alpha=1, beta1=2, beta2=-1.5
• X1 ~ N(1, 4), X2 ~ N(3,1), noise ~ N(0,1)
I have tried this code but not sure its completely accurate:
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
DO i = 1 to 200;
Y=alpha+beta1*X1+beta2*X2+Noise;
X1=Rannor(1);
X2=rannor(3);
Noise=ranuni(0);
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
You need to have a look in the SAS help for the topics
"rannor","ranuni","generating random numbers",...
rannor: generating standard normal distributed RVs.
ranuni: uniform distributed RVs.
The argument in rannor is the seed number, not the expected value.
If N(x,y) in your example means that the random variable is normally distributed with expected value x and standard deviation y (or do you mean the variance???) then the code could be (have a look on the changed order of the statements; the definition of Y has to be after the definition of the random numbers...):
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
seed = 1234;
DO i = 1 to 200;
X1=1+4*Rannor(seed);
X2=3+rannor(seed);
Noise=rannor(seed);
Y=alpha+beta1*X1+beta2*X2+Noise;
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
There are also variants for generating random numbers, e.g. "call rannor". There are different concepts to deal with seed numbers in SAS. See the SAS help for these topics, e.g. here
I am trying to create a table like this:
Here's my code which is not working:
proc tabulate data=temp out = t1;
class age gender ethnic height TRT TREATGR;
table ethnic * (N) gender * (N) age * (n mean median min max) height * (n mean median min max),
TREATGR*TRT*N;
run;
Here's the log:
127 proc tabulate data=temp out = t1;
128 class age gender ethnic height TRT TREATGR;
129 table ethnic * (N) gender * (N) age * (n mean median min max) height * (n mean median min
129! max),
130 TREATGRTRTN;
131 run;
ERROR: There are multiple statistics associated with a single table cell in the following nesting :
ETHNIC * N * TREATGR * TRT * N.
ERROR: There are multiple statistics associated with a single table cell in the following nesting :
GENDER * N * TREATGR * TRT * N.
ERROR: There are multiple statistics associated with a single table cell in the following nesting :
AGE * N * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting : AGE
* Mean * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting : AGE
* Median * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting : AGE
* Min * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting : AGE
* Max * TREATGR * TRT * N.
ERROR: There are multiple statistics associated with a single table cell in the following nesting :
HEIGHT * N * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting :
HEIGHT * Mean * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting :
HEIGHT * Median * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting :
HEIGHT * Min * TREATGR * TRT * N.
ERROR: Statistic other than N was requested without analysis variable in the following nesting :
HEIGHT * Max * TREATGR * TRT * N.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.T1 may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.T1 was not replaced because this step was stopped.
NOTE: PROCEDURE TABULATE used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
But this works
proc tabulate data=temp out = t1;
class age gender ethnic height TRT TREATGR;
table ethnic gender age height ,
TREATGR*TRT*N;
run;
But it lits all the age and heights.
CLASS variables are only used for 'cuts' of the data, ie, what defines the rows/columns. If you want mean/median/etc., ie the contents of the 'middle' of the table, then you have two choices:
Use n or pctn (or similar). Then you get effectively a 'dummy' variable that is just 1 for every row to add.
Add a var variable, which is an analysis variable and can be used for mean/median/etc.
Class variables can also be analysis variables, but they have to be declared as such (and it often doesn't do exactly what you want due to the interaction between class and analysis variables).
In your case, age and height are clearly not intended to be classification variables; they're analysis variables. You're not getting counts for each unique value, but summary statistics.
To your larger problem, you are missing something fundamental about PROC TABULATE tables that's too long to go into here; go read some tutorials. At minimum, you're confused about how rows, columns, and interactions work; all those * lead to a very different table than you're looking for. Space separates things concatenated together on the same axis, while comma separates rows from columns, and asterisk nests inside a dimension. So leaving aside the other issues, you need something like
table (race gender age height)*(n pctn),treatgr;
Order is (table/page),(row),(col).
To get the mean/median, I don't think you can do exactly that; but if you could, it would be something like
table (age*mean age*median age*n age*min age*max),treatgr;
An example of a table not too far from yours:
proc tabulate data=sashelp.class;
var height weight;
class sex age;
table age,sex*(n colpctn);
table (height*n height*mean height*median height*min height*max)
(weight*n weight*mean weight*median weight*min weight*max),sex;
run;
That's not perfect, and I suspect it's not possible to do exactly what you want in one TABULATE table (or two like above); you'd have to use PROC REPORT likely to get it to look exactly like that.