In SAS proc report, a computed column is calculated row by row.
This applies to the summary lines too, but that is not always wat you want.
As an example, take this study of the Body Mass Index in SASHELP.CLASS:
title Study Body Mass Index (BMI) by sex in class;
title2 Erroneously calculate average BMI from the average weight and height;
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean noprint;
define weight / analysis mean noprint;
define kg / computed format = 6.2 'Weight*(kg)';
define m / computed format = 6.2 'Height*(meter)';
define BMI / computed format = 6.2 'BMI*(kg/m²)';
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
BMI = kg/m/m;
if name eq '' then name = 'mean';
endcomp;
break after sex /summarize;
run;
It is wrong, because the BMI is not in the summary, i.e. for mean, is not the mean of the above BMI's, it is calculated from the height and weight left of it.
This is a correct calculation, summing BMI's and counting students manually.
title2 manually : summing BMI and counting students;
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean noprint;
define weight / analysis mean noprint;
define kg / computed format = 6.2 'weight*(kg)';
define m / computed format = 6.2 'height*(meter)';
define BMI / computed format = 6.2 'body mass*(kg/m²)';
* initialize the sum and counter *;
compute before sex;
sumBMI = 0;
count = 0;
endcomp;
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
if name eq '' then do;
name = 'mean';
* use the sum and counter *;
BMI = sumBMI / count;
end;
else do;
BMI = kg/m/m;
* increase the sum and counter *
sumBMI = sumBMI + BMI;
count = count + 1;
end;
endcomp;
break after sex /summarize;
run;
Is there a way to let proc report itself do the averaging correctly?
You could say I want to do analysis on a computed column, but you can only define a column an analysis column if it is on the input dataset.
Create an alias column of an existing data set column.
Redo the BMI computation for the alias column.
In the summary line apply the alias column mean to the BMI column
In this example the column alias weight=bmiX is used.
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg weight=bmiX BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean ;
define weight / analysis mean ;
define kg / computed format = 6.2 'Weight*(kg)';
define m / computed format = 6.2 'Height*(meter)';
define BMI / computed format = 6.2 'BMI*(kg/m²)';
* define bmiX / noprint;
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
BMI = kg/m/m;
if name eq '' then do;
name = 'mean';
BMI = bmiX;
end;
endcomp;
compute bmiX;
bmiX = kg/m/m;
endcomp;
break after sex /summarize;
run;
In my Proc Tabulate output, the class headings are above the class levels. Is there a way to move the class headings into a column of their own that sits next to the class levels? In the desired output in the image, the class heading of 'Education' is in it's own cell next to the class levels. How can I accomplish this?
Class Headings example
PROC FORMAT;
PICTURE PCTF (ROUND) OTHER='009.9%';
RUN;
ODS HTML PATH="%SYSFUNC(GETOPTION(WORK) )" STYLE=JOURNAL1A;
TITLE "Question 21x";
PROC TABULATE DATA = 208s;
CLASS EDUC
AREA
AGE
SEX
CENRACE
POVERTY
EDUC
INSURE
HEALTH
Q21x;
CLASSLEV EDUC AREA AGE SEX CENRACE POVERTY EDUC INSURE HEALTH Q21x ;
TABLE AREA = 'Area in Region' * (ROWPCTN=' '*f=PCTF.)
AGE = 'Age' * (ROWPCTN=' '*f=PCTF.)
SEX * (ROWPCTN=' '*f=PCTF.)
CENRACE = 'Race' * (ROWPCTN=' '*f=PCTF.)
POVERTY = 'Poverty Status' * (ROWPCTN=' '*f=PCTF.)
EDUC * (ROWPCTN=' '*f=PCTF.)
INSURE * (ROWPCTN=' '*f=PCTF.)
HEALTH * (ROWPCTN=' '*f=PCTF.) , Q21x = ' ';
RUN;
You can transpose existing data into a categorical form, which will give you greater control over the row dimension layout. Move the ROWPCTN into the column dimension to eliminate the blank column (in the row header) that would otherwise appear if ROWPCTN was in the row dimension. Use NOCELLMERGE to prevent merged cells in the first data row.
For example, start with
data have;
do personid = 1 to 1000;
area = cats('area_',0 + floor(5 * ranuni(123)));
age = cats('age_',13 + floor(7 * ranuni(123)));
sex = cats('sex_',1 + floor( 2 * ranuni(123)));
q21x = byte(65+(5*ranuni(123)));
output;
end;
label area = 'Area Label';
run;
proc tabulate data=have;
class area age sex q21x;
table
( area age sex ) * (rowpctn=' '), q21x
/ nocellmerge;
run;
And the transposed data version
proc transpose data=have out=have_for_table;
by personid q21x notsorted;
var area age sex;
run;
proc tabulate data=have_for_table missing;
class _name_ _label_ col1 q21x;
table
_name_='' * _label_='' * col1=''
,
q21x * (rowpctn='')
/
nocellmerge
;
run;
I have this dataset (an made-up example, but in the same structure):
data have;
infile datalines delimiter=',';
length country city measure $50.;
input country $ city $ level measure $ mdate total;
informat mdate date9.;
format mdate date9.;
datalines;
England,London,1,Red doors opened,24MAR2014,4
England,London,1,Green doors opened,24MAR2014,6
England,London,2,Doors closed,24MAR2014,7
England,London,1,Red doors opened,25MAR2014,5
England,London,1,Blue doors opened,25MAR2014,4
England,London,1,Green doors opened,25MAR2014,3
England,London,2,Doors closed,25MAR2014,6
England,Manchester,1,Red doors opened,24MAR2014,3
England,Manchester,2,Doors closed,24MAR2014,1
England,Manchester,2,Doors closed,25MAR2014,4
Scotland,Glasgow,1,Red doors opened,24MAR2014,4
Scotland,Glasgow,1,Red doors opened,25MAR2014,3
Scotland,Glasgow,1,Green doors opened,25MAR2014,2
Scotland,Glasgow,2,Doors closed,25MAR2014,4
;;;;
run;
I want to output the 'doors opened' per country/city per day, then subtotal the doors opened, then output the doors closed, then subtract the doors opened from the doors closed to give a 'balance' (per country/city). At the end of each country, I want one line summing the balance (per day) for each country.
So the above would give something like:
Country + City + Measure + 24MAR2014 + 25MAR2014
---------+------------+--------------------+-----------+----------
England + London + Red doors opened + 4 + 5
+ + Green doors opened + 6 + 3
+ + Blue doors opened + . + 4
+ + TOTAL DOORS OPENED + 10 + 12
+ + Doors closed + 7 + 6
+ + BALANCE + -3 + -6
+ Manchester + Red doors opened + 3 + .
+ + TOTAL DOORS OPENED + 3 + .
+ + Doors closed + 1 + 4
+ + BALANCE + -2 + 4
+ ALL + BALANCE + -5 + -2
Scotland + Glasgow + Red doors opened + 4 + 3
+ + Green doors opened + . + 2
+ + TOTAL DOORS OPENED + 4 + 5
+ + Doors closed + . + 4
+ + BALANCE + -4 + -1
+ ALL + BALANCE + -4 + -1
I've deliberately left it so not every measure appears for each instance and the Doors Closed total is sometimes missing. The rows in CAPS are those I want to add with PROC REPORT, i.e. not in the original data.
I've got the basic layout using PROC REPORT, but don't really have an idea where to go to start inserting subtotals on demand. I've added a 'level' variable, to try and give me something to order/group on.
I need one country per output page and the rows kept in that order per grouping, i.e. XXX Doors Opened, TOTAL DOORS OPENED, Doors Closed, BALANCE, so I think maybe the extra columns are needed.
So far, this is what I have done:
proc report data=have out=proc;
by country;
columns city level measure mdate,total;
define city / group;
define level / group noprint;
define measure / group;
define mdate / across;
define total / analysis sum;
compute before level;
endcomp;
compute after level;
if level = 2 and break = '_level_' then do;
measure = 'TOTAL DOORS OPENED';
end;
endcomp;
run;
I know I should be able to do something using the level variable, so I've added some compute blocks before and after it and examined the output dataset. I've tried to add a value of 'TOTAL DOORS OPENED', but this isn't working.
To be honest, I've only just started using PROC REPORT, so this is a bit out of my comfort zone.
Thanks for any help. Please let me know if the question isn't clear.
Sometimes (often for my field of work) it is better to regard PROC REPORT as a fancy PROC PRINT and make your calculations in the dataset.
I would added a variable like TYPE denoting if the entry tells us about the open or closed doors then calculated the sums by contry/city/level/type/day; also I would duplicated all observations with level= 3 (meaning BALANCE in your table) and negated the measures where TYPE=closed then calculated the sums by country/city/day, they stacked the all results together in one dataset with proper keys and transposed with ID=day. PROC REPORT can take it from there. Do not trust COMPUTE blocks too much, they are often useful but hell to debug. Just make a dataset what appears as your desired table and throw it to REPORT.
I have the following sample data with I'm creating a crosstab for:
data have1;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout groups=2;
var stake;
ranks stakeRank;
run;
PROC TABULATE DATA=ranksout NOSEPS;
VAR stake;
class stakerank;
TABLE stakerank, stake*N;
TABLE stakerank, stake*(N Mean Skewness);
RUN;
I want to replicate what I'm doing in PROC TABULATE in PROC REPORT as I need to add p-values for a Difference in Means test and a few other things. However, it seems that Skewness is not a built-in function in Proc Report. How can I calculate this?
PROC REPORT DATA=ranksout NOWINDOWS;
COLUMN stakerank stake, (n mean);
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS '';
define n/format=8. ;
RUN;
Thanks for any help at all on this
It can be done as follows.
Adding an extra intermediate variable to the rankouts1 table:
proc sql;
create table withCubedDeviationsas
select *,
((stake - (select avg(stake) from ranksout1 where stakeRank = main.stakeRank and winnerRank = main.winnerRank))/(select std(stake) from ranksout1 where stakeRank = main.stakeRank and winnerRank = main.winnerRank)) **3 format=8.2 as cubeddeviations
from ranksout1 main;
quit;
PROC REPORT DATA=withCubedDeviationsNOWINDOWS out=report;
COLUMN stakerank winnerrank, ( N stake=avg cubeddeviations skewness);
DEFINE stakerank / GROUP ORDER=INTERNAL '';
DEFINE winnerrank / ACROSS ORDER=INTERNAL '';
DEFINE cubeddeviations / analysis 'SumCD' noprint;
DEFINE N / 'Bettors';
DEFINE avg / analysis mean 'Avg' format=8.2;
DEFINE skewness / computed format=8.2 'Skewness';
COMPUTE skewness;
_C5_ = _C4_ * (_C2_ / ((_C2_ -1) * (_C2_ - 2)));
_C9_ = _C8_ * (_C6_ / ((_C6_ -1) * (_C6_ - 2)));
ENDCOMP;
RUN;
Why didn't they just add Skewness to the list of statistics that are allowed in a PROC REPORT?
I have a PROC REPORT output and want to add an asterick based on the value of the cell being less than 1.96. I don't want colours, just an asterick after the number. Can this be done with a format, or do I need an 'IF/ELSE' clause in the COMPUTE block?
data have1;
input username $ betdate : datetime. stake winnings winner;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90 0
player1 04NOV2008:09:03:44 100 40 1
player2 07NOV2008:14:03:33 120 -120 0
player1 05NOV2008:09:00:00 50 15 1
player1 05NOV2008:09:05:00 30 5 1
player1 05NOV2008:09:00:05 20 10 1
player2 09NOV2008:10:05:10 10 -10 0
player2 09NOV2008:10:05:40 15 -15 0
player2 09NOV2008:10:05:45 15 -15 0
player2 09NOV2008:10:05:45 15 45 1
player2 15NOV2008:15:05:33 35 -35 0
player1 15NOV2008:15:05:33 35 15 1
player1 15NOV2008:15:05:33 35 15 1
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout1 groups=2;
var stake winner;
ranks stakeRank winnerRank;
run;
proc sql;
create table withCubedDeviations as
select *,
((stake - (select avg(stake) from ranksout1 where stakeRank = main.stakeRank and winnerRank = main.winnerRank))/(select std(stake) from ranksout1 where stakeRank = main.stakeRank and winnerRank = main.winnerRank)) **3 format=8.2 as cubeddeviations
from ranksout1 main;
quit;
PROC REPORT DATA=withCubedDeviations NOWINDOWS out=report;
COLUMN stakerank winnerrank, ( N stake=avg cubeddeviations skewness);
DEFINE stakerank / GROUP ORDER=INTERNAL '';
DEFINE winnerrank / ACROSS ORDER=INTERNAL '';
DEFINE cubeddeviations / analysis 'SumCD' noprint;
DEFINE N / 'Bettors';
DEFINE avg / analysis mean 'Avg' format=8.2;
DEFINE skewness / computed format=8.2 'Skewness';
COMPUTE skewness;
_C5_ = _C4_ * (_C2_ / ((_C2_ -1) * (_C2_ - 2)));
_C9_ = _C8_ * (_C6_ / ((_C6_ -1) * (_C6_ - 2)));
ENDCOMP;
RUN;
This is just an example, so this won't make statistical sense, but if the value for SKEWNESS is greater than 1 I need to put a single asterick, two asterix if it's greater than 5 and three asterix if the value is greater than ten. Also, if the asterix could be in superscript that would be even better.
I've been testing the following, but to no avail:
PROC FORMAT;
picture onestar . = " " low - high = "9.9999^{super *}";*^{super***};
picture twostar . = " " low - high = "9.9999^{super **}";*^{super***};
picture threestar . = " " low - high = "9.9999^{super ***}";*^{super***};
run;
PROC REPORT DATA=withCubedDeviations NOWINDOWS out=report;
COLUMN stakerank winnerrank, ( N stake=avg cubeddeviations);
DEFINE stakerank / GROUP ORDER=INTERNAL '';
DEFINE winnerrank / ACROSS ORDER=INTERNAL '';
DEFINE cubeddeviations / analysis 'SumCD' noprint;
DEFINE N / 'Bettors';
DEFINE avg / mean 'Avg' format=8.2;
compute avg;
if _C3_ > 1.96 then call define('_C3_','format','onestar.');
endcomp;
RUN;
Thanks for any help.
I think this will do what you need:
proc format;
picture skewaskf
-1 <-<0 = '00009.99' (mult=100 prefix='-')
0-<1 = '00009.99' (mult=100)
1-<5 = '00009.99*'(mult=100)
5-<10= '00009.99**'(mult=100)
10-high='00009.99***'(mult=100);
quit;
Extend for the negatives further.