how to average a computed column in SAS `proc report` - sas

In SAS proc report, a computed column is calculated row by row.
This applies to the summary lines too, but that is not always wat you want.
As an example, take this study of the Body Mass Index in SASHELP.CLASS:
title Study Body Mass Index (BMI) by sex in class;
title2 Erroneously calculate average BMI from the average weight and height;
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean noprint;
define weight / analysis mean noprint;
define kg / computed format = 6.2 'Weight*(kg)';
define m / computed format = 6.2 'Height*(meter)';
define BMI / computed format = 6.2 'BMI*(kg/m²)';
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
BMI = kg/m/m;
if name eq '' then name = 'mean';
endcomp;
break after sex /summarize;
run;
It is wrong, because the BMI is not in the summary, i.e. for mean, is not the mean of the above BMI's, it is calculated from the height and weight left of it.
This is a correct calculation, summing BMI's and counting students manually.
title2 manually : summing BMI and counting students;
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean noprint;
define weight / analysis mean noprint;
define kg / computed format = 6.2 'weight*(kg)';
define m / computed format = 6.2 'height*(meter)';
define BMI / computed format = 6.2 'body mass*(kg/m²)';
* initialize the sum and counter *;
compute before sex;
sumBMI = 0;
count = 0;
endcomp;
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
if name eq '' then do;
name = 'mean';
* use the sum and counter *;
BMI = sumBMI / count;
end;
else do;
BMI = kg/m/m;
* increase the sum and counter *
sumBMI = sumBMI + BMI;
count = count + 1;
end;
endcomp;
break after sex /summarize;
run;
Is there a way to let proc report itself do the averaging correctly?
You could say I want to do analysis on a computed column, but you can only define a column an analysis column if it is on the input dataset.

Create an alias column of an existing data set column.
Redo the BMI computation for the alias column.
In the summary line apply the alias column mean to the BMI column
In this example the column alias weight=bmiX is used.
proc report data=sasHelp.class nowindows headline headskip split='*'
style(summary) = {font_style=italic foreground=blue};
where Name contains 'J'; * reduce the size to facilitate manual calculation ;
columns sex name age height m weight kg weight=bmiX BMI ;
define sex / group ;
define age / analysis mean format = 6.2;
define height / analysis mean ;
define weight / analysis mean ;
define kg / computed format = 6.2 'Weight*(kg)';
define m / computed format = 6.2 'Height*(meter)';
define BMI / computed format = 6.2 'BMI*(kg/m²)';
* define bmiX / noprint;
compute m;
m = height.mean * .02540;
endcomp;
compute kg;
kg = weight.mean * 0.45359237;
endcomp;
compute BMI;
BMI = kg/m/m;
if name eq '' then do;
name = 'mean';
BMI = bmiX;
end;
endcomp;
compute bmiX;
bmiX = kg/m/m;
endcomp;
break after sex /summarize;
run;

Related

Create grid data in SAS using info from another dataset

I need to get a dataset of a uniform grid 20x20 using info from SASHELP.CARS so that x and y variables are obtained as follows:
do y = min(weight) to max(weight) by (min(weight)+max(weight))/20;
do x = min(horsepower) to max(horsepower) by (min(horsepower)+max(horsepower))/20;
output;
end;
end;
Weight and HorsePower are variables of SASHELP.CARS. Furthermore the grid dataset has to have two more columns EnginSizeMean LengthMean with the same value in each row that equals mean(EnginSize) and mean(Length) from SASHELP.CARS (need all this to build dependency graph for regression model).
First calculate the statistics you need to use.
proc summary data=sashelp.cars ;
var weight horsepower enginesize length ;
output out=stats
min(weight horsepower)=
max(weight horsepower)=
mean(enginesize length)=
/ autoname
;
run;
Then use those values to generate your "grid".
data want;
set stats;
do y = 1 to 20 ;
weight= weight_min + (y-1)*(weight_min+weight_max)/20;
do x = 1 to 20 ;
horsepower = horsepower_min + (x-1)*(horsepower_min+horsepower_max)/20;
output;
end;
end;
run;

Proc Report-Assigning a different format for different rows

I have a table that is current laid out in the way I want. The only issue is that when I went to assign the format, it carried over the format for all values. I have a row that should be total, but I'm unsure how to strip the formatting on this row only in proc report:
Output
Want: total line to show no decimals as they are count but the rest of the table to keep same format.
%let gray=CXBFBFBF;
%let blue=CX13478C;
%let purple=CXDEDDED;
title j=c h=10pt f='Calibri' color=black "Table 1-Distribution, CY2016-CY2018";
options orientation = landscape nonumber nodate leftmargin=0.05in rightmargin=0.05in;
ods noproctitle noresults escapechar='^';
ods rtf file = "path.rtf";
proc report data= work.temp nowd spanrows style(report)={width=100%}
style(header)=[vjust=b font_face = Calibri fontsize=9pt font_weight=bold background=&blue. foreground=white borderrightcolor=black];
/*List variables in order to select order of columns in table*/
col ( m_type
('^S={borderbottomcolor=&blue. vjust=b borderbottomwidth=0.02 }'('^S={borderbottomcolor=&blue. vjust=b borderbottomwidth=0.01 cellheight=0.20in}Age in Years' d_char_desc))
('^S={cellheight=0.20in}Missing Information'
('^S={borderbottomcolor=&blue. borderbottomwidth=0.02 cellheight=0.18in}' percentage16_1)
('^S={borderbottomcolor=&blue. borderbottomwidth=0.02 cellheight=0.18in}' percentage17_1)
('^S={borderbottomcolor=&blue. borderbottomwidth=0.02 cellheight=0.18in}' percentage18_1))
);
define m_type /order=data group noprint style = [vjust=b just=left cellwidth=0.60in font_face='Times New Roman' fontsize=9pt];
define d_char_desc / order=data display style = [vjust=b just=left cellwidth=0.60in font_face='Times New Roman' fontsize=9pt]
'' style(header)=[vjust=b just=left cellheight=0.18in] style(column)=[vjust=b just=left cellheight=0.35in cellwidth=0.60in];
define percentage16_1 /display style = [vjust=b just=center cellwidth=0.60in cellheight=0.05in font_face='Times New Roman' fontsize=9pt]
'CY2016' style(header)=[vjust=b just=center cellheight=0.18in] style(column)=[vjust=b just=center cellheight=0.20in cellwidth=0.40in];
define percentage17_1 /display style = [vjust=b just=center cellwidth=0.45in cellheight=0.05in font_face='Times New Roman' fontsize=9pt]
'CY2017' style(header)=[vjust=b just=center cellheight=0.18in] style(column)=[vjust=b just=center cellheight=0.20in cellwidth=0.40in];
define percentage18_1 /display style = [vjust=b just=center cellwidth=0.45in cellheight=0.05in font_face='Times New Roman' fontsize=9pt]
'CY2018' style(header)=[vjust=b just=center cellheight=0.18in] style(column)=[vjust=b just=center cellheight=0.20in cellwidth=0.40in];
compute m_type;
if m_type = 'm_tot' then
call define (_row_, 'style', 'style=[fontweight=bold background=&gray. font_face=Times]');
endcomp;
run;
ods rtf close;
You will have to explicitly format the numeric values in respective compute blocks. The numeric value referenced will depend on the analysis statistic and the syntax will be variable.statistic.
Your data (not shown) appears to be some form of pre-computed aggregation, based on the m_type = 'm_tot' source code. In that case the reference would be something like percentage16_1.sum (sum is the default analysis for numeric variables when there is a grouping specified)
Example:
Summarize some SASHELP.CARS variables and change the format for the Ford make.
proc report data=sashelp.cars;
column (
make
horsepower mpg_city mpg_highway
horsepower_custom
mpg_city_custom
mpg_highway_custom
);
define make / group ;*noprint;
define horsepower / analysis noprint mean ;
define mpg_city / analysis noprint mean ;
define mpg_highway / analysis noprint mean ;
define horsepower_custom / computed style=[textalign=right];
define mpg_city_custom / computed style=[textalign=right];
define mpg_highway_custom / computed style=[textalign=right];
compute horsepower_custom / character length=10;
if make = 'Ford'
then horsepower_custom = put (horsepower.mean, 10.4);
else horsepower_custom = put (horsepower.mean, 8.1);
endcomp;
compute mpg_city_custom / character length=10;
if make = 'Ford'
then mpg_city_custom = put (mpg_city.mean, 10.5);
else mpg_city_custom = put (mpg_city.mean, 8.2);
endcomp;
compute mpg_highway_custom / character length=10;
if make = 'Ford'
then mpg_highway_custom = put (mpg_highway.mean, 10.6);
else mpg_highway_custom = put (mpg_highway.mean, 8.3);
endcomp;
run;

Merge cells horizontally in RTF output using proc report

I am trying to create a summary row above each group in my data. I have 2 questions:
How do I merge the first 2 cells horizontally (the ones in red below) in the summary rows.
How do I remove the duplicated F and M in the Sex column (at the moment I can work around this by changing only those cell's text colours to white, but hopefully there's a better way)
The output is an RTF file, and I'm using SAS 9.4 - the desktop version.
Is this possible using proc report?
Code:
options missing=' ';
proc report data=sashelp.class nowd;
columns sex name age weight;
define sex / order;
break before sex / summarize;
run;
I don't think you can merge cells in the summarize line.
Some trickery with compute blocks and call define can alter the cell values and appearances.
For example (Just J names for smaller image):
proc report data=sashelp.class nowd;
where name =: 'J';
columns sex name age weight;
define sex / order;
define age / sum;
define weight / sum;
break before sex / summarize style=[verticalalign=bottom];
compute name;
* the specification of / order for sex sets up conditions in the name value
* that can be leveraged in the compute block;
if name = ' ' then do;
* a blank name means the current row the compute is acting on
* is the summarization row;
* uncomment if stat is not obvious or stated in title;
* name = 'SUM';
* 'hide' border for appearance of merged cell;
call define (1, 'style', 'style=[fontsize=18pt borderrightcolor=white]');
end;
else do;
* a non-blank name means one of the detail rows is being processed;
* blank out the value in the sex column of the detail rows;
* the value assignment can only be applied to current column or those
* to the left;
sex = ' ';
end;
endcomp;
compute after sex;
* if you want more visual separation add a blank line;
* line ' ';
endcomp;
run;

proc report: proportion of group sum

I have the following proc report
proc report data=sashelp.class;
col
sex
age
weight
;
define sex / group;
define age / group;
define weight / analysis sum;
run;
However I do not want to show the sum of weight. Instead I would like to have the proportion of the grouped sum. So first row should be 6.23%. How can I achieve this?
Now I have found a workaround:
proc sql noprint;
CREATE TABLE class AS
SELECT a.*
,b.sumweight
FROM sashelp.class a
LEFT JOIN (SELECT sex, sum(weight) as sumweight
FROM sashelp.class
GROUP BY sex
) b
ON a.sex=b.sex
;
quit;
proc report data=class;
col
sex
age
weight
sumweight
perc
;
define sex / group;
define age / group;
define weight / analysis sum;
define sumweight / analysis mean noprint;
define perc / computed format=percent6.2;
compute perc;
perc = weight.sum/sumweight.mean;
endcomp;
run;
But maybe there is a solution without additional proc sql step...

PROC Report, multiple columns with same statistic

I'm using PROC REPORT to generate a report of weighted sums. There are 2 columns that need to be summarized, both with the MEAN statistic. On top of that, I want to output the total weight.
I have 2 issues.
I cannot seem to get the title on each sum to reflect the variable
being summed.
I need a different format for each column.
Here is some sample data:
data test;
format lev1-lev3 $3. weight percent10.2 duration 6.2 convexity 6.4;
informat weight percent10.2 duration 6.2 convexity 6.4;
input lev1 lev2 lev3 weight duration convexity;
datalines;
A C H 16.11% 3.21 0.6182
A C I 3.83% 9.06 1.2244
A D J 7.67% 2.21 3.4010
A D K 16.90% 3.98 0.0303
B E L 2.68% 1.88 1.9515
B E M 16.68% 4.36 3.1851
B F N 20.79% 2.64 0.1145
B F O 15.34% 5.55 2.4408
;
run;
I've tried a number of ways to define things in PROC REPORT. Here is one of many:
proc report data=test nowd out=report;
column lev1 lev2 lev3 duration,(SUMWGT MEAN) convexity,(Mean);
weight weight;
define lev1 / group;
define lev2 / group;
define lev3 / group;
define duration / 'Duration' ;
define sumwgt / 'Weight' format=percent10.2;
define mean / '' format=6.2;
define convexity / 'Convexity';
*define mean / 'Convexity' format=6.4;
break before lev1 / summarize ;
break before lev2 / summarize ;
rbreak before / summarize;
run;
My ultimate goal would be something like:
Lev1 Lev2 Lev3 Weight Duration Convextiy
100.00% 3.88 1.3943
A 44.51% 3.83 0.9267
...
I've also played with PROC TABULATE but I am less of a fan of the tables it presents.
Example TABULATE mess:
PROC TABULATE DATA=WORK.test;
VAR duration convexity;
CLASS LEV1 / ORDER=UNFORMATTED MISSING;
CLASS LEV2 / ORDER=UNFORMATTED MISSING;
CLASS LEV3 / ORDER=UNFORMATTED MISSING;
TABLE
/* Row Dimension */
ALL={LABEL="+"}
LEV1*(
ALL={LABEL="+"}
LEV2*(
ALL={LABEL="+"}
LEV3 ) )
,
/* Column Dimension */
duration={LABEL="Weight"}*SumWgt={LABEL=""}*f=percent10.2
duration={LABEL="Duration"}*Mean={LABEL=""}*f=6.2
convexity={LABEL="Convexity"}*Mean={LABEL=""}*f=6.4;
WEIGHT weight;
RUN;
I think you'll have challenges getting exactly what you want from PROC REPORT. Maybe Cynthia#SAS could figure it out, I don't know, but getting the row headers right in particular will be extremely challenging.
I would suggest pre-processing the means (using PROC MEANS or similar) and then REPORTing that result. Very easy to do.
This may be close to what you want, for example:
proc means data=test;
class lev1 lev2 lev3;
var duration convexity;
weight weight;
types () lev1 lev1*lev2 lev1*lev2*lev3;
output out=test_out
sumwgt(duration)=sumwgt mean(duration)= mean(convexity)=;
run;
proc report data=test_out;
columns lev1-lev3 sumwgt duration convexity;
define lev1/order missing;
define lev2/order missing;
define lev3/order missing;
define sumwgt/display format=percent9.2;
define duration/display format=6.2;
define convexity/display format=6.4;
run;