I am trying to learn SAS and specifically PROC REPORT. I am using SASHELP.CARS dataset.
What I want to achieve in the 6th column of the output, labelled as 'Number of Cars > Mean(Invoice)' to compute number of cars whose Invoice is greater than the Group's mean of Invoice. I am using the code below.
PROC REPORT DATA=sashelp.CARS NOWD OUT=learning.MyFirstReport;
COLUMNS Type Origin INVOICE=Max_INVOICE INVOICE=Mean_Invoice
INVOICE=Count_Invoice TEST DriveTrain;
DEFINE Type / Group 'Type of Car' CENTER;
DEFINE Origin / Group 'Origin of Car' CENTER;
DEFINE Max_Invoice / ANALYSIS MAX 'Max of Invoice';
DEFINE Mean_Invoice / ANALYSIS MEAN 'Mean of Invoice';
DEFINE Count_Invoice / ANALYSIS N FORMAT=5.0 'Total Number of Cars' center;
DEFINE DriveTrain / ACROSS 'Type of DriveTrain of Car';
DEFINE TEST / COMPUTED 'Number of Cars > Mean(Invoice)' center;
COMPUTE TEST;
TEST=N(_c7_>Mean_Invoice);
ENDCOMP;
RUN;
The Output that I am getting is in the image below.
Output of the above SAS code
I don't think that is the correct output since all the rows in the column show a value of 1. How do I get the desired output in the 6th column of the output?
The non group columns are being defined analysis for computing aggregate statistics. One way to achieve a count of a logical evaluation is to prep the data so that a SUM aggregation of an individual flag (0 or 1) is the count of positive assertions.
Prepare
proc sql;
create view cars_v as
select *
, mean(invoice) as invoice_mean_over_type_origin
, (invoice > calculated invoice_mean_over_type_origin) as flag_50
from sashelp.cars
group by type, origin
;
Report
PROC REPORT DATA=CARS_V OUT=work.MyFirstReport;
COLUMNS
Type
Origin
INVOICE/*=Max_INVOICE */
INVOICE=INVOICE_use_2/*=Mean_Invoice */
flag_50
flag_50=flag_50_use_2
flag_50_other
DriveTrain
;
DEFINE Type / Group 'Type of Car' CENTER;
DEFINE Origin / Group 'Origin of Car' CENTER;
DEFINE Invoice / ANALYSIS MAX 'Max of Invoice';
DEFINE Invoice_use_2 / ANALYSIS MEAN 'Mean of Invoice';
DEFINE flag_50 / analysis sum 'Number of Cars > Mean ( Invoice )' center;
DEFINE flag_50_use_2 / noprint analysis N ;
* noprint makes a hidden column whose value is available to compute blocks;
DEFINE flag_50_other / computed 'Number of Cars <= Mean ( Invoice )' center;
DEFINE DriveTrain / ACROSS 'Type of DriveTrain of Car';
compute flag_50_other;
flag_50_other = flag_50_use_2 - flag_50.sum;
endcomp;
RUN;
In newer versions of SAS NOWD is a default option. New Proc REPORT code does not need to specified it explicitly.
Reusing a variable such as invoice=mean_invoice is ok, but a future reader of the code might have some misunderstanding when seeing the DEFINE Mean_Invoice / ANALYSIS MEAN 'Mean of Invoice'; line of code -- is the define for the mean or the mean of a mean
?
Related
I want to compute multiple sums on the same column based on some criteria. Here is a small example using the sashelp.cars dataset.
The code below somewhat achieves what I want to do in three (3) different ways, but there is always a small problem.
proc report data=sashelp.cars out=test2;
column make type,invoice type,msrp;
define make / group;
define type / across;
define invoice / analysis sum;
define msrp / analysis sum;
title "Report";
run;
proc print data=test2;
title "Out table for the report";
run;
proc summary data=test nway missing;
class make type;
var invoice msrp;
output out=sumTest(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=sumTest out=test3;
by make;
var invoice msrp;
id type;
run;
proc print data=test3;
title "Table using proc summary followed by proc transpose";
run;
proc sql undo_policy=none;
create table test4 as select
make,
sum(case when type='Sedan' then invoice else 0 end) as SedanInvoice,
sum(case when type='Wagon' then invoice else 0 end) as WagonInvoice,
sum(case when type='SUV' then invoice else 0 end) as SUVInvoice,
sum(case when type='Sedan' then msrp else 0 end) as Sedanmsrp,
sum(case when type='Wagon' then msrp else 0 end) as Wagonmsrp,
sum(case when type='SUV' then msrp else 0 end) as SUVmsrp
from sashelp.cars
group by make;
quit;
run;
proc print data=test4;
title "Table using SQL queries and CASE/WHEN to compute new columns";
run;
Here is the result I get when I run the presented code.
The first two tables represent the result and the out table of the report procedure. The problem I have with this approach is the column names produced by proc report. I would love to be able to define them myself, but I don't see how I can do this. It is important for further referencing.
The third table represent the result of the proc summary/proc transpose portion of the code. The problem I have with this approach is that Invoice and MSRP appears as rows in the table, instead of columns. For that reason, I think the proc report is better.
The last table represents the use of an SQL query. The result is exactly what I want, but the code is heavy. I have to do a lot of similar computation on my dataset and I believe this approach is cumbersome.
Could you help improve one of these methods ?
You can just use two PROC TRANSPOSE steps;
proc summary data=sashelp.cars nway missing;
where make=:'V';
class make type;
var invoice msrp;
output out=step1(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=step1 out=step2;
by make type ;
var invoice msrp;
run;
proc transpose data=step2 out=step3(drop=_name_);
by make;
id type _name_ ;
var col1 ;
run;
proc print data=step3;
title "Table using proc summary followed by 2 proc transpose steps";
run;
Results:
Sedan Sedan Wagon Wagon
Obs Make SUVInvoice SUVMSRP Invoice MSRP Invoice MSRP
1 Volkswagen $32,243 $35,515 $335,813 $364,020 $77,184 $84,195
2 Volvo $38,851 $41,250 $313,990 $333,240 $57,753 $61,280
Use Proc TABULATE. Very succinct expressions for specifying row and column dimensions defined by desired hierarchy of class variables.
The intersection of these dimensions is a cell and represents a combination of values that select the values for which a statistical measure is displayed in the cell.
In your case the SUM is sum of dollars, which might not make sense when the cell has more then one contributing value.
For example: Does it make sense to show the invoice sum for 11 Volkswagen Sedan's is $335,813 ?
Also note the 'inverted' hierarchy used to show the number of contributing values.
Example:
proc tabulate data=sashelp.cars;
class make type;
var invoice msrp;
table
make=''
,
type * invoice * sum=''*f=dollar9.
type * msrp * sum=''*f=dollar9. /* this is an adjacent dimension */
(invoice msrp) * type * n='' /* specify another adjacent dimension, with inverted hierarchy */
/
box = 'Make'
;
where make =: 'V';
run;
Output
I need to produce a report and used the PROC Tabulate in SAS.The Code I used produce the report with Sub_LOB, Group and Mat_Month and the totals column. With in the Mat_Month there are three sub-columns (Dec 16, Jan17 and Feb17).I wrote the code but it produce the columns in order like Dec 16,Feb17 and Jan17) which is not I wanted. Also, I need one empty row for the group named "CAROLINA GROUP" but the complete row disappears since there are now data in that row. Is there any way we can produce the sub columns in the same way I wanted. Also, is it possible to get the row though it has no values now but can have the values in the future.The code I used are as:
PROC Tabulate
DATA= T_Final_Summary Format=Comma12. ;
VAR Comm Net_Bal;
Class Mat_Month / ORDER=Unformatted MISSING;
Class Sub_LOB /ORDER=Unformatted MISSING;
Class Group /ORDER= Unformatted MISSING;
TABLE /*Row Dimension*/
Sub_LOB={LABEL= “ “} *
(Group={LABEL=” “}
ALL={LABEL=”Grand Total”})
ALL={LABEL=”Grand Total},
/*Column Dimension*/
Mat_Month *(
Comm={LABEL=”Count of Comm} *N={LABEL=” “}
Comm={LABEL=”Sum of Comm} *Sum={LABEL=” “}
Net_Bal={LABEL=”Count of Net Bal”}*N={LABEL=” “}
Net_Bal ={LABEL=”Sum of Net Bal”}*Sum={LABEL=”Sum of Net Bal”})
ALL={LABEL=”Grand Total}*(
Comm={LABEL=”Total Count of Comm} *N={LABEL=” “}
Comm={LABEL=”Total Sum of Comm} *Sum={LABEL=” “}
Net_Bal={LABEL=”Total Count of Net Bal”}*N={LABEL=” “}
Net_Bal ={LABEL=”Total Sum of Net Bal”}*Sum={LABEL=”Sum of Net Bal”})
/*Table Options*/
/BOX=(LABEL=”Sub Lob/Group”} Missing =”0”;
RUN;
Any help will be very much appreciated.
Regarding the order of the variable, it's sorting alphabetically. The variable in MAT_MONTH needs to be an actual SAS date to have it sort accordingly, which would mean numeric with a date format (MONYY5). You'll need to do the conversion before the PROC TABULATE step.
Then replace mat_month in your proc tabulate with the mat_month_date variable.
data want;
set have;
mat_month_date=input(have, anydtdte.);
format want monyy5.;
run;
I have a mixed model with the following parameters:
A slope and intercept term for group 1
A different slope and intercept term for group 2
A random effect which is indexed by group/subject within group
Is there a way to model this using proc mixed? I can't seem to figure out how to get different slopes/intercepts for the two groups.
This shows a simple model with separate intercept and slope. First BY GROUP then with GROUPS as a factor, and pooled estimate of error. Maybe if you should some example data we can figure the RANDOM part.
data group;
do group=1,2;
do x = 1 to 10;
y = rannor(1);
output;
end;
end;
Run;
ods select SolutionF;
proc mixed;
by group;
model y = x / solution;
run;
ods select SolutionF;
proc mixed;
class group;
model y = group x(group) / noint solution;
run;
I am creating a report using proc report. My syntax runs fine but it doesnot shows the results of R break & After break in the output report. Thanks in advance
ods pdf file = "D:\New folder (2)\Assignment\Case_Study_1\Detail_Report.pdf";
proc report data = Cs1.Detailed_Report headline nowd ls = 256 ps = 765;
Title 'Olympic Pipeline (LONDON) - by Probability As of 17th November 2012';
column Probability Account_Name Opportunity_Owner Last_Modified_Date Total_Media_Value Digital_Total_Media_Value Deal_Comments;
where Probability > 0;
define Probability/group Descending 'Probability';
define Account_Name/order 'Client';
define Opportunity_Owner/order 'Champ';
define Last_Modified_Date/order format = MMDDYY. 'Modified';
define Total_Media_Value/order format = dollar25. 'Tot_Budget';
define Digital_Total_Media_Value/order format = dollar25. 'Digital_Bugt';
define Deal_Comments/order 'Deal_Comments';
break after Probability/ summarize suppress ol ul;
rbreak after / summarize ol ul;
run;
ods listing close;
ods pdf close;
Your main problem is that you don't have anything for the summarization to do. All of your columns are "ORDER" columns, which is probably not what you want. This is a common confusion in PROC REPORT; ORDER actually can be used in two different ways.
ORDER column type (vs. ANALYSIS, GROUP, ACROSS, COMPUTED, etc.)
ORDER= instruction for how to order data in a column (ORDER=DATA, ORDER=FORMATTED, etc.)
You can instruct SAS how to order a column without having to make it an ORDER column (which is basically similar to GROUP except it doesn't condense extra copies of a value if there are more than one).
If you want RBREAK or BREAK to do anything, you need to have an ANALYSIS variable(s); those are the variables that you want summaries (and other math) to work on.
Here is an example of this working correctly, with analysis variables. You need to tell SAS what to do, also, when summarizing them; mean, sum, etc., depending on what your desired result is.
ods pdf file = "c:\temp\test.pdf";
proc report data = sashelp.cars headline nowd ls = 256 ps = 765;
column cylinders make model invoice mpg_highway mpg_city;
where cylinders > 6;
define cylinders/group Descending;
define make/order;
define model/order;
define invoice/analysis sum;
define mpg_highway/analysis mean;
define mpg_city/analysis mean;
break after cylinders/ summarize suppress ol ul;
rbreak after / summarize ol ul;
run;
ods pdf close;
The following code will give a TYPE count for each MAKE group
proc report data=sashelp.cars nowd;
column make type;
define make / group;
define type / across;
run;
How can a format be applied to the across columns created?
In the below code displaying a count of the ACROSS variable is assumed. However, it can be made explicit by using a comma, in the COLUMN statement after the ACROSS variable. The N column can then be formatted in a DEFINE statement.
proc report data=sashelp.cars nowd;
column make type,n;
define make / group;
define type / across;
define n / '' format=comma10.1;
run;
When there will be multiple across columns, formatting the columns uniquely can be accomplished in a COMPUTE block. In order to review how the columns will look, use an OUT= statement on the PROC REPORT line to generate a data set. Including a MISSING= option can replace the missing dots with zeros. Art Carpenter's book is an excellent guide to Proc Report...and where I got this tip.
Options missing=0;
proc report data=sashelp.cars nowd out=work.report;
column make type,n;
define make / group;
define type / across;
define n / '';
compute n;
call define('_c4_','format','dollar10.');
endcomp;
run;
Anytime a reference to an absolute column (ex. ____c4____) is used there is the potential for an error when that column does not exist. Creating a user format and using PRELOADFMT on the DEFINE statement for that ACROSS variable will force all format values to appear and guarantee that ____c4____ exists. See this question for more info.
options missing=.;
Proc format;
value $type
'Hybrid'='Hybrid' 'SUV'='SUV' 'Sedan'='Sedan'
'Sports'='Sports' 'Truck'='Truck' 'Wagon'='Wagon';
Run;
Proc Report data=sashelp.cars(where=(make='Buick')) nowd;
column make type,n;
define make / group;
define type / across format=$type. preloadfmt;
define n / '';
compute n;
call define('_c4_','format','dollar10.');
endcomp;
run;
One further edit, that a co-worker showed me, by "blanking" all labels on the define statements, the empty space below the across variables can be removed. In this example, since the group variable (MAKE) now has no label, it needs it's label in the column statement.
options missing=.;
Proc format;
value $type
'Hybrid'='Hybrid' 'SUV'='SUV' 'Sedan'='Sedan'
'Sports'='Sports' 'Truck'='Truck' 'Wagon'='Wagon';
Run;
proc report data=sashelp.cars(where=(make='Buick')) nowd;
column ('Make' make) type,n;
define make / '' group;
define type / '' across format=$type. preloadfmt;
define n / '';
compute n;
call define('_c4_','format','dollar10.');
endcomp;
run;