Below is an example of the data in a SAS Dataset.
I am looking to create a report/transpose (Excel term, Pivot) the data to look like the 2nd image.
Could someone assist me on getting started?
Try Proc Tabulate:
PROC TABULATE DATA=WORK.HAVE;
VAR amount calculated members;
CLASS year_month / ORDER=UNFORMATTED MISSING;
CLASS category1 / ORDER=UNFORMATTED MISSING;
CLASS category2 / ORDER=UNFORMATTED MISSING;
TABLE /* Row Dimension */
year_month*
category1,
/* Column Dimension */
category2*(
amount*
Sum
calculated*
Sum
members*
Sum);
;
RUN;
The syntax is always confusing to me. Try using Enterprise Guide and the "Summary Tables" task to create the code for you!
Related
I need to create a two-way table without row and column total.
here is the code:
proc freq data = freq_table1 ;
table c * x / norow nocol nopercent ;
title "x";
run;
the output is:
I actually want:
Also, does anyone have an idea how to re-order the frequency table from 0-1 to 1-0?
Thank you very much!
JH
The best choice here is probably to use proc tabulate, as that gives you more control over the layout.
proc tabulate data = sashelp.class ;
class age height;
tables age,height*n;
run;
I'm trying to make a proc tabulate table with statistics like "mean" and "standard deviation." However, when I make the table, the statistics are listed on top of each other (shown in the attached picture), and I'd like them side-by-side to make the table shorter. Is there a way to do this? Thank you!
Yes there is: Everything you specify before the comma comes in the rows.
proc tabulate data=sashelp.bweight;
class Married MomEdLevel Boy;
var MomWtGain Weight;
table Married * MomEdLevel * (min p10 mean p90 max) , Boy * (MomWtGain Weight);
run;
I want to compute multiple sums on the same column based on some criteria. Here is a small example using the sashelp.cars dataset.
The code below somewhat achieves what I want to do in three (3) different ways, but there is always a small problem.
proc report data=sashelp.cars out=test2;
column make type,invoice type,msrp;
define make / group;
define type / across;
define invoice / analysis sum;
define msrp / analysis sum;
title "Report";
run;
proc print data=test2;
title "Out table for the report";
run;
proc summary data=test nway missing;
class make type;
var invoice msrp;
output out=sumTest(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=sumTest out=test3;
by make;
var invoice msrp;
id type;
run;
proc print data=test3;
title "Table using proc summary followed by proc transpose";
run;
proc sql undo_policy=none;
create table test4 as select
make,
sum(case when type='Sedan' then invoice else 0 end) as SedanInvoice,
sum(case when type='Wagon' then invoice else 0 end) as WagonInvoice,
sum(case when type='SUV' then invoice else 0 end) as SUVInvoice,
sum(case when type='Sedan' then msrp else 0 end) as Sedanmsrp,
sum(case when type='Wagon' then msrp else 0 end) as Wagonmsrp,
sum(case when type='SUV' then msrp else 0 end) as SUVmsrp
from sashelp.cars
group by make;
quit;
run;
proc print data=test4;
title "Table using SQL queries and CASE/WHEN to compute new columns";
run;
Here is the result I get when I run the presented code.
The first two tables represent the result and the out table of the report procedure. The problem I have with this approach is the column names produced by proc report. I would love to be able to define them myself, but I don't see how I can do this. It is important for further referencing.
The third table represent the result of the proc summary/proc transpose portion of the code. The problem I have with this approach is that Invoice and MSRP appears as rows in the table, instead of columns. For that reason, I think the proc report is better.
The last table represents the use of an SQL query. The result is exactly what I want, but the code is heavy. I have to do a lot of similar computation on my dataset and I believe this approach is cumbersome.
Could you help improve one of these methods ?
You can just use two PROC TRANSPOSE steps;
proc summary data=sashelp.cars nway missing;
where make=:'V';
class make type;
var invoice msrp;
output out=step1(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=step1 out=step2;
by make type ;
var invoice msrp;
run;
proc transpose data=step2 out=step3(drop=_name_);
by make;
id type _name_ ;
var col1 ;
run;
proc print data=step3;
title "Table using proc summary followed by 2 proc transpose steps";
run;
Results:
Sedan Sedan Wagon Wagon
Obs Make SUVInvoice SUVMSRP Invoice MSRP Invoice MSRP
1 Volkswagen $32,243 $35,515 $335,813 $364,020 $77,184 $84,195
2 Volvo $38,851 $41,250 $313,990 $333,240 $57,753 $61,280
Use Proc TABULATE. Very succinct expressions for specifying row and column dimensions defined by desired hierarchy of class variables.
The intersection of these dimensions is a cell and represents a combination of values that select the values for which a statistical measure is displayed in the cell.
In your case the SUM is sum of dollars, which might not make sense when the cell has more then one contributing value.
For example: Does it make sense to show the invoice sum for 11 Volkswagen Sedan's is $335,813 ?
Also note the 'inverted' hierarchy used to show the number of contributing values.
Example:
proc tabulate data=sashelp.cars;
class make type;
var invoice msrp;
table
make=''
,
type * invoice * sum=''*f=dollar9.
type * msrp * sum=''*f=dollar9. /* this is an adjacent dimension */
(invoice msrp) * type * n='' /* specify another adjacent dimension, with inverted hierarchy */
/
box = 'Make'
;
where make =: 'V';
run;
Output
I need to produce a report and used the PROC Tabulate in SAS.The Code I used produce the report with Sub_LOB, Group and Mat_Month and the totals column. With in the Mat_Month there are three sub-columns (Dec 16, Jan17 and Feb17).I wrote the code but it produce the columns in order like Dec 16,Feb17 and Jan17) which is not I wanted. Also, I need one empty row for the group named "CAROLINA GROUP" but the complete row disappears since there are now data in that row. Is there any way we can produce the sub columns in the same way I wanted. Also, is it possible to get the row though it has no values now but can have the values in the future.The code I used are as:
PROC Tabulate
DATA= T_Final_Summary Format=Comma12. ;
VAR Comm Net_Bal;
Class Mat_Month / ORDER=Unformatted MISSING;
Class Sub_LOB /ORDER=Unformatted MISSING;
Class Group /ORDER= Unformatted MISSING;
TABLE /*Row Dimension*/
Sub_LOB={LABEL= “ “} *
(Group={LABEL=” “}
ALL={LABEL=”Grand Total”})
ALL={LABEL=”Grand Total},
/*Column Dimension*/
Mat_Month *(
Comm={LABEL=”Count of Comm} *N={LABEL=” “}
Comm={LABEL=”Sum of Comm} *Sum={LABEL=” “}
Net_Bal={LABEL=”Count of Net Bal”}*N={LABEL=” “}
Net_Bal ={LABEL=”Sum of Net Bal”}*Sum={LABEL=”Sum of Net Bal”})
ALL={LABEL=”Grand Total}*(
Comm={LABEL=”Total Count of Comm} *N={LABEL=” “}
Comm={LABEL=”Total Sum of Comm} *Sum={LABEL=” “}
Net_Bal={LABEL=”Total Count of Net Bal”}*N={LABEL=” “}
Net_Bal ={LABEL=”Total Sum of Net Bal”}*Sum={LABEL=”Sum of Net Bal”})
/*Table Options*/
/BOX=(LABEL=”Sub Lob/Group”} Missing =”0”;
RUN;
Any help will be very much appreciated.
Regarding the order of the variable, it's sorting alphabetically. The variable in MAT_MONTH needs to be an actual SAS date to have it sort accordingly, which would mean numeric with a date format (MONYY5). You'll need to do the conversion before the PROC TABULATE step.
Then replace mat_month in your proc tabulate with the mat_month_date variable.
data want;
set have;
mat_month_date=input(have, anydtdte.);
format want monyy5.;
run;
I've been trying to make my code more efficient and this is the original code, but I think it can be written in one step.
data TABLE;set ORIGINAL_DATA;
Multi=percent*total_units;
keep Multi Type;
proc sort; by Type;
proc means noprint data=TABLE1; by Type; var Multi;output out=Table2(drop= _type_ _freq_)sum=Multi;run;
proc means noprint data=Table1; var Multi;output out=Table3(drop= _type_ _freq_) sum=total ;run;
proc sql;
create table TABLE4as
select a.Type, a.Multi label="Multi", b.total label="total"
from TABLE2 a, TABLE3 b
order by Type;
quit;
data TABLE5;set TABLE4;
pct=(MULTI/total)*100;
run;
I am able to split up part of it, but I can't figure out how to get the PCT part in my code. This is what I have.
proc sql;
create table TABLE1 as
select distinct type, sum(percent*total_units) as MULTI label "MULTI",
MULTI/(percent*total_units)) as PCT
from ORIGINAL_DATA
group by type;
quit;
I had to edit some of the code but I think the general idea should make sense.
The main problem is I cannot call upon the MULTI column because it is just being created but I want to create a percentage of the total for each type.
The "SAS" way to do something like this is to use a CLASS statement with PROC MEANS. That will calculate statistics on all the interaction levels in the data (identified by the TYPE variable). The row where TYPE=0 will be the "total" value, representing the value of that statistic for the entire data set.
In your case, we can take advantage of the fact that PROC MEANS will create the output data set sorted by TYPE and by the variables listed in the CLASS statement. That means we can just read the first observation and save it's value for calculating percentages.
It's probably easier to just show some code:
data TABLE;
set ORIGINAL_DATA;
Multi = percent * total_units;
keep Multi Type;
run;
proc means noprint data=TABLE;
class Type;
var multi;
output out=next sum=;
run;
data want;
retain total;
set next;
if _n_ = 1 then do;
/* The first obs will be the _TYPE_=0 record */
total = multi;
delete;
end;
pct = (multi / total) * 100;
drop total _freq_ _type_;
run;
Notice that you do not need to sort the data before using PROC MEANS. That's because we are using a CLASS statement rather than a BY statement. The data step is using the first observation in the data set created by MEANS (the TYPE=0 record) to retain the total sum of your variable. The delete statement keeps it out of the result.
CLASS statements with PROC MEANS are very useful. Take a few minutes to read up on how the TYPE variable is calculated, especially if you try using more than one class variable.
You can skip the initial data step by using the WEIGHT option in VAR statement of PROC MEANS (this will effectively do the multiplication for you). You can also use PROC TABULATE instead of PROC MEANS, as tabulate can calculate the percentage. I believe the following code will produce your required output in one go.
ods noresults;
proc tabulate data=have out=want (drop=_: rename=(total_units_sum=total total_units_pctsum_0=pct));
class type;
var total_units / weight=percent;
table type, total_units*(sum pctsum);
run;
ods results;
If you need one step, maybe this will work, but it's not actually efficient, since it processes data twice, once for detail by TYPE, once for total.
proc sql;
create table TABLE1 as
select
d.type
, sum(d.percent*d.total_units) as MULTI label "MULTI"
, calculated MULTI/s.total as PCT
from ORIGINAL_DATA d,
( select sum(percent*total_units) as total
from ORIGINAL_DATA) s
group by type
;
quit;
For more efficiency, but in more than one steps you could simply replace tables withe views in your original code:
data TABLE; => data TABLE / view=TABLE;
create table TABLE4 => create view TABLE4