How do I stack many frequency tables in SAS - sas

I have a dataset of about 800 observations. I want to get the frequency of 14 variables. I want to get the frequency of these variable by shape (an example). There are 3 different shapes.
An example of doing this one time would obviously be:
proc freq; tables color; by shape;run;
However, I do not want 42 frequency tables. I want one frequency table that has the list of 14 variables on the left side. The top heading will have shape1 shape2 shape3 with the frequencies of each variable underneath them.
It would look like I transposed the data sets by percentage and then stacked them on top of each other.
I have several sets of combinations where I need to do this. I have about 5 different groups of variables and I need to make tables using 3 different by groups (necessitating about 15 tables). The first example I discussed is one example of such groups.
Any help would be appreciated!

Using proc means and proc transpose. I give you some example. You can add more categories.
proc means data=sashelp.class nway n;
class sex age;
output out=class(drop=_freq_ _type_) n=freq;
run;
proc transpose data=class out=class(drop=_name_) prefix=AGE;
by sex;
var freq;
id age;
run;
data class_sum;
set class;
array a(*) age:;
age_sum = sum(of age:);
do i = 1 to dim(a);
a(i) = a(i) / age_sum;
end;
drop i;
run;

Related

How to rename total count across class variable in Proc Means

I'm doing a simple count of occurrences of a by-variable within a class variable, but cannot find a way to rename the total count across class variables. At the moment, the output dataset includes counts for all cluster2 within each group as well as the total count across all groups (i.e. the class variable used). However, the counts within classes are named, while the total is shown by an empty string.
Code:
proc means data=seeds noprint;
class group;
by cluster2;
id label2;
output out=seeds_counts (drop= _type_ _freq_) n(id)=count;
run;
Example of output file:
cluster2 group label2 count
7 area 1 20
7 sa area 1 15
7 sb area 1 5
15 area 15 42
15 sa area 15 18
....
Naturally, renaming the emtpy string to "Total" could be accomplished in a separate datastep, but I would like to do it directly in the Proc Means-step. It should be simple and trivial, but I haven't found a way so far. Afterwards, I want to transpose the dataset, which means that the emtpy string has to be changed, or it will be dropped in the proc transpose.
I don't know of a way to do it directly, but you can sort-of-cheat: you can tell SAS to show "Total" instead of missing.
proc format;
value $MissTotalF
' ' = 'Total'
other = [$CHAR12.];
quit;
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _type_ _freq_) n(age)=count;
format sex $MissTotalF.;
run;
For example. I'd also recommend using PROC TABULATE instead of PROC MEANS if you're just going for counts, though in this case it doesn't really make much difference.
The problem here is that if the variable in the class statement is numeric, then the resultant column will be numeric, therefore you can't add the word Total (unless you use a format, similar to the answer from #Joe). This will be why the value is missing, as the class variable can be either numeric or character.
Here's an example of a numeric class variable.
proc sort data=sashelp.class out=class;
by sex;
run;
proc means data=class noprint;
class age;
by sex;
output out=class_counts (drop= _:) n=count;
run;
Using proc tabulate can display the result pretty much how you want it, however the output dataset will have the same missing values, so won't really help. Here's a couple of examples.
proc tabulate data=class out=class_tabulate1 (drop=_:);
class sex age;
table sex*(age all='Total'),n='';
run;
proc tabulate data=class out=class_tabulate2 (drop=_:);
class sex age;
table sex,age*n='' all='Total';
run;
I think the best option to achieve your final goal is to add the nway option to proc means, which will remove the subtotals, then transpose the data and finally write a data step that creates the Total column by summing each row. It's 3 steps, but doesn't involve much coding.
Here is one method you could use by taking advantage of the _TYPE_ variable so that you can process the totals and details separately. You will still have trouble with PROC TRANSPOSE if there is a class with missing values (separate from the overall summary record).
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _freq_ ) n(age)=count;
run;
proc transpose data=sex_counts out=transpose prefix=count_ ;
where _type_=1 ;
id sex ;
var count;
run;
data transpose ;
merge transpose sex_counts(where=(_type_=0) keep=_type_ count);
rename count=count_Total;
drop _type_;
run;

How can I create pivot table in SAS?

I have three columns in a dataset: spend, age_bucket, and multiplier. The data looks something like...
spend age_bucket multiplier
10 18-24 2x
120 18-24 2x
1 35-54 3x
I'd like a dataset with the columns as the age buckets, the rows as the multipliers, and the entries as the sum (or other aggregate function) of the spend column. Is there a proc to do this? Can I accomplish it easily using proc SQL?
There are a few ways to do this.
data have;
input spend age_bucket $ multiplier $;
datalines;
10 18-24 2x
120 18-24 2x
1 35-54 3x
10 35-54 2x
;
proc summary data=have;
var spend;
class age_bucket multiplier;
output out=temp sum=;
run;
First you can use PROC SUMMARY to calculate the aggregation, sum in this case, for the variable in question. The CLASS statement gives you things to sum by. This will calculate the N-Way sums and the output data set will contain them all. Run the code and look at data set temp.
Next you can use PROC TRANSPOSE to pivot the table. We need to use a BY statement so a PROC SORT is necessary. I also filter to the aggregations you care about.
proc sort data=temp(where=(_type_=3));
by multiplier;
run;
proc transpose data=temp out=want(drop=_name_);
by multiplier;
var spend;
id age_bucket;
idlabel age_bucket;
run;
In traditional mode 35-54 is not a valid SAS variable name. SAS will convert your columns to proper names. The label on the variable will retain the original value. Just be aware if you need to reference the variable later, the name has changed to be valid.

SAS Find Top Combinations in Dataset

Hell everyone --
I have some sales data which looks like this:
data have;
input order_id item $;
cards;
1 A
1 B
2 A
2 C
3 B
4 A
4 B
;
run;
What I'm trying to find out is what are the most popular combinations of items ordered. For example in the above case, there were 2 orders that contained items A&B, 1 order of A&C, and 1 order of B. What would be the best way to output the different combinations along with the numbers of orders placed?
It seems there is no permutation issue, you could try this:
proc sort data=have;
by order_id item;
run;
data temp;
set have;
by order_id;
retain comb;
length comb $4;
comb=cats(comb,item);
if last.order_id then do;
output;
call missing(comb);
end;
run;
proc freq data=temp;
table comb/norow nopercent nocol nocum;
run;
There are many possible approaches to this problem, and I would not presume to say which is the best. Here's a fairly simple method you could use:
Transpose your data so that you only have 1 row for each order, with an indicator variable for each product.
Feed the transposed dataset into proc corr to produce a correlation matrix for the indicator variables, and look for the strongest correlations.

Using Tabulate for 3-way table

I am trying to output a three way frequency table. I am able to do this (roughly) with proc freq, but would like the control for variable to be joined. I thought proc tabulate would be a good way to customize the output. Basically I want to fill in the cells with frequency, and then customize the percents at a later time. So, have count and column percent in each cell. Is that doable with proc tabulate?
Right now I have:
proc freq data=have;
table group*age*level / norow nopercent;
run;
that gives me e.g.:
What I want:
Here is the code I am using:
proc tabulate data=ex1;
class age level group;
var age;
table age='Age Category',
mean=' '*group=''*level=''*F=10./ RTS=13.;
run;
Thanks!
You can certainly get close to that. You can't really get in 'one' cell, it needs to write each thing out to a different cell, but theoretically with some complex formatting (probably using CSS) you could remove the borders.
You can't use VAR and CLASS together, but since you're just doing percents, you don't need to use MEAN - you should just use N and COLPCTN. If you're dealing with already summarized data, you may need to do this differently - if so then post an example of your dataset (but that wouldn't work in PROC FREQ either without a FREQ statement).
data have;
do _t = 1 to 100;
age = ceil(3*rand('Uniform'));
group = floor(2*rand('Uniform'));
level = floor(5*rand('Uniform'));
output;
end;
drop _t;
run;
proc tabulate data=have;
class age level group;
table age='Age Category',
group=''*level=''*(n='n' colpctn='p')*F=10./ RTS=13.;
run;
This puts N and P (n and column %) in separate adjacent cells inside a single level.

Is there a way to name proc rank groups based on values within the group?

So I have multiple continuous variables that I have used proc rank to divide into 10 groups, ie for each observation there is now a "GPA" and a "GRP_GPA" value, ditto for Hmwrk_Hrs and GRP_Hmwrk_Hrs. But for each of the new group columns the values are between 1 - 10. Is there a way to change that value so that rather than 1 for instance it would be 1.2-2.8 if those were the min and max values within the group? I know I can do it by hand using proc format or if then or case in sql but since I have something like 40 different columns that would be very time intensive.
It's not clear from your question if you want to store the min-max values or just format the rank columns with them. My solution below formats the rank column and utilises the ability of SAS to create formats from a dataset. I've obviously only used 1 variable to rank, for your data it will be a simple matter to wrap a macro around the code and run for each of your 40 or so variables. Hope this helps.
/* create ranked dataset */
proc rank data=sashelp.steel groups=10 out=want;
var steel;
ranks steel_rank;
run;
/* calculate minimum and maximum values per rank */
proc summary data=want nway;
class steel_rank;
var steel;
output out=want_min_max (drop=_:) min= max= / autoname;
run;
/* create dataset with formatted values */
data steel_rank_fmt;
set want_min_max (rename=(steel_rank=start));
retain fmtname 'stl_fmt' type 'N';
label=catx('-',steel_min,steel_max);
run;
/* create format from previous dataset */
proc format cntlin=steel_rank_fmt;
run;
/* apply formatted value to rank column */
proc datasets lib=work nodetails nolist;
modify want;
format steel_rank stl_fmt10.;
quit;
In addition to Keith's good answer, you can also do the following:
proc rank data = sashelp.cars groups = 10 out = test;
var enginesize;
ranks es;
run;
proc sql ;
select *, catx('-',min(enginesize), max(enginesize)) as esrange, es from test
group by es
order by make, model
;
quit;