Weight in Proc TABULATE Has No Effect - sas

I have a table of frequencies that I'm feeding into a Proc Tabulate step. The data come with a weight variable, and I want to include the weighted results in the generated table. Whether I use the weight variable in the VAR or the WEIGHT option, it has no effect on the output table. I've also tried using the weight variable in the TABLE statements for the analysis variables, but again, no effect.
PROC FORMAT; PICTURE PCTF (ROUND) OTHER='009.9%'; RUN;
ODS HTML PATH="%SYSFUNC(GETOPTION(WORK) )" STYLE=JOURNAL1A;
PROC TABULATE DATA = CHSS2017_s1 f=10.2 S=[just=c cellwidth=75];
CLASS AGE SEX Q21;
CLASSLEV AGE / style=[font_weight=medium];
CLASSLEV SEX / style=[font_weight=medium];
CLASSLEV Q21;
WEIGHT REGIONWT ;
*VAR REGIONWT ;
TABLE ALL = 'Greater Cincinnati Residents' * (ROWPCTN=' '*f=PCTF.)
AGE = 'Age' * (ROWPCTN=' '*f=PCTF.)
SEX * (ROWPCTN=' '*f=PCTF.)
, Q21;
RUN;
The expected result should be a proc tabulate output with values that reflect the weight variable, 'REGIONWT'

From my reading of the docs, in PROC TABULATE the WEIGHT statement specifies weights for analysis variables, i.e. variables listed on a VAR statement.
You don't have any analysis variables, you only have class variables.
You might want to look into the FREQ statement as it will impact counts and %, but note that it will treat all the weights as integers.

Related

How to sum distinct subsets of rows into distinct new columns in SAS?

I want to compute multiple sums on the same column based on some criteria. Here is a small example using the sashelp.cars dataset.
The code below somewhat achieves what I want to do in three (3) different ways, but there is always a small problem.
proc report data=sashelp.cars out=test2;
column make type,invoice type,msrp;
define make / group;
define type / across;
define invoice / analysis sum;
define msrp / analysis sum;
title "Report";
run;
proc print data=test2;
title "Out table for the report";
run;
proc summary data=test nway missing;
class make type;
var invoice msrp;
output out=sumTest(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=sumTest out=test3;
by make;
var invoice msrp;
id type;
run;
proc print data=test3;
title "Table using proc summary followed by proc transpose";
run;
proc sql undo_policy=none;
create table test4 as select
make,
sum(case when type='Sedan' then invoice else 0 end) as SedanInvoice,
sum(case when type='Wagon' then invoice else 0 end) as WagonInvoice,
sum(case when type='SUV' then invoice else 0 end) as SUVInvoice,
sum(case when type='Sedan' then msrp else 0 end) as Sedanmsrp,
sum(case when type='Wagon' then msrp else 0 end) as Wagonmsrp,
sum(case when type='SUV' then msrp else 0 end) as SUVmsrp
from sashelp.cars
group by make;
quit;
run;
proc print data=test4;
title "Table using SQL queries and CASE/WHEN to compute new columns";
run;
Here is the result I get when I run the presented code.
The first two tables represent the result and the out table of the report procedure. The problem I have with this approach is the column names produced by proc report. I would love to be able to define them myself, but I don't see how I can do this. It is important for further referencing.
The third table represent the result of the proc summary/proc transpose portion of the code. The problem I have with this approach is that Invoice and MSRP appears as rows in the table, instead of columns. For that reason, I think the proc report is better.
The last table represents the use of an SQL query. The result is exactly what I want, but the code is heavy. I have to do a lot of similar computation on my dataset and I believe this approach is cumbersome.
Could you help improve one of these methods ?
You can just use two PROC TRANSPOSE steps;
proc summary data=sashelp.cars nway missing;
where make=:'V';
class make type;
var invoice msrp;
output out=step1(drop= _Freq_ _TYPE_) sum=;
run;
proc transpose data=step1 out=step2;
by make type ;
var invoice msrp;
run;
proc transpose data=step2 out=step3(drop=_name_);
by make;
id type _name_ ;
var col1 ;
run;
proc print data=step3;
title "Table using proc summary followed by 2 proc transpose steps";
run;
Results:
Sedan Sedan Wagon Wagon
Obs Make SUVInvoice SUVMSRP Invoice MSRP Invoice MSRP
1 Volkswagen $32,243 $35,515 $335,813 $364,020 $77,184 $84,195
2 Volvo $38,851 $41,250 $313,990 $333,240 $57,753 $61,280
Use Proc TABULATE. Very succinct expressions for specifying row and column dimensions defined by desired hierarchy of class variables.
The intersection of these dimensions is a cell and represents a combination of values that select the values for which a statistical measure is displayed in the cell.
In your case the SUM is sum of dollars, which might not make sense when the cell has more then one contributing value.
For example: Does it make sense to show the invoice sum for 11 Volkswagen Sedan's is $335,813 ?
Also note the 'inverted' hierarchy used to show the number of contributing values.
Example:
proc tabulate data=sashelp.cars;
class make type;
var invoice msrp;
table
make=''
,
type * invoice * sum=''*f=dollar9.
type * msrp * sum=''*f=dollar9. /* this is an adjacent dimension */
(invoice msrp) * type * n='' /* specify another adjacent dimension, with inverted hierarchy */
/
box = 'Make'
;
where make =: 'V';
run;
Output

proc tabulate remove default borders and color

I'm trying to generate a simple Proc Tabulate output. I don't want the default borders and coloring in the cells.
proc format;
PICTURE PCTF (ROUND) OTHER='009.9%';
run;
PROC TABULATE DATA = X017;
CLASS EDUC
AREA
AGE
SEX
CENRACE
POVERTY
EDUC
INSURE
HEALTH
Q21 / style=[background=lightgreen];
TABLE AREA * (ROWPCTN*f=PCTF.)
AGE * (ROWPCTN*f=PCTF.)
SEX * (ROWPCTN*f=PCTF.)
CENRACE * (ROWPCTN*f=PCTF.)
POVERTY * (ROWPCTN*f=PCTF.)
EDUC * (ROWPCTN*f=PCTF.)
INSURE * (ROWPCTN*f=PCTF.)
HEALTH * (ROWPCTN*f=PCTF.) , Q21 / BOX = "Question 21" ;
RUN;
example of current output
example of desired table output
SAS has a variety of pre-defined ODS styles. Try using style JOURNAL1A. It appears to be what you are looking for. Check out this example. Note the path statement was only included because I have write access issues in the default location. This changes the path to WORK.
ods html path="%sysfunc(getoption(work) )" style=JOURNAL1A;
proc tabulate data=sashelp.cars;
class origin type;
table origin, type;
run;

How to rename total count across class variable in Proc Means

I'm doing a simple count of occurrences of a by-variable within a class variable, but cannot find a way to rename the total count across class variables. At the moment, the output dataset includes counts for all cluster2 within each group as well as the total count across all groups (i.e. the class variable used). However, the counts within classes are named, while the total is shown by an empty string.
Code:
proc means data=seeds noprint;
class group;
by cluster2;
id label2;
output out=seeds_counts (drop= _type_ _freq_) n(id)=count;
run;
Example of output file:
cluster2 group label2 count
7 area 1 20
7 sa area 1 15
7 sb area 1 5
15 area 15 42
15 sa area 15 18
....
Naturally, renaming the emtpy string to "Total" could be accomplished in a separate datastep, but I would like to do it directly in the Proc Means-step. It should be simple and trivial, but I haven't found a way so far. Afterwards, I want to transpose the dataset, which means that the emtpy string has to be changed, or it will be dropped in the proc transpose.
I don't know of a way to do it directly, but you can sort-of-cheat: you can tell SAS to show "Total" instead of missing.
proc format;
value $MissTotalF
' ' = 'Total'
other = [$CHAR12.];
quit;
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _type_ _freq_) n(age)=count;
format sex $MissTotalF.;
run;
For example. I'd also recommend using PROC TABULATE instead of PROC MEANS if you're just going for counts, though in this case it doesn't really make much difference.
The problem here is that if the variable in the class statement is numeric, then the resultant column will be numeric, therefore you can't add the word Total (unless you use a format, similar to the answer from #Joe). This will be why the value is missing, as the class variable can be either numeric or character.
Here's an example of a numeric class variable.
proc sort data=sashelp.class out=class;
by sex;
run;
proc means data=class noprint;
class age;
by sex;
output out=class_counts (drop= _:) n=count;
run;
Using proc tabulate can display the result pretty much how you want it, however the output dataset will have the same missing values, so won't really help. Here's a couple of examples.
proc tabulate data=class out=class_tabulate1 (drop=_:);
class sex age;
table sex*(age all='Total'),n='';
run;
proc tabulate data=class out=class_tabulate2 (drop=_:);
class sex age;
table sex,age*n='' all='Total';
run;
I think the best option to achieve your final goal is to add the nway option to proc means, which will remove the subtotals, then transpose the data and finally write a data step that creates the Total column by summing each row. It's 3 steps, but doesn't involve much coding.
Here is one method you could use by taking advantage of the _TYPE_ variable so that you can process the totals and details separately. You will still have trouble with PROC TRANSPOSE if there is a class with missing values (separate from the overall summary record).
proc means data=sashelp.class noprint;
class sex;
id age;
output out=sex_counts (drop= _freq_ ) n(age)=count;
run;
proc transpose data=sex_counts out=transpose prefix=count_ ;
where _type_=1 ;
id sex ;
var count;
run;
data transpose ;
merge transpose sex_counts(where=(_type_=0) keep=_type_ count);
rename count=count_Total;
drop _type_;
run;

Using Tabulate for 3-way table

I am trying to output a three way frequency table. I am able to do this (roughly) with proc freq, but would like the control for variable to be joined. I thought proc tabulate would be a good way to customize the output. Basically I want to fill in the cells with frequency, and then customize the percents at a later time. So, have count and column percent in each cell. Is that doable with proc tabulate?
Right now I have:
proc freq data=have;
table group*age*level / norow nopercent;
run;
that gives me e.g.:
What I want:
Here is the code I am using:
proc tabulate data=ex1;
class age level group;
var age;
table age='Age Category',
mean=' '*group=''*level=''*F=10./ RTS=13.;
run;
Thanks!
You can certainly get close to that. You can't really get in 'one' cell, it needs to write each thing out to a different cell, but theoretically with some complex formatting (probably using CSS) you could remove the borders.
You can't use VAR and CLASS together, but since you're just doing percents, you don't need to use MEAN - you should just use N and COLPCTN. If you're dealing with already summarized data, you may need to do this differently - if so then post an example of your dataset (but that wouldn't work in PROC FREQ either without a FREQ statement).
data have;
do _t = 1 to 100;
age = ceil(3*rand('Uniform'));
group = floor(2*rand('Uniform'));
level = floor(5*rand('Uniform'));
output;
end;
drop _t;
run;
proc tabulate data=have;
class age level group;
table age='Age Category',
group=''*level=''*(n='n' colpctn='p')*F=10./ RTS=13.;
run;
This puts N and P (n and column %) in separate adjacent cells inside a single level.

how to customize porc freq to deal with missing values

I have the following code
data work.customBins;
retain fmtname 'bins' type 'n';
do binStart=-2.5 to 2.45 by 0.05;
binEnd=binStart+0.05;
difference=cat(binStart," to ",binEnd);
output;
end;
run;
proc format library=work cntlin=work.customBins; run;
proc freq data=work.myData;
table variable /missing;
format variable bins.;
run;
This code works properly everything is fine my only issue is If I have bins for example -1.45 to -1.40 that dont have any values proc freq disregards them. I want the cumulative frequency of the pervious bin to be displayed in the bins that have no values for example
-1.50 to -.145 cumulative Freq = 2%
-.1.45 to -1.4 has no values but the cumulative Freq for this should be 2%
I have also tried doing this
data work.combined;
set work.myData (in=a) work.customBins (in=b)
if a then cont=1;
if b then cont=0;
run;
proc freq data=work.combined;
table variable /missing;
format variable bins.;
weight cont/zeros;
run;
But this also does not work
myData just contains a single variabrle called variable which is decimal numbers in the range of -2.45 to 2.45
Here is a working variant:
data work.customBins;
do binStart=-2.5 to 2.45 by 0.05;
binEnd=binStart+0.05;
difference=cat(binStart," to ",binEnd);
output;
end;
run;
proc sql;
create table want as
select difference, count(variable) as count
from customBins left join mydata
on binStart < variable <= binEnd
group by difference
order by binStart;
quit;
proc freq data=want order=data;
tables difference;
weight count / zeros;
run;
Regarding your first variant. Are you sure that your PROC FORMAT works as expected? Dataset used in CNTLIN-option should have variables START, END and LABEL, not voluntarily named ones. Anyway, it wouldn't work because PROC FREQ uses only values that you do have in mydata dataset, doesn't matter how many other labels you defined in your format.