I have the following proc report
proc report data=sashelp.class;
col
sex
age
weight
;
define sex / group;
define age / group;
define weight / analysis sum;
run;
However I do not want to show the sum of weight. Instead I would like to have the proportion of the grouped sum. So first row should be 6.23%. How can I achieve this?
Now I have found a workaround:
proc sql noprint;
CREATE TABLE class AS
SELECT a.*
,b.sumweight
FROM sashelp.class a
LEFT JOIN (SELECT sex, sum(weight) as sumweight
FROM sashelp.class
GROUP BY sex
) b
ON a.sex=b.sex
;
quit;
proc report data=class;
col
sex
age
weight
sumweight
perc
;
define sex / group;
define age / group;
define weight / analysis sum;
define sumweight / analysis mean noprint;
define perc / computed format=percent6.2;
compute perc;
perc = weight.sum/sumweight.mean;
endcomp;
run;
But maybe there is a solution without additional proc sql step...
Related
I'm attempting to generate an automated report that combines counts, row percentages, and chi-squared p-values from two-way proc freq output for a series of variables.
I'd like the counts and percentages for Variable A to be displayed under separate headers for each category of Variable B.
I'm almost there, but running the following test code on the sashelp.cars dataset produces a report that has offset rows.
Is it possible to consolidate the rows by Cylinder values so I don't have so many empty cells in the table?
proc freq data=sashelp.cars;
tables origin*cylinders / chisq totpct outpct list out=freqpct(keep=cylinders origin count pct_row);
output out=chisq(keep=N P_PCHI) all;
run;
data freqpct;
set freqpct;
var=1;
run;
data chisq;
set chisq;
var=1;
run;
proc sql;
create table chisq_freqpct
as select *
from freqpct a
inner join
chisq b
on a.var=b.var;
quit;
proc report data=chisq_freqpct;
column cylinders origin,(count pct_row) N P_PCHI;
define cylinders / display;
define origin / across;
define count / display;
define pct_row / display;
define N / group;
define P_PCHI / group;
run;
You can use / group for cylinders.
Example:
data chisq_freqpct;
if _n_ = 1 then set chisq;
set freqpct;
run;
title "sashelp.cars";
proc format;
value blank low-high = ' ';
proc report data=chisq_freqpct split=' ';
column cylinders origin,(count pct_row) N p_pchi;
define cylinders / group ;
define origin / across;
define N / across;
define p_pchi / across;
compute n; call define (8, 'format', 'blank.'); endcomp;
compute p_pchi; call define (9, 'format', 'blank.'); endcomp;
run;
The across for N and P_PCHI places their values in the header.
You could instead have placed the values in macro variables and resolved those in a title statement or grouped header text.
Use GROUP for cylinder and MAX or MIN for N and P_PCHI.
Only attach the N and P_CHI values to the first observation. Which means you either need to exclude the missing values of CYLINDERS and ORIGIN in the PROC FREQ step or add the MISSING keyword to the PROC REPORT step.
proc freq data=sashelp.cars noprint;
* where 0=cmiss(origin,cylinders);
tables origin*cylinders / chisq outpct out=freqpct(keep=cylinders origin count pct_row);
output out=chisq(keep=N P_PCHI ) all;
run;
data chisq_freqpct;
if _n_ = 1 then set chisq;
else call missing(of _all_);
set freqpct;
run;
options missing=' ';
proc report data=chisq_freqpct split=' ' missing;
column cylinders origin,(count pct_row) n p_pchi;
define cylinders / group ;
define origin / across;
define n / max;
define p_pchi / max;
run;
options missing='.';
I am a beginner so my knowledge is lacking. I have a data set consisting of the following columns:
Subject
Age
Height
Weight
I wish to create a table such that for every 1 person i have three rows called Age, Height, Weight.
I have tried to use Proc Tabulate :
proc tabulate data=new;
class person;
var NEWCOLUMN;
table person,NEWCOLUMN;
run;
However i am getting an error because the new column is not the correct type.
You can pivot the data per person and REPORT or TABULATE the cell values.
Example:
proc transpose data=sashelp.class out=pivoted;
by name;
var age height weight;
where name <= 'C';
run;
ods html file='output.html' style=plateau;
options nodate nonumber nocenter;
title; footnote;
proc report data=pivoted;
column name _name_ col1;
define name / ' ' order order=data;
define _name_ / ' ';
define col1 / ' ';
run;
proc tabulate data=pivoted;
class name _name_ / order=data;
var col1;
table name*_name_='', col1=''*min='' / nocellmerge;
run;
ods html close;
Output
proc tabulate data=D.Arena out=work.Arena ;
class Row1 Column1/ order=freq ;
table Row1,Column1 ;
run;
after running this i received these results and now i want to restrict the columns to only 5 variables
Use a 'where' statement to restrict the col1 values being tabulated.
You can restrict based on a value property such as starts with the letter A
where col1 =: 'A';
You can restrict based on a value list:
where col1 in ('Apples', 'Lentils', 'Oranges', 'Sardines', 'Cucumber');
Sample data:
data have;
call streaminit(123);
array col1s[50] $20 _temporary_ (
"Apples" "Avocados" "Bananas" "Blueberries" "Oranges" "Strawberries" "Eggs" "Lean beef" "Chicken breasts" "Lamb" "Almonds" "Chia seeds" "Coconuts" "Macadamia nuts" "Walnuts" "Asparagus" "Bell peppers" "Broccoli" "Carrots" "Cauliflower" "Cucumber" "Garlic" "Kale" "Onions" "Tomatoes" "Salmon" "Sardines" "Shellfish" "Shrimp" "Trout" "Tuna" "Brown rice" "Oats" "Quinoa" "Ezekiel bread" "Green beans" "Kidney beans" "Lentils" "Peanuts" "Cheese" "Whole milk" "Yogurt" "Butter" "Coconut oil" "Olive oil" "Potatoes" "Sweet potatoes" "Vinegar" "Dark chocolate"
);
do row1 = 1 to 20;
do _n_ = 1 to 1000;
col1 = col1s[ceil(rand('uniform',50))];
x = ceil(rand('uniform',250));
output;
end;
end;
run;
Frequency tabulation, also showing ALL counts
* col1 values shown in order by value;
proc tabulate data=have;
class row1 col1;
table ALL row1,col1;
run;
* col1 values shown in order by ALL frequency;
proc tabulate data=have;
class row1;
class col1 / order=freq;
table ALL row1,col1;
run;
* Letter T col1 values shown in order by ALL frequency;
proc tabulate data=have;
where col1 =: 'T';
class row1;
class col1 / order=freq;
table ALL row1,col1;
run;
A top 5 only list of Col1s would require a step that determines which col1s meet that criteria. A list of those col1s can be used as part of a where in clause.
* determine the 5 col1s with highest frequency count;
proc sql noprint outobs=5;
select
quote(col1) into :top5_col1_list separated by ' '
from
( select col1, count(*) as N from have
group by col1
)
order by N descending;
quit;
proc tabulate data=have;
where col1 in (&top5_col1_list);
class row1;
class col1 / order=freq;
table ALL row1,col1;
run;
Col1s in order of value
Col1s in order of frequency
T Col1s
Top 5 Col1s
I have been trying to create a demographic table like below this but I can't seem append the different tables. Please advise on where I can make adjustments in the code.
Group A Group B
chort 1 cohort 2 cohort 3 subtotal cohort 4 cohort 5 cohort 6 subtotal
Age
n
mean
sd
median
min
Gender
n
female
male
Race
n
white
asian
hispanic
black
My Code:
PROC FORMAT;
value content
1=' '
2='Age'
3='Gender'
4='Race'
value sex
1=' n'
2=' female'
3=' male';
value race
1=' n'
2=' white'
3=' asian'
4=' hispanic'
5=' black';
value stat
1=' n'
2=' Mean'
3=' Std. Dev.'
4=' Median'
5=' Minimum';
RUN;
DATA testtest;
SET test.test(keep = id group cohort age gender race);
RUN;
data tottest;
set testtest;
output;
if prxmatch('m/COHORT 1|COHORT 2|COHORT 3/oi', cohort) then do;
cohort='Subtotal';
output;
end;
if prxmatch('m/COHORT 4|COHORT 5|COHORT 6/oi', cohort) then do;
cohort='Subtotal';
output;
end;
run;
data count;
if 0 then set testtest nobs=npats;
call symput('npats',put(npats,1.));
stop;
run;
proc freq data=tottest;
tables cohort /out=patk0 noprint;
tables cohort*sex /out=sex0 noprint;
tables cohort*race /out=race0 noprint;
run;
PROC MEANS DATA = testtest n mean std min median;
class cohort;
VAR age;
RUN;
I know that I would have to transpose it and out it in a report. But before I do that, how do I get the variable out of my proc means, proc freq, etc?
I'm using PROC REPORT to generate a report of weighted sums. There are 2 columns that need to be summarized, both with the MEAN statistic. On top of that, I want to output the total weight.
I have 2 issues.
I cannot seem to get the title on each sum to reflect the variable
being summed.
I need a different format for each column.
Here is some sample data:
data test;
format lev1-lev3 $3. weight percent10.2 duration 6.2 convexity 6.4;
informat weight percent10.2 duration 6.2 convexity 6.4;
input lev1 lev2 lev3 weight duration convexity;
datalines;
A C H 16.11% 3.21 0.6182
A C I 3.83% 9.06 1.2244
A D J 7.67% 2.21 3.4010
A D K 16.90% 3.98 0.0303
B E L 2.68% 1.88 1.9515
B E M 16.68% 4.36 3.1851
B F N 20.79% 2.64 0.1145
B F O 15.34% 5.55 2.4408
;
run;
I've tried a number of ways to define things in PROC REPORT. Here is one of many:
proc report data=test nowd out=report;
column lev1 lev2 lev3 duration,(SUMWGT MEAN) convexity,(Mean);
weight weight;
define lev1 / group;
define lev2 / group;
define lev3 / group;
define duration / 'Duration' ;
define sumwgt / 'Weight' format=percent10.2;
define mean / '' format=6.2;
define convexity / 'Convexity';
*define mean / 'Convexity' format=6.4;
break before lev1 / summarize ;
break before lev2 / summarize ;
rbreak before / summarize;
run;
My ultimate goal would be something like:
Lev1 Lev2 Lev3 Weight Duration Convextiy
100.00% 3.88 1.3943
A 44.51% 3.83 0.9267
...
I've also played with PROC TABULATE but I am less of a fan of the tables it presents.
Example TABULATE mess:
PROC TABULATE DATA=WORK.test;
VAR duration convexity;
CLASS LEV1 / ORDER=UNFORMATTED MISSING;
CLASS LEV2 / ORDER=UNFORMATTED MISSING;
CLASS LEV3 / ORDER=UNFORMATTED MISSING;
TABLE
/* Row Dimension */
ALL={LABEL="+"}
LEV1*(
ALL={LABEL="+"}
LEV2*(
ALL={LABEL="+"}
LEV3 ) )
,
/* Column Dimension */
duration={LABEL="Weight"}*SumWgt={LABEL=""}*f=percent10.2
duration={LABEL="Duration"}*Mean={LABEL=""}*f=6.2
convexity={LABEL="Convexity"}*Mean={LABEL=""}*f=6.4;
WEIGHT weight;
RUN;
I think you'll have challenges getting exactly what you want from PROC REPORT. Maybe Cynthia#SAS could figure it out, I don't know, but getting the row headers right in particular will be extremely challenging.
I would suggest pre-processing the means (using PROC MEANS or similar) and then REPORTing that result. Very easy to do.
This may be close to what you want, for example:
proc means data=test;
class lev1 lev2 lev3;
var duration convexity;
weight weight;
types () lev1 lev1*lev2 lev1*lev2*lev3;
output out=test_out
sumwgt(duration)=sumwgt mean(duration)= mean(convexity)=;
run;
proc report data=test_out;
columns lev1-lev3 sumwgt duration convexity;
define lev1/order missing;
define lev2/order missing;
define lev3/order missing;
define sumwgt/display format=percent9.2;
define duration/display format=6.2;
define convexity/display format=6.4;
run;