I am trying to stack multiple variables vertically in a PROC REPORT. I am tied to PROC REPORT over TABULATE or FREQ, so a solution using REPORT would be preferable.
I've tested out other solutions, but unable to find success using my data.
proc format library = library ;
value AGE
1 = '18 to 29'
2 = '30 to 45'
3 = '46 to 64'
4 = '65 and over'
9 = 'NA' ;
value SEX
1 = 'Male'
2 = 'Female'
9 = 'NA' ;
value Q16F
1 = 'EXCELLENT'
2 = 'VERY GOOD'
3 = 'GOOD'
4 = 'FAIR'
5 = 'POOR'
8 = 'DON''T KNOW'
9 = 'NA/REFUSED' ;
DATA CHSS2017_sashelp (keep = q16 sex age);
SET CHSS2017.CHSS2017_sashelp;
FORMAT q16 q16f.;
FORMAT sex SEX.;
FORMAT age AGE.;
RUN;
proc report data = CHSS2017_sashelp nowindows headline;
columns sex n, (q16);
define sex / group;
define q16 / across;
run;
The expected result would be a stacked REPORT table with multiple variables:
expected output
If you are fine with repeated headings/variable names then you can use two report procedures. See code below, I have used sample sas data and customized the formats a bit:
proc format ;
value AGE
1-10 = '1 to 10'
11-12 = '11 to 12'
13-High = '13 and over'
;
value $SEXv
'M' = 'Male'
'F' = 'Female'
;
value Q16F
1 = 'EXCELLENT'
2 = 'VERY GOOD'
3 = 'GOOD'
4 = 'FAIR'
5 = 'POOR'
8 = 'DON''T KNOW'
9 = 'NA/REFUSED' ;
run;
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data class;
set sashelp.class;
q16 = %RandBetween(1, 9);
FORMAT q16 q16f.;
FORMAT sex $SEXv.;
FORMAT age AGE.;
run;
proc report data = class nowindows headline;
columns age n, (q16);
define age/ group;
define q16 / across;
run;
proc report data = class nowindows headline;
columns sex n, (q16);
define sex / group;
define q16 / across;
run;
the macro RandBetween is only for this code, you don't have to use it
Related
I am attempting to determine the population number and frequency of total caloric intake among two groups of individuals (males equal to or above 3000 calories and those males equal to or below 1800 calories). However, when I attempt to break up my dataset using SAS it has an issue computing separate categories for the two groups even though in my code I request that the male variables break up between TOTALcat=1 and TOTALcat=2. Below is my code. That said, what further steps do I need to take to determine both the population number and frequency of these groups within my larger dataset? Thanks
libname lab "C:\Users\14015\Pictures\BST #1";
data nutrition;
set 'C:\Users\14015\Pictures\BST #1\nutrition';
run;
proc contents data = nutrition;
run;
proc format;
value sex
1 = 'MALE'
2 = 'FEMALE';
run;
data nutrition;
set nutrition;
CARBS = CARBS*4;
FAT = FAT*9;
PROTEIN = PROTEIN*4;
run;
data nutrition;
set nutrition;
TOTAL = sum(of CARBS FAT PROTEIN);
run;
proc print data = nutrition;
var TOTAL CARBS FAT PROTEIN SEX;
run;
data nutrition;
set nutrition;
if sex=1 and TOTAL>=3000 then TOTALcat=1;
else sex=1 and TOTAL<=1800 then TOTALcat=2;
run;
proc freq data=nutrition;
table TOTALcat;
run;
You can combine all of your steps into a single data step. Your else statement has a typo and should be:
else if(sex=1 and TOTAL<=1800)
data nutrition;
set lab.nutrition;
carbs = carbs*4;
fat = fat*9;
protein = protein*4;
total = sum(carbs, fat, protein);
if(sex=1 and TOTAL>=3000) then TOTALcat = 1;
else if(sex=1 and TOTAL<=1800) then TOTALcat = 2;
run;
proc freq data=nutrition;
where sex = 1;
table TOTALcat;
run;
Using sample data, this is the output:
The total population size is 34, with 21 in Group 1 and 13 in Group 2.
I want to use proc tabulate to display the proportions/ total count of some variables conditioned on their values - if I only want to access one variable at once I was able to achieve this by setting a specific format (please check the MWE). However, if I want to access two variables at once (like score_1 >= x OR score_2 >=y) I get nowhere.
MWE:
data have;
input score_1 score_2;
datalines;
2 7
4 4
7 2
;
proc format;
value cut_fmt
low - 2 = 'non-critical'
3 - high = 'critical'
;
run;
proc tabulate data = have missing;
format score_1 score_2 cut_fmt.;
class score_1 score_2;
keylabel N = ' ' ColPCTN = '%' all = 'Total';
table score_1 * N score_2 * N, all;
run;
I now would like to have an extra row where it says:
score_1 OR score_2
non-critical 0
critical 3
Is there a way to achieve this in proc tabulate (this shouldn't be possible with a format statement I guess) or would I need to create a new variable before and just use this one then?
Create a new variable and add it to the table statement.
You will also need classdata to ensure 0 counts are presented.
Example:
data have;
input score_1 score_2;
worst_case = max(score_1, score_2);
label worst_case = 'score_1 or score_2';
datalines;
2 7
4 4
7 2
;
proc format;
value cut_fmt
low - 2 = 'non-critical'
3 - high = 'critical'
;
data combinations;
score_1 = 2;
score_2 = 2;
worst_case = 2; output;
score_1 = 3;
score_2 = 3;
worst_case = 3; output;
run;
options missing='0';
proc tabulate data = have classdata=combinations;
class score_1 score_2 worst_case;
format score_1 score_2 worst_case cut_fmt.;
keylabel N = ' ' ColPCTN = '%' all = 'Total';
table score_1 score_2 worst_case, all;
run;
options missing='.';
I have a dataset in SAS in which the months would be dynamically updated each month. I need to calculate the sum vertically each month and paste the sum below, as shown in the image.
Proc means/ proc summary and proc print are not doing the trick for me.
I was given the following code before:
`%let month = month name;
%put &month.;
data new_totals;
set Final_&month. end=end;
&month._sum + &month._final;
/*feb_sum + &month._final;*/
output;
if end then do;
measure = 'Total';
&month._final = &month._sum;
/*Feb_final = feb_sum;*/
output;
end;
drop &month._sum;
run; `
The problem is this has all the months hardcoded, which i don't want. I am not too familiar with loops or arrays, so need a solution for this, please.
enter image description here
It may be better to use a reporting procedure such as PRINT or REPORT to produce the desired output.
data have;
length group $20;
do group = 'A', 'B', 'C';
array month_totals jan2020 jan2019 feb2020 feb2019 mar2019 apr2019 may2019 jun2019 jul2019 aug2019 sep2019 oct2019 oct2019 nov2019 dec2019;
do over month_totals;
month_totals = 10 + floor(rand('uniform', 60));
end;
output;
end;
run;
ods excel file='data_with_total_row.xlsx';
proc print noobs data=have;
var group ;
sum jan2020--dec2019;
run;
proc report data=have;
columns group jan2020--dec2019;
define group / width=20;
rbreak after / summarize;
compute after;
group = 'Total';
endcomp;
run;
ods excel close;
Data structure
The data sets you are working with are 'difficult' because the date aspect of the data is actually in the metadata, i.e. the column name. An even better approach, in SAS, is too have a categorical data with columns
group (categorical role)
month (categorical role)
total (continuous role)
Such data can be easily filtered with a where clause, and reporting procedures such as REPORT and TABULATE can use the month variable in a class statement.
Example:
data have;
length group $20;
do group = 'A', 'B', 'C';
do _n_ = 0 by 1 until (month >= '01feb2020'd);
month = intnx('month', '01jan2018'd, _n_);
total = 10 + floor(rand('uniform', 60));
output;
end;
end;
format month monyy5.;
run;
proc tabulate data=have;
class group month;
var total;
table
group all='Total'
,
month='' * total='' * sum=''*f=comma9.
;
where intck('month', month, '01feb2020'd) between 0 and 13;
run;
proc report data=have;
column group (month,total);
define group / group;
define month / '' across order=data ;
define total / '' ;
where intck('month', month, '01feb2020'd) between 0 and 13;
run;
Here is a basic way. Borrowed sample data from Richard.
data have;
length group $20;
do group = 'A', 'B';
array months jan2020 jan2019 feb2020 feb2019 mar2019 apr2019 may2019 jun2019 jul2019 aug2019 sep2019 oct2019 oct2019 nov2019 dec2019;
do over months;
months = 10 + floor(rand('uniform', 60, 1));
end;
output;
end;
run;
proc summary data=have;
var _numeric_;
output out=temp(drop=_:) sum=;
run;
data want;
set have temp (in=t);
if t then group='Total';
run;
I have a SAS code (SQL) that has to repeat for 25 times; for each month/year combination (see code below). How can I use a macro in this code?
proc sql;
create table hh_oud_AUG_17 as
select hh_key
,sum(RG_count) as RG_count_aug_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_aug_17
from basis_RG_oud
where valid_from_dt <= "01AUG2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
proc sql;
create table hh_oud_SEP_17 as
select hh_key
,sum(RG_count) as RG_count_sep_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_sep_17
from basis_RG_oud
where valid_from_dt <= "01SEP2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
If you use a data step to do this, you can put all the desired columns in the same output dataset rather than using a macro to create 25 separate datasets:
/*Generate lists of variable names*/
data _null_;
stem1 = "RG_count_";
stem2 = "loyabo_recht_";
month = '01aug2017'd;
length suffix $4 vlist1 vlist2 $1000;
do i = 0 to 24;
suffix = put(intnx('month', month, i, 's'), yymmn4.);
vlist1 = catx(' ', vlist1, cats(stem1,suffix));
vlist2 = catx(' ', vlist2, cats(stem2,suffix));
end;
call symput("vlist1",vlist1);
call symput("vlist2",vlist2);
run;
%put vlist1 = &vlist1;
%put vlist2 = &vlist2;
/*Produce output table*/
data want;
if 0 then set have;
start_month = '01aug2017'd;
array rg_count[2, 0:24] &vlist1 &vlist2;
do _n_ = 1 by 1 until(last.hh_key);
set basis_RG_oud;
by hh_key;
do i = 0 to hbound2(rg_count);
if valid_from_dt <= intnx('month', start_month, i, 's') <= valid_to_dt
then rg_count[1,i] = sum(rg_count[1,i],1);
end;
end;
do _n_ = 1 to _n_;
set basis_RG_oud;
do i = 0 to hbound2(rg_count);
rg_count[2,i] = rg_count[1,i] >= 2;
end;
end;
run;
Create a second data set that enumerates (is a list of) the months to be examined. Cross Join the original data to that second data set. Create a single output table (or view) that contains the month as a categorical variable and aggregates based on that. You will be able to by-group process, classify or subset based on the month variable.
data months;
do month = '01jan2017'd to '31dec2018'd;
output;
month = intnx ('month', month, 0, 'E');
end;
format month monyy7.;
run;
proc sql;
create table want as
select
month, hh_key,
sum(RG_count) as RG_count,
case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht
from
basis_RG_oud
cross join
months
where
valid_from_dt <= month <= valid_to_dt
group
by month, hh_key
order
by month, hh_key
;
…
/* Some analysis */
BY MONTH;
…
/* Some tabulation */
CLASS MONTH;
TABLE … MONTH …
WHERE year(month) = 2018;
This is a follow-up to my previous post on SO.
I am trying to produce a frequency table of demographics, including race, sex, and ethnicity. One table is a crosstab of race by sex for Hispanic participants in a study. However, there are no Hispanic participants thus far. So, the table will be all zeroes, but we still have to report it.
This can be done in R, but so far, I have found no solution for SAS. Example data is below.
data race;
input race eth sex ;
cards;
1 2 1
1 2 1
1 2 2
2 2 1
2 2 2
2 2 1
3 2 2
3 2 2
3 2 1
4 2 2
4 2 1
4 2 2
run;
data class;
do race = 1,2,3,4,5,6,7;
do eth = 1,2,3;
do sex = 1,2;
output;
end;
end;
end;
run;
proc format;
value frace 1 = "American Indian / AK Native"
2 = "Asian"
3 = "Black or African American"
4 = "Native Hawiian or Other PI"
5 = "White"
6 = "More than one race"
7 = "Unknown or not reported" ;
value feth 1 = "Hispanic or Latino"
2 = "Not Hispanic or Latino"
3 = "Unknown or Not reported" ;
value fsex 1 = "Male"
2 = "Female" ;
run;
***** ethnicity by sex ;
proc tabulate data = race missing classdata=class ;
class race eth sex ;
table eth, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
***** race by sex ;
proc tabulate data = race missing classdata=class ;
class race eth sex ;
table race, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
***** race by sex, for Hispanic only ;
***** log indicates that a logical page with only missing values has been deleted ;
***** Thanks SAS, you're a big help... ;
proc tabulate data = race missing classdata=class ;
where eth = 1 ;
class race eth sex ;
table race, sex / misstext = '0' printmiss;
format race frace. eth feth. sex fsex. ;
run;
I understand that the code really can't work because I'm selecting where eth is equal to 1 (there are no cases satisfying the condition...). Specifying the command to be run by eth doesn't work either.
Any guidance is greatly appreciated...
I think the easiest way is to create a row in the data that has the missing value. You could look at the following paper for suggestions as to how to do this on a larger scale:
http://www.nesug.org/Proceedings/nesug11/pf/pf02.pdf
PROC FREQ has the SPARSE option, which gives you all possible combinations of all variables in the table (including missing ones), but it doesn't look like that gives you exactly what you need.
Looks like our good friends at Westat have worked with this issue. A description of there solution is shown here.
The code is shown below for convenience, but please cite the original when referenced
PROC FORMAT;
value ethnicf
1 = 'Hispanic or Latino'
2 = 'Not Hispanic or Latino'
3 = 'Unknown (Individuals Not Reporting Ethnicity)';
value racef
1 = 'American Indian or Alaska Native'
2 = 'Asian'
3 = 'Native Hawaiian or Other Pacific Islander'
4 = 'Black or African American'
5 = 'White'
6 = 'More Than One Race'
7 = 'Unknown or Not Reported';
value gndrf
1 = 'Male'
2 = 'Female'
3 = 'Unknown or Not Reported';
RUN;
DATA shelldata;
format ethlbl ethnicf. racelbl racef. gender gndrf.;
do ethcat = 1 to 2;
do ethlbl = 1 to 3;
do racelbl = 1 to 7;
do gender = 1 to 3;
output;
end;
end;
end;
end;
RUN;
DATA test;
input pt $ 1-3 ethlbl gender racelbl ;
cards;
x1 2 1 5
x2 2 1 5
x3 2 1 5
x4 2 1 5
x5 2 1 5
x6 2 2 2
x7 2 2 2
x8 2 2 5
x9 2 2 4
x10 2 2 4
RUN;
DATA enroll;
set test;
if ethlbl = 1 then ethcat = 1;
else ethcat = 2;
format ethlbl ethnicf. racelbl racef. gender gndrf.;
label ethlbl = 'Ethnic Category'
racelbl = 'Racial Categories'
gender = 'Sex/Gender';
RUN;
%MACRO TAB_WHERE;
/* PROC SQL step creates a macro variable whose */
/* value will be the number of observations */
/* meeting WHERE clause criteria. */
PROC SQL noprint;
select count(*)
into :numobs
from enroll
where ethcat=1;
QUIT;
/* PROC FORMAT step to display all numeric values as zero. */
PROC FORMAT;
value allzero low-high=' 0';
RUN;
/* Conditionally execute steps when no observations met criteria. */
%if &numobs=0 %then
%do;
%let fmt = allzero.; /* Print all cell values as zeroes */
%let str = ; /*No Cases in Subset - WHERE cannot be used */
%end;
%else
%do;
%let fmt = 8.0;
%let str = where ethcat = 1;
%end;
PROC TABULATE data=enroll classdata=shelldata missing format=&fmt;
&str;
format racelbl racef. gender gndrf.;
class racelbl gender;
classlev racelbl gender;
keyword n pctn all;
tables (racelbl all='Racial Categories: Total of Hispanic or Latinos'),
gender='Sex/Gender'*N=' ' all='Total'*n='' / printmiss misstext='0'
box=[LABEL=' '];
title1 font=arial color=darkblue h=1.5 'Inclusion Enrollment Report';
title2 ' ';
title3 font=arial color=darkblue h=1' PART B. HISPANIC ENROLLMENT REPORT:
Number of Hispanic or Latinos Enrolled to Date (Cumulative)';
RUN;
%MEND TAB_WHERE;
%TAB_WHERE
I found this paper to be very informative:
Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing
The preloadfmt option in proc means (Method 5) is my favorite. Once you create the necessary formats it's not necessary to add dummy data. It's odd that they haven't yet added this option to proc freq.