Incorrect SAS freq table output with ODS Excel - sas

trying to export SAS "proc freq" results to an Excel file (xlsx), using Enterprise guide 7.12 with SAS 9.4 on windows.
The following code example :
ODS EXCEL
file='C:\Download\example.xlsx'
STYLE=HtmlBlue
OPTIONS ( sheet_interval="none" sheet_name="Results" );
data example;
input ins_cd$ 1-2 decl_aatrim $ 4-8 prog $ 10-13 compt $ 15-18;
cards;
02 20153 7646 XC12
02 20153 7646 AB02
02 20153 7646 CC13
02 20153 9999
02 20153 7595 PS03
02 20153 7595 PS04
02 20153 6080 XC12
02 20153 6080 XC15
02 20153 6080 CC18
02 20153 6080 DC08
;
proc sort data=example;
by ins_cd decl_aatrim prog compt;
run;
data example2;
set example;
by ins_cd decl_aatrim prog compt;
if first.prog=1 then do;
test=first.prog;
rank=1;
retain rank 1;
end;
else rank=rank+1;
run;
proc freq data=example2;
tables prog*compt;
run;
ods EXCEL close;
outputs the freq table as expected in the results viewer, with four rows per prog like so :(truncated for less copy paste, and freq row labels values roughly translated ):
compt
AB02 CC13 CC18 [...]
prog
6080 Freq 0 0 1 1 0 0 1 1
Pct 0.00 0.00 11.11 11.11 0.00 0.00 11.11 11.11
row pct 0.00 0.00 25.00 25.00 0.00 0.00 25.00 25.00
col.pct 0.00 0.00 100.00 100.00 0.00 0.00 50.00 100.00
7595 Freq 0 0 0 0 0 [...]
[...]
but when the xlsx file produced by ods is opened in Excel, the freq table looks like this:
prog compt
Freq
Pct
row pct
col.pct AB02 CC13 CC18 DC08 PS03 PS04 XC12 XC15 Total
6080 0 0 1 1
0.00 0 11.11 [...]
0.00 0.00 25.00
0.00 0.00 100.00
7595 0
0.00 [...]
and the four cells with freq calculations are merged into one cell and row for each prog.
This http://support.sas.com/kb/32/115.html seems to be related to my problem, but the proposed crosslist solution does not give the wanted output in Excel either.
Any ideas? Thanks!

This is caused by how PROC FREQ works, and the ODS HTML solution (what you refer to as the results viewer) is no different. Notice that it has:
<td class="r t stacked_cell data"><table width="100%" border="0" cellpadding="7" cellspacing="0">
<tr>
<td class="r t data top_stacked_value">1</td>
</tr>
<tr>
<td class="r t data bottom_stacked_value">11.11</td>
</tr>
</table></td>
Inside each cell - so one main table cell has a mini-table in it with the freq/rowpct/colpct/totalpct in it (or in the case of the above, the two elements on a bottom header).
You can solve this a number of ways. One option is, as Reeza notes in another answer, to use PROC TABULATE.
Another option would be to write your own table template via PROC TEMPLATE; that's how PROC FREQ's crosstab is done, after all; you could look at how they did that and change it, perhaps.
A third option would be to postprocess this output; since the resulting table has all of the data you want, just not in rows, you could easily write a VBA routine to change the format to the desired one.

If you can use Proc Tabulate instead. You have more control over your table and the appearance anyways.

Related

PROC TABULATE with ALL and row percentage

I am not able to get a row with ALL using row percentages. I would like the first row to give sum and percentage for column totals. So the percent under borderline for ALL should display 1861 * 100/5049=36.8% and under Desirable to display 1399 * 100/5049=27.7%. Currently it is displaying 100% and I need to change that.
proc tabulate data=sashelp.heart;* format=8.2;
class chol_status smoking_status sex;
table (all smoking_status sex),
(all chol_status)*(n*f=8. colpctn) ;
run;
The output is
All Cholesterol Status
Borderline Desirable High
N ColPctN N ColPctN N ColPctN N ColPctN
All 5049 100.00 1861 100.00 1399 100.00 1789 100.00 <- change the cholesterol % to denominator 5049
Smoking Status
Heavy (16-25) 1029 20.38 383 20.58 285 20.37 361 20.18
Light (1-5) 563 11.15 192 10.32 174 12.44 197 11.01
Moderate (6-15) 563 11.15 217 11.66 170 12.15 176 9.84
Non-smoker 2436 48.25 886 47.61 655 46.82 895 50.03
Very Heavy (> 25) 458 9.07 183 9.83 115 8.22 160 8.94
Sex
Female 2770 54.86 959 51.53 803 57.40 1008 56.34
Male 2279 45.14 902 48.47 596 42.60 781 43.66
I think the closest you can get is this:
proc tabulate data=sashelp.heart;* format=8.2;
class chol_status smoking_status sex;
table all*rowpctn=' ' (smoking_status sex)*(n=' '*f=8. colpctn=' '),
(all) (chol_status) ;
run;
That's not what you want, though, and doesn't really look very good. It's the only option that comes out of proc tabulate, though, as Tabulate won't let you assign statistics to both the rows and the columns - you have to pick one.
PROC REPORT will do what you want, with some effort. However, you could also run this in a two step process - output the tabulate to a dataset, fix the row percentages, then re-print it, either in Report or Tabulate, not asking it to percentage things that time.

Applying cutoff to data set with IDs

I am using SAS and managed to run proc logistic, which gives me a table like so.
Classification Table
Prob Correct Incorrect Percentages
Level Event Non- Event Non- Correct Sensi- Speci- FALSE FALSE
Event Event tivity ficity POS NEG J
0 33 0 328 0 9.1 100 0 90.9 . 99
0.02 33 62 266 0 26.3 100 18.9 89 0 117.9
0.04 31 162 166 2 53.5 93.9 49.4 84.3 1.2 142.3
0.06 26 209 119 7 65.1 78.8 63.7 82.1 3.2 141.5
How do I include IDs for the rows of data in lib.POST_201505_PRED below that have at least 0.6 probability?
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
model BUYER =
age
tenure
usage
payment
loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC;
run;
I've been reading the documentation and searching online but haven't found anything on it. I must be searching for the wrong keywords, as I presume this is a frequently used process.
You just need an id-statement to tell SAS your ID-variable identifies your observations;
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
id ID;
model BUYER = age tenure usage payment loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505
out=lib.POST_201505_PRED
outroc=lib.POST_201505_ROC;
run;
Now your output contains all you need.
For instance to print the IDs that get had probability of at least 0.6 assigned of being a BUYER to them;
proc print data=lib.POST_201505_PRED (where=(P_1 GE 0.6));
var ID P_1;
run;
You find these id yourKey; statements throughout the statistical procedures in SAS, for instance ;
proc univariate data=psydata.stroop;
id Subject;
var ReadTime;
run;
** will report the most extreme values of ReadTime as
;
Turns out I just had to include the ids in lib.POST_201505

How to subset automatically in SAS?

I am new to SAS, so this might be a silly type of question.
Assume there are several datasets with similar structure but different column names. I want to get new datasets with the same number of rows but only a subset of columns.
In the following example, Data_A and Data_B are original datasets and SubA and SubBare what I want. What is the efficient way of deriving SubA and SubB?
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
DATA SubA;
set A_auto;
keep A_make A_price;
RUN;
DATA SubB;
set B_auto;
keep B_make B_price;
RUN;
Here's my new answer. This introduces quite a few concepts, but all are necessary to complete this task.
First of all I would store the required part variable names (the suffixes that are common to all datasets) in a new dataset. This keeps them all in one place and makes it easier to change if required.
The next step is to create a regular expression (regex) search string that combines all the names, separated by a pipe (|), which is the regex symbol for or. I've also added a $ symbol to end of the names, this ensures only variables ending with the part names will be selected.
select into :[macroname] is the method to create macro variables within proc sql
Then I set up a macro to extract the specific variable names for the current dataset and use those names to create a view (like my original answer)
The dictionary library referenced in the proc sql is a metadata library that contains information on all active libraries, tables, columns etc, so is a good source of identifying what the actual variable names are called (based on the regex search string created earlier).
You won't need the proc print in your code, I just put it in to show everything is working as expected.
Let me know if this works for you
/* create intial datasets */
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH B_make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
/* create dataset containing partial name of variables to keep */
data keepvars;
input part_name $ :20.;
datalines;
_make
_price
;
run;
/* create regular expression search string from partial names */
proc sql noprint;
select
cats(part_name,'$') /* '$' matches end of string */
into
:name_str separated by '|' /* '|' is an 'or' search operator in regular expressions */
from
keepvars;
quit;
%put &name_str.; /* print search string to log */
/* macro to create views from datasets */
%macro create_views (dsname, vwname); /* inputs are dataset name being read in and view name being created */
/* extract specific variable names to be kept, based on search string */
proc sql noprint;
select
name
into
:vars separated by ' '
from
dictionary.columns
where
libname = 'WORK'
and memname = upper("&dsname.")
and prxmatch("/&name_str./",strip(name))>0; /* prxmatch is regular expression search function */
quit;
%put &vars.; /* print variables to keep to log */
/* create views */
data &vwname. / view=&vwname.;
set &dsname. (keep=&vars.);
run;
/* test view by printing */
proc print data=&vwname.;;
run;
%mend create_views;
/* run macro for each dataset */
%create_views(A_auto, SubA);
%create_views(B_auto, SubB);

SAS transpose using values as column names and summarize

I'm trying to transpose a data using values as variable names and summarize numeric data by group, I tried with proc transpose and with proc report (across) but I can't do this, the unique way that I know to do this is with data set (if else and sum but the changes aren't dynamically)
For example I have this data set:
school name subject picked saving expenses
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
and I need this in 1 line, sum of 'picked' by the names of students, and later sum of picked by subject, the last 3 columns is the sum total for picked, saving and expense:
school john ruby peter noname math spanish geography nosubject picked saving expenses
raget 15 15 2 0 13 5 2 12 32 22700 8200
If it's possible to be dynamically changed if I have a new student in the school or subject?
It's a little difficult because you're summarising at more than one level, so I've used PROC SUMMARY and chosen different _TYPE_ values. See below:
data have;
infile datalines;
input school $ name $ subject : $10. picked saving expenses;
datalines;
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
;
run;
proc summary data=have;
class school name subject;
var picked saving expenses;
output out=want1 sum(picked)=picked sum(saving)=saving sum(expenses)=expenses;
run;
proc transpose data=want1 (where=(_type_=5)) out=subs (where=(_NAME_='picked'));
by school;
id subject;
run;
proc transpose data=want1 (where=(_type_=6)) out=names (where=(_NAME_='picked'));
by school;
id name;
run;
proc sql;
create table want (drop=_TYPE_ _FREQ_ name subject) as
select
n.*,
s.*,
w.*
from want1 (where=(_TYPE_=4)) w,
names (drop=_NAME_) n,
subs (drop=_NAME_) s
where w.school = n.school
and w.school = s.school;
quit;
I've also tested this code by adding new schools, names and subjects and they do appear in the final table. You'll note that I haven't hardcoded anything (e.g. no reference to math or John), so the code is dynamic enough.
PROC REPORT is an interesting alternative, particularly if you want the printed output rather than as a dataset. You can use ODS OUTPUT to get the output dataset, but it's messy as the variable names aren't defined for some reason (they're "C2" etc.). The printed output of this one is a little messy also as the header rows don't line up, but that can be fixed with some finagling if that's desired.
data have;
input school $ name $ subject $ picked saving expenses;
datalines;
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
;;;;
run;
ods output report=want;
proc report nowd data=have;
columns school (name subject),(picked) picked=picked2 saving expenses;
define picked/analysis sum ' ';
define picked2/analysis sum;
define saving/analysis sum ;
define expenses/analysis sum;
define name/across;
define subject/across;
define school/group;
run;

How to use proc transpose on variables that contain numbers separated with _?

Hi I am new to sas I have a question regarding proc transpose
I have this data
Input
School Name State School Code 26/07/2009 02/08/2009 09/08/2009 16/08/2009
Northwest High IL 14556 06 06 06 06
Georgia High GA 147 05 05 05 06
Macy Hgh TX 45456 NA NA NA NA
The desired output is
School Name State School Code Date Absent
Northwest High IL 14566 26/07/2009 6
Northwest High IL 14556 02/08/2009 6
Northwest High IL 14556 09/08/2009 6
Northwest High IL 14556 16/08/2009 6
Georgia High GA 147 26/07/2009 5
Georgia High GA 147 02/08/2009 5
Georgia High GA 147 09/08/2009 5
Georgia High GA 147 16/08/2009 6
Macy Hgh TX 45456 26/07/2009 NA
Macy Hgh TX 45456 02/08/2009 NA
Macy Hgh TX 45456 09/08/2009 NA
Macy Hgh TX 45456 16/08/2009 NA
This is the code I have written
proc sort data=work.input;
by School_Name State School_Code;
run;
proc transpose data=work.input out=work.inputModified;
by by School_Name State School_Code;
run
I get this error saying that No variables to transpose I think the issue is since the variables are actual numbers like this _26_07_2009 sas does not recognize them,
But I don't get the desired output the dates are actual variables when imported into sas they become _26_07_2009. Note there are about 185 dates and they are actual variables.
Thanks
The following transpose does the job:
proc transpose data=work.input out=work.inputModified;
by School_Name State School_Code;
var _:;
run;
Notice the _: notation - it picks up all variables which start with an underscore and transposes them.
As I mentioned in the link in my comments earlier, if you do not explicitly specify the variables you want to tranpose- then proc transpose by default looks for numeric variables that are not in the by variable list to transpose. However, since your date variables are read-in as strings [due to the presence of NAs] it was saying NOTE: No variables to transpose.
You can use the following to convert the date and absent columns into numeric columns.
data inputModified2;
set inputModified;
format date date9.;
date = input(compress(tranwrd(_name_,'_','')), ddmmyy8.);
if col1 NE 'NA' then absent = input(col1, 8.);
else absent=.;
drop _name_ col1;
run;