I have a simple dataset that I would like in a single output table. I would like the variables 'age' and 'sex' stacked against a third variable, 'q16'. An example of the expected/needed output is attached below. I also need to weight the table values using the field 'weight'.
Have tried various versions of proc tabulate, freq, report, but have not come up with a solution. What I'm hoping to get out of this post is a fresh look on my problem and see if the community has any other solutions that I can try.
data survey;
infile datalines dsd;
input age : $20. sex : $10. q16 : $20. weight;
datalines;
18 to 29,Male,VERY GOOD, 0.3984
46 to 64,Male,POOR, 1.6694
18 to 29,Female,POOR, 0.9696
46 to 64,Female,POOR, 0.6078
65 and over,Female,EXCELLENT, 1.0301
65 and over,Female,POOR, 0.7763
;
needed layout
As you can see in the attached image, it's two variables stacked vertically, but I need those two by a third variable called 'q16'. At this point, I'm not looking for design as much as replicating the table in the image with weighted values.
TABULATE Procedure can produce all the numbers, however, there are no features for reporting multiple statistics in a single cell corresponding to a dimensional crossing -- each number gets it's own cell.
A variable that has statistics computed has to be specified in the VAR statement, and counts or percents are for CLASS variables.
For example:
data have;
do person = 1 to 3218 + 1991;
length status $5;
status = ifc (ranuni(123) < 1991/(1991+3218), 'Dead', 'Alive');
if ranuni(123) < 0.001 then age = .; else age = floor(28+35*ranuni(123));
length gender $6;
if status = 'Dead'
then gender = ifc(ranuni(123) < 896 /(896+1095), 'Female', 'Male');
else gender = ifc(ranuni(123) < 1977/(1977+1241), 'Female', 'Male');
output;
end;
run;
proc tabulate data=have;
class status gender;
var age;
table
age * (N NMISS MEAN STD MIN MAX MEDIAN QRANGE)
gender * (N COLPCTN)
,
status;
run;
To get the exact table in your image you could compute the results via one or more statistics procedures and produce the output table via data _null_ and the ODSOUT component object.
Related
I want to use SAS and eg. proc report to produce a custom table within my workflow.
Why: Prior, I used proc export (dbms=excel) and did some very basic stats by hand and copied pasted to an excel sheet to complete the report. Recently, I've started to use ODS excel to print all the relevant data to excel sheets but since ODS excel would always overwrite the whole excel workbook (and hence also the handcrafted stats) I now want to streamline the process.
The task itself is actually very straightforward. We have some information about IDs, age, and registration, so something like this:
data test;
input ID $ AGE CENTER $;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
The goal is to produce a table report which should look like this structure-wise:
ID NO-ID Total
Count 3 2 5
Age (mean) 27 45.5 34.4
Count by Center:
A 2 1 3
B 0 1 1
A 1 0 1
It seems, proc report only takes variables as columns but not a subsetted data set (ID NE .; ID =''). Of course I could just produce three reports with three subsetted data sets and print them all separately but I hope there is a way to put this in one table.
Is proc report the right tool for this and if so how should I proceed? Or is it better to use proc tabulate or proc template or...?
I found a way to achieve an almost match to what I wanted. First if all, I had to introduce a new variable vID (valid ID, 0 not valid, 1 valid) in the data set, like so:
data test;
input ID $ AGE CENTER $;
if ID = '' then vID = 0;
else vID = 1;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
After this I was able to use proc tabulate as suggested by #Reeza in the comments to build a table which pretty much resembles what I initially aimed for:
proc tabulate data = test;
class vID Center;
var age;
keylabel N = 'Count';
table N age*mean Center*N, vID ALL;
run;
Still, I wonder if there is a way without introducing the new variable at all and just use the SAS counters for missing and non-missing observations.
UPDATE:
#Reeza pointed out to use the proc format to assign a value to missing/non-missing ID data. In combination with the missing option (prints missing values) in proc tabulate this delivers the output without introducing a new variable:
proc format;
value $ id_fmt
' ' = 'No-ID'
other = 'ID'
;
run;
proc tabulate data = test missing;
format ID $id_fmt.;
class ID Center;
var age;
keylabel N = 'Count';
table N age*(mean median) Center*N, (ID=' ') ALL;
run;
I have a dataset as following
AGE GENDER
11 F
12 M
13
15
now I want to create a dataset as following
Basically I want to have the variable names in another column.
or may be in one column like
VAR Value
AGE 11
AGE 12
AGE 13
AGE 15
GENDER F
GENDER M
I have tried normal proc transpose, but looks like it doesnt give the desired result.
This is not a strictly speaking a transpose. Transpose implies that you want to transform some columns into rows or vice-versa, which is not the case here. That sample data transposed would look like:
VAR VALUE1 VALUE2 VALUE3 VALUE4
----------------------------------
AGE 11 12 13 14
GENDER F M
What you're trying to do here instead is have all your variables in the same column and add a 'label' column.
You could have your desired result with a data step:
data have;
infile datalines missover
;
input age $ gender $;
datalines;
11 F
12 M
13
15
;
run;
data want;
length var $6;
set have(keep=age rename=(age=value) in=a)
have(keep=gender rename=(gender=value) where=(value is not missing) in=b);
if b then var='GENDER';
else if a then var='AGE';
run;
Note the where= dataset option on the second part of the set statement since your desired result does not include the missing values that you have for gender in your sample data.
Alternatively, you could do it with two proc transpose:
proc transpose data=have out=temp name=VAR;
var age gender;
run;
proc transpose data=temp out=want(drop=_name_ rename=(col1=VALUE) where=(VALUE is not missing));
var col1 col2 col3 col4;
by var;
run;
One solution is to introduce a new unique row identifier and use that in a BY statement. This will let TRANSPOSE pivot the data values in each row.
data have;
rownum + 1; * new variable for pivoting by row via BY statement;
input AGE GENDER $;
datalines;
11 F
12 M
13 .
15 .
run;
proc transpose data=have out=want(drop=_name_ rename=(col1=value) where=(value ne ''));
by rownum;
var age gender;
run;
In Proc TRANPOSE the default new column names are prefixed with COL and indexed by the number of occurrences of a value 1..n in the incoming rows. The artificial rownum and BY statement ensure the pivoted data has only one data column. Note: the prefix can be specified with option PREFIX=, and additionally the pivoted data column names can come from the data itself if you use the ID statement.
Mixed data types can be a problem because the new column will use character representation of underlying data values. So dates will come out as numbers and numeric that were initially formatted will lose their format.
If you are trying to make a JSON transmission I would recommend researching the JSON library engine or the JSON package of Proc DS2.
If you are looking to create a report with the data in this transposed shape I would recommend Proc TABULATE.
I have a list of financial advisors and I need to pull 4 samples per advisor but catch is in those 4 samples I need to force 2 mortgages, 1 loan, 1 credit card lets say.
Is there a way in the Survey select statement to set the specific number of samples to pull per stratum? I know you can stratify on 1 category and set it as a equal number. I was hoping I could use a mapping of employee names + the number of samples left to pull for each category and have survey select utilize that to pull in a dynamic way.
I'm using this as an example but this only stratifies on employee first and gives me 4 per employee. I would need to further stratify on Product type and set that to a specific sample size per product.
proc surveyselect data=work.Emp_Table_Final
method=srs n=4 out=work.testsample SELECTALL;
strata Employee_No;
run;
Thanks i know it might sound complicated, but if i know its possible then i can google the rest
Yes, you can have a dataset be the target of the n option. That dataset must:
Contain the strata variables as well as a variable SAMPSIZE or _NSIZE_ with the number to select
Have the same type and length as the strata variables
Be sorted by the strata variables
Have an entry for every strata variable value
See the documentation for more details.
data sample_counts;
length sex $1;
input sex $ _NSIZE_;
datalines;
F 5
M 3
;;;;
run;
proc sort data=sashelp.class out=class;
by sex;
run;
proc surveyselect n=sample_counts method=srs out=samples data=class;
strata sex;
run;
For two variables it's the same, you just need two variables in the sample_counts. Of course it makes it a lot more complicated, and you may want to produce this in an automated fashion.
proc sort data=sashelp.class out=class;
by sex age;
run;
data sample_counts;
length sex $1;
input sex $ age _NSIZE_;
datalines;
F 11 1
F 12 1
F 13 1
F 14 1
F 15 1
M 11 1
M 12 1
M 13 1
M 14 1
M 15 1
M 16 0
;;;;
run;
/* or do it in an automated way*/
data sample_counts;
set class;
by sex age; *your strata;
if first.age then do; *do this once per stratum level;
if age le 15 then _NSIZE_ = 1; *whatever your logic is for defining _NSIZE_;
else _NSIZE_=0;
output;
end;
run;
proc surveyselect n=sample_counts method=srs out=samples data=class;
strata sex age;
run;
I have a dataset of about 800 observations. I want to get the frequency of 14 variables. I want to get the frequency of these variable by shape (an example). There are 3 different shapes.
An example of doing this one time would obviously be:
proc freq; tables color; by shape;run;
However, I do not want 42 frequency tables. I want one frequency table that has the list of 14 variables on the left side. The top heading will have shape1 shape2 shape3 with the frequencies of each variable underneath them.
It would look like I transposed the data sets by percentage and then stacked them on top of each other.
I have several sets of combinations where I need to do this. I have about 5 different groups of variables and I need to make tables using 3 different by groups (necessitating about 15 tables). The first example I discussed is one example of such groups.
Any help would be appreciated!
Using proc means and proc transpose. I give you some example. You can add more categories.
proc means data=sashelp.class nway n;
class sex age;
output out=class(drop=_freq_ _type_) n=freq;
run;
proc transpose data=class out=class(drop=_name_) prefix=AGE;
by sex;
var freq;
id age;
run;
data class_sum;
set class;
array a(*) age:;
age_sum = sum(of age:);
do i = 1 to dim(a);
a(i) = a(i) / age_sum;
end;
drop i;
run;
I need a column a total as an observation.
Input Dataset Output Dataset
------------- --------------
data input; Name Mark
input name$ mark; a 10
datalines; b 20
a 10 c 30
b 20 Total 60
c 30
;
run;
The below code which I wrote is working fine.
data output;
set input end=eof;
tot + mark;
if eof then
do;
output;
name = 'Total';
mark = tot;
output;
end;
else output;
run;
Please suggest if there is any better way of doing this.
PROC REPORT is a good solution for doing this. This summarizes the entire report - other options give you the ability to summarize in groups.
proc report out=outds data=input nowd;
columns name mark;
define name/group;
define mark/analysis sum;
compute after;
name = "Total";
line "Total" mark.sum;
endcomp;
run;
Your code is fine in general, however the issue might be in terms of performance. If the input table is huge, you end up rewriting full table.
I'd suggest something like this:
proc sql;
delete from input where name = 'Total';
create table total as
select 'Total' as name length=8, sum(mark) as mark
from input
;
quit;
proc append base=input data=total;
run;
Here you are reading full table but writing only a single row to existing table.