Rolling up data in SAS - sas

Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!

You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;

Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;

Related

Proc transpose in SAS with multiple observations in var variable

I have a dataset that I want to tranpose from long to wide. I have:
**ID **Question** Answer**
1 Referral to a
1 Referral to b
1 Referral to d
2 Referral to a
2 Referral to c
4 Referral to a
6 Referral to a
6 Referral to c
6 Referral to d
What I want the tranposed dataset to look like:
**ID **Referral to**
1 a, b, d
2 a, c
4 a
6 a, c, d
I've tried to transpose the data, but the resulting dataset only contains 1 of the responses from the answer column, not all of them.
Code I've been using:
proc transpose data=test out=test2 let;
by ID;
id Question;
var Answer; run;
The dataset has hundreds of thousands of rows with dozens of variables that are exactly the same as the 'Referral to' example. How can make it so the tranposed wide dataset contains all of the answers to the Question in the same cell and not just one? Any help would be appreciated.
Thank you.
Here's two methods you can use in this case.
The first uses a data step approach, which is a single step. The second is more dynamic and uses a PROC TRANSPOSE + CATX() after the fact to create the combined variable. Note the use of PREFIX option in the transpose procedure to make the variables easier to identify and concatenate.
*create sample data for demonstration;
data have;
infile cards dlm='09'x;
input OrgID Product $ States $;
cards;
1 football DC
1 football VA
1 football MD
2 football CA
3 football NV
3 football CA
;
run;
*Sort - required for both options;
proc sort data=have;
by orgID;
run;
**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
set have;
by orgID;
length combined $100.;
retain combined;
if first.orgID then
combined=states;
else
combined=catx(', ', combined, states);
if last.orgID then
output;
run;
**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
by orgID;
var states;
run;
data want_option2;
set wide;
length combined $100.;
combined=catx(', ', of state_:);
run;

Need same output in sas

So I have the following data:
Data Cricket;
input match $;
cards;
IndVsPak
NezVsAus
PakVsInd
WesVsPak
WesVsAus
IndVsPak
AusVsNez
; run;
Need Output:
Match Count
IndVsPak 3
NezVsAus 2
WesVsPak 1
WesVsAus 1
Please help with code how many ways we get the above output?
Try this:
Data Cricket;
input match $;
cards;
IndVsPak
NezVsAus
PakVsInd
WesVsPak
WesVsAus
IndVsPak
AusVsNez
;
run;
/*standardise team order within each match - easier to do in data step*/
data temp /view = temp;
set cricket;
team1 = substr(match,1,3);
team2 = substr(match,6,3);
call sortc(of team:);
match_sorted = cats(team1,'Vs',team2);
run;
proc sql noprint;
create table want as
select match_sorted, count(match_sorted) as freq
from temp
group by match_sorted
order by freq descending
;
quit;
Output:
match_
sorted freq
IndVsPak 3
AusVsNez 2
AusVsWes 1
PakVsWes 1
Here's my attempt at doing this entirely in proc sql:
proc sql noprint;
create table want as
select
ifc(
team1 < team2,
cats(team1, 'Vs', team2),
cats(team2, 'Vs', team1)
) as match_sorted length=8,
count(calculated match_sorted) as freq
from (
select
substr(match,1,3) as team1,
substr(match,6,3) as team2
from cricket
)
group by match_sorted
order by freq descending
;
quit;
N.B. this uses a calculated field - a bit of SAS-specific sql functionality. You could eliminate this by setting the whole thing up as a sub-query that produces match_sorted, or you could flatten the query and use calculated fields for everything.
Good day, In SAS (almost) everything is done via PROCS. Kind of macros performing actions.
In this case I suggest using Proc freq
Data Cricket;
input match $10.;
cards;
IndVsPak
NezVsAus
PakVsInd
WesVsPak
WesVsAus
IndVsPak
AusVsNez
; run;
proc freq data=Cricket noprint;
table match / out= freqs ;
run;
You can see the output by removing the noprint-option.
This will also work if you are more comfortable using SQL:
PROC SQL;
SELECT match, count(*) AS cnt FROM cricket GROUP BY match;
QUIT;

How to transpose my data on sas by observation on data step

I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.

SAS - Survey Select - Selecting Different Sample Size per Stratum

I have a list of financial advisors and I need to pull 4 samples per advisor but catch is in those 4 samples I need to force 2 mortgages, 1 loan, 1 credit card lets say.
Is there a way in the Survey select statement to set the specific number of samples to pull per stratum? I know you can stratify on 1 category and set it as a equal number. I was hoping I could use a mapping of employee names + the number of samples left to pull for each category and have survey select utilize that to pull in a dynamic way.
I'm using this as an example but this only stratifies on employee first and gives me 4 per employee. I would need to further stratify on Product type and set that to a specific sample size per product.
proc surveyselect data=work.Emp_Table_Final
method=srs n=4 out=work.testsample SELECTALL;
strata Employee_No;
run;
Thanks i know it might sound complicated, but if i know its possible then i can google the rest
Yes, you can have a dataset be the target of the n option. That dataset must:
Contain the strata variables as well as a variable SAMPSIZE or _NSIZE_ with the number to select
Have the same type and length as the strata variables
Be sorted by the strata variables
Have an entry for every strata variable value
See the documentation for more details.
data sample_counts;
length sex $1;
input sex $ _NSIZE_;
datalines;
F 5
M 3
;;;;
run;
proc sort data=sashelp.class out=class;
by sex;
run;
proc surveyselect n=sample_counts method=srs out=samples data=class;
strata sex;
run;
For two variables it's the same, you just need two variables in the sample_counts. Of course it makes it a lot more complicated, and you may want to produce this in an automated fashion.
proc sort data=sashelp.class out=class;
by sex age;
run;
data sample_counts;
length sex $1;
input sex $ age _NSIZE_;
datalines;
F 11 1
F 12 1
F 13 1
F 14 1
F 15 1
M 11 1
M 12 1
M 13 1
M 14 1
M 15 1
M 16 0
;;;;
run;
/* or do it in an automated way*/
data sample_counts;
set class;
by sex age; *your strata;
if first.age then do; *do this once per stratum level;
if age le 15 then _NSIZE_ = 1; *whatever your logic is for defining _NSIZE_;
else _NSIZE_=0;
output;
end;
run;
proc surveyselect n=sample_counts method=srs out=samples data=class;
strata sex age;
run;

How to sort by formatted values

proc sort data=sas.mincome;
by F3 F4;
run;
Proc sort doesn't sort the dataset by formatted values, only internal values. I need to sort by two variables prior to a merge. Is there anyway to do this with proc sort?
I don't think you can sort by formatted values in proc sort, but you can definitely use a simple proc SQL procedure to sort a dataset by formatted values. proc SQL is similar to the data step and proc sort, but is more powerful.
The general syntax of proc sql for sorting by formatted values will be:
proc sql;
create table NewDataSet as
select variable(s)
from OriginalDataSet
order by put(variable1, format1.), put(variable2, format2.);
quit;
For example, we have a sample data set containing the names, sex and ages of some people and we want to sort them:
proc format;
value gender 1='Male'
2='Female';
value age 10-15='Young'
16-24='Old';
run;
data work.original;
input name $ sex age;
datalines;
John 1 12
Zack 1 15
Mary 2 18
Peter 1 11
Angela 2 24
Jack 1 16
Lucy 2 17
Sharon 2 12
Isaac 1 22
;
run;
proc sql;
create table work.new as
select name, sex format=gender., age format=age.
from work.original
order by put(sex, gender.), put(age, age.);
quit;
Output of work.new will be:
Obs name sex age
1 Mary Female Old
2 Angela Female Old
3 Lucy Female Old
4 Sharon Female Young
5 Jack Male Old
6 Isaac Male Old
7 John Male Young
8 Zack Male Young
9 Peter Male Young
If we had used proc sort by sex, then Males would have been ranked first because we had used 1 to represent Males and 2 to represent Females which is not what we want. So, we can clearly see that proc sql did in fact sort them according to the formatted values (Females first, Males second).
Hope this helps.
Because of the nature of formats, SAS only uses the underlying values for the sort. To my knowledge, you cannot change that (unless you want to build your own translation table via PROC TRANTAB).
What you can do is create a new column that contains the formatted value. Then you can sort on that column.
proc format library=work;
value $test 'z' = 'a'
'y' = 'b'
'x' = 'c';
run;
data test;
format val $test.;
informat val $1.;
input val $;
val_fmt = put(val,$test.);
datalines;
x
y
z
;
run;
proc print data=test(drop=val_fmt);
run;
proc sort data=test;
by val_fmt;
run;
proc print data=test(drop=val_fmt);
run;
Produces
Obs val
1 c
2 b
3 a
Obs val
1 a
2 b
3 c