Create table with frequency buckets in Base SAS - sas

Below is a sample of my dataset:
City Days
Atlanta 10
Tampa 95
Atlanta 100
Charlotte 20
Charlotte 31
Tampa 185
I would like to break down "Days" into buckets of 0-30, 30-90, 90-180, 180+, such that the "buckets" are along the x-axis of the table, and the cities are along the y-axis.
I tried using PROC FREQ, but I don't have SAS/STAT. Is there any way to do this in base SAS?

I believe this is what you want. This is most certainly a "brute force" approach, but I think its outlines the concept correctly.
data have;
length city $9;
input city dayscount;
cards;
Atlanta 10
Tampa 95
Atlanta 100
Charlotte 20
Charlotte 31
Tampa 185
;
run;
data want;
set have;
if dayscount >= 0 and dayscount <=30 then '0-30'n = dayscount;
if dayscount >= 30 and dayscount <=90 then '30-90'n = dayscount;
if dayscount >= 90 and dayscount <=180 then '90-180'n = dayscount;
if dayscount > 180 then '180+'n = dayscount;
drop dayscount;
run;

One of the ways for solving this problem is by using Proc Format for assigning the value bucket and then using Proc Transpose for the desired result:
data city_day_split;
length city $12.;
input city dayscount;
cards;
atlanta 10
tampa 95
atlanta 100
charlotte 20
charlotte 31
tampa 185
;
run;
/****Assigning the buckets****/
proc format;
value buckets
0 - <30 = '0-30'
30 - <90 = '30-90'
90 - <180 = '90-180'
180 - high = 'gte180'
;
run;
data city_day_split;
set city_day_split;
day_bucket = put(dayscount,buckets.);
run;
proc sort data=city_day_split out=city_day_split;
by city;
run;
/****Making the Buckets as columns, City as rows and daycount as Value****/
proc transpose data=city_day_split out=city_day_split_1(drop=_name_);
by city;
id day_bucket;
var dayscount;
run;
My Output:
> **city |0-30 |90-180 |30-90 |GTE180**
> Atlanta |10 |100 |. |.
> Charlotte |20 |. |31 |.
> Tampa |. |95 |. |185

Related

Sgplot for multiple response variables

I have a dataset with over 50,000 records that looks like the following; ID, season(either high or low), bed_time and triage_time in minutes:
ID Season Bed_time Triage_time
1 high 34 68
2 low 44 20
3 high 90 14
4 low 71 88
5 low 27 54
I would like to create a GROUPED VERTICAL BAR CHART where both then Bed_time Triage_time are reflected as a median and grouped by season on the X-axis:
| | |
| | | |
| | | | |
|-------------------------------
BT TT BT TT
High Low
I reckon I have to transpose the data and then plug into an SGPLOT, but I'm not quite sure how to do that to ensure that data can then be graphed.
proc sgplot data=mysas.projects;
vbar season/ stat=median
group=[Bed_time Triage_time] /*NEED FROM TRANSPOSED DATA*/
groupdisplay=cluster;
run;
quit;
Indeed, you will need to reshape data from wide to long to use group call. Additionally, you will need to include a response call. Consider proc transpose for reshaping:
*** POSTED DATA
data time_data;
length Season $ 5;
input ID Season $ Bed_time Triage_time;
cards;
1 high 34 68
2 low 44 20
3 high 90 14
4 low 71 88
5 low 27 54
;
run;
*** RESHAPE LONG TO WIDE;
proc transpose
data = time_data
out = time_data_long
name = time_group;
by ID Season;
run;
*** CLEAN UP OUTPUT;
data time_data_long;
set time_data_long;
label time_group = "Time Group";
rename col1 = value;
run;
proc sgplot data=time_data_long;
vbar season / response=value stat=median
group = time_group
groupdisplay=cluster;
run;

Doing Principal Components in SAS Using a Holdout and to Score New Data

I am performing Principal Components Analysis in SAS Enterprise Guide and wish to compute factor/component scores on some holdout.
KeepCombinedLR is my primary source of truth. I have another dataset, with the exact same variables, that I would like to be scored without including it in the actual factor analyses.
proc factor data = KeepCombinedLR
simple
method = prin
priors = one
rotate = varimax reorder
mineigen = 1
nfactors = 25
out = FactorScores;
var var1--var40;
run;
data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse ##;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
proc factor data=Fitness outstat=FactOut
method=prin rotate=varimax score;
var Age Weight RunTime RunPulse RestPulse;
title 'Factor Scoring Example';
run;
proc print data=FactOut;
title2 'Data Set from PROC FACTOR';
run;
proc score data=Fitness score=FactOut out=FScore;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=FScore;
title2 'Data Set from PROC SCORE';
run;
PROC SCORE will score your data for you, using your 'holdout' data set.
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_score_examples01.htm&docsetVersion=14.3&locale=en

SAS, transpose a table

I want to transform my SAS table from data Have to data want.
I feel I need to use Proc transpose but could not figure it out how to do it.
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
data Want;
input Variable $11.0 MAX MIN SUM;
datalines;
Variable_1 6 0 29
Variable_2 7 1 87
Variable_3 11 3 87
Variable_4 23 5 100
;
You are right, proc transpose is the solution
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
/*sort it by the stat var*/
proc sort data=Have; by Stat; run;
/*id statement will keep the column names*/
proc transpose data=have out=want name=Variable;
id stat;
run;
proc print data=want; run;

SAS PROC REPORT how to display analysis variables as rows?

I don't know where to start with this. I've tried listing the columns in every possible order but they are always listed horizontally. The dataset is:
data job2;
input year apply_count interviewed_count hired_count interviewed_mean hired_mean;
datalines;
2012 349 52 12 0.149 0.23077
2013 338 69 20 0.20414 0.28986
2014 354 70 18 0.19774 0.25714
;
run;
Here's an example of the proc report code for just one analysis variable:
proc report data = job2;
columns apply_count year;
define year / across " ";
define apply_count / analysis "Applied" format = comma8.;
run;
Ideally the final report would look like this:
2012 2013 2014
Applied 349 338 354
Interv. 52 69 70
Hired 12 20 18
Inter % 15% 20% 20%
Hired % 23% 29% 26%
I don't know if this is the best way to do this.
data job2;
input year apply_count interviewed_count hired_count interviewed_mean hired_mean;
datalines;
2012 349 52 12 0.149 0.23077
2013 338 69 20 0.20414 0.28986
2014 354 70 18 0.19774 0.25714
;;;;
run;
proc transpose data=job2 out=job3;
by year;
run;
data job3;
set job3;
length y atype $8;
y = propcase(scan(_name_,1,'_'));
atype = scan(_name_,-1,'_');
if atype eq 'mean' then substr(y,8,1)='%';
run;
proc print;
run;
proc report data=job3 list;
columns atype y year, col1 dummy;
define atype / group noprint;
define y / group order=data ' ';
define year / across ' ';
define dummy / noprint;
define col1 / format=12. ' ';
compute before atype;
xatype = atype;
endcomp;
compute after atype;
line ' ';
endcomp;
compute col1;
if xatype eq 'mean' then do;
call define('_C3_','format','percent12.');
call define('_C4_','format','percent12.');
call define('_C5_','format','percent12.');
end;
endcomp;
run;

Problems aggregating data by variable in SAS

I have data that looks like this:
ID FileSource Age MamUlt ProcDate Name
223 Facility 35 M 19591 SWEDISH
223 Facility 35 M 19592 SWEDISH
223 Facility 35 U 19592 SWEDISH
223 Facility 35 U 19593 SWEDISH
223 Non-Facility 35 M 19594 RADIA
223 Non-Facility 35 U 19594 RADIA
What I am trying to do is to combine that data (for each ID in the data set) to look like this:
ID Age MAMs ULTs SameDate
223 35 3 3 2
So, for each ID, I need the total times "M" and "U" show up and how many times they show up on the same date; twice in this sample.
Here is what I have so far:
data ImageTotals;
set ImageClaims;
by ID;
retain ID MAMs ULTs SameDate;
if first.ID then do;
MAMs = 0;
ULTs = 0;
MamDate = .;
UltDate = .;
SameDate = 0;
end;
if MamUlt = "M" then do; MAMs = MAMs + 1; MamDate = ProcDate; end;
if MamUlt = "U" then do; ULTs = ULTs + 1; UltDate = ProcDate; end;
if MamDate = UltDate and MamDate ^= . then do; SameDate = SameDate+1; end;
if last.ID;
keep ID MAMs ULTs SameDate;
run;
Any advice? This solves the count problems but not the SameDate problem (still coming up as zero for this instance).
You can use DOW loop to do the aggregation in a data step. Data must be sorted by ID and PROCDATE. Within the same date count how many times M or U appear. Then you can use those day counts to aggregate at the ID level and also test if both appeared on the same date. The AGE variable is simply kept so it will have the value from the last record for that ID.
data counts ;
do until (last.id);
m=0;
u=0;
do until (last.procdate);
set imageclaims;
by id procdate;
m= sum(m,proc='M');
u= sum(u,proc='U');
end;
MAMs=sum(mams,m);
ULTs=sum(ults,u);
SameDate=sum(samedate,m and u);
end;
keep id age mams ults samedate ;
run;
I think this is probably a SQL problem (not my specialty), but since you started on a DATA step solution I took a stab at both. I also added more test data.
data ImageClaims;
input id age Proc $1. ProcDate;
cards;
223 35 M 19591
223 35 M 19592
223 35 U 19592
223 35 U 19593
223 35 M 19594
223 35 U 19594
224 35 M 19591
224 35 M 19592
224 35 M 19593
224 35 M 19593
224 35 M 19594
224 35 U 19595
225 35 M 19592
225 35 U 19592
225 35 U 19593
225 35 M 19593
225 35 M 19594
225 35 U 19594
;
run;
For DATA step approach, create counters for MAMs, ULTs, and MAMULTs (Mam and Ult on same day). Note because I use sum statement for these counters (MAMs++1) they are implicitly retained.
data ImageTotals (keep=id Age MAMs ULTs MAMULTs);
set ImageClaims;
by ID ProcDate;
retain HaveMam HaveUlt; *Count vars are implicitly retained by sum statement;
if first.ID then do;
MAMs=0; *count of mammograms;
ULTs=0; *count of ultrasounds;
MAMULTs=0; *count of mammograms and ultrasounds on same date;
end;
if first.ProcDate then do;
HaveMam=0; *indicator for have a mammogram or not on that date;
HaveUlt=0; *indicator for have an ultrasound or not on that date;
end;
if Proc='M' then do;
HaveMam=1; *set mammogram indicator (for that date);
MAMs++1; *increment counter;
end;
else if Proc='U' then do;
HaveUlt=1; *set ultrasound indicator (for that date);
ULTs++1; *increment counter;
end;
if last.ProcDate then do;
MAMULTs++(HaveMam=1 and HaveUlt=1); *increment MamUlts counter if had both on same date;
end;
if last.id;
run;
For SQL solution I use a subquery that counts MAMs, ULTs, and MAMULTs by ID and ProcDate, and an outer query then sums these by ID. Probably there's a better SQL solution, but I think this works.
proc sql;
create table ImageTotals as
select id
,max(age) as age /*arbitrary use of max age is constant within id*/
,sum(MAMs) as MAMs
,sum(ULTs) as ULTs
,sum(MAMULTs) as MAMULTs
from (
select id
,procdate
,max(age) as age
,sum(Proc='M') as MAMs
,sum(Proc='U') as ULTs
,count(distinct(Proc))=2 as MAMULTs
from ImageClaims
group by id,ProcDate
)
group by id
;
quit;
proc print;
run;
Work.ImageTotals I get from both steps is:
Obs id age MAMs ULTs MAMULTs
1 223 35 3 3 2
2 224 35 5 1 0
3 225 35 3 3 3
Thinking this could be solved with proc sql (count/group by) once you take Q's suggestion, unless I am misinterpreting the complexity here...was going to post some code, but will let you take a crack at it first...