Say I have data called data and a variable age
and I am looking for a way to divide the variable representing age, in to a new variable showing the age group meaning,18-21,21-24 and so on.
I know how to do it with proc sql and if, but I understand there is an efficient way to do it using the round function.
Does anybody know how you can divide a numeric variable using round?
thanks.
Method1: Using If/Else
Method2: Using Proc format
I have shared the code for method2 as you already know method1
proc format;
value age
low - 9 = '0 - 9'
10 - 19 = '10 - 19'
20 - 29 = '20 - 29'
30 - 39 = '30 - 39'
40 - 49 = '40 - 49'
50 - 59 = '50 - 59'
60 - high = '60 +++'
;
run;
/**1) If you just want to look at age in bucket form **/
data abc;
set abc;
format age age.;
run;
/**2) If you want to create another variable(say age_bucket) **/
data abc;
set abc;
age_bucket = put(age,age.);
run;
Let me know in case of any queries.
Related
I have a column for dollar-amount that I need to break apart into $1000 segments - so $0-$999, $1,000-$1,999, etc.
I could use Case/When, but there are an awful lot of groups I would have to make.
Is there a more efficient way to do this?
Thanks!
You could just use arithmetic. For example you could convert them to upper limit of the $1,000 range.
up_to = 1000*ceil(dollar/1000);
Let's make up some example data:
data test;
do dollar=0 to 5000 by 500 ;
up_to = 1000*ceil(dollar/1000);
output;
end;
run;
Results:
Obs dollar up_to
1 0 0
2 500 1000
3 1000 1000
4 1500 2000
5 2000 2000
6 2500 3000
7 3000 3000
8 3500 4000
9 4000 4000
10 4500 5000
11 5000 5000
Absolutely. This is a great use case for user-defined formats.
proc format;
value segment
0-<1000 = '0-1000'
1000-<2000 = '1000s'
2000-<3000 = '2000s'
;
quit;
If the number is too high to write out, do it with code!
data segments;
retain
fmtname 'segment'
type 'n' /* numeric format */
eexcl 'Y' /* exclude the "end" match, so 0-1000 excluding 1000 itself */
;
do start = 0 to 1e6 by 1000;
end = start + 1000;
label = catx('- <',start,end); * what you want this to show up as;
output;
end;
run;
proc format cntlin=segments;
quit;
Then you can use segment = put(dollaramt,segment.); to assign the value of segment, or just apply the format format dollaramt segment.; if you're just using it in PROC SUMMARY or somesuch.
And you can combine the two approaches above to generate a User Defined Format that will bin the amounts for you.
Create bins to set up a user defined format. One drawback of this method is that it requires you to know the range of data ahead of time.
Use a user defined function via PROC FCMP.
Use a manual calculation
I illustrate version of the solution for 1 & 3 below. #2 requires PROC FCMP but I think using it a plain data step can be simpler.
data thousands_format;
fmtname = 'thousands_fmt';
type = 'N';
do Start = 0 to 10000 by 1000;
END = Start + 1000 - 1;
label = catx(" - ", put(start, dollar12.0), put(end, dollar12.0));
output;
end;
run;
proc format cntlin=thousands_format;
run;
data demo;
do i=100 to 10000 by 50;
custom_format = put(i, thousands_fmt.);
manual_format = catx(" - ", put(floor(i/1000)*1000, dollar12.0), put((ceil(i/1000))*1000-1, dollar12.0));
output;
end;
run;
I'd like to calculate the number of patients currently within an Emergency Room by hour and I'm having trouble conceptualizing an efficient code.
I have two time variables, 'Check In Time' and 'Release Time'. These date/time variables are obviously arbitrary and the 'release time' variable will come after the 'check in time variable'.
I would like the output for a given day to look something like this:
Hour Midnight 1am 2am 3am 4am.....
# of Pts 34 56 89 23 29
So for example, at 1am there were 56 patients currently in the ED -when considering both checkin and release times.
My initial thought is to:
1) round the time variables
2) Write a code a code the looks something like this...
data EDTimesl;
set EDDATA;
if checkin = '1am' and release = '2am' then OneAMToTwoAM = 1;
if checkin = '1am' and release = '3am' then OneAMToTwoAM = 1;
if checkin = '1am' and release = '3am' then TwoAMToThreeAM = 1;
....
run;
This, however, gives me pause because I feel there is a more efficient method!
Thanks in advance!
I found a code online that might answer the question! Please see below:
data have (keep=admitdate disdate);
/* generate some admission and discharge date time variables*/
year=2015; /* for example all of the admits are in 2015*/
format admitdate disdate datetime20.;
do day= 1 to 20;
do month=1 to 12;
hour = floor(24*ranuni(4445));
min = floor(50*ranuni(1234));
date = mdy(month,day,2015);
admitdate=dhms(date,hour,min,0);
/* random duration of stay*/
duration = 60 + floor(3000*ranuni(7777));
disdate = intnx('minute',admitdate,duration);
output;
end;
end;
run;
data occupancy;
set have;
format admitdate disdate datetime20.;
Do Occupanthour = (dhms(datepart(admitdate),hour(admitdate),0,0)) to
dhms(datepart(disdate),hour(disdate),0,0) by 3600;
HourOfDay = hour(OccupantHour);
DayOfWeek = Weekday(datepart(OccupantHour));
output;
end;
format OccupantHour datetime20.;
run;
Proc freq data=occupancy;
Tables HourOfDay;
run;
proc tabulate data=occupancy;
class DayOfWeek;
class HourOfDay;
tables HourOfDay,
(DayOfWeek All)*n;
run;
Consider following exemplary SAS dataset with following layout.
Price Num_items
100 10
120 15
130 20
140 25
150 30
I want to group them into 4 categories by defining a new variable called cat such that the new dataset looks as follows:
Price Num_items Cat
100 10 1
120 15 1
130 20 2
140 25 3
150 30 4
Also I want to group them so that they have about equal number of items (For example in above grouping Group 1 has 25, Group 2 has 20 ,Group 3 has 25 and Group 4 has 30 observations). Note that the price column is sorted in ascending order (that is required).
I am struggling to start with SAS for above. So any help would be appreciated. I am not looking for a complete solution but pointers towards preparing a solution would help.
Cool problem, subtly complex. I agree with #J_Lard that a data step with some retainment would likely be the quickest way to accomplish this. If I understand your problem correctly, I think the code below would give you some ideas as to how you want to solve it. Note that depending on the num_items, and group_target, your mileage will vary.
Generate similar, but larger data set.
data have;
do price=50 to 250 by 10;
/*Seed is `_N_` so we'll see the same random item count.*/
num_items = ceil(ranuni(_N_)*10)*5;
output;
end;
run;
Categorize.
/*Desired group size specification.*/
%let group_target = 50;
data want;
set have;
/*The first record, initialize `cat` and `cat_num_items` to 1 with implicit retainment*/
if _N_=1 then do;
cat + 1;
cat_num_items + num_items;
end;
else do;
/*If the item count for a new price puts the category count above the target, apply logic.*/
if cat_num_items + num_items > &group_target. then do;
/*If placing the item into a new category puts the current cat count closer to the `group_target` than would keeping it, then put into new category.*/
if abs(&group_target. - cat_num_items) < abs(&group_target. - (cat_num_items+num_items)) then do;
cat+1;
cat_num_items = num_items;
end;
/*Otherwise keep it in the currnet category and increment category count.*/
else cat_num_items + num_items;
end;
/*Otherwise keep the item count in the current category and increment category count.*/
else cat_num_items + num_items;
end;
drop cat_num_items;
run;
Check.
proc sql;
create table check_want as
select cat,
sum(num_items) as cat_count
from want
group by cat;
quit;
I have a data set that contains quarterly data value. But now I want to sum the quarterly values which have the same year.
Data h :
time value
01JAN90 23
01APR90 31
01JUL90 13
01OCT90 45
01JAN91 11
01APR91 4
01JUL91 1
01OCT91 17
I want my result data like this:
time value
1990 53
1991 35
If your time variable is numeric, you can use a FORMAT statement within PROC SUMMARY to automatically extract the year as the PROC runs. (Thanks to #Joe for showing this in comments to my original answer.)
PROC SUMMARY NWAY DATA = h;
CLASS time;
FORMAT time YEAR. ;
OUTPUT
OUT = result (
KEEP = year value
)
SUM (value) =
;
RUN;
This may be a simple question but I want to subtract 6 days from 01/09/2012 and keep the
format of DD/MM/YYYY how would I do this. Also if I compare this with another date in the same format does SAS actually compare the dates so if I said
If (Date1<Date2) /*Does this work in SAS */
SAS dates are simply stored as the number of days since 01JAN1960 - so just subtract six :-)
See my log:
44 data _null_;
45 date1 = '01SEP2012'd;
46 date2 = date1 - 6;
47 put date2= ddmmyys10.; /* the format you need */
48 if (date1 < date2) then put 'false'; /* this DOES work in SAS */
49 else put date1= date2=; /* unformatted - num of days*/
50 run;
date2=26/08/2012
date1=19237 date2=19231