What's the code program in SAS to stack data?
For the purpose of example, lets say I have this dataset:
DATA test.one;
INPUT Name $ Y1996 Y1997 Y1998 Y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
Running this set would give me an output like this:
Name Y1996 Y1997 Y1998 Y1999
Dan 5 10 40 20
Derek 10 12 10 10
However, I would want my data to look like this:
Name Year Income
Dan 1996 5
Dan 1997 10
Dan 1998 40
Dan 1999 20
Derek 1996 10
Derek 1997 12
Derek 1998 10
Derek 1999 10
It would create a new variable income corresponding to the stacking the of the data as shown above.
Are you asking how to read the raw data directly into that form?
DATA want;
INPUT Name $ #;
do year=1996 to 1999;
input income #;
output;
end;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
;
The PROC Transpose can solve this;
DATA test.one;
INPUT Name $ y1996 y1997 y1998 y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
proc transpose data=test.one out=long1;
by name;
run;
data test2;
set long1 (rename=(col1=Income));
RUN;
It will then transform the dataset into a stacked version.
Related
This is my code:
DATA sales;
INFILE 'D:\Users\...\Desktop\Onions.dat';
INPUT VisitingTeam $ 1-20 ConcessionSales 21-24 BleacherSales 25-28
OurHits 29-31 TheirHits 32-34 OurRuns 35-37 TheirRuns 38-40;
PROC PRINT DATA = sales;
TITLE 'SAS Data Set Sales';
RUN;
This is the data, but the spacing may be incorrect.
Columbia Peaches 35 67 1 10 2 1
Plains Peanuts 210 . 2 5 0 2
Gilroy Garlics 151035 12 11 7 6
Sacramento Tomatoes 124 85 15 4 9 1
;
I need to add or delete a blank column at the 19th
column. Can someone help?
Just open the dataset and then look at what the variable name is. Then do:
Data Want (drop=varible_name_you_are_dropping); /*This is your output dataset*/
Set have; /*this is your dataset you have*/
Run;
I'm struggling conceptualizing a code I would like to develop that would output the average number of patients seen by provider. Here is what a snippet of what my dataset, which spans 3 years worth of data, looks like (I have three variables, the patient_ID, provider name and the time which the provider saw the patient which is displayed in a date/time format:
patient_fin first_Md_seen Provider_Seen_Date_Time
1 Bob 5/1/2018 4:19:00 AM
2 Bob 5/1/2018 4:29:00 AM
3 Bob 5/1/2018 4:30:00 PM
4 Sally 5/1/2018 7:39:00 AM
5 Sally 5/1/2018 7:49:00 AM
6 Sally 5/1/2018 8:55:00 PM
7 Bubba 5/3/2018 12:19:00 AM
8 Bob 5/3/2018 4:10:00 AM
....
To calculate the number of a patients seen by a provider, I wrote the following code:
data ED_TAT3;
SET ED_TAT2;
if patient_fin ne . then Patient_fin_count=1;
run;
proc means data = ED_TAT3;
class first_Md_seen;
var Patient_fin_count;
run;
Now, I need to figure out how many hours a provider worked so I can divide the number of patients seen by the number of hours worked.
I think I can use the Provider_Seen_Date_Time variable as a proxy after running the following code to get the hour 'hour = hour (datepart(Provider_Seen_Date_Time))'.
Would a code like this give me the correct number of hours a provider
data new1;
set new;
hour = hour (datepart(Provider_Seen_Date_Time));
if Provider_Name = 'Bob' and hour ne . then hour_worked = 1;
run;
Is there:
1) a more accurate or efficient (there are hundreds of different providers) way to figure out the total number of hours worked per provider?
OR
2) which is the more ideal code, to simply figure out the number of patients per hour a provider saw.
Desired output:
Provider Avg Patients Seen per Hour
Bob 5
Sally 4
Bubba 6
Thanks in advance!
Based on what is given , you can try following code.. however, I still have concerns about the data
data ed_tat2;
input patient_fin first_Md_seen$ Provider_Seen_Date_Time mdyampm25.2;
format Provider_Seen_Date_Time mdyampm25.;
hour = hour (Provider_Seen_Date_Time);
date_seen=datepart(Provider_Seen_Date_Time);
format date_seen date9.;
datalines;
1 Bob 5/1/2018 4:19:00 AM
2 Bob 5/1/2018 4:30:00 PM
3 Sally 5/1/2018 7:39:00 AM
4 Sally 5/1/2018 7:59:00 PM
5 Bubba 5/3/2018 12:19:00 AM
6 Bob 5/3/2018 4:10:00 AM
7 Bob 5/3/2018 4:30:00 AM
8 Bob 5/3/2018 5:10:00 AM
run;
proc sort data=ed_tat2; by first_Md_seen date_seen hour; run;
data ed_tat3;
set ed_tat2;
by first_Md_seen date_seen hour;
if not first.first_Md_seen and date_seen=lag(date_seen) and hour=lag(hour) then hour=0;
else hour=1;
run;
proc sql;
select first_Md_seen, date_seen, count(patient_fin) as number_of_patients_seen, sum(hour) as number_of_hours, count(patient_fin)/sum(hour) as patients_seen_per_hour
from ed_tat3
where hour ne .
group by first_Md_seen, date_seen;
select first_Md_seen, count(patient_fin) as number_of_patients_seen, sum(hour) as number_of_hours, count(patient_fin)/sum(hour) as patients_seen_per_hour
from ed_tat3
where hour ne .
group by first_Md_seen;
quit;
You can do this easily within two proc freqs.
The first will calculate the number of patients seen by doctor per hour and the second uses the first output to calculate the number of hours worked per doctor, per day. You can easily modify these by modifying the TABLE statements.
data ed_tat2;
input patient_fin first_Md_seen $ Provider_Seen_Date_Time mdyampm25.2;
format Provider_Seen_Date_Time mdyampm25.;
hour=hour (Provider_Seen_Date_Time);
date_seen=datepart(Provider_Seen_Date_Time);
format date_seen date9.;
datalines;
1 Bob 5/1/2018 4:19:00 AM
2 Bob 5/1/2018 4:30:00 PM
3 Sally 5/1/2018 7:39:00 AM
4 Sally 5/1/2018 7:59:00 PM
5 Bubba 5/3/2018 12:19:00 AM
6 Bob 5/3/2018 4:10:00 AM
7 Bob 5/3/2018 4:30:00 AM
8 Bob 5/3/2018 5:10:00 AM
;
run;
*counts per hour;
proc freq data=ed_tat2 noprint;
table first_Md_seen*date_seen*hour / out=provider_counts;
run;
*hours worked per doctor;
proc freq data=provider_counts noprint;
table first_Md_seen*date_seen / out=provider_hours;
run;
title 'Number of patients seen';
proc print data=provider_counts label;
label count='# of patients per hour';
title 'Number of hours worked';
proc print data=provider_hours label;
label count='# of hours worked in a day';
run;
I have an example table as below
id term subj prof hour
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
I need to divide hours if the id- term and subj same. There are 2 different prof with same id:20 - term and subj, so i divided hour 2.
There are 3 different prof with same id : 30 - term and subj. So i divided hour 3.
So the output should be like this;
id term subj prof hour
20 2016 COM James 2
20 2016 COM Henrey 2
30 2016 HUM Nelly 1
30 2016 HUM John 1
30 2016 HUM Jimmy 1
45 2016 CGS Tim 3
In SAS you can use a double DOW loop to achieve this, once the data has been sorted in the correct order. The first loop counts how many profs there are with the same id, term and subj. The second loop divides hour by the number of profs. The loops are performed at each change of id, term or subj.
I've created a new_hour variable and kept in the temporary _counter variable just so you can see the code working, you can obviously overwrite the hour variable and drop the _counter variable if you wish
/* create initial dataset */
data have;
input id term subj $ prof $ hour;
datalines;
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
;
run;
/* sort data */
proc sort data=have;
by id term subj prof;
run;
/* create output dataset */
data want;
do until(last.subj); /* 1st loop*/
set have;
by id term subj prof;
if first.subj then _counter=0; /* reset counter when id, term or subj change */
_counter+first.prof; /* count number of times prof changes */
end;
do until(last.subj); /* 2nd loop */
set have;
by id term subj;
new_hour=hour / _counter; /* divide hour by number of profs from 1st loop */
output; /* output record */
end;
run;
Assuming your problem is as simple as the one you gave as an example, one proc sql should suffice. If it is more complicated, please explain how so we can be more helpful!
data have;
input id term subj $ prof $ hour;
datalines;
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
;
run;
proc sql;
create table want as select
*, hour / count(prof) as hour_adj
from have
group by id, subj;
quit;
say I have two rows of data I try to read in.
cody: 10 9 20 18
john: 4 5 1 2
and I want to read them in a two row style in datalines, like such:
input cody john ##;
datalines;
10 9 20 18
4 5 1 2
run;
But this reads it in like cody: 10 20 4 1 john: 9 18 5 2
How do I fix this?
You'd need to read in the CODY lines all at once, then the JOHN lines all at once. It's unclear what the final data structure should look like, but this is one possibility, and then you can restructure this how you wish, perhaps with PROC TRANSPOSE.
Basically, I assign name to the proper name (using an array here, but you can do this in better ways, data-driven ways, depending on your data). Then I loop and tell SAS to keep reading in data until it is unable to read any more, using the truncover option (or missover is also fine) to make sure it doesn't skip to the next line, and output a new row for each value.
data want;
array names[2] $ _temporary_ ("Cody","John") ;
infile datalines truncover;
do _name = 1 to 2;
name = names[_name];
do _i = 1 by 1 until (missing(value));
input value #;
if not missing(value) then output;
end;
input;
end;
drop _:;
datalines;
10 9 20 18
4 5 1 2
run;
I think that the solution to your problem is to use the names as another column, not as variables, like this:
data foo;
input var1 $ var2 var3 var4 var5;
datalines;
cody 10 9 20 18
john 4 5 1 2
;
run;
I have a table with four variables and i want the table a table with combination of all values. Showing a table with only 2 columns as an example.
NAME AMOUNT COUNT
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
The following output is to show the values only for raj and the output should be for all names.
NAME AMOUNT COUNT
RAJ 90 1
RAJ 90 4
RAJ 90 5
RAJ 90 3
RAJ 20 1
RAJ 20 4
RAJ 20 5
RAJ 20 3
RAJ 30 1
RAJ 30 4
RAJ 30 5
RAJ 30 3
RAJ 40 1
RAJ 40 4
RAJ 40 5
RAJ 40 3
.
.
.
.
There are a couple of useful options in SAS to do this; both create a table with all possible combinations of variables, and then you can just drop the summary data that you don't need. Given your initial dataset:
data have;
input NAME $ AMOUNT COUNT;
datalines;
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
;;;;
run;
There is PROC FREQ with SPARSE.
proc freq data=have noprint;
tables name*amount*count/sparse out=want(drop=percent);
run;
There is also PROC TABULATE.
proc tabulate data=have out=want(keep=name amount count);
class name amount count;
tables name*amount,count /printmiss;
run;
This has the advantage of not conflicting with the name for the COUNT variable.
Try
PROC SQL;
CREATE TABLE tbl_out AS
SELECT a.name AS name
,b.amount AS amount
,c.count AS count
FROM tbl_in AS a, tbl_in AS b, tbl_in AS c
;
QUIT;
This performs a double self-join and should have the desired effect.
Here's a variation on #JustinJDavies's answer, using an explicit CROSS JOIN clause:
data have;
input NAME $ AMOUNT COUNT;
datalines;
RAJ 90 1
RAVI 20 4
JOHN 30 5
JOSEPH 40 3
run;
PROC SQL;
create table combs as
select *
from have(keep=NAME)
cross join have(keep=AMOUNT)
cross join have(keep=COUNT)
order by name, amount, count;
QUIT;
Results:
NAME AMOUNT COUNT
JOHN 20 1
JOHN 20 3
JOHN 20 4
JOHN 20 5
JOHN 30 1
JOHN 30 3
JOHN 30 4
JOHN 30 5
...