Divide variable if the rest is same - sas

I have an example table as below
id term subj prof hour
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
I need to divide hours if the id- term and subj same. There are 2 different prof with same id:20 - term and subj, so i divided hour 2.
There are 3 different prof with same id : 30 - term and subj. So i divided hour 3.
So the output should be like this;
id term subj prof hour
20 2016 COM James 2
20 2016 COM Henrey 2
30 2016 HUM Nelly 1
30 2016 HUM John 1
30 2016 HUM Jimmy 1
45 2016 CGS Tim 3

In SAS you can use a double DOW loop to achieve this, once the data has been sorted in the correct order. The first loop counts how many profs there are with the same id, term and subj. The second loop divides hour by the number of profs. The loops are performed at each change of id, term or subj.
I've created a new_hour variable and kept in the temporary _counter variable just so you can see the code working, you can obviously overwrite the hour variable and drop the _counter variable if you wish
/* create initial dataset */
data have;
input id term subj $ prof $ hour;
datalines;
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
;
run;
/* sort data */
proc sort data=have;
by id term subj prof;
run;
/* create output dataset */
data want;
do until(last.subj); /* 1st loop*/
set have;
by id term subj prof;
if first.subj then _counter=0; /* reset counter when id, term or subj change */
_counter+first.prof; /* count number of times prof changes */
end;
do until(last.subj); /* 2nd loop */
set have;
by id term subj;
new_hour=hour / _counter; /* divide hour by number of profs from 1st loop */
output; /* output record */
end;
run;

Assuming your problem is as simple as the one you gave as an example, one proc sql should suffice. If it is more complicated, please explain how so we can be more helpful!
data have;
input id term subj $ prof $ hour;
datalines;
20 2016 COM James 4
20 2016 COM Henrey 4
30 2016 HUM Nelly 3
30 2016 HUM John 3
30 2016 HUM Jimmy 3
45 2016 CGS Tim 3
;
run;
proc sql;
create table want as select
*, hour / count(prof) as hour_adj
from have
group by id, subj;
quit;

Related

Count amount within an year

I have a dataset which looks like
ID STATUS YEAR AMOUNT DT_1
. OPEN 2010 12 12
. OPEN 2009 24 10
. OPEN 2008 32 1
AA CLOSE 2015 150 12
AA CLOSE 2014 200 10
AA CLOSE 2010 10 8
AA CLOSE 2009 20 7
AA CLOSE 2008 18 5
AA OPEN 2012 21 8
AA OPEN 2001 20 7
AA OPEN 2000 18 5
Column DT_1 may take from a max of 12 to a min of 1.
I would like to calculate how much amount there is within this range each time. This means that I should assign to the current year the previous amount.
I would like to expect something like this
ID STATUS YEAR AMOUNT DT_1
. OPEN 2010 12 24
. OPEN 2009 24 32
. OPEN 2008 32 .
AA CLOSE 2015 150 200
AA CLOSE 2014 200 10
AA CLOSE 2010 10 20
AA CLOSE 2009 20 18
AA CLOSE 2008 18 .
AA OPEN 2012 21 20
AA OPEN 2001 20 18
AA OPEN 2000 18 .
I have tried as follows
proc sql;
create table tab1 as
select ID, status, year, sum(amount) as tot_amount, dt_1
from tab
group by 1,2,3;
quit;
but it does not give me the expected output.
EDIT: I had to edit the question as the expected output was different.
So DT_1 is the amount form the previous year? If so it would be a lot easier if the data was sorted by increasing value of YEAR, instead of decreasing as displayed in the question. Then you can just use the LAG() function.
proc sort data=HAVE out=WANT ;
by id status year ;
run;
data WANT;
set want ;
by id status year;
dt_1 = lag(amount);
if first.status then dt_1=.;
run;
See if this is what you want
data have;
input ID $ STATUS $ YEAR AMOUNT;
datalines;
. OPEN 2010 12
. OPEN 2009 24
. OPEN 2008 32
AA CLOSE 2015 150
AA CLOSE 2014 200
AA CLOSE 2010 10
AA CLOSE 2009 20
AA CLOSE 2008 18
AA OPEN 2012 21
AA OPEN 2001 20
AA OPEN 2000 18
;
data want(drop = s);
merge have
have(firstobs = 2 keep = amount STATUS
rename = (amount = DT_1 STATUS = s));
if STATUS ne s then DT_1 = .;
run;

Creating combinations of observations in SAS

I need to figure out how to tabulate all possible combinations of data in a dataset. I have a dataset where each person has 2 rows, one row for an activity score and one row for a total score on a test. There are variables for the score at each visit. A person may have anywhere between 1 to 5 visits. I am looking for all possible combinations of the scores for a given person for each score.
For example, here is code to generate the sample data structure.
data example;
input name $ type $ visit1-visit5;
datalines;
Bob activity 10 13 16 . .
Bob total 13 19 17 . .
John activity 11 20 25 20 21
John total 13 15 17 19 22
Steve activity 6 . . . .
Steve total 9 . . . . .
;
run;
I would like to have a dataset that would give me a structure as follows:
Bob activity 10 13
Bob activity 10 16
Bob activity 13 16
Bob total 13 19
Bob total 13 17
Bob total 19 17
John (rows for all possible combinations)
Steve - would have no rows, since he only has one visit (no combinations possible)
Any suggestions?
For N choose 2 and the output structure you want a couple of nested DO's will suffice.
data example;
input name $ type $ visit1-visit5;
datalines;
Bob activity 10 13 16 . .
Bob total 13 19 17 . .
John activity 11 20 25 20 21
John total 13 15 17 19 22
Steve activity 6 . . . .
Steve total 9 . . . . .
;;;;
run;
data by2;
set example;
array v[*] visit:;
n=n(of v[*]);
do i = 1 to n;
col1 = v[i];
do j = i + 1 to n;
col2 = v[j];
output;
end;
end;
drop i j visit:;
run;
proc print;
run;

Stacking a dataset

What's the code program in SAS to stack data?
For the purpose of example, lets say I have this dataset:
DATA test.one;
INPUT Name $ Y1996 Y1997 Y1998 Y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
Running this set would give me an output like this:
Name Y1996 Y1997 Y1998 Y1999
Dan 5 10 40 20
Derek 10 12 10 10
However, I would want my data to look like this:
Name Year Income
Dan 1996 5
Dan 1997 10
Dan 1998 40
Dan 1999 20
Derek 1996 10
Derek 1997 12
Derek 1998 10
Derek 1999 10
It would create a new variable income corresponding to the stacking the of the data as shown above.
Are you asking how to read the raw data directly into that form?
DATA want;
INPUT Name $ #;
do year=1996 to 1999;
input income #;
output;
end;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
;
The PROC Transpose can solve this;
DATA test.one;
INPUT Name $ y1996 y1997 y1998 y1999;
cards;
Dan 5 10 40 20
Derek 10 12 10 10
run;
proc print data = test.one;
run;
proc transpose data=test.one out=long1;
by name;
run;
data test2;
set long1 (rename=(col1=Income));
RUN;
It will then transform the dataset into a stacked version.

Proc report- grouping

I have an easy table, and I need to create a complicated report. I tried to do it with proc report using lots of grouping but didn't give me right result. Here is my example table :
campus id year gender
West 35 2013 F
West 35 2014 F
West 35 2015 F
West 38 2014 M
West 38 2015 M
East 48 2014 -
East 48 2015 -
East 55 2013 F
East 55 2014 F
And this is the report I need to create:
west east
2014 2015 2014 2015
total 2 2 2 1
Gender 2 2 2 1
F 1 1 1 -
M 1 1 - -
none - - 1 1
So I have 4 different group: I worked on this code
proc tabulate data=a ;
class gender year ;
table gender, year*n*f=4. ;
by id;
run ;
Do you think I can do total first, then gender. And tehn I can append them?
This doesn't quite match your requested output, but I'm not sure having the total repeated makes sense either. Proc Tabulate works well here:
proc tabulate data=have;
class campus year gender/missing;
table (all='Total' gender='Gender'), campus=''*year=''*n='';
run;

Data tranfromation with if then else

I have a table as below:
id term subj degree
18 2007 ww Yes
32 2015 AA Yes
32 2016 AA No
25 2011 NM No
25 2001 ts No
18 2009 ww Yes
18 2010 ww No
I need another variable term2 if the degree is Yes, and I will write to term2 whatever same id and subj's term. So means:
id term subj degree term2
18 2007 ww Yes 2009
32 2015 AA Yes 2016
32 2016 AA No 0
25 2011 NM No 0
25 2001 ts No 0
18 2009 ww Yes 2010
18 2010 ww No 0
What I did with if then else doesn't work. Any idea? Thank you
this is the one I used
data have;
merge aa aa (rename=(id=id1 subj=subj1
term=term1);
term2=0;
if id=id1 and subj=subj1 and degree="Yes" then
term2=term1
run;
data have;
input id term subj $ degree $;
cards;
32 2015 AA Yes
32 2016 AA No
25 2011 NM No
25 2001 ts No
18 2007 ww Yes
18 2010 ww No
;
data want;
merge have have(firstobs=2 keep=id term rename=(id=_id term=_term));
term2=0;
if id=_id and degree='Yes' then term2=_term;
drop _:;
run;
There is missing some important information, like, when an id has an degree = yes value, is there always a degree = no row with the same id?
What should be done if there are more then one degree=no rows with different terms for an id if it also has an degree=yes value? Why do you want to solve this with an if-else Statement?
Assuming you have always exactly one id-matching degree=no row for a row with degree = yes you can use this:
Proc sql;
Select a.*, case when a.degree = "Yes" then b.term else 0 end from table as a
left outer join table as b on a.id = b.id and b.degree = "No" and a.degree="Yes";
quit;
This is without if-statement and no datastep, but you must provide more information if you want a more specific solution.