SAS Program to add salary - sas

I want to sum the salary after first two observation of the table.
1st two observation will remain same, rest should be added as shown below
Name salary Name salary
subrat 10 subrat 10
abhi 20 abhi 20
milan 100 other 1000
sam 200
sudhir 300
muna 400

data want;
set have end = eof;
if _N_ in (1 2 3) then new_salary = salary;
else if _N_ > 3 then new_salary + salary;
if eof then name = "Other";
if _N_ in (1 2) or eof;
drop salary;
rename new_salary = salary;
run;

Related

Using do loops in sas

Assume you have a data file called VIRUS_PROLIF from an infectious disease research center. Each observation has 3 variables COUNTRY START_DATE, and DOUBLE_RATE, where START_DATE is the date that the Country registered its 100th case of COVID-19. For each country, DOUBLE_RATE is the number of days it takes for the number of cases to double in that country. Write the SAS code using DO UNTIL to calculate the date at which that Country would be predicted to register 200,000 cases of COVID-19.
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate ;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
;
run;
data VIRUS_PROLIF1 (drop=start_date);
set VIRUS_PROLIF;
do until (num_of_cases>200000);
double_rate+1;
num_of_cases+ (num_of_cases*1);
end;
run;
proc print data=VIRUS_PROLIF1;
run;
The key concept you're missing here is how to employ the growth rate. That would be using the following formula, similar to interest growth for money.
If you have one dollar today and you get 100% interest it becomes
StartingAmount * (1 + interestRate) where the interest rate here is 100/100 = 1.
*fake data;
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
AB 03/17/2020 100 20
;
run;
data VIRUS_PROLIF1;
set VIRUS_PROLIF;
*assign date to starting date so both are in output;
date=start_date;
*save record to data set;
output;
do until (num_of_cases>200000);
*increment your day;
date=date+1;
;
*doubling rate is represented as a percent so add it to 1 to show the rate;
num_of_cases=num_of_cases*(1+double_rate/100);
*save record to data set;
output;
end;
*control date display;
format date start_date date9.;
run;
*check results;
proc print data=VIRUS_PROLIF1;
run;
The problem 200,000 < N0 (1+R/100) k can be solved for integer k without iterations
day_of_200K = ceil (
LOG ( 200000 / NUM_OF_CASES )
/ LOG ( 1 + R / 100 )
);

Getting next observation per group

I am working on a dataset in SAS to get the next observation's score should be the current observation's value for the column Next_Row_score. If there is no next observation then the current observation's value for the column Next_Row_score should be 'null'per group(ID). For better illustration i have provided the sample below dataset :
ID Score
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
Resultant output should be like -
ID Salary Next_Row_Salary
10 1000 1500
10 1500 2000
10 2000 .
20 3000 4000
20 4000 .
30 2500 2500
Thank you in advance for your help.
data want(drop=_: flag);
merge have have(firstobs=2 rename=(ID=_ID Score=_Score));
if ID=_ID then do;
Next_Row_Salary=_Score;
flag+1;
end;
else if ID^=_ID and flag>=1 then do;
Next_Row_Salary=.;
flag=.;
end;
else Next_Row_Salary=score;
run;
Try this :
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
proc sql noprint;
select count(*) into :obsHave
from have;
quit;
data want2(rename=(id1=ID Score1=Salary) drop=ID id2 Score);
do i=1 to &obsHave;
set have point=i;
id1=ID;
Score1=Score;
j=i+1;
set have point=j;
id2=ID;
if id1=id2 then do;
Next_Row_Salary = Score;
end;
else Next_Row_Salary=".";
output;
end;
stop;
;
run;
There is a simpler (in my mind, at least) proc sql approach that doesn't involve loops:
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
/*count each observation's place in its ID group*/
data have2;
set have;
count + 1;
by id;
if first.id then count = 1;
run;
/*if there is only one ID in a group, keep original score, else lag by 1*/
proc sql;
create table want as select distinct
a.id, a.score,
case when max(a.count) = 1 then a.score else b.score end as score2
from have2 as a
left join have2 (where = (count > 1)) as b
on a.id = b.id and a.count = b.count - 1
group by a.id;
quit;

SAS,DATA PREPARATION

I have 5 columns .The columns are
date
stock[a,b,c,d,.]
qty_in[fixed number as in 10 qty came in for the stock on 1/1/2015]
qty_out[ went out /or got sold]
final_qty(qty_in -qty_out)
There are over 100 stocks and transaction for over 6 months duration,thus for the stocks on each day[for example,qty_in on 2/1/2015 is 10 then it should display the value of qty_in as sum of qty_in on 2/1/2015 +final_qty on 1/1/2015]for the same stock ] . How can i achieve this with sas.
Run this in sas
data testfile;
input date $ 1-10 stock $ 11-16 qty_in $17-20 qty_out $21-23 final_qty $24-26;
datalines;
1/1/2015 a 10 0 10
1/1/2015 b 20 4 16
1/1/2015 c 32 23 9
2/1/2015 a 10 /*this value should be= qty_in(2/1/2015 + final_qty 1/1/2015 i.e. 10+10=20*/
2/1/2015 b 20 /*this should be 20+16=36*/
2/1/2015 c 32
;
if you want to do this in a data step you first need to sort the data set by stock and by date. Also, start with just 4 columns and will compute the final col in the data set:
data stockout5;
set stockin4;
retain FIN_QTY;
by stock date;
if (first.stock) then FIN_QTY = INQTY - OUTQTY;
else FIN_QTY = FIN_QTY + INQTY - OUTQTY;
run;
let me know if this works for you. If you supply some test data with what you are starting with and what you want to end up with it would help. Your question is fine but it's not very clear unless you've worked with financial data before (imo)
From start to finish this should do what you're looking for. It's pretty straight forward let me know if you don't understand something. Note that 0 is added in for missing out values.
Data stock4;
format date date9.;
date = '1jan2015'd;
stock = "a";
in = 10;
out = 0 ;
output;
date = "1jan2015"d;
stock = "b";
in = 20;
out = 4;
output;
date = "1jan2015"d;
stock ="c";
in =32;
out=23;
output;
date="2jan2015"d;
stock = "a";
in = 10;
out=0;
output ;
date="2jan2015"d;
stock ="b";
in = 20;
out=0;
output;
date ="2jan2015"d;
stock = "c";
in=32;
out=0;
output;
run;
proc sort data=stock4;
by stock date;
run;
data stock5;
set stock4;
retain FIN_QTY;
by stock date;
if (first.stock) then FIN_QTY = IN - OUT;
else FIN_QTY = FIN_QTY + IN - OUT;
run;

Calculate maximum difference between grouped rows

I have the following data where people in households are sorted by age (oldest to youngest):
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
I would like to calculate for each household the maximum age difference between consecutively aged people. So this example would give values of 5 (=25-20), 16 (=32-16) and 32 (=42-10) for households 1, 2 and 3 consecutively.
I could do this using lots of merges (i.e. extract person 1, merge with extract of person 2, and so on), but as there can be upto 20+ people in a household I'm looking for a much more direct method.
Here's a two pass solution. Same first step as the two solutions above, sort by age. In the second step keep track of max_diff per row, at the last record of HouseID output the results. This results in only two passes through the data.
proc sort data=houses; by houseid age;run;
data want;
set houses;
by houseID;
retain max_diff 0;
diff = dif1(age)*-1;
if first.HouseID then do;
diff = .; max_diff=.;
end;
if diff>max_diff then max_diff=diff;
if last.houseID then output;
keep houseID max_diff;
run;
proc sort data=houses; by houseid personid age;run;
data _t1;
set houses;
diff = dif1(age) * (-1);
if personid = 1 then diff = .;
run;
proc sql;
create table want as
select houseid, max(diff) as Max_Diff
from _t1
group by houseid;
proc sort data = house;
by houseid descending age;
run;
data house;
set house;
by houseid;
lag_age = lag1(age);
if first.houseid then age_diff = 0;
age_diff = lag_age - age;
run;
proc sql;
select houseid,max(age_diff) as max_age_diff
from house
group by houseid;
quit;
Working:
First sort the data set using houseid and descending Age.
Second data step will calculate difference between current age value (in PDV) and previous age value in PDV. Then, using sql procedure, we can get the max age difference for each houseid.
Just throwing one more into the mix. This one is a condensed version of Reeza's response.
/* No need to sort by PersonID as age is the only concern */
proc sort data = houses;
by HouseID Age;
run;
data want;
set houses;
by HouseID;
/* Keep the diff when a new row is loaded */
retain diff;
/* Only replace the diff if it is larger than previous */
diff = max(diff, abs(dif(Age)));
/* Reset diff for each new house */
if first.HouseID then diff = 0;
/* Only output the final diff for each house */
if last.HouseID;
keep HouseID diff;
run;
Here is an example using FIRST. and LAST. with one pass (after sort) through the data.
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
Proc sort data=HOUSES;
by houseid descending age ;
run;
Data WANT(keep=houseid max_diff);
format houseid max_diff;
retain max_diff age1 age2;
Set HOUSES;
by houseid descending age ;
if first.houseid and last.houseid then do;
max_diff=0;
output;
end;
else if first.houseid then do;
call missing(max_diff,age1,age2);
age1=age;
end;
else if not(first.houseid or last.houseid) then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
age1=age;
end;
else if last.houseid then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
output;
end;
Run;

Sum Vertically for a By Condition

I checked out this previous post (LINK) for potential solution, but still not working. I want to sum across rows using the ID as the common identifier. The num variable is constant. The id and comp the two variables I want to use to creat a pct variable, which = sum of [comp = 1] / num
Have:
id Comp Num
1 1 2
2 0 3
3 1 1
2 1 3
1 1 2
2 1 3
Want:
id tot pct
1 2 100
2 3 0.666666667
3 1 100
Currently have:
proc sort data=have;
by id;
run;
data want;
retain tot 0;
set have;
by id;
if first.id then do;
tot = 0;
end;
if comp in (1) then tot + 1;
else tot + 0;
if last.id;
pct = tot / num;
keep id tot pct;
output;
run;
I use SQL for things like this. You can do it in a Data Step, but the SQL is more compact.
data have;
input id Comp Num;
datalines;
1 1 2
2 0 3
3 1 1
2 1 3
1 1 2
2 1 3
;
run;
proc sql noprint;
create table want as
select id,
sum(comp) as tot,
sum(comp)/count(id) as pct
from have
group by id;
quit;
Hi there is a much more elegant solution to your problem :)
proc sort data = have;
by id;
run;
data want;
do _n_ = 1 by 1 until (last.id);
set have ;
by id ;
tot = sum (tot, comp) ;
end ;
pct = tot / num ;
run;
I hope it is clear. I use sql too because I am new and the DOW loop is rather complicated but in your case its pretty straightforward.