SAS,DATA PREPARATION - sas

I have 5 columns .The columns are
date
stock[a,b,c,d,.]
qty_in[fixed number as in 10 qty came in for the stock on 1/1/2015]
qty_out[ went out /or got sold]
final_qty(qty_in -qty_out)
There are over 100 stocks and transaction for over 6 months duration,thus for the stocks on each day[for example,qty_in on 2/1/2015 is 10 then it should display the value of qty_in as sum of qty_in on 2/1/2015 +final_qty on 1/1/2015]for the same stock ] . How can i achieve this with sas.
Run this in sas
data testfile;
input date $ 1-10 stock $ 11-16 qty_in $17-20 qty_out $21-23 final_qty $24-26;
datalines;
1/1/2015 a 10 0 10
1/1/2015 b 20 4 16
1/1/2015 c 32 23 9
2/1/2015 a 10 /*this value should be= qty_in(2/1/2015 + final_qty 1/1/2015 i.e. 10+10=20*/
2/1/2015 b 20 /*this should be 20+16=36*/
2/1/2015 c 32
;

if you want to do this in a data step you first need to sort the data set by stock and by date. Also, start with just 4 columns and will compute the final col in the data set:
data stockout5;
set stockin4;
retain FIN_QTY;
by stock date;
if (first.stock) then FIN_QTY = INQTY - OUTQTY;
else FIN_QTY = FIN_QTY + INQTY - OUTQTY;
run;
let me know if this works for you. If you supply some test data with what you are starting with and what you want to end up with it would help. Your question is fine but it's not very clear unless you've worked with financial data before (imo)

From start to finish this should do what you're looking for. It's pretty straight forward let me know if you don't understand something. Note that 0 is added in for missing out values.
Data stock4;
format date date9.;
date = '1jan2015'd;
stock = "a";
in = 10;
out = 0 ;
output;
date = "1jan2015"d;
stock = "b";
in = 20;
out = 4;
output;
date = "1jan2015"d;
stock ="c";
in =32;
out=23;
output;
date="2jan2015"d;
stock = "a";
in = 10;
out=0;
output ;
date="2jan2015"d;
stock ="b";
in = 20;
out=0;
output;
date ="2jan2015"d;
stock = "c";
in=32;
out=0;
output;
run;
proc sort data=stock4;
by stock date;
run;
data stock5;
set stock4;
retain FIN_QTY;
by stock date;
if (first.stock) then FIN_QTY = IN - OUT;
else FIN_QTY = FIN_QTY + IN - OUT;
run;

Related

Using do loops in sas

Assume you have a data file called VIRUS_PROLIF from an infectious disease research center. Each observation has 3 variables COUNTRY START_DATE, and DOUBLE_RATE, where START_DATE is the date that the Country registered its 100th case of COVID-19. For each country, DOUBLE_RATE is the number of days it takes for the number of cases to double in that country. Write the SAS code using DO UNTIL to calculate the date at which that Country would be predicted to register 200,000 cases of COVID-19.
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate ;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
;
run;
data VIRUS_PROLIF1 (drop=start_date);
set VIRUS_PROLIF;
do until (num_of_cases>200000);
double_rate+1;
num_of_cases+ (num_of_cases*1);
end;
run;
proc print data=VIRUS_PROLIF1;
run;
The key concept you're missing here is how to employ the growth rate. That would be using the following formula, similar to interest growth for money.
If you have one dollar today and you get 100% interest it becomes
StartingAmount * (1 + interestRate) where the interest rate here is 100/100 = 1.
*fake data;
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
AB 03/17/2020 100 20
;
run;
data VIRUS_PROLIF1;
set VIRUS_PROLIF;
*assign date to starting date so both are in output;
date=start_date;
*save record to data set;
output;
do until (num_of_cases>200000);
*increment your day;
date=date+1;
;
*doubling rate is represented as a percent so add it to 1 to show the rate;
num_of_cases=num_of_cases*(1+double_rate/100);
*save record to data set;
output;
end;
*control date display;
format date start_date date9.;
run;
*check results;
proc print data=VIRUS_PROLIF1;
run;
The problem 200,000 < N0 (1+R/100) k can be solved for integer k without iterations
day_of_200K = ceil (
LOG ( 200000 / NUM_OF_CASES )
/ LOG ( 1 + R / 100 )
);

How do I pass values through variables to a macro in sas

Here is my code
%macro redemptions1(startdate, enddate, sd, ed, sunday1, sunday2);
data _null_;
%put &startdate;
run;
%mend redemptions1;
data _null_;
format tday date9.;
format sd date9.;
format ed date9.;
tday=today();
if weekday(tday) = 1 then do; ed = intnx('day',tday,-9); sd = intnx('day',tday,-15);end;
if weekday(tday) = 2 then do; ed = intnx('day',tday,-3); sd = intnx('day',tday,-9);end;
if weekday(tday) = 3 then do; ed = intnx('day',tday,-4); sd = intnx('day',tday,-10);end;
if weekday(tday) = 4 then do; ed = intnx('day',tday,-5); sd = intnx('day',tday,-11);end;
if weekday(tday) = 5 then do; ed = intnx('day',tday,-6); sd = intnx('day',tday,-12);end;
if weekday(tday) = 6 then do; ed = intnx('day',tday,-7); sd = intnx('day',tday,-13);end;
if weekday(tday) = 7 then do; ed = intnx('day',tday,-8); sd = intnx('day',tday,-14);end;
startdate = (year(sd) - 1900) * 10000 + month(sd) * 100 + day(sd);
enddate = (year(ed) - 1900) * 10000 + month(ed) * 100 + day(ed);
sunday1 = year(intnx('day',sd,-6))*10000+month(intnx('day',sd,-6))*100+day(intnx('day',sd,-6));
sunday2 = year(intnx('day',sd,1))*10000+month(intnx('day',sd,1))*100+day(intnx('day',sd,1));
%redemptions1(startdate,enddate,sd,ed,sunday1,sunday2);
run;
If i pass values through the variables startdate,enddate etc, The redemeptions1 macro just prints 'startdate' instead of actually printing the value of startdate. How do I get it to print the value contained in the variable(s)?
Thanks!
You need to construct a call to the macro as a text string and then tell SAS to execute it using either CALL EXECUTE or the DOSUBL function.
The parameters on a macro call need to be literal text giving the values you want. Your call in the data step starts %redemptions1(startdate,... so the first parameter is the literal text startdate and that's what the macro prints. Instead, you could do something like:
myCall = '%redemptions1(' || startdate || ')';
call execute(myCall);
This construct the necessary call - something like %redemptions1(09MAR2017) - and then executes it. You could of course do this in one line:
call execute('%redemptions1(' || startdate || ')');
You'll need to fill in the values of the other parameters, of course.
Your date calculations look a bit sketchy, by the way - startdate and enddate may not contain the values you think they do. Please look up the dhms function to see if that might help. You're creating a number like '1170309' for today's date - 1 million, 170 thousand, 3 hundred and 9. The year value is very odd - do you really want 117 (2017 - 1900)?. If you ask SAS to handle that value as a date, it will treat it as a number of days since 01JAN1960, which would be some date way in the future.

How can I select the first and last week of each month in SAS?

I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.

How to update previous retained rows in SAS / if condition?

I have a database like this. This corresponds to a single person and I have this type of data for multiple persons.
data test;
input date YYMMDD10. real_length min_length;
format date YYMMDD10.;
cards;
2000-02-23 1 7
2000-02-24 12 15
2000-03-07 15 7
2000-03-22 7 15
2000-03-29 13 7
2000-04-11 17 7
2000-04-28 . 7
run;
What I am looking for is : if the interval between 2 dates in consecutive lines (real_length) is inferior to a certain length (min_length), I want to replace the date in the next line by the previous date + min_length. So far, this is not a problem and here is the code I used to achieve it:
data test2;
set test;
format lagdate min_date YYMMDD10.;
retain lagmin lagdate;
if lag(real_length) < lag(min_length) and lag(real_length) ~= . then min_date = lagdate + lagmin;
else min_date = date;
lagdate = min_date;
lagmin = min_length;
run;
Which gives :
date min_date min_length
2000-02-23 2000-02-23 7
2000-02-24 2000-03-01 15
2000-03-07 2000-03-16 7
2000-03-22 2000-03-22 15
...
The problem is that now the interval between 2 consecutive dates could become less than the minimal length, e.g. : 2000-03-22 - 2000-03-16 = 6 days < min_length = 7. And I would like to have 2000-03-23 = 2000-03-16 + 7 (=min_length) instead of 2000-02-22 like this:
date min_date min_length
2000-02-23 2000-02-23 7
2000-02-24 2000-03-01 15
2000-03-07 2000-03-16 7
2000-03-22 2000-03-23 15
...
So I've tried this code, but it does not work... I believe the problem could be in the if condition.
data test2;
set test;
format lagdate min_date YYMMDD10.;
retain lagmin lagdate;
if (lag(real_length) < lag(min_length) and lag(real_length) ~= .) or (adjust_length < lag(min_length) and adjust_length ~=.) then min_date = lagdate + lagmin;
else min_date = date;
adjust_length = min_date - lagdate;
lagdate = min_date;
lagmin = min_length;
run;
Does anybody see why this isn't working or do you hve another way of doing this?
Thank you!
The problem is that each time you adjust one date, you have to move all the subsequent dates as well if they're bunched up together. I think you can do this by keeping a running total of how many days you've added on to all the previous rows and then adding on only what's needed after that to get to the min_length between dates:
data want;
set test;
format t_min_date min_date yymmdd10.;
if _n_ = 1 then total_adj = 0;
t_min_date = date + min_length + total_adj;
min_date = lag1(t_min_date);
total_adj + max(0,min_length - real_length);
run;
Is that what you were aiming for?
N.B. you'll need to replace the if _n_ = 1 with some first.id and last.id logic to make this work for multiple individuals in the same dataset.

Which months are included in a date range?

I have a dataset with from and to dates of registration for a group of users. I would like to programmatically find which months lie in between those dates for each user, without having to hard code in any months, etc. I only want a summary of numbers registered in each month, so if that makes it quicker, so much the better.
E.g. I have something like
User-+-From-------+-To-----------------
A + 11JAN2011 + 15MAR2011
A + 16JUN2011 + 17AUG2011
B + 10FEB2011 + 12FEB2011
C + 01AUG2011 + 05AUG2011
And I want something like
Month---+-Registrations
JAN2011 + 1 (A)
FEB2011 + 2 (AB)
MAR2011 + 1 (A)
APR2011 + 0
MAY2011 + 0
JUN2011 + 1 (A)
JUL2011 + 1 (A)
AUG2011 + 2 (AC)
Note I don't need the bit in brackets; that was just to try and clarify my point.
Thanks for any help.
One easy way is to construct an intermediate dataset and then PROC FREQ.
data have;
informat from to DATE9.;
format from to DATE9.;
input user $ from to;
datalines;
A 11JAN2011 15MAR2011
A 16JUN2011 17AUG2011
B 10FEB2011 12FEB2011
C 01AUG2011 05AUG2011
;;;;
run;
data int;
set have;
_mths=intck('month',from,to,'d'); *number of months after the current one (0=current one). 'd'=discrete=count 1st of month as new month;
do _i = 0 to _mths; *start with current month, iterate over months;
month = intnx('month',from,_i,'b');
output;
end;
format month MONYY7.;
run;
proc freq data=int;
tables month/out=want(keep=month count rename=count=registrations);
run;
You can eliminate the _mths step by doing that in the do loop.