SAS: Logic to subtract two numbers on FIFO basis - sas

I have a problem which I need to implement using SAS PSQL. I have thought of implementing it using macros.
I will try to break my problem down into phases. For the first phase I am struck on how do I iterate over rows accessing each cell values doing some manipulations and calculations on those values. An example is:
For a client "A" a financial institution buys USD 100, USD 400, USD 500 and then for the same client "A" it sells USD 350. Now beginning with USD 350 (considering it was on xx date). My calculation would be:
USD350 - USD 100 = USD 250 (It is not zero so we proceed to next step)
USD 250 - USD 400 = -150 USD (<0 so we stop here and log this value along with date and then calculate aging by subtracting the start date of sale and date of this purchase).
So I am guessing that I have to iterate over values and I am currently unable to do so. Can someone guide me to a sample of iteration (For loop)?
Edit
We are doing this exercise at month end to calculate total unutilized stock .Hence on FIFO (First IN First OUT )basis, first sell amount should get minus with first buy amount based on buy_date for a client . Any residual amount of first buy is to be carried forward for the calculation of next sell transaction (based on buy date).code is given below .but problem is BUY of 1-MAR-2018 i.e 100000 is not getting exhausted by sell amount of 50000 of 18-MAR-2018 and 19-MAR-2018.rather it is moving on next buy amount which is 50000 of 02-mar-2018.
data want;
set sample_2;
by SECURITY_ID;
array d{99999} _temporary_;
array t{99999} _temporary_;
retain count;
if first.SECUIRTY_ID then do;k=-0;SELL=0;BUY=0;count=0;call missing(of d{*} t{*});end;
if B_S='buy' then do;
k+1;
d{k}=buy_date;
t{k}=stock;
end;
if B_S='sell' then do;
SELL+stock;FIFO=d{ifn(count=0,1,count)};
do i=count+1 to k;
BUY+t{i};
if SELL lt BUY then do;count=i;leave;end;
end;
end;
format FIFO date11.;
REM_QTY = SELL - BUY;
drop BUY SELL i k count REM_QTY;
run;

You haven't really explained what you want. If you want to process records in order then use normal DATA step.
Let's make some sample data.
data have ;
length client $5 date 8 type $4 amount 8 ;
input client date type amount ;
informat date yymmdd.;
format date yymmdd10.;
cards;
A 2018-03-01 BUY 100
A 2018-03-02 BUY 400
A 2018-03-03 SELL 350
A 2018-03-04 BUY 500
;
Now let's process in each client in order.
data want ;
set have ;
by client date ;
if first.client then on_hand =0;
if type='BUY' then on_hand+amount;
if type='SELL' then on_hand+-amount;
run;
Result
Obs client date type amount on_hand
1 A 2018-03-01 BUY 100 100
2 A 2018-03-02 BUY 400 500
3 A 2018-03-03 SELL 350 150
4 A 2018-03-04 BUY 500 650

Related

SAS - Cumulative sum with date range and conditions

The following is an example of the data I have
startdate
enddate
amount
1/1/2010
2/2/2020
10
1/5/2011
2/3/2015
10
1/3/2012
2/2/2023
10
1/4/2013
2/2/2014
10
5/5/2015
2/2/2028
10
1/6/2016
2/2/2032
10
I want to calculate the sum of all existing amounts as of each start date so it should look like this:
startdate
amount
1/1/2010
10
1/5/2011
20
1/3/2012
30
1/4/2013
40
5/5/2015
30
1/6/2016
40
How do I do this in SAS?
Essentially what I want to do is for each of the start dates, calculate the cumulative sum of any amounts that haven't expired. So for the first four dates, it is just a running cumulative sum because none of the amounts have expired. But at 5/5/2015, two of the previous amounts have expired hence a cumulative sum of 30. Same for the last date, where the same two have previously expired and you have the additional amount as of 1/6/2016 therefore 40.
One way to accomplish this is with a self-join via Proc SQL:
proc sql;
create table out_dset as
select a.startdate, sum(a.amount) as amount
from in_dset as a left join in_dset as b
on a.startdate >= b.startdate and a.startdate < b.enddate
group by a.startdate
order by a.startdate;
quit;
For each observation in the original dataset, this code will find observations in the same dataset that meet the date range criteria and will sum up the amount column.
You can change the second comparison operator from < to <= if you want to include situations when a previous amount expired on the same date as a given startdate.

Summing a Column By Group In a Dataset With Macros

I have a dataset that looks like:
Month Cost_Center Account Actual Annual_Budget
June 53410 Postage 13 234
June 53420 Postage 0 432
June 53430 Postage 48 643
June 53440 Postage 0 917
June 53710 Postage 92 662
June 53410 Phone 73 267
June 53420 Phone 103 669
June 53430 Phone 90 763
...
I would like to first sum the Actual and Annual columns, respectively and then create a variable where it flags if the Actual extrapolated for the entire year is greater than than Annual column.
I have the following code:
Data Test;
set Combined;
%All_CC; /*MACRO TO INCLUDE ALL COST CENTERS*/
%Total_Other_Expenses;/*MACRO TO INCLUDE SPECIFIC Account Descriptions*/
Sum_Actual = sum(Actual);
Sum_Annual = sum(Annual_Budget);
Run_Rate = Sum_Actual*12;
if Run_Rate > Sum_Annual then Over_Budget_Alarm = 1;
run;
However, when I run this code, it does not sum by group, for example, this is the output I get:
Account_Description Sum_Actual Sum_Annual Run_Rate Over_Budget_Alarm
Postage 13 234 146
Postage 0 432 0
Postage 48 643 963 1
Postage 0 917 0
Postage 92 662 634 1
I'm looking for output where all the 'postage' are summed for Actual and Annual, leaving just one row of data.
Use PROC MEANS to summarize the data
Use a data step and IF/THEN statement to create your flags.
proc means data=have N SUM NWAY STACKODS;
class account;
var amount annual_budget;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
run;
data want;
set summary_stats;
if sum_actual > sum_annual_budget then flag=1;
else flag=0;
run;
SAS DATA step behavior is quite complex ("About DATA Step Execution" in SAS Language Reference: Concepts). The default behavior, that you're seeing, is: at the end of each iteration (i.e. for each input row) the row is written to the output data set, and the PDV - all data step variables - is reset.
You can't expect to write Base SAS "intuitively" without spending a few days learning it first, so I recommend using PROC SQL, unless you have a reason not to.
If you really want to aggregate in data step, you have to use something called BY groups processing: after ensuring the input data set is sorted by the BY vars, you can use something like the following:
data Test (keep = Month Account Sum_Actual Sum_Annual /*...your Run_Rate and Over_Budget_Alarm...*/);
set Combined; /* the input table */
by Month Account; /* must be sorted by these */
retain Sum_Actual Sum_Annual; /* don't clobber for each input row */
if first.account then do; /* instead do it manually for each group */
Sum_Actual = 0;
Sum_Annual = 0;
end;
/* accumulate the values from each row */
Sum_Actual = sum(Sum_Actual, Actual);
Sum_Annual = sum(Sum_Annual, Annual_Budget);
/* Note that Sum_Actual = Sum_Actual+Actual; will not work if any of the input values is 'missing'. */
if last.account then do;
/* The group has been processed.
Do any additional processing for the group as a whole, e.g.
calculate Over_Budget_Alarm. */
output; /* write one output row per group */
end;
run;
Proc SQL can be very effective for understanding aggregate data examination. With out seeing what the macros do, I would say perform the run rate checks after outputting data set test.
You don't show rows for other months, but I must presume the annual_budget values are constant across all months -- if so, I don't see a reason to ever sum annual_budget; comparing anything to sum(annual_budget) is probably at the incorrect time scale and not useful.
From the show data its hard to tell if you want to know any of these
which (or if some) months had a run_rate that exceeded the annual_budget
which (or if some) months run_rate exceeded the balance of annual_budget (i.e. the annual_budget less the prior months expenditure)
Presume each row in test is for a single year/month/costCenter/account -- if not the underlying data would have to be aggregated to that level.
Proc SQL;
* retrieve presumed constant annual_budget values from data;
* this information might (should) already exist in another table;
* presume constant annual budget value at each cost center | account combination;
* distinct because there are multiple months with the same info;
create table annual_budgets as
select distinct Cost_Center, Account, Annual_Budget
from test;
create table account_budgets as
select account, sum(annual_budget) as annual_budget
from annual_budgets
group by account;
* flag for some run rate condition;
create table annual_budget_mon_runrate_check as
select
2019 as year,
account,
sum(actual) as yr_actual, /* across all month/cost center */
min (
select annual_budget from account_budgets as inner
where inner.account = outer.account
) as account_budget,
max (
case when actual * 12 > annual_budget then 1 else 0 end
) as
excessive_runrate_flag label="At least one month had a cost center run rate that would exceed its annual_budget")
from
test as outer
group by
year, account;
You can add a where clause to restrict the accounts processed.
Changing the max to sum in the flag computation would return the number of cost center months with excessive run rates.

Computing moving average in SAS

I'm trying to use SAS to compute a moving average for x number of periods that uses forecasted values in the calculation. For example if I have a data set with ten observations for a variable, and I wanted to do a 3-month moving average. The first forecast value should be an average of the last 3 observations, and the second forecast value should be an average of the last two observations, and the first forecast value.
If you have for example data like this:
data input;
infile datalines;
length product $10 period value 8;
informat period yymmdd10.;
format period yymmdd10.;
input product $ period value;
datalines;
car 2016-01-01 10
car 2015-12-01 20
car 2015-11-01 30
car 2015-10-01 40
car 2015-09-01 30
car 2015-08-01 15
;
run;
You can left join input table itself with a condition:
input t1 left join input t2
on t1.product = t2.product
and t2.period between intnx('month',t1.period,-2,'b') and t1.period
group by t1.product, t1.period, t1.value
With this you have t1.value as current value and avg(t2.value) as 3 months avg. To compute 2 months avg change every value that is older then previos period to missing value with ifn() function:
avg(ifn( t2.period >= intnx('month',t1.period,-1,'b'),t2.value,. ))
Full code could looks like this:
proc sql;
create table want as
select t1.product, t1.period, t1.value as currentValue,
ifn(count(t2.period)>1,avg(ifn( t2.period >= intnx('month',t1.period,-1,'b'),t2.value,. )),.) as twoMonthsAVG,
ifn(count(t2.period)>2,avg(t2.value),.) as threeMonthsAVG
from input t1 left join input t2
on t1.product = t2.product
and t2.period between intnx('month',t1.period,-2,'b') and t1.period
group by t1.product, t1.period, t1.value
;
quit;
I've also added count(t2.perion) condition to return missing values if I haven't got enough records to compute measure. My result set looks like this:

Extracting data in SAS

I have a col Test_Name in which I have a total of 15000 records. In this col there are in total 14 distinct variables which have different counts. For eg Test name A has 347 counts B has 1500 C has 233 count D has 40 and E has 12 counts etc resp.
Now i want that where ever the count is >100 I should get random 100 records for a specific test or if I get the first 100 records for each test which has a count >100 would be just fine in either case.
how can i do that in SAS? An early response would be appreciated.
Here is away you can just get the data from 100th record in the keep dataset
proc sort data=test;
by test_Name;
run;
data new keep;
set test;
by Test_Name;
if first.Test_Name then n=0;
n+1;
if n=99 then output keep;
run;

sas aggregate weekly, monthly

For the data set below(actual one is several thousand row long) I would like SAS to aggregate the income daily (many income lines everyday per machine), weekly, monthly (start of week is Monday, Start of month is 01 in any given year) by the machine. Is there a straight forward code for this? Any help is appreciated.
MachineNo Date income
1 01Jan2012 1500
1 02Jan2012 2000
1 27Aug2012 300
2 02Jan2012 1200
2 15Jun2012 50
3 03Mar2012 1000
4 08Apr2012 500
proc expand and proc timeseries are excellent tools for accumulation and aggregation to different frequencies of series. You can combine both with by-group processing to convert to any time period that you need.
Step 1: Sort by MachineNo and Date
proc sort data=want;
by MachineNo Date;
run;
Step 2: Find the min/max end dates of your series for date alignment
The format=date9. statement is important. For whatever reason, some SAS/ETS and HPF procedures require date literals for certain arguments.
proc sql noprint;
select min(date) format=date9.,
max(date) format=date9.
into :min_date,
:max_date
from have;
quit;
Step 3: Align each MachineNo by start/end date, and accumulate days per MachineNo
The below code will get you aligned daily accumulation, remove duplicate days per machine, and set Income on any missing days to 0. This step will also guarantee that your series has equal time intervals per by-group, allowing you to run hierarchical time-series analyses without violating the equal-spaced interval assumption.
proc timeseries data=have
out=want_day;
by MachineNo;
id date interval=day
align=both
start="&min_date"d
end="&max_date"d;
var income / accumulate=total setmiss=0;
run;
Step 4: Aggregate aligned Daily to Weekly shifted by 1 day, Monthly
SAS time intervals are able to be both multiplied and shifted. Since the standard weekday starts on a Sunday, we want to shift by 1 day to have it start on a Monday.
Standard Week
2 3 4 5 6 7 1
Mon Tue Wed Thu Fri Sat Sun
Shifted
1 2 3 4 5 6 7
Mon Tue Wed Thu Fri Sat Sun
Intervals follow the format:
TimeInterval<Multiplier>.<Shift>
The standard shift interval is 1. For all intents and purposes, consider 1 as 0: 1 means it's unshifted. 2 means it's shifted by 1 period. Thus, for a week to start on a Monday, we want to use the interval Week.2.
proc expand data=want_day
out=want_week
from=day
to=week.2;
id date;
convert income / method=aggregate observed=total;
run;
Step 5: Convert Week to Month
proc expand data=want_week
out=want_month
from=week.2
to=month;
id date;
convert income / method=aggregate observed=total;
run;
In case you don't have a license for SAS/ETS here's another way.
For the monthly data you can format the date in a proc means output.
I think WeekW. starts on Monday but it may not be in a format you want, so you'll need to create a new variable for week first if you wanted to use this method.
proc means data=have nway noprint;
class machineno date;
format date monyy7.;
var income;
output out=want sum(income)=income;
run;