Find difference of amount between each date - sas

I have a set of data where I need to calculate the difference between each month but I am not sure where to start or how to do it. The as of dates will constantly change. At the end of this month March would be added and so on and so forth. Any help would be greatly appreciated.
First image is the data. Second is the output I need to achieve.

Use DIF()
data want;
set have;
difference = dif(total_amount);
run;
See the documentation here for further information.

Alternately:
data want;
set have;
retain prevmonth;
difference = total_amount - prevmonth;
output;
prevmonth = total_amount;
run;
These kinds of data step exercises are good to practice alongside knowing useful wrappers like dif. retain and output are general and powerful tools in the SAS programmer's library.

Related

How to categorize a date by decade in SAS

I have a dataset with many dates in them. I want to categorize these dates into a new column that organizes them by decade (1980s, 1990s, etc).
I have a good idea on how to use IF, AND, and ELSE statements to accomplish this, but I don't know how to have SAS extract the year and only the year from the date to apply it to the conditional logic.
You could always use multipliers and the intnx() function as well.
Using Allan's sample data...
data want;
set have;
decade=year(intnx('year10.',dateval,0,'beginning'));
run;
The intnx() part of the code returns the date corresponding to the start of the decade, then we just take the year portion from it.
The year10. parameter tells it we want to work with decades, the 0 parameter means shift the date supplied to the current decade, and the beginning parameter tells it to return the date corresponding to the beginning of the decade.
If you're not familiar with using intnx() to perform date calculations in SAS see here for a quick primer: https://stackoverflow.com/a/11211180/214994
No need for conditional logic - can use a combination of the year() and floor() functions with some simple arithmetic:
data have;
infile cards;
input dateval date9.;
cards;
01JAN2004
08FEB1996
07MAR1987
14SEP1982
;run;
data want;
set have;
decade=floor(year(dateval)/10)*10;
run;
Which gives:

using by group processing First. and Last

I just start learning sas and would like some help with understanding the following chunk of code. The following program computes the annual payroll by department.
proc sort data = company.usa out=work.temp;
by dept;
run;
data company.budget(keep=dept payroll);
set work.temp;
by dept;
if wagecat ='S' then yearly = wagrate *12;
else if wagecat = 'H' then yearly = wagerate *2000;
if first.dept then payroll=0;
payroll+yearly;
if last.dept;
run;
Questions:
What does out = work.temp do in the first line of this code?
I understand the data step created 2 temporary variables for each by variable (first.varibale/last.variable) and the values are either 1 or 0, but what does first.dept and last.dept exactly do here in the code?
Why do we need payroll=0 after first.dept in the second to the last line?
This code takes the data for salaries and calculates the payroll amount for each department for a year, assuming salary is the same for all 12 months and that an hourly worker works 2000 hours.
It creates a copy of the data set which is sorted and stored in the work library. RTM.
From the docs
OUT= SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC SORT creates it.
CAUTION:
Use care when you use PROC SORT without OUT=.
Without the OUT= option, PROC SORT replaces the original data set with the sorted observations when the procedure executes without errors.
Default Without OUT=, PROC SORT overwrites the original data set.
Tips With in-database sorts, the output data set cannot refer to the input table on the DBMS.
You can use data set options with OUT=.
See SAS Data Set Options: Reference
Example Sorting by the Values of Multiple Variables
First.DEPT is an indicator variable that indicates the first observation of a specific BY group. So when you encounter the first record for a department it is identified. Last.DEPT is the last record for that specific department. It means the next record would the first record for a different department.
It sets PAYROLL to 0 at the first of each record. Since you have if last.dept; that means that only the last record for each department is outputted. This code is not intuitive - it's a manual way to sum the wages for people in each department. The common way would be to use a summary procedure, such as MEANS/SUMMARY but I assume they were trying to avoid having two passes of the data. Though if you're not sorting it may be just as fast anyways.
Again, RTM here. The SAS documentation is quite thorough on these beginner topics.
Here's an alternative method that should generate the exact same results but is more intuitive IMO.
data temp;
set company.usa;
if wagecat='S' then factor=12; *salary in months;
else if wagecat='H' then factor=2000; *salary in hours;
run;
proc means data=temp noprint NWAY;
class dept;
var wagerate;
weight factor;
output out=company.budget sum(wagerate)=payroll;
run;

convert character variable to time in SAS

I have a variable called post_time which is of character type
Type : Character
Length : 5
Foramt : $5.
Informat : $5.
eg: 00300 ,01250
How do I get time from this? can someone pls help me out?
need the time to look like 03:00AM 12:50PM
Need to display all the time in standard time zone
SAS will work directly with this via the HHMMSS informat.
data _null_;
x = input('02100',HHMMSS5.);
put x= timeampm9.;
run;
Any time zone concerns can be handled using either a time zone sensitive format, such as NLDATMTZ., and/or the TZONES2U or TZONEU2S functions, which work with DATETIME but may be able to work with your time values if you never go across a date boundary (though using timezones that's risky).
See the SAS Documentation page on Timezones for a more detailed explanation.
Assuming that the first three digits are hours and last two are minutes this may work:
data have;
input time $;
cards;
00300
00400
00234
;
run;
data want;
set have;
time_var=hms(
input(substr(time, 1,3), 3.),
input(substr(time, 4,2), 2.),
0);
format time_var timeampm8.;
run;

editing categorical data for uniformity

i have 1 million + rows of data and on of the columns is channel_name. The people collecting the data didn't seem to care that they entered one channel in about 10 different variations, lots of which contain the # symbol. Google search isn't giving me any decent documentation, can anyone direct me to something useful?
To some extent the answer has to be, "it depends". Your actual data will determine the best solution to this; and there may not be one true solution - you may have to try a few things, and there may well be more manual work than you'd like.
One option is to build a format based on what you see. That format can either convert various values to one consistent value, or convert to a numeric category (which is then overlaid with a format that shows the consistent value).
For example, you might have 'channel' as retail store:
data have;
infile datalines truncover;
input #1 channel $8.;
datalines;
Best Buy
BestBuy
BB
;;;;
run;
So you can do one of two things:
proc format;
value $channel
"Best Buy","BB","BestBuy" = "Best Buy";
quit;
data want;
set have;
channel_coded = put(channel,$channel.);
run;
Or you can do:
proc format;
invalue channeli
"Best Buy", "BB","BestBuy" = 1
;
value channelf
1 = "Best Buy"
;
quit;
data want;
set have;
channel_coded = input(channel,CHANNELI.);
format channel_coded channelf.;
run;
Which you do is largely up to you - the latter gives you more flexibility in the long run, for example when Sears and K-Mart merged, it would be somewhat to take 2 and 16 and format then as Sears, than to change the stored values for the character format - and even easier to roll back if/when KMart splits off again.
This does require some manual work, though; you have to code things by hand here, or develop some method for figuring out what the coding is. You can use the other option in proc format to easily identify new values and add them to the format (which can be derived from a dataset, instead of hand written code), but at the end of the day the actual values you have determine what solution is best for the actual work of determining what is "Best Buy", and a by-hand solution (each time a new value comes in, it is looked at by a person and coded) may ultimately be the best.

How to rename variables dynamically in sas?

I have a sas data set. In it i have some variables following a pattern
-W 51 Sales
-W 52 Sales
-W 53 Sales
and so on.
Now i want to rename all of these variables dynamically such that W 51 is replaced by starting date of that week and the new name becomes - 5/2/2013 Sales?
The reason i want to rename them is that i have sales data of all the 53 weeks in an year and the data set would be eassier for me to understand if i had the starting date of a week instead of W(week_no) Sales as a variable name
Is there any way i can do that in sas?
You really don't want to rename your variables. You may think you do, but it'll just bite you eventually.
What you can do instead is give them descriptive labels. This can be done via proc datasets.
proc datasets library=<lib>;
modify <dataset>;
label <variable> '5/2/2013 sales';
run;
Just for fun lets assume you want to do this anyway -- Safest thing to do is just create a copy of the dataset for your output...
this code assumes your variable names are named like w1_sales and output names are going to be renamed to 03JAN2013_sale or something like that.
data newDataSet;
set oldDataSet;
%MACRO rename_vars(mdataset,year);
data &mdataset.;
set &mdataset.;
%do i = 1 %to 53;
%let weekStartDate = %sysfunc(intnx('week&i','01jan&year.'d,0)); %*returns the starting day of week(i) uses sunday as starting date. If you want monday use 0.1 as last param;
%let weekstartDateFormatted = %sysfunc(putn(&weekStartDate.,DATE.)) %*formats into ddMONyyy. substitute whatever format you want;
rename w&i._Sale = &weekstartDateFormatted ._SALES;
%end;
run;
%MEND rename_vars;
%rename_vars(newDataSet,2013);
I don't have time to test this right now, so sommebody let me know if I screwed it up somewhere. This should at least get you going though. Or you can send me or post some code to read a small sample dataset (obviously if this is possible without having to share some proprietary info. You might have to genericize it a bit) with those vars like that and I'll debug it.