I have a dataset with many dates in them. I want to categorize these dates into a new column that organizes them by decade (1980s, 1990s, etc).
I have a good idea on how to use IF, AND, and ELSE statements to accomplish this, but I don't know how to have SAS extract the year and only the year from the date to apply it to the conditional logic.
You could always use multipliers and the intnx() function as well.
Using Allan's sample data...
data want;
set have;
decade=year(intnx('year10.',dateval,0,'beginning'));
run;
The intnx() part of the code returns the date corresponding to the start of the decade, then we just take the year portion from it.
The year10. parameter tells it we want to work with decades, the 0 parameter means shift the date supplied to the current decade, and the beginning parameter tells it to return the date corresponding to the beginning of the decade.
If you're not familiar with using intnx() to perform date calculations in SAS see here for a quick primer: https://stackoverflow.com/a/11211180/214994
No need for conditional logic - can use a combination of the year() and floor() functions with some simple arithmetic:
data have;
infile cards;
input dateval date9.;
cards;
01JAN2004
08FEB1996
07MAR1987
14SEP1982
;run;
data want;
set have;
decade=floor(year(dateval)/10)*10;
run;
Which gives:
Related
I have a set of data where I need to calculate the difference between each month but I am not sure where to start or how to do it. The as of dates will constantly change. At the end of this month March would be added and so on and so forth. Any help would be greatly appreciated.
First image is the data. Second is the output I need to achieve.
Use DIF()
data want;
set have;
difference = dif(total_amount);
run;
See the documentation here for further information.
Alternately:
data want;
set have;
retain prevmonth;
difference = total_amount - prevmonth;
output;
prevmonth = total_amount;
run;
These kinds of data step exercises are good to practice alongside knowing useful wrappers like dif. retain and output are general and powerful tools in the SAS programmer's library.
I have to perform a number of processing for past 2 years (2017-2019) within SAS for every month.
I have a job that uses a YYMMDD parameter to indicate what data should be used from the data warehouse.
Lets say i have a table with JOB_NAME and JOB_DATE columns along with conditions (based on rc value the next one will start or not).
How can i tell SAS to take the date parameter from a certain column in a certain table?
You can just format after loading dataset...
data Want;
set have;
format JOB_DATE YYMMDD10. /*or use YYMMDD8.*/;
run;
I have a variable called post_time which is of character type
Type : Character
Length : 5
Foramt : $5.
Informat : $5.
eg: 00300 ,01250
How do I get time from this? can someone pls help me out?
need the time to look like 03:00AM 12:50PM
Need to display all the time in standard time zone
SAS will work directly with this via the HHMMSS informat.
data _null_;
x = input('02100',HHMMSS5.);
put x= timeampm9.;
run;
Any time zone concerns can be handled using either a time zone sensitive format, such as NLDATMTZ., and/or the TZONES2U or TZONEU2S functions, which work with DATETIME but may be able to work with your time values if you never go across a date boundary (though using timezones that's risky).
See the SAS Documentation page on Timezones for a more detailed explanation.
Assuming that the first three digits are hours and last two are minutes this may work:
data have;
input time $;
cards;
00300
00400
00234
;
run;
data want;
set have;
time_var=hms(
input(substr(time, 1,3), 3.),
input(substr(time, 4,2), 2.),
0);
format time_var timeampm8.;
run;
I have this number 90724 which is actually 24/07/2009, how can output the number to that date format. Another example 100821 should 21/08/2010
data want
set testData;
format date ddmmyyyy.;
run;
Cheers
You really need to learn about Informats. Another good introductory source is this UCLA site
What you really needed to specify is the format in which you have the dates - using the informat yymmdd6. It uses a YearCutOff option to determine which century a 2-digit year falls into see Adjusting Dates in a New Century & YEARCUTOFF= System Option
Note: The default is 1920 which spans the 100-year range between 1920 and 2019 - if your dates are outside this range then set the appropritate cutoff value using OPTIONS YEARCUTOFF=nnnn;
data test;
dateNumber=100821; ProperDate=input(put(dateNumber,6.), yymmdd6.); output;/*ProperDate= 21AUG2010*/
dateNumber=90724; ProperDate=input(put(dateNumber,6.) , yymmdd6.); output;/*ProperDate=24JUL2009*/
format ProperDate date9.;
run;
i have 1 million + rows of data and on of the columns is channel_name. The people collecting the data didn't seem to care that they entered one channel in about 10 different variations, lots of which contain the # symbol. Google search isn't giving me any decent documentation, can anyone direct me to something useful?
To some extent the answer has to be, "it depends". Your actual data will determine the best solution to this; and there may not be one true solution - you may have to try a few things, and there may well be more manual work than you'd like.
One option is to build a format based on what you see. That format can either convert various values to one consistent value, or convert to a numeric category (which is then overlaid with a format that shows the consistent value).
For example, you might have 'channel' as retail store:
data have;
infile datalines truncover;
input #1 channel $8.;
datalines;
Best Buy
BestBuy
BB
;;;;
run;
So you can do one of two things:
proc format;
value $channel
"Best Buy","BB","BestBuy" = "Best Buy";
quit;
data want;
set have;
channel_coded = put(channel,$channel.);
run;
Or you can do:
proc format;
invalue channeli
"Best Buy", "BB","BestBuy" = 1
;
value channelf
1 = "Best Buy"
;
quit;
data want;
set have;
channel_coded = input(channel,CHANNELI.);
format channel_coded channelf.;
run;
Which you do is largely up to you - the latter gives you more flexibility in the long run, for example when Sears and K-Mart merged, it would be somewhat to take 2 and 16 and format then as Sears, than to change the stored values for the character format - and even easier to roll back if/when KMart splits off again.
This does require some manual work, though; you have to code things by hand here, or develop some method for figuring out what the coding is. You can use the other option in proc format to easily identify new values and add them to the format (which can be derived from a dataset, instead of hand written code), but at the end of the day the actual values you have determine what solution is best for the actual work of determining what is "Best Buy", and a by-hand solution (each time a new value comes in, it is looked at by a person and coded) may ultimately be the best.