In test_1 table, the my_date field is a "DATE9." format.
I would like to convert it to a pure numeric format (number length 8) which is of the form YYYYMMDD.
I would also like to do this in a proc sql statement ideally.
Here's what I have so far.
Clearly I need something to manipulate the my_date field.
rsubmit;
proc sql;
CREATE TABLE test_2 AS
SELECT
my_date
FROM
test_1
;
quit;
endrsubmit;
FYI: I am finding it quite difficult to understand the various methods in SAS.
To clarify, the field should actually be a number, not a character field, nor a date.
If you want the field to store the value 20141231 for 31DEC2014, you can do this:
proc sql;
create table want as
select input(put(date,yymmddn8.),8.) as date_num
from have;
quit;
input(..) turns something into a number, put(..) turns something into a string. In this case, we first put it with your desired format (yymmddn8. is YYYYMMDD with no separator), and then input it with 8., which is the length of the string we are reading in.
In general, this should not be done; storing dates as numerics of their string representation is a very bad idea. Try to stay within the date formats, as they are much easier to work with once you learn them, and SAS will happily work with other databases to use their date types as well. If you want the "20141231" representation (to put it to a text file, for example), make it a character variable.
Don't.
You lose the ability to use built in SAS functions for date calculations.
SAS stores dates as numbers, 0 being Jan 1, 1960 and increments from there. Formats are used to display the formats as desired for reporting and presentation.
Related
I'm very new to SAS and I'm trying to read a txt file that contains date and time. The file is shown in the following figure
I believe I have tried all the possible options that I can think of to read the file but the output is still in numeric form. The following is the code I'm using
data wb_bg_1619;
infile "C:\Users\daizh\Desktop\Ren\SAS\wb_bg_0215.txt" firstobs=3 missover;
informat DATE DATE7. TIME TIME5. ;
input DATE TIME BG;
run;
proc print data=wb_bg_1619;
run;
The output looks like this
You've used an informat to automatically convert a date stored as text into a numeric SAS date format, which is the number of days since Jan 1 1960. To display that in a human readable format, you need to use a regular format. Add the following to the top or bottom of your code:
format date date9.
time time.
;
This changes how the data is displayed to you, but does not change how SAS works with it. As far as SAS is concerned, a date is only a number. You could run the rest of your program without ever using a format and get the right numbers and calculations with it if you wanted to, but that sure makes troubleshooting hard.
To remember the difference between a format and informat:
informats are for inputs
formats are for you
I have a table in SAS which contains the format information I want. I want to bin this data into the categories given.
What I don't know how to do is create either an xform or a format file from the data.
An example table looks like this:
TxtLabel Type FmtName label Hlo count
. I FAC1f 0 O 1
1996 I FAC1f 1 2
1997 I FAC1f 2 3
I want to date all years in a different data set as after 1997 OR before 1996.
The problem is that I know how to do this by hard coding it, but these files changes the numbers each time so I'm hoping to use the information in the table to generate the bins rather than hard code them.
How do I go about binning by data using a column from another dataset for my categorization?
Edit
I have two data sets, one which looks like the one I have included and one which has a column titled "YEAR". I want to bin the second data set using the categories from the first. In this case there are two available years in TxtLabel. There are multiple tables like this, I'm looking at how to generate PROC Format code from the table, rather than hard coding the values.
This should run to create the desired format
Proc FORMAT CNTLIN=MyCustomFormatControlData;
run;
You can then use it in a DATA Step, or apply it to a column in a data set.
Binning the data might be construed as 'data set splitting' but your question does not make it clear if that is so. Generic arbitrary splitting is often done with one of these techniques:
wall paper source code resolved from macro variables populated from information garnered in a Proc SQL or Proc FREQ step
dynamic data splitting using hash object for grouping records in memory, and saved to a data set with an .output() call.
Sample code for explicit binning
data want0 want1 want2 want3 want4 want5 wantOther;
set have;
* explicit wall paper;
select (put(year,FAC1f.));
when ('0') output want0;
when ('1') output want1;
when ('2') output want2;
when ('3') output want3;
when ('4') output want4;
when ('5') output want5;
otherwise output wantOther;
run;
This is the construct that source code generated by macro can produce, and requires
one pass to determine the when/output lines that are to be generated
a second pass to apply the lines of code that were generated.
If this is the data processing that you are attempting:
do some research (plenty of info out there)
write some code
make a new question if you get errors you can't resolve
Proc FORMAT
Proc FORMAT has a CNTLIN option for specifying a data set containing the format information. The structure and values expected of the Input Control Data Set (that CNTLIN) is described in the Output Control Data Set documentation. Some of the important control data columns are:
FMTNAME
specifies a character variable whose value is the format or informat name.
LABEL
specifies a character variable whose value is associated with a format or an informat.
START
specifies a character variable that gives the range's starting value.
END
specifies a character variable that gives the range's ending value.
As the requirements of the custom format to be created get more sophisticated you will need to have more information variables in the input control data set.
I have a dataset where date_occur is in MMDDYY10 format (looks like 10/23/2014). I want all dates in FY17 which is 10/01/2016-09/30/2017. The following code is not working for some reason. Why!!
I'm not getting any error or warning messages I'm just getting an empty table.
Thanks in advance
data fy17;
set y16_17;
where date_occur between 10/01/2016 and 09/30/2017;
run;
The Log says:
NOTE: There were 0 observations read from the data set WORK.Y16_17.
WHERE (date_occur>=0.0001487357 and date_occur<=0.0049603175);
I'm thinking SAS is not understanding that date_occur is in mmddyy10 format. The internet has code like
where date_occur between '10/01/2016'd and '09/30/2017'd;
I tried this and it did not work either.
Your 'dates' are being interpreted as numeric expressions, and none of the date values in your table lie in the resulting range. Use date literals instead:
data fy17;
set y16_17;
where date_occur between '01oct2016'd and '30sep2017'd;
run;
You can only use date. format (ddmmmyyyy / ddmmmyy) when specifying date literals.
I started out formatting my variables using PROC FORMAT. Later on I found that I had to change some of my variables in my dataset. I want to maintain the formatting I originally created, but I don't think I can do this if I recode. Am I correct in assuming this? I think I will have to just change some of my formats to accommodate my new variables, but is there a way
I'm not quite sure I understand your question, but I think I can still answer your question by giving you an understanding of the difference between recoding variables in SAS and using formatted values.
If you have originally created a format, that format is applied to the values in the SAS dataset at the time that your analysis is run. So, if you have a value of "Block A" in a character variable in your dataset and you have formatted value that maps "Block A" to the formatted value of 1, then if you go in and later change the value of "Block A" to something else and rerun your analysis, "Block A" will not longer be printed in your output or used in your analysis as the formatted value. Formats work independently of the underlying values in your datasets. When you run an analysis SAS essentially looks through your datasets at run-time and maps each of the values to the formatted values as you've specified in your proc format statement and then performs the analysis using the formatted values.
If you want to keep the original formatting, you can use two separate formats: one for the old format and one for the new formatting and call the appropriate format into your procedures depending on when you want to use which format.
You can also use a put statement in a datastep to convert the previously formatted value and "hard code" the formatted value as an actual value in your dataset. For example, if you have a format called "blockno" that you used with a variable called "block" then, using your old format, you could create a variable called blockno_old and set it to the old formatted value with:
block_old=put(block, $blockno.).
You could then modify block with your new values. You would then have to variables in your dataset: block_old which would contain the original values of your variable and block which, after your changes, would contain the new values.
Proc Format is not a format statement
With proc format, you create formats, you do not assign them to variables. That you can do for instance with a format statement.
The format of a variable is not its internal length
A SAS variable can only have two types: numerical (which non SAS programmers call double) or chracter (which non SAS programmers call fixed length character) It can however have hundreds of different formats. The format just determines the way the variable is represented in a report.
You can perfectly change the format of a variable without changing it's length.
Try this:
proc format;
value myFormat
0-10 = 'small'
10-20 ='medium'
20-100='large' ;
run;
data test1;
infile datalines;
length myVar 8.;
input myVar;
format myVar 6.2;
datalines;
1
2.1
9.12
10.123
15.1234
22.12345
50.123456
;
data test2;
set test1;
format myVar myFormat.;
data test3;
set test2;
format myVar 12.6;
run;
title 'In test1, myVar has format 6.2';
proc print data=test1;
run;
title 'In test2, myVar has format myFormat';
proc print data=test2;
run;
title 'In test3, myVar has format 12.6';
proc print data=test3;
run;
You can create a format in a format catalog and store it for any future reference. It always happens that the dataset has new variables and updated variables with new data. So having a format catalog to accommodate the new and old changes will actually help to maintain history of the original and current values.
I have a need to combine two sas datasets having the same column names but one of the datasets will have a numeric value where the same name in the other dataset are character. I was thinking to evaluate each field with the %isnum function and based on this convert the number to character:
char_id = put(id, 7.) ;
drop id ;
rename char_id=id ;
What I need to know is how do I determine the length of the variable to use in the PUT and what would I do for date fields?
Sounds like you need to analyze your data and see how long things are. Use an obviously too long format (best32.) and then see how long the actual results are, or use max.
For date fields, you need to decide how you want your date fields to look.
date_c = put(date_n,date9.);
That would be the default, but there are literally hundreds of date formats you can choose from.
You can also use proc contents data=myDataStes out=VarDatasets; run; and you will get the list of variables with type, length, format, informat and so on.