PROC FORMAT;
VALUE $Gender 'M'='Male'
'F'='Female';
In the above I am passing the value for processing format as 'm'= 'male'
And 'f'='female' ... The same way I need that values to be passed from a file and the values of process format comes dynamically. How do I do that.
Like I need to pass the above mapping m=male, f=female from a file and read the file and pass that mapping to proceed format dynamically.
To do what you are asking, is done if you can place your data into Data set, two steps process
1- Get your raw data into data set
2- Use above data set to get desired format
so let do this-
*step 1-;
DATA fmt;
Infile "Textfile.txt" DSD ;
Retain fmtname '$myfmt'; /*myfmt is what your format name*/;
Length start $2 label $50;
Input start label ;
RUN;
Now since above code will create a dataset with Male female information use same dataset to create your format.
*Step2:
PROC FORMAT CNTLIN=fmt;
RUN;
The simplest way to create a format from a data set is to use the CNTLIN= option in PROC FORMAT.
REQUIRED VARIABLES IN THE FORMAT DATASET (FMTNAME, START, AND LABEL)
Variable Used for
FMTNAME- The format name
START - The left side of the formatting = sign (assumed character) –
must be unique unless defining a Multi-Label format
LABEL- The right side of the formatting = sign
Related
I am working to merge two data sets and get the following error:
Variable DOB has been defined as both character and numeric.
Here is my code. I know I need a set statement to change the character to numeric. I was thinking:
DATA Merged1;
SET Aug21 Aug22;
RUN;
set (rename=(DOB=DOBnum));
length DOB $ 10.;
DOB= put(DOBnum,f10. -L);
drop DOBnum;
Would this be placed before my Set statement to merge to Aug 21 Aug 22?
Thank you!
I tried to run the code but it would not merge, unsure if where the Set statement for DOB would go
You do not need the second SET statement. You need to add the RENAME= dataset option to the dataset where it is mentioned in the first SET statement.
So something like:
DATA BOTH;
SET Aug21 Aug22(in=in2 rename=(DOB=DOBnum));
if in2 then DOB= put(DOBnum,f10. -L);
drop DOBnum;
RUN;
To get a more detailed answer provide more details about the variables and the types of values they contain. For example if DOB means Date of Birth then it does not make much sense to use the F format. If DOB should be an actual DATE then it should be numeric and not character. And if the version that is numeric has actual date values then converting them to text using the F format is going to generate strings that will be confusing for humans.
If you're a beginner I recommend two steps so you can trace the work.
Convert dob from character to numeric
Append the two datasets together (assume you're stacking the data sets)
Use format to control how the date is displayed
*convert character to numeric SAS date;
data aug21_convert2num;
set aug21(rename=dob=dobchar);
dob = input(dob, anydtdte.);
drop dobchar;
run;
*append the two data sets;
data want;
set aug21_convert2num aug22;
format dob yymmdd10.;
run;
Now the question I have is I have a bigger problem as I am getting "this range is repeated or overlapped"... To be specific my values of label are repeating I mean my format has repeated values like a=aa b=aa c=as kind of. How do I resolve this error. When I use the hlo=M as muntilqbel option it gives double the data...
I am mapping like below.
Santhan=Santhan
Chintu=Santhan
Please suggest a solution.
To convert data to a FORMAT use the CNTLIN= option on PROC FORMAT. But first make sure the data describes a valid format. So read the data from the file.
data myfmt ;
infile 'myfile.txt' dsd truncover ;
length fmtname $32 start $100 value $200 ;
fmtname = '$MYFMT';
input start value ;
run;
Make sure to set the lengths of START and VALUE to be long enough for any actual values your source file might have.
Then make sure it is sorted and you do not have duplicate codes (START values).
proc sort data=myfmt out=myfmt_clean nodupkey ;
by start;
run;
The SAS log will show if any observations were deleted because of duplicate START values.
If you do have duplicate values then examine the dataset or original text file to understand why and determine how you want to handle the duplicates. The PROC SORT step above will keep just one of the duplicates. You might just has exact duplicates, in which case keeping only one is fine. Or you might want to collapse the duplicate observations into a single observation and concatenate the multiple decodes into one long decode.
If you want you can add a record that will add the functionality of the OTHER keyword of the VALUE statement in PROC FORMAT. You can use that to set a default value, like 'Value not found', to decode any value you might encounter that was not in your original source file.
data myfmt_final;
set myfmt_clean end=eof;
output;
if eof then do;
start = ' ';
label = 'Value not found';
hlo = 'O' ;
output;
end;
run;
Then use PROC FORMAT to make the format from the cleaned up data file.
proc format cntlin = myfmt_final;
run;
To convert a FORMAT to a dataset use the CNTLOUT= option on PROC FORMAT.
For example if you had created this format previously.
proc format ;
value $myfmt 'ABC'='ABC' 'BCD'='BCD' 'BCD1'='BCD' 'BCD2'='BCD' ;
run;
then you can use another PROC FORMAT step to make a dataset. Use the SELECT statement if you format catalog has more than one format defined and you just want one (or some) of them.
proc format cntlout=myfmt ;
select $myfmt ;
run;
Then you can use that dataset to easily make a text file. For example a comma delimited file.
data _null_;
set myfmt ;
file 'myfmt.txt' dsd ;
put start label;
run;
The result would be a text file that looks like this:
ABC,ABC
BCD,BCD
BCD1,BCD
BCD2,BCD
You get this error because you have the same code that maps to two different categories. I'm going to guess you likely did not import your data correctly from your text file and ended up getting some values truncated but without the full process it's an educated guess.
This will work fine:
proc format;
value $ test
'a'='aa' 'b'='aa' 'c'='as'
;
run;
This version will not work, because a is mapped to two different values, so SAS will not know which one to use.
proc format;
value $ badtest
'a'='aa'
'a' = 'ba'
'b' = 'aa'
'c' = 'as';
run;
This generates the error regarding overlaps in your data.
The way to fix this is to find the duplicates and determine which code they should actually map to. PROC SORT can be used to get your duplicate records.
I want to extend the question asked at Sum consecutive observations by some variable.
When I define the following macro variable
%let varnames=id dose supply date;
Since I don't know the fomat of the original data, how do I get the format for the rename automatically to apply it in the following extract of the above link?
data want(drop=id_i dose_i supply_i date_i);
format id dose supply 8. date mmddyy10.;
retain id dose supply date;
set have(rename=(id=id_i dose=dose_i supply=supply_i date=date_i)) end=last;
I am submitting the following SAS code:
proc format;
picture mysdt
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
DATA _NULL_;
call symput("Today", Put(datetime(),mysdt.));
run;
%put t_&today;
The resulting log shows 2 spaces before the datetime:
t_ 201504240150
The problem here is when my macro is resolved it is creating leading space. Why is it creating spaces?
My output should be:
t_201504240150
I know the solution but just wanted to know the reason.
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt.)));
run;
The reason for this is that your format is set up with a default length of 14. Therefore, when you go to put your value into &today it is stored with leading blanks to fill out the length to 14. From the SAS documentation:
DEFAULT=length
specifies the default length of the picture. The value for DEFAULT= becomes the length of picture if you do not give a specific length when you associate the format with a variable.
So, there are a number of options:
Set the default format length to match the expected length of your datetime value (using DEFAULT=12):
proc format;
picture mysdt (default=12)
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
Specify the width in your format:
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt12.)));
run;
Or, as previously answered, use call symputx to trim whitespace:
DATA _NULL_;
call symputx("Today", strip(Put(datetime(),mysdt.)));
run;
Personally, I'd fix the format to default to 12, then you won't need to remember to specify width or use call symputx each time.
call symputx remove leading and tailing space.
DATA _NULL_;
call symputx("Today", Put(datetime(),mysdt.));
run;
I've got a dataset that's full of data all in character format.
Now I want to create another dataset from this one, put put everything it it's correct decimal or date or character format.
Here's what I'm trying.
data work.testout;
attrib account_open_date informat = mmddyy10.;
do i = 1 to nobs;
set braw.accounts point = i nobs = nobs;
output;
end;
stop;
run;
this gives me:
Variable 'account_open_date' from data set braw.accounts (at line 7 column 21) has a different type (character) to the variable type on the data vector (numeric)
What's the best way of doing this?
You cannot use an informat to convert a variable directly from character to numeric. At least in SAS proper, you cannot convert a variable from character to numeric, period, without using an intermediary. You must do something along the lines of the following:
data want;
set have(rename=varwant=temp);
varwant=input(temp,MMDDYY10.);
drop temp;
run;
There you rename the (character) variable to a temporary name, then convert it to numeric using INPUT.