how assign custom created format in proc format - sas

This one is a hard one I have the this format which I have created .
This is my custom format
data work.myBins;
do start = -2.5 to 2.45 by 0.05;
end=start+0.05;
label=catx(' ',put(start,8.2),'to',put(end,8.2));
output;
end;
run;
proc format cntlin=work.myBins; run;
Now I have further created this format using proc format
proc format;
value customFormat
2.5-high='Higher then 2.5'
low-2.5='Lower then -2.5'
other=bin.;
run;
Will this work
Thanks

You need to create 2 extra records in your myBins dataset, one for the 'low - <2.5' range, and another for the '2.5 - high' range. Then use the single format to cover all values.

Related

SAS: Converting numeric to character values

I am trying to convert datatime20. from numeric to character value.
Currently I have numeric values like this: 01Jan200:00:00:00 and I need to convert it to character values and received output like: 2020-01-01 00:00:00.0
What format and informat should be used in aboved ?
I have tried used PUT function to convert numeric to character and tried many option, each time receiving other format. Should be also use DHMS function before PUT ?
There is not a native format that produces that string exactly. But it it not hard to build it in steps using existing formats. Or you could use PICTURE statement in PROC FORMAT to build your own format.
If you don't really care about the time of day part of the datetime value then this is an easy and clearly understand way to convert the numeric variable DT with number of seconds into a new character variable in that style. Use DATEPART() to get the date (number of days) from the datetime value and then use the YYMMDD format to generate the 10 character string for the date and then just append the constant string of the formatted zeros.
length dt_string $21.;
dt_string = put(datepart(dt),yymmdd10.)||' 00:00:00.0';
If you need the time of day part then you could also use the TOD format.
dt_string = put(datepart(dt),yymmdd10.)||put(dt,tod11.1);
Or you could use the format E8601DT21.1 and then change the letter T between the date and time to a space instead.
dt_string = translate(put(dt,E8601DT21.1),' ','T');
If you want to figure out what formats exist for datetime values and what the formatted results look like you could run a little program to pull the formats from the meta data and apply them to a specific datetime value.
data datetime_formats;
length format $50 string $80 ;
set sashelp.vformat;
where fmttype='F';
where also fmtinfo(fmtname,'cat')='datetime';
keep format string fmtname maxw minw maxd ;
format=cats(fmtname,maxw,'.','-L');
string=putn('01Jan2020:01:02:03'dt,format);
run;
A custom format can be defined to return the result of a user defined function. Docs
proc format;
value <format-name> (default=<width>)
other = [<function-name>()]
;
run;
Example:
options cmplib=(sasuser.functions);
proc fcmp outlib=sasuser.functions.temporal;
function E8601DTS (datetime) $21;
return (
translate (putn(datetime,'E8601DT21.1'),' ','T')
);
endsub;
run;
proc format;
value E8601DTS (default=21)
other = [E8601DTS()]
;
run;
data have;
do dt = '01jan2020:0:0'dt to '10jan2020:0:0'dt by '60:00't;
output;
end;
format dt datetime16.;
run;
ods html file='function-based-format.html';
proc print data=have(obs=4); title 'stock E8601DT';
proc print data=have(obs=4); title 'custom E8601DTS';
format dt E8601DTS.;
run;
ods html close;

Proc Freq Combine Results Rows based on Contents

I am doing a Proc Freq on a a large amount of User Entered Data, I would like to know if I can Combine the Results Rows based on the Contents of the first column.
You appear to want to perform a frequency of the first word (or 1st scanned part of a column). Such a case will require data manipulation to reduce the longer value to the desired shortened value, in a different variable, to be frequency binned.
data have;
input;
user_entered_data = _infile_;
datalines;
Nyfaria - January
Nyfaria - Febuary
Michelangelo - January
Michelangelo - Feburary
run;
data have_for_freq;
set have;
item = scan (user_entered_data,1,' ');
run;
options nocenter;
ods noproctitle;
proc freq data=have_for_freq;
title "Freq of raw data";
table user_entered_data;
run;
proc freq data=have_for_freq;
title "Freq of raw data formatted as $4.";
table user_entered_data;
format user_entered_data $4.;
run;
proc freq data=have_for_freq;
title "Freq of raw data - item scanned out";
table item;
run;
Note: In some cases you can use a format to control the mapping of a raw value to a reported value. There is no format that returns the first 'word' of a value (such as scan does)

Append of Tables with the Same Variables but Differing Attributes

My question is about the append of two different tables that are supposed to have the same name/format/type/length variables.
I am trying to create a step in my SAS program where I don't allow my program to be executed if the format/type/length of variables with the same name is not the same.
For example, when in one table I have a date in type string "dd-mm-yyyy" and in the other table I have the "yyyy-mm-dd" or "dd-mm-yyyy hh:mm:ss". After the append, our daily executions based on these input tables didn't work as expected. Sometimes the values come up as missing or out of order, since the formats are different.
I tried using the PROC COMPARE statement, which allowed me to check which variables have Differing Attributes (Type, Length, Format, InFormat and Labels).
proc compare base = SAS-data-set
compare = SAS-data-set;
run;
However, I only got the info on which variables have differing atributes (listing of common variables with differing attributes), not being able to do anything with/about it.
On the other hand, I would like to know if there's a chance to have a structured output table with this information, in order to use it as a control statement.
Creating an automatic task to do it would save me a lot of time.
Screenshot of an example:
You can use Proc CONTENTS to get information about a data sets variables. Do that for both data sets, and then you can use Proc COMPARE to create a data set informing you of the variable attributes differences.
data cars1;
set sashelp.cars (obs=10);
date = today ();
format date date9.;
cars1_only = 1;
x = 1.458; label x = "x-factor";
run;
data cars2;
length type $50;
set sashelp.cars (obs=10);
format date yymmdd10.;
cars2_only = 1;
X = 1.548; label x = "X factor to apply";
run;
proc contents noprint data=cars1 out=cars1_contents;
proc contents noprint data=cars2 out=cars2_contents;
run;
data cars1_contents;
set cars1_contents;
upName = upcase(Name);
run;
data cars2_contents;
set cars2_contents;
upName = upcase(Name);
run;
proc sort data=cars1_contents; by upName;
proc sort data=cars2_contents; by upName;
run;
proc compare noprint
base=cars1_contents
compare=cars2_contents
outall
out=cars_contents_compare (where=(_TYPE_ ne 'PERCENT'))
;
by upName;
run;
There is also an ODS table you can capture directly without having to run Proc CONTENTS, but the capture is not 'data-rific'
ods output CompareVariables=work.cars_vars;
proc compare base=cars1 compare=cars2;
run;

SAS proc append - variable in wrong format

I have in the table b the ID column in format INTEGER .
I use proc append, but when I check the table database.aw_1234 I have ID in double or float format, how can I fix it?
data a (KEEP = ID ACC_NO PERIOD_DTE);
infile "/root/dirs/files." dlm=";";
ID=_n_;
format ID 8.;
input ACC_NO_VAR PERIOD_DTE $10.;
leading_zeros = 16 - length(ACC_NO_VAR);
cat = repeat('0', leading_zeros);
ACC_NO = catt(cat, ACC_NO_VAR);
run;
DATA b(KEEP = ID ACC_NO PERIOD_DTE);
RETAIN ID ACC_NO PERIOD_DTE;
SET a;
RUN;
proc delete data = database.aw_1234;
proc append BASE=database.aw_1234. FORCE;
SAS only has 2 types, strings and doubles. A format is just instructions for SAS on how to display the variable to the user. So your number was always a double.
If you are creating a table in an RDBMS, you will probably see a note in the log that says something along the lines "SAS Formats are not translated". This means that the RDBMS doesn't really know what a format is, so SAS just writes your double, as a double.
To fix this, create the table in the RDBMS system with the TYPE integer. Then use SAS to delete records from the table and append into that table. Don't delete and recreate the table.
Change your code to something like this:
proc sql noprint;
delete from database.aw_1234;
quit;
proc append base=database.aw_1234 data=b force;
run;

Is there a way to name proc rank groups based on values within the group?

So I have multiple continuous variables that I have used proc rank to divide into 10 groups, ie for each observation there is now a "GPA" and a "GRP_GPA" value, ditto for Hmwrk_Hrs and GRP_Hmwrk_Hrs. But for each of the new group columns the values are between 1 - 10. Is there a way to change that value so that rather than 1 for instance it would be 1.2-2.8 if those were the min and max values within the group? I know I can do it by hand using proc format or if then or case in sql but since I have something like 40 different columns that would be very time intensive.
It's not clear from your question if you want to store the min-max values or just format the rank columns with them. My solution below formats the rank column and utilises the ability of SAS to create formats from a dataset. I've obviously only used 1 variable to rank, for your data it will be a simple matter to wrap a macro around the code and run for each of your 40 or so variables. Hope this helps.
/* create ranked dataset */
proc rank data=sashelp.steel groups=10 out=want;
var steel;
ranks steel_rank;
run;
/* calculate minimum and maximum values per rank */
proc summary data=want nway;
class steel_rank;
var steel;
output out=want_min_max (drop=_:) min= max= / autoname;
run;
/* create dataset with formatted values */
data steel_rank_fmt;
set want_min_max (rename=(steel_rank=start));
retain fmtname 'stl_fmt' type 'N';
label=catx('-',steel_min,steel_max);
run;
/* create format from previous dataset */
proc format cntlin=steel_rank_fmt;
run;
/* apply formatted value to rank column */
proc datasets lib=work nodetails nolist;
modify want;
format steel_rank stl_fmt10.;
quit;
In addition to Keith's good answer, you can also do the following:
proc rank data = sashelp.cars groups = 10 out = test;
var enginesize;
ranks es;
run;
proc sql ;
select *, catx('-',min(enginesize), max(enginesize)) as esrange, es from test
group by es
order by make, model
;
quit;