SAS proc format statement - sas

I want to create a format on a numeric variable (say, age) to see the result as ">10". I tried as:
PROC FORMAT;
VALUE agefmt
>10 - high = '> 10' /*10 to be excluded.*/
other = '<= 10'
;
RUN;
But it does not work. Please help.

You made just a small mistake, the > must be < and between the values:
PROC FORMAT;
VALUE agefmt
10 <- high = '> 10' /*10 to be excluded.*/
other = '<= 10'
;
RUN;

Related

How to use proc format with regexp option?

I stumbled upon there is a regexp option in proc format, so I give a try on this and get fuzzled finally.
proc format;
invalue test
'/n\(.*\)/i'(regexp) = 1
;
run;
data _null_;
x = 'n(ADT,TRTDT)';
y = input(x,test.);
z = prxmatch('/n\(.*\)/i',x)^=0;
put y = z = ;
run;
I had thought that the regexp option is equal to prxmatch() in data step, but the truth is I am wrong.
NOTE: Invalid argument to function INPUT at row 466 column 9.
y=. z=1
x=n(ADT,TRTDT) y=. z=1 _ERROR_=1 _N_=1
I have searched on help documentation and get nothing really help.
How does the option regexp in proc format works? Feel free to share your opinoin, thanks.
You defined an informat with a default width of 10 and tried to read a string of length 11.
data _null_;
x = 'n(ADT,TRTDT)';
y1 = input(x,??test.);
y2 = input(x,??test20.);
z = prxmatch('/n\(.*\)/i',x)^=0;
put (_all_) (=);
run;
Results:
x=n(ADT,TRTDT) y1=. y2=1 z=1
You can add the DEFAULT= option to the INVALUE statement to change the default width.
proc format;
invalue test (default=40)
'/n\(.*\)/i'(regexp) = 1
;
run;

SAS / PROC FREQ TABLES - can I suppress frequencies and percents if frequency is less than a given value?

I'm using tagsets.excelxp in SAS to output dozens of two-way tables to an .xml file. Is there syntax that will suppress rows (frequencies and percents) if the frequency in that row is less than 10? I need to apply that in order to de-identify the results, and it would be ideal if I could automate the process rather than use conditional formatting in each of the outputted tables. Below is the syntax I'm using to create the tables.
ETA: I need those suppressed values to be included in the computation of column frequencies and percents, but I need them to be invisible in the final table (examples of options I have considered: gray out the entire row, turn the font white so it doesn't show for those cells, replace those values with an asterisk).
Any suggestions would be greatly appreciated!!!
Thanks!
dr j
%include 'C:\Users\Me\Documents\excltags.tpl';
ods tagsets.excelxp file = "C:\Users\Me\Documents\Participation_rdg_LSS_3-8.xml"
style = MonoChromePrinter
options(
convert_percentages = 'yes'
embedded_titles = 'yes'
);
title1 'Participation';
title2 'LSS-Level';
title3 'Grades 3-8';
title4 'Reading';
ods noproctitle;
proc sort data = part_rdg_3to8;
by flag_accomm flag_participation lss_nm;
run;
proc freq data = part_rdg_3to8;
by flag_accomm flag_participation;
tables lss_nm*grade_p / crosslist nopercent;
run;
ods tagsets.excelxp close;
D.Jay: Proc FREQ does not contain any options for conditionally masking cells of it's output. You can leverage the output data capture capability of the ODS system with a follow-up Proc REPORT to produce the desired masked output.
I am guessing on the roles of the lss and grade_p as to be a skill level and a student grade level respectively.
Generate some sample data
data have;
do student_id = 1 to 10000;
flag1 = ranuni(123) < 0.4;
flag2 = ranuni(123) < 0.6;
lss = byte(65+int(26*ranuni(123)));
grade = int(6*ranuni(123));
* at every third lss force data to have a low percent of grades < 3;
if mod(rank(lss),3)=0 then
do until (grade > 2 or _n_ < 0.15);
grade = int(6*ranuni(123));
_n_ = ranuni(123);
end;
else if mod(rank(lss),7)=0 then
do until (grade < 3 or _n_ < 0.15);
grade = int(6*ranuni(123));
_n_ = ranuni(123);
end;
output;
end;
run;
proc sort data=have;
by flag1 flag2;
*where lss in ('A' 'B') and flag1 and flag2; * remove comment to limit amount of output during 'learning the code' phase;
run;
Perform the Proc FREQ
Only capture the data corresponding to the output that would have been generated
ods _all_ close;
* ods trace on;
/* trace will log the Output names
* that a procedure creates, and thus can be captured
*/
ods output CrossList=crosslist;
proc freq data=have;
by flag1 flag2;
tables lss * grade / crosslist nopercent;
run;
ods output close;
ods trace off;
Now generate output to your target ODS destination (be it ExcelXP, html, pdf, etc)
Reference output of which needs to be produced an equivalent having masked values.
* regular output of FREQ, to be compare to masked output
* of some information via REPORT;
proc freq data=have;
by flag1 flag2;
tables lss * grade / crosslist nopercent;
run;
Proc REPORT has great features for producing conditional output. The compute block is used to select either a value or a masked value indicator for output.
options missing = ' ';
proc format;
value $lss_report ' '= 'A0'x'Total';
value grade_report . = 'Total';
value blankfrq .b = '*masked*' ._=' ' other=[best8.];
value blankpct .b = '*masked*' ._=' ' other=[6.2];
proc report data=CrossList;
by flag1 flag2;
columns
('Table of lss by grade'
lss grade
Frequency RowPercent ColPercent
FreqMask RowPMask ColPMask
)
;
define lss / order order=formatted format=$lss_report. missing;
define grade / display format=grade_report.;
define Frequency / display noprint;
define RowPercent / display noprint;
define ColPercent / display noprint;
define FreqMask / computed format=blankfrq. 'Frequency' ;
define RowPMask / computed format=blankpct. 'Row/Percent';
define ColPMask / computed format=blankpct. 'Column/Percent';
compute FreqMask;
if 0 <= RowPercent < 10
then FreqMask = .b;
else FreqMask = Frequency;
endcomp;
compute RowPMask;
if 0 <= RowPercent < 10
then RowPMask = .b;
else RowPMask = RowPercent;
endcomp;
compute ColPMask;
if 0 <= RowPercent < 10
then ColPMask = .b;
else ColPMask = ColPercent;
endcomp;
run;
ods html close;
If you have to produce lots of cross listings for different data sets, the code is easily macro-ized.
When I've done this in the past, I've first generated the frequency to a dataset, then filtered out the N, then re-printed the dataset (using tabulate usually).
If you can't recreate the frequency table perfectly from the freq output, you can do a simple frequency, check which IDs or variables or what have you to exclude, and then filter them out from the input dataset and rerun the whole frequency.
I don't believe that you can with PROC FREQ, but you can easily replicate your code with PROC TABULATE and you can use a custom format there to mask the numbers. This example sets it to M for missing and N for less than 5 and with one decimal place for the rest of the values. You could also replace the M/N with a space (single space) to have no values shown instead.
*Create a format to mask values less than 5;
proc format;
value mask_fmt
. = 'M' /*missing*/
low - < 5='N' /*less than 5 */
other = [8.1]; /*remaining values with one decimal place*/
run;
*sort data for demo;
proc sort data=sashelp.cars out=cars;
by origin;
run;
ods tagsets.excelxp file='/folders/myfolders/demo.xml';
*values partially masked;
proc tabulate data=cars;
where origin='Asia';
by origin;
class make cylinders;
table make, cylinders*n*f=mask_fmt. ;
run;
ods tagsets.excelxp close;
This was tested on SAS UE.
EDIT: Forgot the percentage piece, so this likely will not work for that, primarily because I don't think you'll get the percentages the same as in PROC FREQ (appearance) so it depends on how important that is to you. The other possibility to accomplish this would be to modify the PROC FREQ template to use the custom format as above. Unfortunately I do not have time to mock this up for you but maybe someone else can. I'll leave this here to help get you started and delete it later on.

SAS way of R's cut function [duplicate]

I have the numeric values of salaries of different employee's. I want to break the ranges up into categories. However I do not want a new column rather, I want to just format the existing salary column into this range method:
At least $20,000 but less than $100,000 -
At least $100,000 and up to $500,000 - >$100,000
Missing - Missing salary
Any other value - Invalid salary
I've done something similar with gender. I just want to use the proc print and format command to show salary and gender.
DATA Work.nonsales2;
SET Work.nonsales;
RUN;
PROC FORMAT;
VALUE $Gender
'M'='Male'
'F'='Female'
'O'='Other'
other='Invalid Code';
PROC FORMAT;
VALUE salrange
'At least $20,000 but less than $100,000 '=<$100,000
other='Invalid Code';
PROC PRINT;
title 'Salary and Gender';
title2 'for Non-Sales Employees';
format gender $gender.;
RUN;
Proc Format is the correct method and you need a numeric format:
proc format;
value salfmt
20000 - <100000 = "At least $20,000 but less than $100,000"
100000 - 500000 = "100,000 +"
. = 'Missing'
other = 'Other';
Then in your print apply the format, similar to what you did for gender.
format salary salfmt.;
This should help get you started.
I created a little function that mimics the R cut functions :
options cmplib=work.functions;
proc fcmp outlib=work.functions.test;
function cut2string(var, cutoffs[*], values[*] $) $;
if var <cutoffs[1] then return (values[1]);
if var >=cutoffs[dim(cutoffs)] then return (values[dim(values)]);
do i=1 to dim(cutoffs);
if var >=cutoffs[i] & var <cutoffs[i+1] then return (values[i+1]);
end;
return ("Error, this shouldn't ever happen");
endsub;
run;
Then you can use it like this :
data Work.nonsales2;
set Work.nonsales;
array cutoffs[3] _temporary_ (20000 100000 500000);
array valuesString[4] $10 _temporary_ ("<20k " "20k-100k" "100k-500k" ">500k");
salary_string = cut2string(salary ,cutoffs,valuesString);
run;

SAS Second smallest value

The following code, built using the Summary Statistics task from SAS Enterprise Guide, finds the min of each column of a table.
How can I find the second smallest value?
I tried replacing MIN with SMALLEST(2) but doesn't work.
Thank you.
TITLE;
TITLE1 "Summary Statistics";
TITLE2 "Results";
FOOTNOTE;
FOOTNOTE1 "Generated by the SAS System (&_SASSERVERNAME, &SYSSCPL) on
%TRIM (%QSYSFUNC(DATE(), NLDATE20.)) at
%TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC MEANS DATA=WORK.SORTTempTableSorted
NOPRINT
CHARTYPE
MIN NONOBS ;
VAR A B C;
OUTPUT OUT=WORK.MEANSummaryStats(LABEL="Summary Statistics for
WORK.QUERY_FOR_TRNSTRANSPOSEDPD__0001")
MIN()=
/ AUTONAME AUTOLABEL INHERIT
;
RUN;
Using the ExtremeValue table from PROC UNIVARIATE.
ods select none;
ods output ExtremeValues=ExtremeValues(where=(loworder=2) drop=high:);
proc univariate data=sashelp.class NEXTRVAL=2;
run;
ods select all;
proc print;
run;
I don't think there's any way to accomplish this within proc means. There are ways using a variety of other procs. The Univariate procedure highlights one method using the Extreme Observations.
https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect058.htm
title 'Extreme Blood Pressure Observations';
ods select ExtremeObs;
proc univariate data=BPressure;
var Systolic Diastolic;
id PatientID;
run;
proc print data=ExtremeObs;
run;
I will assume you are interested in all numeric columns. If the IFLIST macro variable is longer than 64k bytes in length due to the number of numeric variables and the length of their names, this code will fail. It should work for all reasonably narrow data sets.
UNTESTED CODE
We get a list of the variables in your data set.
PROC CONTENTS DATA=SORTTempTableSorted OUT=md NOPRINT ;
RUN ;
We use that list to create statements and expressions.
IFLIST is a block of statements to store the minimum value of fieldname in fieldname_1 and the second lowest in fieldname_2. If the comparison is LT, then we keep distinct values, not necessarily the order statistics. If the comparision is LE and there are multiple observations with the minimum value, fieldname_1 and fieldname_2 will be equal to each other. I will assume you want distinct values.
MAXLIST is an expression that will resolve to the largest numeric value in the data set :)
MINLIST and MINLIST2 are created for use in RETAIN and KEEP statements.
PROC SQL STIMER NOPRINT EXEC ;
SELECT 'IF ' || name || ' LT ' || name '_1 THEN DO;' ||
name || '_2=' || name || '_1;' ||
name || '_1=' || name || ';END;ELSE IF ' ||
name || ' LT ' || name || '_2 THEN ' ||
name || '_2=' || name,
'MAX(' || name || ')',
name || '_1',
name || '_2'
INTO :iflist SEPARATED BY '; ',
:maxlist SEPARATED BY '<>'
:minlist SEPARATED BY ' ',
:min2list SEPARATED BY ' '
FROM md
WHERE type EQ 1
;
Now we get the largest numeric value from the data set:
SELECT &maxlist
INTO :maxval
FROM SORTTempTableSorted
;
QUIT ;
Now we do the work. The END option sets "eof" to 1 on the last observation, which is the only time we want to write a record to the output data set.
DATA min2 ;
SET SORTTempTableSorted END=eof;
RETAIN &minlist &min2list &maxval;
KEEP &minlist &min2list ;
&iflist ;
IF eof THEN
OUTPUT ;
RUN ;
A data step solution that should work for any number of columns without ever running into macro limitations:
proc sql noprint;
select count(*) into :NUM_COUNT from dictionary.columns
where LIBNAME='SASHELP' and MEMNAME = 'CLASS' and TYPE = 'num';
quit;
data class_min2;
do until(eof);
set sashelp.class end = eof;
array min2[&NUM_COUNT,2] _temporary_;
array nums[*] _numeric_;
do _n_ = 1 to &NUM_COUNT;
min2[_n_,1] = min(min2[_n_,1],nums[_n_]);
if min2[_n_,1] < nums[_n_] then min2[_n_,2] = min(nums[_n_],min2[_n_,2]);
end;
end;
do _iorc_ = 1 to 2;
do _n_ = 1 to &NUM_COUNT;
nums[_n_] = min2[_n_,_iorc_];
end;
output;
end;
keep _NUMERIC_;
run;
This outputs the two lowest distinct values of each numeric variable, without transposing the data in the same way that proc univariate does. You can easily allow for duplicate minimum values with a bit of tweaking.
Sort the values in ascending order. Delete the first value. This would be the minimum value. Now the value left at the first position is your second minimum.

SAS proc format for background colours for dates including blank values

I have this code where I create a proc format statement for dates based on todays date.
Any date prior to today is red and any date in the future is green. However this is a proc report I am calling this statement in and there are blank values for date in some cases. Therefore, I want fields that don't contain a date to be white.
data _null_;
sdate = date ();
format sdate date9.;
call symput('sdate',sdate);
run;
proc format;
value closefmt
low - &sdate ='red'
' ' = 'white'
&sdate - high = 'green';
run;
It doesn't like ' ' = white and doesn't accept null.
Any help would be appreciated.
Thanks in advance.
Use single dot for missing values in numeric variable:
proc format;
value closefmt
low - &sdate ='red'
. = 'white'
&sdate - high = 'green';
run;
/* test code */
data indata;
d = .; output;
do i=-3 to 3;
d = today()+i;
output;
end;
run;
data _null_;
set indata;
format d yymmdds10.;
length color $10;
color = put(d, closefmt. - L);
putlog d color;
run;