I am trying to use a format table to do look-ups for ICD codes.
I am import a csv format table with two columns icd_code and cancer_type
and my main dataset has two columns patient_id and icd.
So I can use the formats to lookup the icd codes and return cancer type in CANCER column. But when some icd codes that are not in the format table the CANCER column will just return the ICD value itself. So I want a 'other = *' in the VALUE of PROC FORMAT. Since I have so more than 200 different cancer types so it would not be possible to add them one by one by VALUE in PROC FORMAT. Is there a way to do that when the ICD code is not in the format table it won't be able to return it own value?
Thank you for helping!
DATA FORMAT; SET FORMAT_TABLE(RENAME = (ICD_CODE = START CANCER_TYPE = LABEL));
RETAIN FMTNAME '$CANCER_TYPE';
RUN;
PROC FORMAT LIB = LIBRARY
CNTLIN = CAL.FMT;
RUN;
DATA LOOKUP_DATA; SET DATA;
CANCER = PUT(ICD, $CANCER_TYPE.);
RUN;
The CNTLIN data set will need one additional row to specify the label for other values.
DATA MY_CNTLIN;
SET
FORMAT_TABLE(RENAME = (ICD_CODE = START CANCER_TYPE = LABEL))
END = LAST_CODE
;
RETAIN FMTNAME '$CANCER_TYPE';
OUTPUT;
IF LAST_CODE THEN DO;
hlo='O';
label='*';
OUTPUT;
END;
RUN;
Read SAS Documentation more details.
Proc FORMAT, CNTLIN=, Input Control Data Set
Related
I have many different datasets within a particularly library, and I'm wondering whether there is a way to find a minimum and maximum date associated with a particular unique ID across ALL datasets in a library?
Currently, I can find a local minimum and local maximum date associated with a particular ID within a particular dataset, but this ID will show up again throughout different datasets and have it's own minimum/max date associated with that dataset. But I want to compare the dates on this particular unique ID throughout the entire library, so I can find the global minimum and global maximum date but I do not know how to do this search throughout the entire library.
Currently my code looks like the following
DATA SUBSET_MIN_MAX (keep= MIN_DATE MAX_DATE UNIQUEID);
DO UNTIL (LAST.UNIQUEID);
set LIBRARY.&SAS_FILE_N;
BY UNIQUEID;
MIN_DATE = MIN(MIN_DATE,DATE);
MAX_DATE = MAX(MAX_DATE,DATE);
if last.UNIQUEID then output;
END;
format MIN_DATE MAX_DATE date9.;
RUN;
Thanks so much for any assistance.
Consider this using a view and PROC SUMMARY.
data d1; set sashelp.class; date=height+ranuni(4); run;
data d2; set sashelp.class; date=height-rannor(5); run;
data d3; set sashelp.class; date=height-ranuni(3); run;
data alld/view=alld;
length indsname $64;
set work.d:(keep=name date) indsname=indsname;
source=indsname;
run;
proc summary data=alld nway missing;
class name;
var date;
output out=want(drop=_type_)
idgroup(max(date) out(source date)=source1 globalmax)
idgroup(min(date) out(source date)=source2 globalmin)
;
run;
proc print;
run;
My question is about the append of two different tables that are supposed to have the same name/format/type/length variables.
I am trying to create a step in my SAS program where I don't allow my program to be executed if the format/type/length of variables with the same name is not the same.
For example, when in one table I have a date in type string "dd-mm-yyyy" and in the other table I have the "yyyy-mm-dd" or "dd-mm-yyyy hh:mm:ss". After the append, our daily executions based on these input tables didn't work as expected. Sometimes the values come up as missing or out of order, since the formats are different.
I tried using the PROC COMPARE statement, which allowed me to check which variables have Differing Attributes (Type, Length, Format, InFormat and Labels).
proc compare base = SAS-data-set
compare = SAS-data-set;
run;
However, I only got the info on which variables have differing atributes (listing of common variables with differing attributes), not being able to do anything with/about it.
On the other hand, I would like to know if there's a chance to have a structured output table with this information, in order to use it as a control statement.
Creating an automatic task to do it would save me a lot of time.
Screenshot of an example:
You can use Proc CONTENTS to get information about a data sets variables. Do that for both data sets, and then you can use Proc COMPARE to create a data set informing you of the variable attributes differences.
data cars1;
set sashelp.cars (obs=10);
date = today ();
format date date9.;
cars1_only = 1;
x = 1.458; label x = "x-factor";
run;
data cars2;
length type $50;
set sashelp.cars (obs=10);
format date yymmdd10.;
cars2_only = 1;
X = 1.548; label x = "X factor to apply";
run;
proc contents noprint data=cars1 out=cars1_contents;
proc contents noprint data=cars2 out=cars2_contents;
run;
data cars1_contents;
set cars1_contents;
upName = upcase(Name);
run;
data cars2_contents;
set cars2_contents;
upName = upcase(Name);
run;
proc sort data=cars1_contents; by upName;
proc sort data=cars2_contents; by upName;
run;
proc compare noprint
base=cars1_contents
compare=cars2_contents
outall
out=cars_contents_compare (where=(_TYPE_ ne 'PERCENT'))
;
by upName;
run;
There is also an ODS table you can capture directly without having to run Proc CONTENTS, but the capture is not 'data-rific'
ods output CompareVariables=work.cars_vars;
proc compare base=cars1 compare=cars2;
run;
I have in the table b the ID column in format INTEGER .
I use proc append, but when I check the table database.aw_1234 I have ID in double or float format, how can I fix it?
data a (KEEP = ID ACC_NO PERIOD_DTE);
infile "/root/dirs/files." dlm=";";
ID=_n_;
format ID 8.;
input ACC_NO_VAR PERIOD_DTE $10.;
leading_zeros = 16 - length(ACC_NO_VAR);
cat = repeat('0', leading_zeros);
ACC_NO = catt(cat, ACC_NO_VAR);
run;
DATA b(KEEP = ID ACC_NO PERIOD_DTE);
RETAIN ID ACC_NO PERIOD_DTE;
SET a;
RUN;
proc delete data = database.aw_1234;
proc append BASE=database.aw_1234. FORCE;
SAS only has 2 types, strings and doubles. A format is just instructions for SAS on how to display the variable to the user. So your number was always a double.
If you are creating a table in an RDBMS, you will probably see a note in the log that says something along the lines "SAS Formats are not translated". This means that the RDBMS doesn't really know what a format is, so SAS just writes your double, as a double.
To fix this, create the table in the RDBMS system with the TYPE integer. Then use SAS to delete records from the table and append into that table. Don't delete and recreate the table.
Change your code to something like this:
proc sql noprint;
delete from database.aw_1234;
quit;
proc append base=database.aw_1234 data=b force;
run;
This one is a hard one I have the this format which I have created .
This is my custom format
data work.myBins;
do start = -2.5 to 2.45 by 0.05;
end=start+0.05;
label=catx(' ',put(start,8.2),'to',put(end,8.2));
output;
end;
run;
proc format cntlin=work.myBins; run;
Now I have further created this format using proc format
proc format;
value customFormat
2.5-high='Higher then 2.5'
low-2.5='Lower then -2.5'
other=bin.;
run;
Will this work
Thanks
You need to create 2 extra records in your myBins dataset, one for the 'low - <2.5' range, and another for the '2.5 - high' range. Then use the single format to cover all values.
I'm having trouble with merging two datasets. I'm using SAS 9.2 and when importing several datasets they get corrupted and I can only open the last imported set.
DATA my_set1;
SET my_library.my_set1;
OPTIONS FMTSEARCH = (my_library.labels_my_set1);
RUN;
DATA my_set2;
SET my_library.my_set2;
OPTIONS FMTSEARCH = (my_library.labels_my_set2);
RUN;
The labels are set like this:
DATA labels;
SET formatted;
LABEL var_1 = 'label1'
var_2 = 'label2';
RUN;
DATA labels2;
SET labels;
PROC FORMAT LIBRARY = my_library.my_set1;
VALUE missing_num_labels . = 'Missing';
VALUE $missing_char_labels ' ' = 'Missing';
VALUE yes_no_labels 0 = 'No'
1 = 'Yes'
. = 'Missing';
RUN;
DATA labels2;
SET labels2;
OPTIONS FMTSEARCH = (my_library.my_set1);
FORMAT var_1 yes_no_labels.;
RUN;
I then do the exact same but for my_library.my_set2 instead of my_library.my_set1.
Thanks!
Here's my solution that worked, as #CarolinaJay65 suggested the OPTIONS is not specific for the dataset.
OPTIONS FMTSEARCH = (my_library.labels_my_set1 my_library.labels_my_set2);
DATA my_set1;
SET my_library.my_set1;
RUN;
DATA my_set2;
SET my_library.my_set2;
RUN;