Check whether a SAS date is valid before using MDY()? - sas

Is there a way to check whether three variables (month, day, year) can actually build a valid SAS date format before handing those variables over to MDY() (maybe except checking all possible cases)?
Right now I am dealing with a couple of thousand input variables and let SAS put them together - there are a lot of date variables which cannot work like month=0, day=33, year=10 etc. and I'd like to catch them. Otherwise I will get way too many Notes like
NOTE: Invalid argument to function MDY(13,12,2014)
which then eventually culminate in Warnings like
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
I really would like too prevent getting those Warnings and I thought the best way would be to actually check the validity of the date - any recommendations?

Use an INFORMAT instead, then you can use the ?? modifier to suppress errors.
month=0;
day=33;
year=10;
date = input(cats(put(year,z4.),put(month,z2.),put(day,z2.)),??yymmdd8.);
SAS documentation: ? or ?? (Format Modifiers for Error Reporting)

Related

Put with macro variable as format

I have a dataset with a variable called pt with observations 8.1,8.2,8.3 etc and a variable called mean with values like 8.24 8.1 8.234 etc. Which are paired with each other.
I want to be able to set my put informat to the formats from the variable num.
I get the errors "Expecting an arithmetic expression"
"the symbol is not recognized and will be ignored" and "syntax error" from my code. (underlining the &fmt. part)
if pt=&type;
call symput("fmt",pt);
fmt_mean = putn(mean,&fmt.);
Thanks in advance for your help.
The macro processor's work is done before SAS compiles and runs the data step. So trying to place the value into a macro variable and then use it immediately to generate and execute SAS code will not work.
But since you are using the PUTN() function it can use the value of an actual variable, so there is no need to put the format into a macro variable.
fmt_mean = putn(mean,pt);
Please, post your data set and data step. Your description is hard to understand.
However the solution seems to be simple: do not use macro variables! You don't need them here. Unlike put() function which expect format know at compile time (that is when you can use macro variables) its analog putn() expects second argument to be variable. Of course, it works a little slower due to that permittance. So your code can look like that:
data ...;
set ...(keep=mean pt);
fmt_mean = putn(mean, pt);
run;
where pt variable maybe numeric, i.e. 8.2, or character, i.e. '8.2'.
If you want to understand how SAS macro works and what call symput does look here:
https://stackoverflow.com/a/69979074/7864377

SAS Data integration studio

Long time reader first-time questioner.
Using SAS Data Integration studio, when you create a summary transformation in the table options advanced tab you can add a where statement to your code automatically. Unfortunately, it adds some code that makes this resolve incorrectly. Putting the following in the where text box:
TESTFIELD = "TESTVALUE"
creates
%let _INPUT_options = %nrquote(WHERE = %(TESTFIELD = %"TESTVALUE%"%));
In the code, used
proc tabulate data = &_INPUT (&_INPUT_options)
But resolves to
WHERE = (TESTFIELD = "TESTVALUE")
_
22
ERROR: Syntax error while parsing WHERE clause. ERROR 22-322: Syntax
error, expecting one of the following: a name, a quoted string, a
numeric constant, a datetime constant,
a missing value, (, *, +, -, :, INPUT, NOT, PUT, ^, ~.
My question is this: Is there a way to add a function to the where statement box that would allow this quotation mark to be properly added here?
Note that all functions get the preceding % when added to the where statement automatically and I have no control over that. This seems like something that should be relatively easy to fix but I haven't found a simple way yet.
The % are simply escaping the " and () characters; they're perfectly harmless, really. The bigger problem is the %NRQUOTE "quotes" (which are nonprinting characters that tell SAS this is macro-quoted); they mess up the WHERE processing.
Use %UNQUOTE( ... ) to remove these.
Example:
data have;
testfield="TESTVALUE";
output;
testfield="AMBASDF";
output;
run;
%let _INPUT_options = %nrquote(WHERE = %(TESTFIELD = %"TESTVALUE%"%));
%put &=_input_options;
data want;
set have(%unquote(&_INPUT_options.));
run;
Thank you all for your responses. Long story short, I ended up creating a SAS Troubleshooting ticket. The analyst told me that they have now documented the issue, which should now be resolved in a future iteration of DI.
The temporary solution was to create a new transformation, with a slight alteration, adding an UNQOUTE (as mentioned above by Joe) to the source code before the input options:
proc tabulate data = &_INPUT (%unquote(&_INPUT_options)) %unquote(&procOptions);
For those interested you will need to create the transformation in a public subfolder of your project so others can use it. Not what I was hoping for, but a workable solution while waiting for the version update.

How to convert imported date variable to the original format in Stata?

My original date variable is like this 19jun2015 16:52:04. After importing, it looks like this: 1.77065e+12
The storage type for the new imported variable is str11 and display format is %11s
I wonder how I can restore it back to date format?
William Lisowski gives excellent advice in his comment. For anyone using date-times in Stata, there is a minimal level of understanding without which confusion and outright error are unavoidable. Only study of the help so that your specific needs are understood can solve your difficulty.
There is a lack of detail in the question which makes precise advice difficult (imported -- from what kind of file? using which commands and/or third party programs?), except to diagnose that your dates are messed up and can only be corrected by going back to the original source.
Date strings such as "19jun2015 16:52:04" can be held in Stata as strings but to be useful they need to be converted to double numeric variables which hold the number of milliseconds since the beginning of 1960. This is a number that people cannot interpret, but Stata provides display formats so that displayed dates are intelligible.
Your example is when converted a number of the order of a trillion but if held as a string with only 6 significant figures you have, at a minimum, lost detail irretrievably.
These individual examples make my points concrete. di is an abbreviation for the display command.
clock() (and also Clock(), not shown or discussed here: see the help) converts string dates to milliseconds since Stata's origin. With a variable, you would use generate double.
. di %23.0f clock("19jun2015 16:52:04", "DMY hms")
1750351924000
If displayed with a specific format, you can check that Stata is interpreting your date-times correctly. There are also many small variations on the default %tc format to control precise display of date-time elements.
. di %tc clock("19jun2015 16:52:04", "DMY hms")
19jun2015 16:52:04
The first example shows that even date-times which are recent dates (~2016) and in integer seconds need 10 significant figures to be accurate; the default display gives 4; somehow you have 6, but that is not enough.
. di clock("19jun2015 16:52:04", "DMY hms")
1.750e+12
You need to import the dates again. If you import them exactly as shown, the rest can be done in Stata.
See https://en.wikipedia.org/wiki/Significant_figures if that phrase is unfamiliar.

Warn if column is missing in a data step [duplicate]

This question already has an answer here:
Can I Promote Notes About Uninitialized Variables to Errors
(1 answer)
Closed 7 years ago.
I had a recent problem where I was using a data step to create an output file and one of the columns had been renamed. The data step executed normally filling in the now missing column with nulls without any errors or warnings. It did add a note in the log saying that a variable was undefined but otherwise there was no indication that anything was wrong.
Is there anyway to force the data step to error out or at least give a more noticeable warning in such a situation?
There is an undocumented system option which turns problematic notes into errors, including the uninitialized note. I find it very handy.
1 options dsoptions=note2err;
2 data a;
3 y=x;
4 run;
ERROR: Variable x is uninitialized.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.A may be incomplete. When this step was stopped there were 0 observations and 2 variables.
Put a list of the expected variables in the output file in a keep statement in your data step (or a keep= clause on the output dataset), and set option dkrocond = error; before running your data step. This is quite an old option (it goes back to at least SAS 9.1.3) so it should work in your scenario.
You can also trigger similar error messages if variables are missing from an input dataset by setting option dkricond = error;.
You can also set either of these to warn if you prefer.
Also, if you want a more general method of detecting whether a variable is present in a data set, you could try something like this:
data _null_;
dsid = open('sashelp.class');
vnum1 = varnum(dsid,'varname');
vnum2 = varnum(dsid,'sex');
rc = close(dsid);
put vnum1= vnum2=;
run;
The crucial behaviour here is that the varnum function returns 0 for variables that are not present in the opened dataset. All of the functions used above can be used with %sysfunc, so it's even possible to do this sort of check in pure macro code, i.e. without actually running a data step or proc.

SAS - Keeping only observations with all variables

I have a dataset of membership information, and I want to keep only the people who have been continuously enrolled for the entire year. There are 12 variables for each person, one for each month of the year with how many days during that month they were enrolled. Is there a way to make a subset of the data for just those with a value >1 for each of the month variables?
Thanks!
SAS has various summary functions that might well be what you're looking for. See min() (minimum) in particular, as it will allow you to find the minimum of several variables. You may also want to consider nmiss() (number of missing values) and n() (number of non-missing values) if you have to deal with missing values in your data.
Summary functions can be passed lists of variables like this (in a data step):
minimum = min(var1, var2, var3);
However, that can become long winded if you need to use a lot of variables. Fortunately, SAS provides several ways to reference lists of variables to make things neater. You can read about these variable lists here. To use them in a summary function use the of qualifier:
minimum = min(of var1-var12);
maximum = max(of var:);
blanks = nmiss(of _NUMERIC_);
Finally you will want to use your new found data to decide whether what data to include. To do this in a data step look at the output statement (user guide):
if min(of var:) > 1 then output;
Or if you feel like learning a bit more about SAS's syntax you could try using an implicit output by reading through the last link.
In general it's preferred to ask specific questions and show your current work on SO, and I'd advise using google to answer your basic questions while you're learning the fundamentals. There is plenty of great documentation available to help you out there.