I am working with data in SAS environment. I'm trying to replace missing values with blanks for an entire table. Missing values can be found in columns that are both character and numeric.
I tried using the following:
RSUBMIT; proc stdize data=Final_note out=final_note_blanks reponly missing=''; run; quit; ENDRSUBMIT;
but since some of the columns are operating on numeric data types, this won't work because '' is a character. Below is the error message I received:
Syntax error, expecting one of the following: a numeric constant, a datetime constant, ABW, AGK, AHUBER, AWAVE, EUCLEN, IQR, LEAST, MAD, MAXABS, MEAN, MEDIAN, MIDRANGE, RANGE, SPACING, STD, SUM, USTD. The symbol is not recognized and will be ignored
You cannot replace a missing numeric value with a blank. This can only be done with character values. However, you can represent missing values as blank for reporting purposes.
options missing = '';
This only changes the way it is presented. SAS still sees it as a missing numeric value.
Related
I am working on a SAS macro to validate if a macro variable is an valid SAS number or not. My solution is based on prxmacth() function:
%macro IsSASnumber(number);
%sysfunc(prxmatch(/^-?(?:\d+|\d*\.\d+)(?:e-?\d+)?|\.[a-z]?$/i,&number));
%mend;
There are several examples:
%put %IsSASnumber(123);
1
%put %IsSASnumber(1.23);
1
%put %IsSASnumber(-.12e-3);
1
%put %IsSASnumber(.N);
1
%put %IsSASnumber(.tryme);
0
My question is:
Is this regular expression covers all condition?
Is there a shorter or faster way to achieve this?
Ps: Assume the input is not empty.
If the goal is to support using the INPUT() function without generating error messages when the strings do not represent numbers then just use the ? or ?? modifiers to suppress the errors.
Since the INPUT() function does not care if the width used on the informat specification is larger then the length of the string being read just use the maximum width the informat supports. So just use:
number = input(variable,??32.);
You might also want to test the length of VARIABLE, the numeric informat can only handle strings up to 32 bytes long. You might want to remove any leading spaces.
if length(left(variable)) <= 32 then number=input(left(variable),??32.);
If you want strings like "N" or "X" to be treated as meaning the special missing values .N and .X then make sure to tell SAS that in advance by using the global MISSING statement. To support all 27 special missing values use a missing statement like this:
missing abcdefghijklmnopqrstuvwxyz_ ;
If you want to treat '.N' as meaning .N instead of . then you will need to test for that string. To test all of them you could use something like:
if missing(number) and length(variable)=2 and char(variable,1)='.'
then number=input(char(variable,2),??32.)
;
Note: make sure to use the name of an INFORMAT when using the INPUT() function. BEST is the name of a FORMAT (the name makes no sense as a name for an informat since there is only one way to represent a number as a number). If you use BEST as an INFORMAT SAS will just treat it as an alias for the normal numeric informat.
The %datatyp macro can determine all of these, but it fails at .N. You can simplify your use case this way:
%macro IsSASnumber(number);
%sysevalf(%datatyp(&number) = NUMERIC OR %sysfunc(prxmatch(/^\.[A-Z_]$|^\.$/i, &number)));
%mend;
This will match your numeric cases, and then you can match the . cases.
I have a problem with numeric and character values.
I did proc contents, so I have variable Poids as characters.
I want to use the following, but it does not change to numeric . best32. is used as it is demanded in problem. Do I do any mistake?
data X;
set Y;
Poids=input(Poids,best32.);
run;
Okay, I found the problem. I cannot have same variable defined as both character and numeric. To fix this problem, I have to rename initial variable in dataset options as shown below and then drop the renamed variable.
data X( drop =Poids_char;
set Y(rename=(Poids=Poids_char));
Poids=input(Poids_char,best32.);
run;
I am trying to reformat my variables in SAS using the put statement and a user defined format. However, I can't seem to get it to work. I want to make the value "S0001-001" convert to "S0001-002". However, when I use this code:
put("S0001-001",$format.)
it returns "S0001-001". I double-checked my format and it is mapped correctly. I import it from Excel, convert it to a SAS table, and convert the SAS table to a SAS format.
Am I misunderstanding what the put statement is supposed to be doing?
Thanks for the help.
Assuming that you tried something like this it should work as you intended.
proc format ;
value $format 'S0001-001' = 'S0001-002' ;
run;
data want ;
old= 'S0001-001';
new=put(old,$format.);
put (old new) (=:$quote.);
run;
Make sure that you do not have leading spaces or other invisible characters in either the variable value or the START value of your format. Similarly make sure that your hyphens are actual hyphens and not em-dash characters.
Do we have any alternative for like operator(sql) in SAS datastep?
I am using below code for my requirement. but it is not working.
IF var1 ne : 'ABC' then new_var=XYZ;
Please anyone suggest what is wrong in this or suggest to me what the correct usage is for this situation.
Thanks,
In datastep, 'if' could be used with 'index/find/findw', but if you want to use 'like', you must use 'where' and 'like' together.
data want;
set sashelp.class;
where name like 'A%';
run;
You can use the find function,e.g.:
data want;
set sashelp.class;
if find(name,'e') then new_var='Y';
run;
The colon operator as you've used it only compares values that begin with the quoted string 'ABC'. Essentially SAS compares the 2 values, truncated to the smallest length of the 2. So if all the values in var1 are more than 3 characters, then it will truncate the values to 3 characters before comparing with 'ABC'.
It therefore differs from the like function in sql, which is used in conjunction with the % wildcard operator to determine whether to look at the beginning, end, or anywhere in the string.
To replicate like, you need to use a function such as find as recommended by #Amir, or index which is also commonly used in this situation.
I'd like to use the following syntax
data new;
set old (where=(mystring in ('string1','string2',...,'string500')));
run;
in order to filter a very large input data set. The 500 strings at first are contained as numeric values in the variable "bbb" in the dataset "aux". So far I have created a macro variable which contains the required list of the 500 strings the following way:
proc sql noprint;
select bbb into :StringList1 separated by "',' "
from work.aux;
quit;
data _null_; call symputx('StringList2',compress("'&StringList1'")); run;
data new;
set old (where=(mystring in (&StringList2)));
run;
... which seems to work. But there is a warning telling me that
The quoted string currently being processed has become more than 262
characters long. You might have unbalanced quotation marks.
Results still seem to be plausible. Should I be worried that one day results might become wrong?
More importantly: I try to find a way to avoid using the compress function by setting up the
separated by "',' "
option in a way that does not contain blanks in the first place. Unfortunately the following seems not to work:
separated by "','"
It doesn't give me a eror message but when looking at the macro variable there is a multipage-mess of red line numbers (the color which usually denotes error messages), empty rows, minus signs, ... . The following screenshot shows part of the log after running this code:
proc sql noprint;
select vnr into :StringVar1 separated by "','"
from work.var_nr_import;
quit;
%put &StringVar1.;
Have already tried to make use of the STR()-function but no success so far.
I cannot replicate your error messages in SAS 9.3
If your variable is numeric you don't need quotes in the macro variable.
If it is character try using the QUOTE() function.
proc sql noprint;
select quote(bbb) into :StringList1 separated by " "
from work.aux;
quit;
A macro variable can only contain 65,534 characters. So if there are too many values of BBB then your macro variable value will be truncated. This could lead to unbalanced quotes. That is most likely the source of your errors.
Note that you can turn off the warning about the length of the quoted strings by using the NOQUOTELENMAX system option, but in this application you wouldn't want to because the individual quoted strings are not that long.
You will be better served to use another method to subset your data if lists this long are required.
This will work,
for double quotations
proc sql noprint;
select quote(bbb) into :StringList1 separated by ","
from work.aux;
quit;
for single quotations
proc sql noprint;
select "'"||bb||"'" into :StringList1 separated by ","
from work.aux;
quit;