Do we have any alternative for like operator(sql) in SAS datastep?
I am using below code for my requirement. but it is not working.
IF var1 ne : 'ABC' then new_var=XYZ;
Please anyone suggest what is wrong in this or suggest to me what the correct usage is for this situation.
Thanks,
In datastep, 'if' could be used with 'index/find/findw', but if you want to use 'like', you must use 'where' and 'like' together.
data want;
set sashelp.class;
where name like 'A%';
run;
You can use the find function,e.g.:
data want;
set sashelp.class;
if find(name,'e') then new_var='Y';
run;
The colon operator as you've used it only compares values that begin with the quoted string 'ABC'. Essentially SAS compares the 2 values, truncated to the smallest length of the 2. So if all the values in var1 are more than 3 characters, then it will truncate the values to 3 characters before comparing with 'ABC'.
It therefore differs from the like function in sql, which is used in conjunction with the % wildcard operator to determine whether to look at the beginning, end, or anywhere in the string.
To replicate like, you need to use a function such as find as recommended by #Amir, or index which is also commonly used in this situation.
Related
I am working on a SAS macro to validate if a macro variable is an valid SAS number or not. My solution is based on prxmacth() function:
%macro IsSASnumber(number);
%sysfunc(prxmatch(/^-?(?:\d+|\d*\.\d+)(?:e-?\d+)?|\.[a-z]?$/i,&number));
%mend;
There are several examples:
%put %IsSASnumber(123);
1
%put %IsSASnumber(1.23);
1
%put %IsSASnumber(-.12e-3);
1
%put %IsSASnumber(.N);
1
%put %IsSASnumber(.tryme);
0
My question is:
Is this regular expression covers all condition?
Is there a shorter or faster way to achieve this?
Ps: Assume the input is not empty.
If the goal is to support using the INPUT() function without generating error messages when the strings do not represent numbers then just use the ? or ?? modifiers to suppress the errors.
Since the INPUT() function does not care if the width used on the informat specification is larger then the length of the string being read just use the maximum width the informat supports. So just use:
number = input(variable,??32.);
You might also want to test the length of VARIABLE, the numeric informat can only handle strings up to 32 bytes long. You might want to remove any leading spaces.
if length(left(variable)) <= 32 then number=input(left(variable),??32.);
If you want strings like "N" or "X" to be treated as meaning the special missing values .N and .X then make sure to tell SAS that in advance by using the global MISSING statement. To support all 27 special missing values use a missing statement like this:
missing abcdefghijklmnopqrstuvwxyz_ ;
If you want to treat '.N' as meaning .N instead of . then you will need to test for that string. To test all of them you could use something like:
if missing(number) and length(variable)=2 and char(variable,1)='.'
then number=input(char(variable,2),??32.)
;
Note: make sure to use the name of an INFORMAT when using the INPUT() function. BEST is the name of a FORMAT (the name makes no sense as a name for an informat since there is only one way to represent a number as a number). If you use BEST as an INFORMAT SAS will just treat it as an alias for the normal numeric informat.
The %datatyp macro can determine all of these, but it fails at .N. You can simplify your use case this way:
%macro IsSASnumber(number);
%sysevalf(%datatyp(&number) = NUMERIC OR %sysfunc(prxmatch(/^\.[A-Z_]$|^\.$/i, &number)));
%mend;
This will match your numeric cases, and then you can match the . cases.
Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;
I'd like to use the following syntax
data new;
set old (where=(mystring in ('string1','string2',...,'string500')));
run;
in order to filter a very large input data set. The 500 strings at first are contained as numeric values in the variable "bbb" in the dataset "aux". So far I have created a macro variable which contains the required list of the 500 strings the following way:
proc sql noprint;
select bbb into :StringList1 separated by "',' "
from work.aux;
quit;
data _null_; call symputx('StringList2',compress("'&StringList1'")); run;
data new;
set old (where=(mystring in (&StringList2)));
run;
... which seems to work. But there is a warning telling me that
The quoted string currently being processed has become more than 262
characters long. You might have unbalanced quotation marks.
Results still seem to be plausible. Should I be worried that one day results might become wrong?
More importantly: I try to find a way to avoid using the compress function by setting up the
separated by "',' "
option in a way that does not contain blanks in the first place. Unfortunately the following seems not to work:
separated by "','"
It doesn't give me a eror message but when looking at the macro variable there is a multipage-mess of red line numbers (the color which usually denotes error messages), empty rows, minus signs, ... . The following screenshot shows part of the log after running this code:
proc sql noprint;
select vnr into :StringVar1 separated by "','"
from work.var_nr_import;
quit;
%put &StringVar1.;
Have already tried to make use of the STR()-function but no success so far.
I cannot replicate your error messages in SAS 9.3
If your variable is numeric you don't need quotes in the macro variable.
If it is character try using the QUOTE() function.
proc sql noprint;
select quote(bbb) into :StringList1 separated by " "
from work.aux;
quit;
A macro variable can only contain 65,534 characters. So if there are too many values of BBB then your macro variable value will be truncated. This could lead to unbalanced quotes. That is most likely the source of your errors.
Note that you can turn off the warning about the length of the quoted strings by using the NOQUOTELENMAX system option, but in this application you wouldn't want to because the individual quoted strings are not that long.
You will be better served to use another method to subset your data if lists this long are required.
This will work,
for double quotations
proc sql noprint;
select quote(bbb) into :StringList1 separated by ","
from work.aux;
quit;
for single quotations
proc sql noprint;
select "'"||bb||"'" into :StringList1 separated by ","
from work.aux;
quit;
Is there a function SAS proc SQL which i can use to extract left part of the string.it is something similar to LEFT function sql server. in SQL I have left(11111111, 4) * 9 = 9999, I would like to something similar in SAS proc SQL. Any help will be appreciated.
Had an impression you want to repeat the substring instead of multiply, so I'm adding REPEAT function just for the curiosity.
proc sql;
select
INPUT(SUBSTR('11111111', 1, 4), 4.) * 9 /* if source is char */
, INPUT(SUBSTR(PUT(11111111, 16. -L), 1, 4), 4.) * 9 /* if source is number */
, REPEAT(SUBSTR(PUT(11111111, 16. -L), 1, 4), 9) /* repeat instead of multiply */
FROM SASHELP.CLASS (obs=1)
;
quit;
substr("some text",1,4) will give you "some". This function works the same way in a lot of SQL implementations.
Also, note that this is a string function, but in your example you're applying it to a number. SAS will let you do this, but in general it's wise to control you conversion between strings and numbers with put() and input() functions to keep your log clean and be sure that you're only converting where you actually intend to.
You might be looking for SUBSTRN function..
SUBSTRN(string, position <, length>)
Arguments
string specifies a character or numeric constant, variable,
or expression.
If string is numeric, then it is converted to a character value that
uses the BEST32. format. Leading and trailing blanks are removed, and
no message is sent to the SAS log.
position is an integer that specifies the position of the first
character in the substring.
length is an integer that specifies the length of the substring. If
you do not specify length, the SUBSTRN function returns the substring
that extends from the position that you specify to the end of the
string.
As others have pointed out, substr() is the function you are looking for, although I feel that a more useful answer would also 'teach you how to fish'.
A great way to find out about SAS functions is to google sas functions by category which at the time of writing this post will direct you here:
SAS Functions and CALL Routines by Category
It's worth scanning through this list at least once just to get an idea of all of the functions available.
If you're after a specific version, you may want to include the SAS version number in your search. Note that the link above is for 9.2.
If you have scanned through all the functions, and still can't find what you are looking for, then your next option may be to write your own SAS function using proc fcmp. If you ever need assistance with doing this than I suggest posting a new question.
I am stuck in this one particular point. I have a character variable with observations extracted from rtf document. I need to keep only the observations from obs A to obs B. The firstobs and obs is not helpful here because we do not know the observation number beforehand. All we know is the two unique strings. For example in the dataset, I need to create a dataset with observations from obs 11 to 16. This is only part of dataset, the original dataset has over 1500 observations, that is why we use unique text to capture instead of observation number.
Thank you all in advance.
You don't explain enough, but odds are you can do something sort of like this if I understand you right (you have a "start" and a "stop" string in the document).
data want;
set have;
retain keep 0;
if strvar = "keepme" then keep=1;
if keep=1;
if strvar = "lastone" then keep=0;
run;
IE, have some condition set the keep variable to 1, then test for it, then have the off condition after that (assuming you want to keep the off condition row). Use string functions like index or find or scan to search for your particular string if it's not an entire string. You could also use regular expressions if necessary.