use regular expression in case statements - regex

I try to use regular expression in case statements in SAS as follows:
proc sql;
create table lib_name.CIS_Ser_flat_Info as
select NO, company,
case when prxmatch(prxparse("^(map)|^(mb\s?person)"), upper(company))>0 then 1 else 0 end as map_flag,
from map_info;
quit;
But is still shows the problem as follows:
Syntax error, expecting one of the following: !, !!, &, (, *, **, +,
',', -, /, <, <=,
<>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET,
LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
The table looks like:
No company
1 saura
2 maybe

There are a few issues in your code.
For starters, upper() is not a SAS function. If you're trying to convert the value of company to uppercase, the function is upcase().
Secondly, regular expression patterns must start and end with slashes /.
Lastly, you don't need to use prxparse() within prxmatch.
The following should do what you want:
proc sql;
create table lib_name.CIS_Ser_flat_Info as
select NO, company,
case when prxmatch("/^(map)|^(mb\s?person)/", upcase(company)) then 1 else 0 end as map_flag,
from map_info;
quit;

Related

Formula for filtering dates

I have an issue with my formula to select a date, using PROc SQL in SAS.
What do I do wrong? Thank you all.
Regards, Geoff
I use this coder:
FROM
VTXBUSS.s
WHERE
t1.ifrs_stage_date_at_start IS ’ 31DEC2017% ’
ORDER BY
t1.customer_id,
I get a syntax error back. Dates are in this format: 31DEC2017:00:00:00.000000
SAS gives as log:
ERROR: The value '’'n is not a valid SAS name. WARNING: Apparent
invocation of macro ’ not resolved. 45 WHERE
t1.ifrs_stage_at_start NOT = t1.ifrs_stage_PROV AND
t1.ifrs_stage_date_at_start = ’31DEC2017%’,
_
76 ERROR 22-322: Syntax error, expecting one of the following: ;, !,
!!, &, (, *, **, +, -, '.', /, <, <=, <>, =, >, >=, AND, EQ, EQT,
EXCEPT, GE, GET, GROUP, GT, GTT, HAVING, INTERSECT, LE, LET, LT, LTT, NE, NET, NOT, OR, ORDER, OUTER, UNION, ^, ^=,
|, ||, ~, ~=.
Try this:
FROM
VTXBUSS.s
WHERE
t1.ifrs_stage_date_at_start = '31DEC2017'd
ORDER BY
t1.customer_id,
If you're looking for an specific "datetime", then you should use, for example:
FROM
VTXBUSS.s
WHERE
t1.ifrs_stage_date_at_start = '31DEC2017:00:00:00.000000'dt
ORDER BY
t1.customer_id,
That is because, when dealing with dates in PROC SQL, you must specify the type of data you're looking for; putting "d" if DATE or "dt" if DATETIME
If you want to filter DATETIME values by what DATE they represent you could convert the values to a DATE value.
WHERE datepart(t1.ifrs_stage_date_at_start) = '31DEC2017’d
Or convert to a specific DATETIME value
WHERE intnx('dtday',t1.ifrs_stage_date_at_start,0) = '31DEC2017:00:00’dt
Use a range of DATETIME values
WHERE t1.ifrs_stage_date_at_start between '31DEC2017:00:00’dt and '01JAN2018:00:00’dt
Or possibly convert to a character string
WHERE put(t1.ifrs_stage_date_at_start,datetime20.-L) like '31DEC2017:%'

Substring in inner join with contains sas

I'm trying to run the following code
PROC SQL;
CREATE TABLE _check AS
SELECT DISTINCT * FROM table1 as a
INNER JOIN table2 as b
ON ON UPCASE(b.Cname) contains UPCASE(SUBSTRING(CNAMN, 1, 4))
;
quit;
but I get the error
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /, <, <=,
<>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, IN, IS,
LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ^, ^=, |, ||, ~, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
any clues? What is the correct syntax to handle substrings in a contains?
Thanks in advance

SAS: converting a datetime to date in where clause

I have a column in a sas dataset that is of datetime 25.6 format; lets call this column datetime. I want to convert it to Date9 format in a where clause and check against a certain date or date variable.
I currently have the following code:
proc sql;
Select rowid, name, dob, country
from db.testTable
where cast(datetime as date9.) eq '14sep2014'd
;
quit;
I get an error when i run the above code:
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, ), *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, GE, GET, GT, GTT, IN, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ^, ^=, |, ||, ~, ~=.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
I get the same error message if use the following
proc sql;
Select rowid, name, dob, country
from db.testTable
where cast(datetime as date9.) = '14sep2014'd
;
quit;
Is there a better way to cast a datetime to date9 format in SAS?
Any help on this will be greatly appreciated
In SAS you would use the datepart() function to extract the date value from the datetime value:
where datepart(datetime) = '14sep2014'd
There is no need to specify a format, as it does not affect the underlying value.
A second option, that undoubtedly is slightly inferior to datepart unless you have a really large dataset (in which it is a bit faster probably as you have a (hopefully) constant on one side of the expression rather than a function call for each iteration), is to use DHMS to make a datetime:
data _null_;
x=dhms('01JAN2010'd,0,0,0);
put x=;
run;
or even use a datetime constant explicitly:
data _null_;
x='01JAN2010:00:00:00'dt;
put x=;
run;
(I would expect them to be of identical timings, as SAS should optimize the first to the second, since it's a constant expression - but who knows).
And on a side note, SAS has two primitive types: numeric and character. input turns character to numeric, and put turns numeric to character - the equivalent of cast. Anything else in SAS (e.g., dates) are simply formats applied to numbers (or characters) and have their own special functions (like datepart here).

ERROR 22-322: Syntax error, expecting one of the following: using CAST

I am trying to use some SQL code in SAS within a proc SQL. The original code in DB2 had this which is working fine.
I get the below syntax error at as
541 as NC_2,SUM ( CASE WHEN A.R_1='N' AND A.R_2='N' AND A.R_4='Y' then 1 else 0
541 ! end ) as NC_4 FROM ( SELECT CASE WHEN (LENGTH(TRIM(TRANSLATE(cast(ABC_CT as char(4000)), '
__
22
202
541 ! ',
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, ), *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND,
BETWEEN, CONTAINS, EQ, EQT, GE, GET, GT, GTT, IN, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ^, ^=, |, ||,
~, ~=.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
Can someone help me?
CAST is a conversion/type function - i.e. put in this format. In SAS translate is to replace occurrences of letters so not the same category of functions.
I think you're looking for the PUT function that will convert a numeric to character, assuming ABC_CT is numeric.
put(ABC_CT, 4000.)
Or you can use SQL PASS THROUGH which will run the DB2 code directly on the DB2 server and uses DB2 syntax.
Example of the type of syntax from the documentation:
proc sql;
connect to db2 as db1 (database=sample);
select *
from connection to db1
(select
* from sasdemo.customers
where
customer like '1%');
disconnect from db1;
quit;
http://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001348730.htm
Cast function is not present in sas. This may be a issue if you are accesing a db2 table directly from sas.

Extracting strings using Oracle REGEXP_SUBSTR

I am using REGEXP_SUBSTR in Oracle 11g and I am having difficulty trying to extract the following strings.
My query is:
SELECT regexp_substr('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', '[^CN=]*\,', 1, rownum) line
FROM dual
CONNECT BY LEVEL <= length('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,') -
length(REPLACE('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', ',', ''))
From this query, I am having issues trying to match on exact string 'CN=' as from this query, I need the output to appear as follows:
CN=aTIGERAdmin-Admin,
CN=D0902498,
CN=ea90045052,
CN=aTIGERCall-Admin,
And in this format, with the comma at the end.
The way I am doing it at the moment is chopping off the "CN=" but I actually require this part.
I think this will return the resultset you are looking for:
SELECT REGEXP_SUBSTR(d.s,'CN=.*?,', 1, ROWNUM) line
FROM (SELECT 'CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,'
AS s FROM dual) d
CONNECT BY LEVEL <= LENGTH(d.s) - LENGTH(REPLACE(d.s,',',''))
The regular expression trick used here is to specify the ? modifier (following the .*) to make the match "non-greedy". The default match (without the ? modifier) is "greedy" in that it will match as much of the string as possible. In your case, you want the match to end at the first comma found. The intent here is to match literal string 'CN=' followed by any number of characters (zero, one or more) up to the first comma encountered.
This will work in Oracle 10g as well as 11g.
In 11g, the REGEXP_COUNT function can replace your "count of comman" calculation of occurrences.
CONNECT BY LEVEL <= REGEXP_COUNT(d.s,'CN=.*?,')
(BTW... by using a subquery to return the literal string, the literal string only has to be specified once. That makes it much easier to change the string for testing, rather than having to change it in multiple places.)
Addendum:
I can confirm that the comma is included in the returned value. Sample output:
LINE
-----------------------
CN=aTIGERAdmin-Admin,
CN=D0902498,
CN=ea90045052,
CN=aTIGERCall-Admin,
I'm not an LDAP master, but will the regular expression CN=[^,]+ (C, then N, then equals sign, greedily followed by more than one non-comma) work for you?
Also, do you know about REGEXP_COUNT, new in 11g?
SQL> SELECT REGEXP_SUBSTR('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+', 1, ROWNUM) line
2 FROM dual
3 CONNECT BY LEVEL <= REGEXP_COUNT('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+')
4 /
LINE
----------------------------------------------------------------------------------------------------
CN=aTIGERAdmin-Admin
CN=D0902498
CN=ea90045052
CN=aTIGERCall-Admin
SQL>