Oracle REGEXP_LIKE logical and matching of substrings in string - regex

I have a string containing codes like 'code1 code2 code3'. It should return the string if all codes entered are contained in the string.
For example:
select * from (
select 'avs cde jkl' code from dual)
where REGEXP_LIKE(code, 'REGEX-MAGIC')
When the regex is now something like ^(?=.*\bjkl\b)(?=.*\bavs\b).*$ then it should return the code. But this syntax is not working for regex in oracle.
The logic is 'if all codes looked for are in the string (order does not matter), then return the code.'
I have researched and this would be achievable with a positive lookahead, but oracle does not support this as far as I know. I would search for one regex and not a construct like REGEXP_LIKE(...,..) and REGEXP_LIKE(...,..) and ....
The Oracle Version is 12c.
Any help would be appreciated!

Oracle does not support look-ahead, look-behind or word boundaries in regular expressions.
If you have the sample data:
CREATE TABLE table_name (code) AS
SELECT 'avs cde jkl' FROM DUAL UNION ALL
SELECT 'avs cde' FROM DUAL UNION ALL
SELECT 'jkl avs' FROM DUAL UNION ALL
SELECT 'cde jkl' FROM DUAL;
Option 1:
The simplest query is to not use regular expressions and to look for sub-string matches using multiple LIKE conditions:
SELECT code
FROM table_name
WHERE ' ' || code || ' ' LIKE '% avs %'
AND ' ' || code || ' ' LIKE '% jkl %'
Which outputs:
CODE
avs cde jkl
jkl avs
Option 2:
You could use (slower) regular expressions with multiple REGEXP_LIKE conditions:
SELECT code
FROM table_name
WHERE REGEXP_LIKE(code, '(^| )avs( |$)')
AND REGEXP_LIKE(code, '(^| )jkl( |$)')
Which outputs the same as above.
Option 3:
You could put the matches into a sub-query factoring clause and then use a LATERAL join:
WITH match_conditions (match) AS (
SELECT 'avs' FROM DUAL UNION ALL
SELECT 'jkl' FROM DUAL
)
SELECT code
FROM table_name t
CROSS JOIN LATERAL (
SELECT 1
FROM match_conditions
WHERE ' ' || code || ' ' LIKE '% ' || match || ' %'
HAVING COUNT(*) = (SELECT COUNT(*) FROM match_conditions)
)
Which outputs the same as above.
Option 4:
If you really want a single regular expression then you can generate each permutation of the codes to match and concatenate them into a single regular expression:
SELECT code
FROM table_name
WHERE REGEXP_LIKE(
code,
'(^| )avs( | .*? )jkl( |$)' -- Permutation 1
|| '|(^| )jkl( | .*? )avs( |$)' -- Permutation 2
)
Which outputs the same as above.
However, this is going to get problematic to maintain as the number of codes to match grows as, for 2 items there are 2 permutations but for 5 items there are 5! = 120 permutations.
Option 5:
You could declare a nested table collection:
CREATE TYPE string_list AS TABLE OF VARCHAR2(20);
Then split the string (again, you do not need slow regular expressions) and then compare it to a nested table:
WITH bounds (rid, code, spos, epos) AS (
SELECT ROWID, code, 1, INSTR(code, ' ', 1)
FROM table_name
UNION ALL
SELECT rid, code, epos + 1, INSTR(code, ' ', epos + 1)
FROM bounds
WHERE epos > 0
)
SEARCH DEPTH FIRST BY code SET order_rn
SELECT MAX(code) AS code
FROM bounds
GROUP BY rid
HAVING string_list('avs', 'jkl') SUBMULTISET OF CAST(
COLLECT(
CAST(
CASE epos
WHEN 0
THEN SUBSTR(code, spos)
ELSE SUBSTR(code, spos, epos - spos)
END
AS VARCHAR2(20)
)
)
AS string_list
);
Depending on the client application you are using, you can pass the entire string_list('avs', 'jkl') collection in as a single bind variable that you can populate from an array. Java (and some languages built on top of Java) using an ODBC driver can do this; C# cannot directly but you can pass an associative array and convert it to a nested table collection with a helper function.
Which outputs the same as above.
db<>fiddle here

I'm not good at regex-magix, but - see if something like this helps.
This is a table that contains those codes:
SQL> select * from codes;
ID CODE
---------- -----------
1 avs cde jkl
2 xyz avs
Query
splits every code into rows (t_split CTE)
does the same for the entered parameter (par_string) value (p_split CTE)
why? So that they could act as if they were rows in a table, and you can apply the MINUS set operator
if MINUS returns nothing, there's a match; otherwise it's a mismatch
SQL> with
2 -- split code to rows
3 t_split as
4 (select id,
5 code original_code,
6 regexp_substr(code, '[^ ]+', 1, column_value) code
7 from codes cross join
8 table(cast(multiset(select level from dual
9 connect by level <= regexp_count(code, ' ') + 1
10 ) as sys.odcinumberlist))
11 where id = &&par_id
12 ),
13 -- split parameter to rows
14 p_split as
15 (select regexp_substr('&&par_string', '[^ ]+', 1, level) code
16 from dual
17 connect by level <= regexp_count('&&par_string', ' ') + 1
18 )
19 --
20 -- if all parameter's "pieces" of code are contained in CODE value, MINUS returns nothing
21 -- so there's a match
22 select distinct t.original_code,
23 '&&par_string' par_string,
24 case when (select count(*)
25 from (select code from t_split
26 minus
27 select code from p_split
28 )
29 ) = 0 then 'Match'
30 else 'Mismatch'
31 end result
32 from t_split t
33 where t.id = &&par_id;
Enter value for par_id: 1
Enter value for par_string: jkl avs cde
ORIGINAL_CO PAR_STRING RESULT
----------- ----------- --------
avs cde jkl jkl avs cde Match
SQL> undefine par_string
SQL> /
Enter value for par_string: avs jkl www
ORIGINAL_CO PAR_STRING RESULT
----------- ----------- --------
avs cde jkl avs jkl www Mismatch
SQL>
Depending on tool you use (this is SQL*Plus), you might need to replace && with a colon :; or, convert such a piece of code to a function.

Related

check if string has the format : 2 lettes au max following with numbers with regexp_like oracle

Could you please tell me the expression regexp_like to check if a string has the format which begins with one or two letters max followed by numbers. For example, like 'A25', 'AB567'.
Thanks
How about
SQL> with test (col) as
2 (select 'A25' from dual union all
3 select 'AB567' from dual union all
4 select 'A12A532A' from dual union all
5 select 'ABC123' from dual union all
6 select '12XYZ34' from dual
7 )
8 select col,
9 case when regexp_like(col, '^[[:alpha:]]{1,2}[[:digit:]]+$') then 'OK'
10 else 'Wrong'
11 end result
12 from test;
COL RESULT
---------- ----------
A25 OK
AB567 OK
A12A532A Wrong
ABC123 Wrong
12XYZ34 Wrong
SQL>
The regular expression would be ^[A-Z]{1,2}[0-9]+$.
Working demo on db<>fiddle here
This match a string that contain only numbers and letters, that start with a letter, when there is never more than 2 letters in a row.
^(([A-Za-z]){1,2}([0-9])+)*\2{0,2}$

Extract Date Along with Am or pm in oracle

I want to get the time pattern along with AM or PM from the given string Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv
I tried the following:
Select regexp_substr(filename,'\d{4}',1,3)
From
(Select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' filename from dual);
which only gives me the last number, e.g. 0329, but I need 0329PM.
Using this form of REGEXP_SUBSTR() will get what you need in one call. It returns the first group, which is the set of characters after the last underscore and before the literal period of 1 or more numbers followed by an A or P then an M.
with tbl(filename) as (
Select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv'
from dual
)
select regexp_substr(filename, '_(\d+[AP]M)\.', 1, 1, NULL, 1)
From tbl;
Actually, to tighten up the match you could make it case-insensitive and add the extension:
select regexp_substr(filename, '_(\d+[AP]M)\.csv', 1, 1, 'i', 1)
From tbl;
Note if a match is not found NULL will be returned.
Nested substr is one option (if data always looks like this; you didn't say it doesn't):
SQL> with test (col) as
2 (select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' from dual)
3 select substr(substr(col, -10), 1, 6) result from test
4 /
RESULT
------
0329PM
SQL>
the inner substr returns the last 10 characters (0329PM.csv)
the outer substr returns the first 6 characters out of it (0329PM)
Or, using regular expressions:
SQL> with test (col) as
2 (select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' from dual)
3 select regexp_substr(translate(col, '_.', ' '), '\S+',
4 1,
5 regexp_count(translate(col, '_.', ' '), '\S+') - 1
6 ) result
7 from test;
RESULT
------
0329PM
SQL>
line #3: translate replaces underlines and dots with a space
line #4: start from the beginning
line #5: return substring which is one before the last one

Oracle SQL: Searching for $ in Regexp

I want to search my response field to find any instance where a dollar sign $ is not followed by a numerical value. 1 or 2 spaces before a numerical value is ok, but there shouldn't be any text values following $.
I have the following query below:
SELECT * FROM RESPONSES
WHERE (regexp_like(response, '(\$)(\s){1,2}[^0-9]'));
This should be able to identify responses that have "$ NA". Most responses will contain a combination of $ followed by numeric values and $ by text values.
I've tried a couple of variations of the above query without any success. Any thoughts?
You can use this:
SELECT *
FROM dual
WHERE
SIGN(REGEXP_INSTR (RESPONSE, '(\$)(\s){2}[^0-9]'))=0
Include the space character in your negated character set
Since a space character qualifies as a non-digit character, a second space can give a "false positive" for the data set you want to find.
SCOTT#db>WITH smple AS (
2 SELECT
3 '23 dkf $ 1' response
4 FROM
5 dual
6 UNION ALL
7 SELECT
8 '23 dkfg gjg $ 4'
9 FROM
10 dual
11 UNION ALL
12 SELECT
13 '$ NA'
14 FROM
15 dual
16 ) SELECT
17 s.*
18 FROM
19 smple s
20 WHERE
21 ( REGEXP_LIKE ( s.response,
22 '\$\s{1,2}[^ 0-9]+' ) );
RESPONSE
----------
$ NA
You may use :
select * from responses where regexp_like(a, '^\$\s')
to get values begin with a $ sign and followed with at least one space as consecutive character(s).
with t as
(
select '$ 524' as a from dual union all
select '$524' as a from dual union all
select '$ s67e' as a from dual union all
select '# 67e' as a from dual union all
select '$s67e' as a from dual union all
select '$#67e' as a from dual
)
select * from t where regexp_like(a, '^\$\s')
A
----
$ 524
$ s67e
Demo

Oracle REGEXP_REPLACE replace space in middle with empty string

I'm trying to use the Oracle REGEXP_REPLACE function to replace a whitespace (which is in the middle of a string) with an empty string.
One of my columns contains strings like the following one.
[alphanumeric][space][digits][space][alpha] (eg. R4SX 315 GFX)
Now, I need to replace ONLY the second whitespace (the whitespace after the digits) with an empty string (i.e. R4SX 315 GFX --> R4SX 315GFX)
To achieve this, I tried the following code:
SELECT REGEXP_REPLACE(
'R4SX 315 GFX',
'([:alphanum:])\s(\d)\s([:alpha:])',
'\1 \2\3') "REPLACED"
FROM dual;
However, the result that I get is the same as my input (i.e. R4SX 315 GFX).
Can someone please tell me what I have done wrong and please point me in the right direction.
Thanks in advance.
[:alphanum:]
alphanum is incorrrect. The alphanumeric character class is [[:alnum:]].
You could use the following pattern in the REGEXP_REPLACE:
([[:alnum:]]{4})([[:space:]]{1})([[:digit:]]{3})([[:space:]]{1})([[:alpha:]]{3})
Using REGEXP
SQL> SELECT REGEXP_REPLACE('R4SX 315 GFX',
2 '([[:alnum:]]{4})([[:space:]]{1})([[:digit:]]{3})([[:space:]]{1})([[:alpha:]]{3})',
3 '\1\2\3\5')
4 FROM DUAL;
REGEXP_REPL
-----------
R4SX 315GFX
SQL>
If you are not sure about the number of characters in each expression of the pattern, then you could do:
SQL> SELECT REGEXP_REPLACE('R4SX 315 GFX',
2 '([[:alnum:]]+[[:blank:]]+[[:digit:]]+)[[:blank:]]+([[:alpha:]]+)',
3 '\1\2')
4 FROM dual;
REGEXP_REPL
-----------
R4SX 315GFX
SQL>
Using SUBSTR and INSTR
The same could be done with substr and instr which wouldbe less resource consuming than regexp.
SQL> WITH DATA AS
2 ( SELECT 'R4SX 315 GFX' str FROM DUAL
3 )
4 SELECT SUBSTR(str, 1, instr(str, ' ', 1, 2) -1)
5 ||SUBSTR(str, instr(str, ' ', 1, 2) +1, LENGTH(str)-instr(str, ' ', 1, 2)) new_str
6 FROM DATA;
NEW_STR
-----------
R4SX 315GFX
SQL>
Your regex contains an invalid class alphanum. Also, these classes must be used inside character classes [...]. Instead of \s, you need to use a supported [:blank:] class. More details on the regex syntax in MySQL can be found here.
I recommend using
SELECT REGEXP_REPLACE(
'R4SX 315 GFX',
'([[:alnum:]]+[[:blank:]]+[[:digit:]]+)[[:blank:]]+([[:alpha:]]+)'
, '\1\2') "REGEXP_REPLACE"
FROM dual;
This way you will use just 2 capturing groups. The less we have the better is for performance. Here you can see more details on REGEXP_REPLACE function.

how to find consecutive repetitive characters in oracle column

Is there any way to find consecutive repetitive characters like 1414, 200200 in a varchar column of an oracle table.
how can we achieve it with regexp ?
Im failing to achieve it with regexp
im my example i can get a consecutive repetition of a number but not a pattern
select regexp_substr('4120066' ,'([[:alnum:]])\1', 7,1,'i') from dual; -- getting output as expected
select regexp_substr('6360360' ,'([[:alnum:]])\1', 7,1,'i') from dual; -- i want to select this also as i have 360 followed by 360
You should be able to use something like this:
[...] WHERE REGEXP_LIKE(field, '(\d+?)\1')
If you're looking for any repetition of characters, or:
[...] WHERE REGEXP_LIKE(field, '^(\d+?)\1$')
If you want to check the whole string in the field.
\d+? will match digits.
( ... ) will store those digits.
\1 refers to the captured digits.
Note: Change to \d to . if you are not checking digits only.
Try this :
SQL> WITH t AS
2 (SELECT '1414,200200,11,12,33,33,1234,1234' test_string
3 FROM DUAL)
4 SELECT LTRIM (SYS_CONNECT_BY_PATH (test_string, ','), ',') names
5 FROM (SELECT ROW_NUMBER () OVER (ORDER BY test_string) rno, test_string
6 FROM (SELECT DISTINCT REGEXP_SUBSTR (test_string,
7 '[^,]+',
8 1,
9 LEVEL
10 ) test_string
11 FROM t
12 CONNECT BY LEVEL <=
13 LENGTH (test_string)
14 - LENGTH (REPLACE (test_string,
15 ',',
16 NULL
17 )
18 )
19 + 1))
20 WHERE CONNECT_BY_ISLEAF = 1 AND ROWNUM = 1
21 CONNECT BY rno = PRIOR rno + 1;
NAMES
--------------------------------------------------------------------------------
11,12,1234,1414,200200,33
None of the delimited string repeat !