Oracle SQL Regex not returning expected results

Oracle SQL Regex not returning expected results - regex

I am using a regex that works perfectly in Java/PHP/regex testers.
\d(?:[()\s#-]*\d){3,}
Examples: https://regex101.com/r/oH6jV0/1
However, trying to use the same regex in Oracle SQL is returning no results. Take for example:
select *
from
(select column_value str from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3')))
where regexp_like(str, '\d(?:[()\s#-]*\d){3,}');
This returns no rows. Why does this act so differently? I even used a regex tester that does POSIX ERE, but that still works.

Oracle does not support non-capturing groups (?:). You will need to use a capturing group instead.
It also doesn't like the perl-style whitespace meta-character \s match inside a character class [] (it will match the characters \ and s instead of whitespace). You will need to use the POSIX expression [:space:] instead.
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
select *
from (
select column_value str
from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3'))
)
where regexp_like(str, '\d([()[:space:]#-]*\d){3,}')
Results:
| STR |
|--------|
| 1234 |
| 12345 |
| 12 135 |

Related

Replacing multiple special characters in oracle

I have a requirement in oracle to replace the special characters at first and last position of the column data.
Requirement: only [][.,$'*&!%^{}-?] and alphanumberic characters are allowed to stay in the address data and rest of the characters has to be replaced with space.I have tried in below way in different probabilities but its not working as expected. Please help me in resolving this.
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^[[][.,$'\*&!%^{}-?\]]]|[^[[][.,$'\*&!%^{}-?\]]]$'
) AS simplified_emp_address
FROM table_name

As per the regular expression operators and metasymbols documentation:
Put ] as the first character of the (negated) character group;
- as the last; and
Do not put . immediately after [ or it can be matched as the start of a coalition element [..] if there is a second . later in the expression.
Also:
Double up the single quote (to escape it, so it does not terminate the string literal); and
Include the non-special characters a-zA-Z0-9 in the capture group too otherwise they will be matched.
Which gives you the regular expression:
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^][,.$''\*&!%^{}?a-zA-Z0-9-]|[^][,.$''\*&!%^{}?a-zA-Z0-9-]$'
) AS simplified_emp_address
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (emp_address) AS
SELECT '"test1"' FROM DUAL UNION ALL
SELECT '$test2$' FROM DUAL UNION ALL
SELECT '[test3]' FROM DUAL UNION ALL
SELECT 'test4' FROM DUAL UNION ALL
SELECT '|test5|' FROM DUAL;
Outputs:
EMP_ADDRESS
SIMPLIFIED_EMP_ADDRESS
"test1"
test1
$test2$
$test2$
[test3]
[test3]
test4
test4
|test5|
test5
db<>fiddle here

You do not need regular expressions, because they will have cumbersome escape sequences. Use substrings and translate function:
with a as (
select
'some [data ]' as val
from dual
union all
select '{test $' from dual
union all
select 'clean $%&* value' from dual
union all
select 's' from dual
)
select
translate(substr(val, 1, 1), q'{ [][.,$'*&!%^{}-?]}', ' ')
|| substr(val, 2, lengthc(val) - 2)
|| case
when lengthc(val) > 1
then translate(substr(val, -1), q'{ [][.,$'*&!%^{}-?]}', ' ')
end
as value_replaced
from a
| VALUE_REPLACED |
| :--------------- |
| some [data |
| test |
| clean $%&* value |
| s |
db<>fiddle here

Match a word in a list of words regex

I want the user to only be able to enter the values in the following regex:
^[AB | BC | MB | NB | NL | NS | NT | NU | ON |QC | PE | SK | YT]{2}$
My problem is that words like : PP AA QQ are accepted.
I am not sure how i can prevent that ? Thank you.
Site i use to verify the expression : https://regex101.com/

In most RegExp flavors, square brackets [] denotate character classes; that is, a set of individual tokens that can be matched in a specific position.
Because P is included in this character class (along with a quantifier of {2}) PP is matched.
Instead, you seem to want a group with alternatives; for that, you'd use parenthesis () (while also eliminating the whitespace, something it doesn't appear was intentional on your part):
^(AB|BC|MB|NB|NL|NS|NT|NU|ON|QC|PE|SK|YT){2}$
RegEx101
This matches things like ABBC, ABAB, NLBC, etc.

Regex Oracle not matching as expected

I need to match and replace string like VA123 - so two letters and 3 numbers, but this expression is not working as intended. Any idea where I am going off?
SELECT REGEXP_REPLACE ('test VA123', '^\[A-Z]{2}[0-9]{3}$', 'test')
FROM dual;
I want the output in this case to say test test

It you want white space (or start of a string) before the matched string then you can use:
SELECT REGEXP_REPLACE ('test VA123', '(^|\s)[A-Z]{2}[0-9]{3}$', '\1test')
AS replaced_value
FROM dual;
| REPLACED_VALUE |
| :------------- |
| test test |
db<>fiddle here

How to write fuzzy multiple substring matching when using RLIKE in Hive

For example:
df.select('category').show()
+---------------------------+
| category|
+---------------------------+
| money,insurance|
| life, housework|
| game,FPS,network|
| game,fight,jump|
| hotel|
| trip,hotel|
| null|
I want to use RLIKE to write a regex expression to fuzzy match one of substrings list, ['money', 'life'].
-- This is an exact match
SELECT *
FROM tb_name
WHERE col_name RLIKE '(money|life)'
-- This is a fuzzy match
SELECT *
FROM tb_name
WHERE col_name RLIKE '*.(money|life)'
BUT there is error in ast tree in the fuzzy match code snippet.
06-11 16:59:17-fatal filter ast tree
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TAB tb_name))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR "hdfs://XXXX/XX")) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (RLIKE (TOK_TABLE_OR_COL col_name ) '*.(money|life)')) (TOK_LIMIT 2000)))
06-11 16:59:17-fatal Filter feature: .TOK_TAB \S tdw_inter_db.*|.TOK_(CUBE|ROLLUP) .
So I can't see anything wrong with the fuzzy match code snippet.
So could anyone help me?
Thanks in advances.

'(?i)money|life' regexp will match strings containing any of money, life, case insensitive - (?i)

Postgres regex to delimit multiple optional matches

Suppose a text field needs to be delimited in PostgreSQL. It is formatted as 'abcd' where each variable can be any one of: 1.4, 3, 5, 10, 15, 20 or N/A. Here is a query with some examples, followed by their expected results:
WITH example AS(
SELECT '10N/AN/AN/A' AS bw
UNION SELECT '1010N/AN/A'
UNION SELECT '101020N/A'
UNION SELECT '35N/A1.4'
UNION SELECT '1010N/A10'
UNION SELECT '105N/AN/A'
UNION SELECT '1.43N/A20'
)
SELECT
bw
,regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(bw, '(1\.4)', E'\\&|', 'g')
, '(3)', E'\\&|', 'g')
, '(5)', E'\\&|', 'g')
, '(10)', E'\\&|', 'g')
, '(15)', E'\\&|', 'g')
, '(20)', E'\\&|', 'g')
, '(N/A)', E'\\&|', 'g')
FROM
example
Results:
bw:text, regexp_replace:text
'1010N/AN/A', '10|10|N/A|N/A|'
'1010N/A10', '10|10|N/A|10|'
'35N/A1.4', '3|5|N/A|1.4|'
'1.43N/A20', '1.4|3|N/A|20|'
'105N/AN/A', '10|5|N/A|N/A|'
'101020N/A', '10|10|20|N/A|'
'10N/AN/AN/A','10|N/A|N/A|N/A|'
I'm not worried about the trailing pipe '|' since I can deal with it. This gets me what I want, but I'm concerned I could be doing it more succinctly. I experimented with putting each of the capture groups in a single regexp_replace statement while scouring through the documentation, but I was unable to get these results.
Can this be achieved within a single regexp_replace statement?

You may build a (1\.4|3|5|1[50]|20|N/A) capturing group with alternation operators separating the alternatives and replace with \1|:
select regexp_replace('35N/A1.4', '(1\.4|3|5|1[50]|20|N/A)', '\1|','g');
-- 35|N/A|1.4|
See the online demo
Details
( - starting the capturing group construct
1\.4 - 1.4 substring (. must be escaped in order to be parsed as a literal dot, else, it matches any char)
| - or
3 - a 3 char
| - or
5 - a 5 char
| - or
1[50] - 1 followed with either 5 or 0 (the [...] is called a bracket expression where you may specify chars, char ranges or even character classes)
| - or
20 - a 20 substring
| - or
N/A - a N/A substring
) - end of the capturing group.
The \1 in the replacement pattern is a numbered replacement backreference (also called a (group) placeholder) that references the value captured into Group 1.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Oracle SQL Regex not returning expected results - regex

Related

Replacing multiple special characters in oracle

Match a word in a list of words regex

Regex Oracle not matching as expected

How to write fuzzy multiple substring matching when using RLIKE in Hive

Postgres regex to delimit multiple optional matches

Categories

Resources