Replacing multiple special characters in oracle - regex

I have a requirement in oracle to replace the special characters at first and last position of the column data.
Requirement: only [][.,$'*&!%^{}-?] and alphanumberic characters are allowed to stay in the address data and rest of the characters has to be replaced with space.I have tried in below way in different probabilities but its not working as expected. Please help me in resolving this.
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^[[][.,$'\*&!%^{}-?\]]]|[^[[][.,$'\*&!%^{}-?\]]]$'
) AS simplified_emp_address
FROM table_name

As per the regular expression operators and metasymbols documentation:
Put ] as the first character of the (negated) character group;
- as the last; and
Do not put . immediately after [ or it can be matched as the start of a coalition element [..] if there is a second . later in the expression.
Also:
Double up the single quote (to escape it, so it does not terminate the string literal); and
Include the non-special characters a-zA-Z0-9 in the capture group too otherwise they will be matched.
Which gives you the regular expression:
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^][,.$''\*&!%^{}?a-zA-Z0-9-]|[^][,.$''\*&!%^{}?a-zA-Z0-9-]$'
) AS simplified_emp_address
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (emp_address) AS
SELECT '"test1"' FROM DUAL UNION ALL
SELECT '$test2$' FROM DUAL UNION ALL
SELECT '[test3]' FROM DUAL UNION ALL
SELECT 'test4' FROM DUAL UNION ALL
SELECT '|test5|' FROM DUAL;
Outputs:
EMP_ADDRESS
SIMPLIFIED_EMP_ADDRESS
"test1"
test1
$test2$
$test2$
[test3]
[test3]
test4
test4
|test5|
test5
db<>fiddle here

You do not need regular expressions, because they will have cumbersome escape sequences. Use substrings and translate function:
with a as (
select
'some [data ]' as val
from dual
union all
select '{test $' from dual
union all
select 'clean $%&* value' from dual
union all
select 's' from dual
)
select
translate(substr(val, 1, 1), q'{ [][.,$'*&!%^{}-?]}', ' ')
|| substr(val, 2, lengthc(val) - 2)
|| case
when lengthc(val) > 1
then translate(substr(val, -1), q'{ [][.,$'*&!%^{}-?]}', ' ')
end
as value_replaced
from a
| VALUE_REPLACED |
| :--------------- |
| some [data |
| test |
| clean $%&* value |
| s |
db<>fiddle here

Related

Capture the last group/word

I want to capture the last word from the matched regexp. Here’s my query.
SELECT REGEXP_SUBSTR(
'The;quick;brown;fox;jumps;over;the;lazy;dog','^([^;]*;){5}([^;]*)') REF
FROM
DUAL
Desired result: over
Actual Result: The;quick;brown;fox;jumps;over
I can do subregex but it will affect the performance if there are million of records…
Nested Regex
SELECT REGEXP_SUBSTR(REGEXP_SUBSTR(
'The;quick;brown;fox;jumps;over;the;lazy;dog',
'^([^;]*;){5}([^;]*)'),'[^;]*$') REF
FROM
DUAL
Don't use regular expressions if you are worried about performance (as they are slow), just use normal string functions:
SELECT SUBSTR(
value,
INSTR(value, ';', 1, 5) + 1,
INSTR(value, ';', 1, 6) - INSTR(value, ';', 1, 5) - 1
) AS DATA
FROM table_name;
If you did want to use a regular expression then just extract the value of a capturing group:
SELECT REGEXP_SUBSTR(value, '(.*?);', 1, 6, NULL, 1) AS data
-- ^ Start from
-- ^ Occurrence
-- ^ Capturing group to extract
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'The;quick;brown;fox;jumps;over;the;lazy;dog' FROM DUAL;
Both output:
DATA
over
db<>fiddle here
If you want to use REGEXP_SUBSTR then use its fourth parameter for the occurrence you are looking for:
SELECT REGEXP_SUBSTR(
'The;quick;brown;fox;jumps;over;the;lazy;dog',
'[^;]+',
1,
6) AS ref
FROM dual;
Docs: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099

How can I extract a word from a string using Oracle regexp_substr?

I'm trying to extract a word from a string using Oracle 12c regexp_substr but no luck in understanding how it works, too much info in the net and I get confused.
So I want to extract tmp* tables from a string:
query_str:
select
column1 c1,
column2 c2
from tmp_123 foo1, -- some comments here
TAB1_123 TAB1
where 1=1
;
Trying to use this but no "luck":
select regexp_substr(query_str, 'TMP_[A-z]+', 1, 1, 'i');
I want to extract until the space and the tmp table name can have numbers in the middle like this: tmp_123.
Any suggestion?
You can use either of the two:
select regexp_substr(query_str, 'TMP_\w+', 1, 1, 'i');
select regexp_substr(query_str, 'TMP_\S+', 1, 1, 'i');
The \w+ will match alphanumeric or underscore chars after TMP_ and \S+ will match one or more non-whitespace chars.
See the \w regex demo and the \S regex demo.
The major problem is that the SELECT statement shown is not valid in Oracle, where a FROM clause is required. Here's an example of how to make this work:
WITH cteData
AS (SELECT 'select' AS QUERY_STR FROM DUAL UNION ALL
SELECT 'column1 c1,' AS QUERY_STR FROM DUAL UNION ALL
SELECT 'column2 c2' AS QUERY_STR FROM DUAL UNION ALL
SELECT 'from tmp_123 foo1, -- some comments here' AS QUERY_STR FROM DUAL UNION ALL
SELECT 'TAB1_123 TAB1' AS QUERY_STR FROM DUAL UNION ALL
SELECT 'where 1=1' AS QUERY_STR FROM DUAL UNION ALL
SELECT ';' AS QUERY_STR FROM DUAL)
select regexp_substr(query_str, 'TMP_[A-z]+', 1, 1, 'i') AS MATCH
FROM cteData
WHERE regexp_substr(query_str, 'TMP_[A-z]+', 1, 1, 'i') IS NOT NULL
Here I've put your data line-for-line into a Common Table Expression (CTE) named "cteData" which the SELECT then uses as the source of its data. This returns the line
tmp_123 foo1, -- some comments here
db<>fiddle here

Postgres regex to delimit multiple optional matches

Suppose a text field needs to be delimited in PostgreSQL. It is formatted as 'abcd' where each variable can be any one of: 1.4, 3, 5, 10, 15, 20 or N/A. Here is a query with some examples, followed by their expected results:
WITH example AS(
SELECT '10N/AN/AN/A' AS bw
UNION SELECT '1010N/AN/A'
UNION SELECT '101020N/A'
UNION SELECT '35N/A1.4'
UNION SELECT '1010N/A10'
UNION SELECT '105N/AN/A'
UNION SELECT '1.43N/A20'
)
SELECT
bw
,regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(bw, '(1\.4)', E'\\&|', 'g')
, '(3)', E'\\&|', 'g')
, '(5)', E'\\&|', 'g')
, '(10)', E'\\&|', 'g')
, '(15)', E'\\&|', 'g')
, '(20)', E'\\&|', 'g')
, '(N/A)', E'\\&|', 'g')
FROM
example
Results:
bw:text, regexp_replace:text
'1010N/AN/A', '10|10|N/A|N/A|'
'1010N/A10', '10|10|N/A|10|'
'35N/A1.4', '3|5|N/A|1.4|'
'1.43N/A20', '1.4|3|N/A|20|'
'105N/AN/A', '10|5|N/A|N/A|'
'101020N/A', '10|10|20|N/A|'
'10N/AN/AN/A','10|N/A|N/A|N/A|'
I'm not worried about the trailing pipe '|' since I can deal with it. This gets me what I want, but I'm concerned I could be doing it more succinctly. I experimented with putting each of the capture groups in a single regexp_replace statement while scouring through the documentation, but I was unable to get these results.
Can this be achieved within a single regexp_replace statement?
You may build a (1\.4|3|5|1[50]|20|N/A) capturing group with alternation operators separating the alternatives and replace with \1|:
select regexp_replace('35N/A1.4', '(1\.4|3|5|1[50]|20|N/A)', '\1|','g');
-- 35|N/A|1.4|
See the online demo
Details
( - starting the capturing group construct
1\.4 - 1.4 substring (. must be escaped in order to be parsed as a literal dot, else, it matches any char)
| - or
3 - a 3 char
| - or
5 - a 5 char
| - or
1[50] - 1 followed with either 5 or 0 (the [...] is called a bracket expression where you may specify chars, char ranges or even character classes)
| - or
20 - a 20 substring
| - or
N/A - a N/A substring
) - end of the capturing group.
The \1 in the replacement pattern is a numbered replacement backreference (also called a (group) placeholder) that references the value captured into Group 1.

Extract data outside of parentheses in oracle

I have this value: (203)1669
My requirement is to extract data which is outside of the parentheses.
I want to use Regular expression for this Oracle query.
Much appreciated!
You can use the Oracle REGEXP_REPLACE() function, and match the group which is outside the parentheses.
SELECT REGEXP_REPLACE(phone_number, '\([[:digit:]]+\)(.*)', '\1') AS newValue
FROM your_table
You can use the combination of SUBSTR and INSTR function.
select substr('(203)1669', instr('(203)1669',')')+1) from dual
This example uses REGEXP_SUBSTR() and the REGEX explicitly follows your spec of getting the 4 digits between the closing paren and the end of the line. If there could be a different number of digits, replace the {4} with a + for one or more digits:
SQL> with tbl(str) as (
select '(203)1669' from dual
)
select regexp_substr(str, '\)(\d{4})$', 1, 1, NULL, 1) nbr
from tbl;
NBR
----
1669
SQL>
For the pattern you mentioned, this should work.
select
rtrim(ltrim(substr(phone_number,instr(phone_number,')')+1,length(phone_number))))
as derived_phone_no
from
(select '(123)456' as phone_number from dual union all
select '(567)99084' as phone_number from dual)
Here first I am getting position of ) and then getting substr from the position of ) + 1 till the length of the string. As a best practice, you can use trim functions.

Oracle SQL Regex not returning expected results

I am using a regex that works perfectly in Java/PHP/regex testers.
\d(?:[()\s#-]*\d){3,}
Examples: https://regex101.com/r/oH6jV0/1
However, trying to use the same regex in Oracle SQL is returning no results. Take for example:
select *
from
(select column_value str from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3')))
where regexp_like(str, '\d(?:[()\s#-]*\d){3,}');
This returns no rows. Why does this act so differently? I even used a regex tester that does POSIX ERE, but that still works.
Oracle does not support non-capturing groups (?:). You will need to use a capturing group instead.
It also doesn't like the perl-style whitespace meta-character \s match inside a character class [] (it will match the characters \ and s instead of whitespace). You will need to use the POSIX expression [:space:] instead.
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
select *
from (
select column_value str
from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3'))
)
where regexp_like(str, '\d([()[:space:]#-]*\d){3,}')
Results:
| STR |
|--------|
| 1234 |
| 12345 |
| 12 135 |