Extract data outside of parentheses in oracle - regex

I have this value: (203)1669
My requirement is to extract data which is outside of the parentheses.
I want to use Regular expression for this Oracle query.
Much appreciated!

You can use the Oracle REGEXP_REPLACE() function, and match the group which is outside the parentheses.
SELECT REGEXP_REPLACE(phone_number, '\([[:digit:]]+\)(.*)', '\1') AS newValue
FROM your_table

You can use the combination of SUBSTR and INSTR function.
select substr('(203)1669', instr('(203)1669',')')+1) from dual

This example uses REGEXP_SUBSTR() and the REGEX explicitly follows your spec of getting the 4 digits between the closing paren and the end of the line. If there could be a different number of digits, replace the {4} with a + for one or more digits:
SQL> with tbl(str) as (
select '(203)1669' from dual
)
select regexp_substr(str, '\)(\d{4})$', 1, 1, NULL, 1) nbr
from tbl;
NBR
----
1669
SQL>

For the pattern you mentioned, this should work.
select
rtrim(ltrim(substr(phone_number,instr(phone_number,')')+1,length(phone_number))))
as derived_phone_no
from
(select '(123)456' as phone_number from dual union all
select '(567)99084' as phone_number from dual)
Here first I am getting position of ) and then getting substr from the position of ) + 1 till the length of the string. As a best practice, you can use trim functions.

Related

Regex: Get penultimate part of a "path"

I've got something like this:
>AAA>BBB>CCC>DDD
With
([^>]*$)
I get the last part DDD . How can I get the part before it, CCC?
Thanks!
You may use
REGEXP_SUBSTR('>AAA>BBB>CCC>DDD', '([^>]+)>[^>]+$', 1, 1, NULL, 1)
The ([^>]+)>[^>]+$ regex will match and capture into Group 1 any 1+ chars other than >, then will match > followed with any 1+ chars other than > up to the end of the string.
The last argument, 1, tells REGEXP_SUBSTR to return just the captured substring.
See online demo.
Another approach is to replace the whole string but keep the captured part of your choice:
REGEXP_REPLACE( '>AAA>BBB>CCC>DDD', '.*>([^>]+)>[^>]+$', '\1')
See another online demo.
Here, .*> will match all the string up to the >, then ([^>]+) will capture any 1+ chars other than > and then >[^>]+$ will match and consume > and 1+ chars other than > at the end of the string.
You don't need regular expressions for this - standard string functions suffice, and they will be much faster.
In the last example, notice that there is no "second-to-last" or penultimate part; so the output is NULL. That is indeed the correct answer in that case.
with
test_data (pth) as (
select '>AAA>BBB>CCC>DDD' from dual union all
select null from dual union all
select '>EEE>GGG' from dual union all
select '>JJJJJ' from dual
)
select pth,
substr(pth, instr(pth, '>', -1, 2) + 1,
instr(pth, '>', -1, 1) - instr(pth, '>', -1, 2) - 1) as stl
from test_data
;
PTH STL
---------------- ----------------
>AAA>BBB>CCC>DDD CCC
>EEE>GGG EEE
>JJJJJ
Here is a silly workaround for the lack of support for returning subexpressions in your version of Oracle. I offer this just as a curiosity; I proposed a better solution that doesn't use regular expressions at all in a separate Answer.
with
test_data (pth) as (
select '>AAA>BBB>CCC>DDD' from dual union all
select null from dual union all
select '>EEE>GGG' from dual union all
select '>JJJJJ' from dual
)
select pth,
regexp_substr(pth, '[^>]*', 1, nullif(2*regexp_count(pth, '>')-2, 0)) as stl
from test_data
;
PTH STL
---------------- ----------------
>AAA>BBB>CCC>DDD CCC
>EEE>GGG EEE
>JJJJJ

Oracle - Regular expression - Keep reducing a char to match to another column

I have 2 columns from 2 different tables - say columnA and columnB, which I am matching with each other. However, if they do not match then I want to remove last one char from columnB and again match with columnA. If it still won't match then reduce one more char at the end from columnB and try to match. Keep reducing chars from columnB till there is match ( and untill columnB turns to 0 length).
Ex - ColumnA has a value "ABC" and columnB has "ABCDEF".
Then, since "ABC" is not equal to "ABCDEF", try to match "ABCDE" with "ABC". Since it is not matching then try "ABCD" . Since there is still no match then try "ABC" . Now there is match and so stop !!
I am unable to come with a regular expression in Oracle to handle this. I can use substr/length and bunch of "OR" conditions but I will prefer to avoid that if there is regular expression, which can do it nicely.
Thanks in advance.
SELECT *
FROM table_name
WHERE REGEXP_LIKE( columnb, '^'||columna||'.*$' );
(However, this has issues when columna contains ^$.*+?|[]{}()\ characters).
or
SELECT *
FROM table_name
WHERE columnb LIKE columna||'%';
or
SELECT *
FROM table_name
WHERE INSTR( columnb, columna ) = 1;
or
SELECT *
FROM table_name
WHERE SUBSTR( columnb, 1, LENGTH( columna ) ) = columna;
My guess is may be you want to find the longest prefix of two strings.
In my opinion, it's easier to do in PL/SQL than in SQL:
create or replace function longest_prefix(a varchar2, b varchar2) return varchar2 as
l number :=least(length(a), length(b));
l_common varchar2(32767) :=substr(a,1,l);
begin
for i in 1..l loop
if substr(a,i,1)!=substr(b,i,1) then
l_common:=substr(a,1,i-1);
exit;
end if;
end loop;
return l_common;
end;
/
Test:
SQL> select longest_prefix('asdf', 'as23') from dual;
LONGEST_PREFIX('ASDF','AS23')
--------------------------------------------------------------------------------
as

How to split strings using two delimiter in Oracle 11g regexp_substr functions

I have doubt to split a string using the delimiter.
First split based on , delimiter select those splitted strings should split based on - delimiter
My original string: UMC12I-1234,CSM3-123,VQ,
Expected output:
UMC12I
CSM3
VQ
Each value comes as row value
I tried the option
WITH fab_sites AS (
SELECT trim(regexp_substr('UMC12I-1234,CSM3-123,VQ,', '[^,]+', 1, LEVEL)) fab_site
FROM dual
CONNECT BY LEVEL <= regexp_count('UMC12I-1234,CSM3-123,VQ,', '[^,]+')+1
)
SELECT fab_site FROM fab_sites WHERE fab_site IS NOT NULL
-- splitted based on , delimiter
Output is:
UMC12I-1234
CSM3-123
VQ
how can I get my expected output? (need to split again - delimiter)
You may extract the "words" before the - with the regexp_substr using
([^,-]+)(-[^,-]+)?
The pattern will match and capture into Group 1 one or more chars other than , and -, then will match an optional sequence of - and 1+ chars other than ,and -.
See the regex demo.
Use this regex_substr line instead of yours with the above regex:
SELECT trim(regexp_substr('UMC12I-1234,CSM3-123,VQ,', '([^,-]+)(-[^,-]+)?', 1, LEVEL, NULL, 1)) fab_site
See the online demo
You might try this query:
WITH fab_sites AS (
SELECT TRIM(',' FROM REGEXP_SUBSTR('UMC12I-1234,CSM3-123,VQ,', '(^|,)[^,-]+', 1, LEVEL)) fab_site
FROM dual
CONNECT BY LEVEL <= REGEXP_COUNT('UMC12I-1234,CSM3-123,VQ,', '(^|,)[^,-]+')
)
SELECT fab_site
FROM fab_sites;
We start by matching any substring that starts either with the start of the whole string ^ or with a comma ,, the delimiter. We then get all the characters that match neither a comma nor a dash -. Once we have that substring we trim any leftover commas from it.
P.S. I think the +1 in the CONNECT BY clause is extraneous, as is the WHERE NOT NULL in the "outer" query.

Oracle Substring after specific character

I already found out I need to use substr/instr or regex but with reading the documentation about those, I cant get it done...
I am here on Oracle 11.2.
So here is what I have.
A list of Strings like:
743H5-34L-56
123HD34-7L
12HSS-34R
23Z67-4R-C23
What I need is the number (length 1 or 2) after the first '-' until there comes a 'L' or 'R'.
Has anybody some advice?
regexp_replace(string, '^.*?-(\d+)[LR].*$', '\1')
fiddle
Another version (without fancy lookarounds :-) :
with v_data as (
select '743H5-34L-56' val from dual
union all
select '123HD34-7L' val from dual
union all
select '12HSS-34R' val from dual
union all
select '23Z67-4R-C23' val from dual
)
select
val,
regexp_replace(val, '^[^-]+-(\d+)[LR].*', '\1')
from v_data
It matches
the beginning of the string "^"
one or more characters that are not a '-' "[^-]+"
followed by a '-' "-"
followed by one ore more digits (capturing them in a group) "(\d+)"
followed by 'L' or 'R' "[LR]"
followed by zero or more arbitrary characters ".*"

How to make regular expression correctly?

I need to get data from third-occurrence position of "*" to 4th. I do so:
with t as (select 'T*76031*12558*test*received percents' as txt from dual)
select regexp_replace(txt, '.*(.{4})[*][^*].*$', '\1')
from t
I receive "test" - it's right, but how to get any number of characters, not just 4?
This should work given the example you have used:
REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
So the SELECT would be:
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
FROM t;
The regex looks for:
Group 1:
start of string. Any number of characters up to a ''. Any further characters up mto another ''. Any further characters up to the third '*'.
Group 2:
Any alphanumeric characters
Group 3:
A '*' followed by any other characters up to the end of the string.
Replace all of the above with whatever was found in Group 2.
Hope this helps.
EDIT:
Following on from a great answer from another thread by Rob van Wijk here:
Exracting substring from given string
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_SUBSTR( txt,'[^\*]+',1,4)
FROM t;
How about the following?
^([^*]*[*]){3}([^*]*)
The first part matches 3 groups of * and the second part matches everything until the next * or end of line.
You are assuming that the last * of your text is also the fourth. If this assumption is true then this :
\b\w*\b(?=\*[^*]*$)
Will get you what you want. But of course this only matches the last word between * before the last star. It only matches test in this case or whatever word characters are inside the *.
Note: 10g REGEXP_SUBSTR doesn't support returning subexpressions, see comments below.
If you are really only selecting a part of the string I recommend using REGEXP_SUBSTR instead. I don't know if it's more efficient, but it will better document your intent:
SQL> select regexp_substr('T*76031*12558*test*received percents',
'^([^*]*[*]){3}([^*]*)', 1, 1, '', 2) from dual;
REGEXP_SUBST
------------
test
Above I have used regexp provided by Pieter-Bas.
See also http://www.regular-expressions.info/oracle.html