Oracle - Regular expression - Keep reducing a char to match to another column - regex

I have 2 columns from 2 different tables - say columnA and columnB, which I am matching with each other. However, if they do not match then I want to remove last one char from columnB and again match with columnA. If it still won't match then reduce one more char at the end from columnB and try to match. Keep reducing chars from columnB till there is match ( and untill columnB turns to 0 length).
Ex - ColumnA has a value "ABC" and columnB has "ABCDEF".
Then, since "ABC" is not equal to "ABCDEF", try to match "ABCDE" with "ABC". Since it is not matching then try "ABCD" . Since there is still no match then try "ABC" . Now there is match and so stop !!
I am unable to come with a regular expression in Oracle to handle this. I can use substr/length and bunch of "OR" conditions but I will prefer to avoid that if there is regular expression, which can do it nicely.
Thanks in advance.

SELECT *
FROM table_name
WHERE REGEXP_LIKE( columnb, '^'||columna||'.*$' );
(However, this has issues when columna contains ^$.*+?|[]{}()\ characters).
or
SELECT *
FROM table_name
WHERE columnb LIKE columna||'%';
or
SELECT *
FROM table_name
WHERE INSTR( columnb, columna ) = 1;
or
SELECT *
FROM table_name
WHERE SUBSTR( columnb, 1, LENGTH( columna ) ) = columna;

My guess is may be you want to find the longest prefix of two strings.
In my opinion, it's easier to do in PL/SQL than in SQL:
create or replace function longest_prefix(a varchar2, b varchar2) return varchar2 as
l number :=least(length(a), length(b));
l_common varchar2(32767) :=substr(a,1,l);
begin
for i in 1..l loop
if substr(a,i,1)!=substr(b,i,1) then
l_common:=substr(a,1,i-1);
exit;
end if;
end loop;
return l_common;
end;
/
Test:
SQL> select longest_prefix('asdf', 'as23') from dual;
LONGEST_PREFIX('ASDF','AS23')
--------------------------------------------------------------------------------
as

Related

Oracle Database, extract string beeing between two other strings

I need a regexp that's combined with regexp_substr() would give me the word being between two other specified words.
Example:
source_string => 'First Middle Last'
substring varchar2(100);
substring := regexp_substr(source_string, 'First (.*) Last'); <===
this doesn't work :(.
dbms_output.put_line(substring) ===> output should be: 'Middle'
I know it looks simple and to be honest, at the beginning I thought the same.
But now after spending about 3h for searching for a solution I give up...
It's not working because the literal strings 'First' and 'Last' are being looked for. Assuming that the strings don't all literally begin 'First' you need to find another way to represent them. You've already done this by representing 'Middle' as (.*)
The next point is that you need to extract a sub-expression (the part in parenthesis), this is the 6th parameter of REGEXP_SUBSTR().
If you put these together then the following gives you what you want:
regexp_substr(source_string, '.*\s(.*)\s.*', 1, 1, 'i', 1)
An example of it working:
SQL> select regexp_substr('first middle last', '.*\s(.*)\s.*', 1, 1, 'i', 1)
2 from dual;
REGEXP
------
middle
You can also use an online regex tester to validate that 'middle' is the only captured group.
Depending on what your actual source strings look like you may not want to search for exactly spaces, but use \W (a non-word character) instead.
If you're expecting exactly three words I'd also anchor your expression to the start and end of the string: ^.*\s(.*)\s.*$
If source string always looks the same, i.e. consists of 3 elements (words), then such a simple regular expression does the job:
SQL> with t (str) as
2 (select 'First Middle Last' from dual)
3 select regexp_substr(str, '\w+', 1, 2) result from t;
RESULT
------
Middle
SQL>
(\S*) pattern might be used with regexp_replace and regexp_substr as in the following way to get the middle word :
with t(str) as
(
select 'First Middle Last' from dual
)
select regexp_substr(trim(regexp_replace(str, '^(\S*)', '')),'(\S*)')
as "Result String"
from t;
Result String
-------------
Middle
in the first step First, and in the second one Last words are trimmed.
Or, More directly you can figure out by using regexp_replace as
with t(str) as
(
select 'First Middle Last' from dual
)
select regexp_replace(str,'(.*) (.*) (.*)','\2')
as "Result String"
from t;
Result String
-------------
Middle

Positive Lookbehind in Postgres 9.5 using regexp_replace

I have values like this: 1ST, 2ND, FIRST, and I want to remove the 'ST' and 'ND' ONLY if what comes before is a digit.
I am running postgres 9.5 and I have a positive lookbehind working in SQL Fiddle but it only works on 9.6
SELECT ex,
regexp_replace(ex, '(?<=[0-9]+)(TH|ST|ND|RD)', '', 'gi') as test
FROM t1
Is there any other way to do this besides using a CASE statement like this:
SELECT ex,
(CASE WHEN ex ~ '\d(TH|ST|ND|RD)' THEN regexp_replace (ex, 'TH|ST|ND|RD', '','gi') ELSE ex end) as test_case
FROM t1
Any suggestions would be appreciated. Thanks!
You may match and capture the digit and replace with a backreference to the value. Also, I suggest adding a word boundary after the ordinal numeral suffixes to make sure we are matching them at the end of the word.
SELECT regexp_replace(ex, '([0-9])(?:TH|ST|ND|RD)\y', '\1', 'gi') as test_case FROM t1
See the updated SQLFiddle.
CREATE TABLE t1
(ex varchar)
;
INSERT INTO t1
(ex)
VALUES
('1ST'),
('2ND'),
('3RD'),
('4TH'),
('FIRST'),
('FOURTH')
;
SELECT regexp_replace(ex, '([0-9])(?:TH|ST|ND|RD)\y', '\1', 'gi') as test_case FROM t1

Split single row string into multiple rows by multi-chracter delimiter Oracle

I have attempted to use this question here Splitting string into multiple rows in Oracle and adjust it to my needs however I'm not very confident with regex and have not been able to solve it via searching.
Currently that questions answers it with a lot of regex_substr and so on, using [^,]+ as the pattern so it splits by a single comma. I need it to split by a multi-character delimiter (e.g. #;) but that regex pattern matches any single character to split it out so where there are #s or ;s elsewhere in the text this causes a split.
I've worked out the pattern (#;+) will match every group of #; but I cannot workout how to invert this as done above to split the row into multiple.
I'm sure I'm just missing something simple so any help would be greatly appreciated!
I think you should use:
[^#;+]+
instead of
(#;+)
As, it will be checking for any one of the characters in the range which can be # ; or + and then you can split accordingly.
You can change it according to your requirement but in the regex I
shared, I am consudering # , ; and + as delimeter
So, in end, the query would look something like this:
with tbl(str) as (
select ' My, Delimiter# Hello My; Delimiter World My Delimiter My Delimiter test My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'([^#;+]+)', 1, LEVEL, NULL, 1 ) AS element_value
FROM tbl
CONNECT BY LEVEL <= regexp_count(str, '[#;+]')+1\\
Output:
ELEMENT ELEMENT_VALUE
1 My, Delimiter
2 Hello My
3 Delimiter World My Delimiter My Delimiter test My Deli
-- EDIT --
In case you want to check unlimited numbers of # or ; to split and don't want to split at one existence, I found the below regex, but again that is not supported by Oracle.
(?:(?:(?![;#]+).#(?![;#]+).|(?![;#]+).;(?![;#]+).|(?![;#]+).)*)+
So, I found no easy apart from below query which will not split on single existence if there is only one such instance between two delimeters:
select ' My, Delimiter;# Hello My Delimiter ;;# World My Delimiter ; My Delimiter test#; My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'([^#;]+#?[^#;]+;?[^#;]+)', 1, LEVEL, NULL, 1 ) AS element_value
FROM tbl
CONNECT BY LEVEL <= regexp_count(str, '[#;]{2,}')+1\\
Output:
ELEMENT ELEMENT_VALUE
1 My, Delimiter
2 Hello My Delimiter
3 World My Delimiter ; My Delimiter test
4 My Delimiter

How to split strings using two delimiter in Oracle 11g regexp_substr functions

I have doubt to split a string using the delimiter.
First split based on , delimiter select those splitted strings should split based on - delimiter
My original string: UMC12I-1234,CSM3-123,VQ,
Expected output:
UMC12I
CSM3
VQ
Each value comes as row value
I tried the option
WITH fab_sites AS (
SELECT trim(regexp_substr('UMC12I-1234,CSM3-123,VQ,', '[^,]+', 1, LEVEL)) fab_site
FROM dual
CONNECT BY LEVEL <= regexp_count('UMC12I-1234,CSM3-123,VQ,', '[^,]+')+1
)
SELECT fab_site FROM fab_sites WHERE fab_site IS NOT NULL
-- splitted based on , delimiter
Output is:
UMC12I-1234
CSM3-123
VQ
how can I get my expected output? (need to split again - delimiter)
You may extract the "words" before the - with the regexp_substr using
([^,-]+)(-[^,-]+)?
The pattern will match and capture into Group 1 one or more chars other than , and -, then will match an optional sequence of - and 1+ chars other than ,and -.
See the regex demo.
Use this regex_substr line instead of yours with the above regex:
SELECT trim(regexp_substr('UMC12I-1234,CSM3-123,VQ,', '([^,-]+)(-[^,-]+)?', 1, LEVEL, NULL, 1)) fab_site
See the online demo
You might try this query:
WITH fab_sites AS (
SELECT TRIM(',' FROM REGEXP_SUBSTR('UMC12I-1234,CSM3-123,VQ,', '(^|,)[^,-]+', 1, LEVEL)) fab_site
FROM dual
CONNECT BY LEVEL <= REGEXP_COUNT('UMC12I-1234,CSM3-123,VQ,', '(^|,)[^,-]+')
)
SELECT fab_site
FROM fab_sites;
We start by matching any substring that starts either with the start of the whole string ^ or with a comma ,, the delimiter. We then get all the characters that match neither a comma nor a dash -. Once we have that substring we trim any leftover commas from it.
P.S. I think the +1 in the CONNECT BY clause is extraneous, as is the WHERE NOT NULL in the "outer" query.

Extract data outside of parentheses in oracle

I have this value: (203)1669
My requirement is to extract data which is outside of the parentheses.
I want to use Regular expression for this Oracle query.
Much appreciated!
You can use the Oracle REGEXP_REPLACE() function, and match the group which is outside the parentheses.
SELECT REGEXP_REPLACE(phone_number, '\([[:digit:]]+\)(.*)', '\1') AS newValue
FROM your_table
You can use the combination of SUBSTR and INSTR function.
select substr('(203)1669', instr('(203)1669',')')+1) from dual
This example uses REGEXP_SUBSTR() and the REGEX explicitly follows your spec of getting the 4 digits between the closing paren and the end of the line. If there could be a different number of digits, replace the {4} with a + for one or more digits:
SQL> with tbl(str) as (
select '(203)1669' from dual
)
select regexp_substr(str, '\)(\d{4})$', 1, 1, NULL, 1) nbr
from tbl;
NBR
----
1669
SQL>
For the pattern you mentioned, this should work.
select
rtrim(ltrim(substr(phone_number,instr(phone_number,')')+1,length(phone_number))))
as derived_phone_no
from
(select '(123)456' as phone_number from dual union all
select '(567)99084' as phone_number from dual)
Here first I am getting position of ) and then getting substr from the position of ) + 1 till the length of the string. As a best practice, you can use trim functions.