Positive Lookbehind in Postgres 9.5 using regexp_replace - regex

I have values like this: 1ST, 2ND, FIRST, and I want to remove the 'ST' and 'ND' ONLY if what comes before is a digit.
I am running postgres 9.5 and I have a positive lookbehind working in SQL Fiddle but it only works on 9.6
SELECT ex,
regexp_replace(ex, '(?<=[0-9]+)(TH|ST|ND|RD)', '', 'gi') as test
FROM t1
Is there any other way to do this besides using a CASE statement like this:
SELECT ex,
(CASE WHEN ex ~ '\d(TH|ST|ND|RD)' THEN regexp_replace (ex, 'TH|ST|ND|RD', '','gi') ELSE ex end) as test_case
FROM t1
Any suggestions would be appreciated. Thanks!

You may match and capture the digit and replace with a backreference to the value. Also, I suggest adding a word boundary after the ordinal numeral suffixes to make sure we are matching them at the end of the word.
SELECT regexp_replace(ex, '([0-9])(?:TH|ST|ND|RD)\y', '\1', 'gi') as test_case FROM t1
See the updated SQLFiddle.
CREATE TABLE t1
(ex varchar)
;
INSERT INTO t1
(ex)
VALUES
('1ST'),
('2ND'),
('3RD'),
('4TH'),
('FIRST'),
('FOURTH')
;
SELECT regexp_replace(ex, '([0-9])(?:TH|ST|ND|RD)\y', '\1', 'gi') as test_case FROM t1

Related

Oracle Database, extract string beeing between two other strings

I need a regexp that's combined with regexp_substr() would give me the word being between two other specified words.
Example:
source_string => 'First Middle Last'
substring varchar2(100);
substring := regexp_substr(source_string, 'First (.*) Last'); <===
this doesn't work :(.
dbms_output.put_line(substring) ===> output should be: 'Middle'
I know it looks simple and to be honest, at the beginning I thought the same.
But now after spending about 3h for searching for a solution I give up...
It's not working because the literal strings 'First' and 'Last' are being looked for. Assuming that the strings don't all literally begin 'First' you need to find another way to represent them. You've already done this by representing 'Middle' as (.*)
The next point is that you need to extract a sub-expression (the part in parenthesis), this is the 6th parameter of REGEXP_SUBSTR().
If you put these together then the following gives you what you want:
regexp_substr(source_string, '.*\s(.*)\s.*', 1, 1, 'i', 1)
An example of it working:
SQL> select regexp_substr('first middle last', '.*\s(.*)\s.*', 1, 1, 'i', 1)
2 from dual;
REGEXP
------
middle
You can also use an online regex tester to validate that 'middle' is the only captured group.
Depending on what your actual source strings look like you may not want to search for exactly spaces, but use \W (a non-word character) instead.
If you're expecting exactly three words I'd also anchor your expression to the start and end of the string: ^.*\s(.*)\s.*$
If source string always looks the same, i.e. consists of 3 elements (words), then such a simple regular expression does the job:
SQL> with t (str) as
2 (select 'First Middle Last' from dual)
3 select regexp_substr(str, '\w+', 1, 2) result from t;
RESULT
------
Middle
SQL>
(\S*) pattern might be used with regexp_replace and regexp_substr as in the following way to get the middle word :
with t(str) as
(
select 'First Middle Last' from dual
)
select regexp_substr(trim(regexp_replace(str, '^(\S*)', '')),'(\S*)')
as "Result String"
from t;
Result String
-------------
Middle
in the first step First, and in the second one Last words are trimmed.
Or, More directly you can figure out by using regexp_replace as
with t(str) as
(
select 'First Middle Last' from dual
)
select regexp_replace(str,'(.*) (.*) (.*)','\2')
as "Result String"
from t;
Result String
-------------
Middle

Postgres regex to delimit multiple optional matches

Suppose a text field needs to be delimited in PostgreSQL. It is formatted as 'abcd' where each variable can be any one of: 1.4, 3, 5, 10, 15, 20 or N/A. Here is a query with some examples, followed by their expected results:
WITH example AS(
SELECT '10N/AN/AN/A' AS bw
UNION SELECT '1010N/AN/A'
UNION SELECT '101020N/A'
UNION SELECT '35N/A1.4'
UNION SELECT '1010N/A10'
UNION SELECT '105N/AN/A'
UNION SELECT '1.43N/A20'
)
SELECT
bw
,regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(bw, '(1\.4)', E'\\&|', 'g')
, '(3)', E'\\&|', 'g')
, '(5)', E'\\&|', 'g')
, '(10)', E'\\&|', 'g')
, '(15)', E'\\&|', 'g')
, '(20)', E'\\&|', 'g')
, '(N/A)', E'\\&|', 'g')
FROM
example
Results:
bw:text, regexp_replace:text
'1010N/AN/A', '10|10|N/A|N/A|'
'1010N/A10', '10|10|N/A|10|'
'35N/A1.4', '3|5|N/A|1.4|'
'1.43N/A20', '1.4|3|N/A|20|'
'105N/AN/A', '10|5|N/A|N/A|'
'101020N/A', '10|10|20|N/A|'
'10N/AN/AN/A','10|N/A|N/A|N/A|'
I'm not worried about the trailing pipe '|' since I can deal with it. This gets me what I want, but I'm concerned I could be doing it more succinctly. I experimented with putting each of the capture groups in a single regexp_replace statement while scouring through the documentation, but I was unable to get these results.
Can this be achieved within a single regexp_replace statement?
You may build a (1\.4|3|5|1[50]|20|N/A) capturing group with alternation operators separating the alternatives and replace with \1|:
select regexp_replace('35N/A1.4', '(1\.4|3|5|1[50]|20|N/A)', '\1|','g');
-- 35|N/A|1.4|
See the online demo
Details
( - starting the capturing group construct
1\.4 - 1.4 substring (. must be escaped in order to be parsed as a literal dot, else, it matches any char)
| - or
3 - a 3 char
| - or
5 - a 5 char
| - or
1[50] - 1 followed with either 5 or 0 (the [...] is called a bracket expression where you may specify chars, char ranges or even character classes)
| - or
20 - a 20 substring
| - or
N/A - a N/A substring
) - end of the capturing group.
The \1 in the replacement pattern is a numbered replacement backreference (also called a (group) placeholder) that references the value captured into Group 1.

Oracle - Regular expression - Keep reducing a char to match to another column

I have 2 columns from 2 different tables - say columnA and columnB, which I am matching with each other. However, if they do not match then I want to remove last one char from columnB and again match with columnA. If it still won't match then reduce one more char at the end from columnB and try to match. Keep reducing chars from columnB till there is match ( and untill columnB turns to 0 length).
Ex - ColumnA has a value "ABC" and columnB has "ABCDEF".
Then, since "ABC" is not equal to "ABCDEF", try to match "ABCDE" with "ABC". Since it is not matching then try "ABCD" . Since there is still no match then try "ABC" . Now there is match and so stop !!
I am unable to come with a regular expression in Oracle to handle this. I can use substr/length and bunch of "OR" conditions but I will prefer to avoid that if there is regular expression, which can do it nicely.
Thanks in advance.
SELECT *
FROM table_name
WHERE REGEXP_LIKE( columnb, '^'||columna||'.*$' );
(However, this has issues when columna contains ^$.*+?|[]{}()\ characters).
or
SELECT *
FROM table_name
WHERE columnb LIKE columna||'%';
or
SELECT *
FROM table_name
WHERE INSTR( columnb, columna ) = 1;
or
SELECT *
FROM table_name
WHERE SUBSTR( columnb, 1, LENGTH( columna ) ) = columna;
My guess is may be you want to find the longest prefix of two strings.
In my opinion, it's easier to do in PL/SQL than in SQL:
create or replace function longest_prefix(a varchar2, b varchar2) return varchar2 as
l number :=least(length(a), length(b));
l_common varchar2(32767) :=substr(a,1,l);
begin
for i in 1..l loop
if substr(a,i,1)!=substr(b,i,1) then
l_common:=substr(a,1,i-1);
exit;
end if;
end loop;
return l_common;
end;
/
Test:
SQL> select longest_prefix('asdf', 'as23') from dual;
LONGEST_PREFIX('ASDF','AS23')
--------------------------------------------------------------------------------
as

Extract data outside of parentheses in oracle

I have this value: (203)1669
My requirement is to extract data which is outside of the parentheses.
I want to use Regular expression for this Oracle query.
Much appreciated!
You can use the Oracle REGEXP_REPLACE() function, and match the group which is outside the parentheses.
SELECT REGEXP_REPLACE(phone_number, '\([[:digit:]]+\)(.*)', '\1') AS newValue
FROM your_table
You can use the combination of SUBSTR and INSTR function.
select substr('(203)1669', instr('(203)1669',')')+1) from dual
This example uses REGEXP_SUBSTR() and the REGEX explicitly follows your spec of getting the 4 digits between the closing paren and the end of the line. If there could be a different number of digits, replace the {4} with a + for one or more digits:
SQL> with tbl(str) as (
select '(203)1669' from dual
)
select regexp_substr(str, '\)(\d{4})$', 1, 1, NULL, 1) nbr
from tbl;
NBR
----
1669
SQL>
For the pattern you mentioned, this should work.
select
rtrim(ltrim(substr(phone_number,instr(phone_number,')')+1,length(phone_number))))
as derived_phone_no
from
(select '(123)456' as phone_number from dual union all
select '(567)99084' as phone_number from dual)
Here first I am getting position of ) and then getting substr from the position of ) + 1 till the length of the string. As a best practice, you can use trim functions.

How to make regular expression correctly?

I need to get data from third-occurrence position of "*" to 4th. I do so:
with t as (select 'T*76031*12558*test*received percents' as txt from dual)
select regexp_replace(txt, '.*(.{4})[*][^*].*$', '\1')
from t
I receive "test" - it's right, but how to get any number of characters, not just 4?
This should work given the example you have used:
REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
So the SELECT would be:
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
FROM t;
The regex looks for:
Group 1:
start of string. Any number of characters up to a ''. Any further characters up mto another ''. Any further characters up to the third '*'.
Group 2:
Any alphanumeric characters
Group 3:
A '*' followed by any other characters up to the end of the string.
Replace all of the above with whatever was found in Group 2.
Hope this helps.
EDIT:
Following on from a great answer from another thread by Rob van Wijk here:
Exracting substring from given string
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_SUBSTR( txt,'[^\*]+',1,4)
FROM t;
How about the following?
^([^*]*[*]){3}([^*]*)
The first part matches 3 groups of * and the second part matches everything until the next * or end of line.
You are assuming that the last * of your text is also the fourth. If this assumption is true then this :
\b\w*\b(?=\*[^*]*$)
Will get you what you want. But of course this only matches the last word between * before the last star. It only matches test in this case or whatever word characters are inside the *.
Note: 10g REGEXP_SUBSTR doesn't support returning subexpressions, see comments below.
If you are really only selecting a part of the string I recommend using REGEXP_SUBSTR instead. I don't know if it's more efficient, but it will better document your intent:
SQL> select regexp_substr('T*76031*12558*test*received percents',
'^([^*]*[*]){3}([^*]*)', 1, 1, '', 2) from dual;
REGEXP_SUBST
------------
test
Above I have used regexp provided by Pieter-Bas.
See also http://www.regular-expressions.info/oracle.html