REGEXP_REPLACE in Postgresql not substring - regex

In postgresql I would like to substitute just in full words and not substrings. I noticed that replace and translate replace strings even in substrings. Then, I used regexp_replace to add the following:
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '(^|[^a-z0-9])' || UPPER('CAT') || '($|[^a-z0-9])', '\1' || UPPER('GATO') || '\2','g')
In the previous sample, CAT should not been replaced because it is not a whole word, but a substring which is part of a word. How can I achieve to avoid the replacement? The output should be BIG CATDOG because no substitution was possible.
Thanks

The replacement happens because you are only checking for [^a-z0-9] after the search term, and D is not in that character class. You can resolve this by either adding A-Z to your character class:
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '(^|[^a-zA-Z0-9])' || UPPER('CAT') || '($|[^a-zA-Z0-9])', '\1' || UPPER('GATO') || '\2','g')
Or by adding the i flag to the replace call:
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '(^|[^a-z0-9])' || UPPER('CAT') || '($|[^a-z0-9])', '\1' || UPPER('GATO') || '\2','gi')
In either case you will get the desired BIG CATDOG output.
However a better solution is to use the word boundary constraints \m (beginning of word) and \M (end of word):
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '\m' || UPPER('CAT') || '\M', UPPER('GATO'),'g')
Demo on dbfiddle

Related

Regex_replace Postgres - Check if <= 2 characters length

I need 3 characters minimum for users accounts. I reuse existing names like
"tata-fzef - vcefv" or "kk" from the IMP_FR field to make this accounts.
In the second exemple, "kk" should become "k_k" because less than 3 characters.
How to do it with Postgresql?
regexp_replace( IMP_FR , regexp, first_character + '_' + last character, 'g')
Regular expressions won't help much here since REGEXP_REPLACE does not support conditional replacement patterns. You need different replacement pattern here for cases when the input only contains one, two or three or more chars.
So, it is better to rely on CASE ... WHEN ... ELSE here and the regular string manipulation functions:
CREATE TABLE tabl1
(s character varying)
;
INSERT INTO tabl1
(s)
VALUES
('tata-fzef - vcefv'),
('kkk'),
('kk'),
('k')
;
SELECT
CASE
WHEN LENGTH(s) = 1 THEN '_' || s || '_'
WHEN LENGTH(s) = 2 THEN LEFT(s,1) || '_' || RIGHT(s,1)
ELSE s
END AS Result
FROM tabl1;
See the online demo. Result:

Need to form pattern for regexp_replace

I have input string something like :
1.2.3.4_abc_4.2.1.44_1.3.4.23
100.11.11.22_xyz-abd_10.2.1.2_12.2.3.4
100.11.11.22_xyz_123_10.2.1.2_1.2.3.4
I have to replace the first string formed between two ipaddress which are separated by _, however in some string the _ is part of the replacement string (xyz_123)
I have to find the abc, xyz-abd and xyz_123 from the above string, so that I can replace with another column in that table.
_.*?_(?=\d+\.)
matches _abc_, _xyz-abd_ and _xyz_123_ in your examples. Is this working for you?
DECLARE
result VARCHAR2(255);
BEGIN
result := REGEXP_REPLACE(subject, $$_.*?_(?=\d+\.)$$, $$_foo_$$);
END;
Probably this is enough:
_[^.]+_
and replace with
_Replacement_
See it here on Regexr.
[^.]+ uses a negated character class to match a sequence of at least one (the + quantifier) non "." characters.
I am also matching a leading and a trailing "_", so you have to put it in again in the replacement string.
If PostgreSQL supports lookbehind and lookahead assertions, then it is possible to avoid the "_" in the replacement string:
(?<=_)[^.]+(?=_)
See it on Regexr
In order to map match first two "" , as #stema and #Tim Pietzcker mentioned the regex works. Then in order to append "" to the column , which is what I was struggling with, can be done with || operator as eg below
update table1 set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_')
Then for using the another table for update query , the below eg can be helpfull
update table1 as t set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_') from table2 as t2 where t.id=t2.id [other criteria]

Oracle Substring after specific character

I already found out I need to use substr/instr or regex but with reading the documentation about those, I cant get it done...
I am here on Oracle 11.2.
So here is what I have.
A list of Strings like:
743H5-34L-56
123HD34-7L
12HSS-34R
23Z67-4R-C23
What I need is the number (length 1 or 2) after the first '-' until there comes a 'L' or 'R'.
Has anybody some advice?
regexp_replace(string, '^.*?-(\d+)[LR].*$', '\1')
fiddle
Another version (without fancy lookarounds :-) :
with v_data as (
select '743H5-34L-56' val from dual
union all
select '123HD34-7L' val from dual
union all
select '12HSS-34R' val from dual
union all
select '23Z67-4R-C23' val from dual
)
select
val,
regexp_replace(val, '^[^-]+-(\d+)[LR].*', '\1')
from v_data
It matches
the beginning of the string "^"
one or more characters that are not a '-' "[^-]+"
followed by a '-' "-"
followed by one ore more digits (capturing them in a group) "(\d+)"
followed by 'L' or 'R' "[LR]"
followed by zero or more arbitrary characters ".*"

Proper way to add unescaped text from a field to a regex in postgres?

What's the proper way to add a literal text value from a field to a regex in postgres?
For example, something like this where some_field could contain invalid regex syntax if left unescaped:
where some_text ~* ('\m' || some_field || '\M');
The easiest thing to do is to use a regex to prep your string to be in a regex. Escaping non-word characters in your string should be sufficient to make it regex-safe, for example:
=> select regexp_replace('. word * and µ{', E'([^\\w\\s])', E'\\\\\\1', 'g');
regexp_replace
--------------------
\. word \* and µ\{
So something like this should work in general:
where some_text ~* x || regexp_replace(some_field, E'([^\\w\\s])', E'\\\\\\1', 'g') || y
where x and y are the other parts of the regex.
If you didn't need a regex at the end (i.e. no y above), then you could use (?q):
An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE.
and a q means that the:
rest of RE is a literal ("quoted") string, all ordinary characters
So you could use:
where some_text ~* x || '(?q)' || some_field
in this limited case.

searching backwards with regex

I have the following different texts
line1: SELECT column1,
line2: column2,
line3: RTRIM(LTRIM(blah1)) || ' ' || RTRIM(LTRIM(blah3)),
line4: RTRIM(LTRIM(blah3)) || ' ' || RTRIM(LTRIM(some1)) outColumn,
line5: RTRIM(LTRIM(blah3)) || ' ' || RTRIM(LTRIM(some1)) something,
line6: somelast
Following is what I want to get out of each line
basically want to start the regex search from end of string and keep going untill space. I can take out comma later on.
line1: column1
line2: column2
line3: <space> nothing found
line4: outColumn
line5: something
line6: somelast
basically I will be fine if I can start the regex from the end and walk towards first space.
There probably will have to be a special case for line3 as I dont expect anything back.
I am using groovy for this regex.
Iterate over the lines and match each line against the regex:
(?i).*(column\w+).*
The word you're looking for is captured in group 1 ($1).
I think you want:
(\w*)\s*,?$
Where match group one contains the first word at the end of the line.
Anchoring the expression to the end of the line basically is starting the regex from the end.