Need to form pattern for regexp_replace - regex

I have input string something like :
1.2.3.4_abc_4.2.1.44_1.3.4.23
100.11.11.22_xyz-abd_10.2.1.2_12.2.3.4
100.11.11.22_xyz_123_10.2.1.2_1.2.3.4
I have to replace the first string formed between two ipaddress which are separated by _, however in some string the _ is part of the replacement string (xyz_123)
I have to find the abc, xyz-abd and xyz_123 from the above string, so that I can replace with another column in that table.

_.*?_(?=\d+\.)
matches _abc_, _xyz-abd_ and _xyz_123_ in your examples. Is this working for you?
DECLARE
result VARCHAR2(255);
BEGIN
result := REGEXP_REPLACE(subject, $$_.*?_(?=\d+\.)$$, $$_foo_$$);
END;

Probably this is enough:
_[^.]+_
and replace with
_Replacement_
See it here on Regexr.
[^.]+ uses a negated character class to match a sequence of at least one (the + quantifier) non "." characters.
I am also matching a leading and a trailing "_", so you have to put it in again in the replacement string.
If PostgreSQL supports lookbehind and lookahead assertions, then it is possible to avoid the "_" in the replacement string:
(?<=_)[^.]+(?=_)
See it on Regexr

In order to map match first two "" , as #stema and #Tim Pietzcker mentioned the regex works. Then in order to append "" to the column , which is what I was struggling with, can be done with || operator as eg below
update table1 set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_')
Then for using the another table for update query , the below eg can be helpfull
update table1 as t set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_') from table2 as t2 where t.id=t2.id [other criteria]

Related

Postgres Regex Negative Lookahead

Scenario: Match any string that starts with "J01" except the string "J01FA09".
I'm baffled why the following code returns nothing:
SELECT 1
WHERE
'^J01(?!FA09).*' ~ 'J01FA10'
when I can see on regexr.com that it's working (I realize there are different flavors of regex and that could be the reason for the site working).
I have confirmed in the postgres documentation that negative look aheads are supported though.
Table 9-15. Regular Expression Constraints
(?!re) negative lookahead matches at any point where no substring
matching re begins (AREs only). Lookahead constraints cannot contain
back references (see Section 9.7.3.3), and all parentheses within them
are considered non-capturing.
Match any string that starts with "J01" except the string "J01FA09".
You can do without a regex using
WHERE s LIKE 'J01%' AND s != 'J01FA09'
Here, LIKE 'J01%' requires a string to start with J01 and then may have any chars after, and s != 'J01FA09' will filter out the matches.
If you want to ahieve the same with a regex, use
WHERE s ~ '^J01(?!FA09$)'
The ^ matches the start of a string, J01 matches the literal J01 substring and (?!FA09$) asserts that right after J01 there is no FA09 followed with the end of string position. IF the FA09 appears and there is end of string after it, no match will be returned.
See the online demo:
CREATE TABLE table1
(s character varying)
;
INSERT INTO table1
(s)
VALUES
('J01NNN'),
('J01FFF'),
('J01FA09'),
('J02FA09')
;
SELECT * FROM table1 WHERE s ~ '^J01(?!FA09$)';
SELECT * FROM table1 WHERE s LIKE 'J01%' AND s != 'J01FA09';
RE is a right side operand:
SELECT 1
WHERE 'J01FA10' ~ '^J01(?!FA09)';
?column?
----------
1
(1 row)

replace regex does not work in postgresql

I have a table with a column of string. within the string there are single quote which I want to get rid of all single quotes.for example:
"''hey, hey, we're the monkees''"
my regex works perfect and select all the values containing single quotes.
select regexp_replace(colName, '%''%', '') from tblName;
but it does not update my table when I want to replace this regex with nothing.
UPDATE tblName SET colName = regexp_replace(colName, '%''%', '');
I also checked this one
UPDATE tblName SET colName = replace(colName, '%''%', '');
Different functions and operators in Postgres use one of three different pattern matching languages, as described in a dedicated section of the manual.
The % form you are using here is the SQL LIKE syntax, where % represents "any number of any character". But the function you are using, regexp_replace, expects a Posix regular expression, where the equivalent would be .* (. meaning any character, * meaning repeat zero or more times).
Also note that LIKE expressions have to match the whole string, but a Posix regex doesn't, unless you explicitly match the start of the string with ^ and the end with $.
So the direct translation of '%''%' would be '^.*''.*$', giving you this:
UPDATE tblName SET colName = regexp_replace(colName, '^.*''.*$', '');
In practice, this would give the same effect as the simpler:
UPDATE tblName SET colname='' WHERE colname LIKE '%''%';
Your actual use case is much simpler: you want to replace all occurrences of a fixed string (', which will need to be quoted and escaped as '''') with another fixed string (the empty string, written ''). So you don't need any pattern matching at all, just straight replacement using replace:
UPDATE tblName SET colname=replace(colname, '''', '');
This will probably be faster if you limit it to rows that contain an apostrophe to begin with:
UPDATE tblName SET colname=replace(colname, '''', '') WHERE colname LIKE '%''%';
% is not an regexp character
try this
select regexp_replace(colName, $$'$$, '','g') from tblName;
($$ is use to surround your string instead of ' to simplify the query)
(,'g') is use to continue after the first quote is found.
UPDATE tblName SET colName = regexp_replace(colName, $$'$$, '','g');

How to perform operations on a selected piece of string after regex in clojure

Base String:
SELECT (sum([column.one]) / sum([column.two])) AS [sum / sum], [column.three] AS [column.three] FROM [database.table] GROUP BY [column.three] ORDER BY [column.three] ASC
Resultant String:
SELECT (sum([column.one]) / sum([column.two])) AS [sum___sum], [column.three] AS [column.three] FROM [database.table] GROUP BY [column.three] ORDER BY [column.three] ASC
Here [sum / sum] could change to some other format like [sum * distinct] or [max + min - distinct]
What I have till now:
Replace all the values with [] around them with _:
(s/replace sql #"\[(.*?)\]" "_")
What I am trying:
If I can get the value that got matched, I can replace all special characters except dot (.) with an underscore.
(s/replace sql #"\[(.*?)\]" #(s/replace "$1" #"[\/\*\-\+\(\)\\\s]" "_"))
More clarity:
In short, anything inside [] can only be a combination of alphanumeric, dots, and underscores. Otherwise replace that character with underscore (_).
[Repeating my answer from comments]
In this case "$1" is not a valid syntax.
You are trying to replace something in literal string "$1", not in the matched string. You should operate the match passed by first replace in the second one. Just replace "$1" with (second %)
Ugly way would be simple line splitting with subs to first part and second part. Then add you "sum___sum" between those parts.
That would be quite simple if part to be replaced is always first "AS [" in your sql query string. You can use that to find right index-of from your string. That way you wouldn't need the regexp.
As mentioned earlier inserting string straight to the query might offer possibility to attack into your database using sql injection.
Better way would be use parameter(s) in your original query or create the query as a prepared statement.

Proper way to add unescaped text from a field to a regex in postgres?

What's the proper way to add a literal text value from a field to a regex in postgres?
For example, something like this where some_field could contain invalid regex syntax if left unescaped:
where some_text ~* ('\m' || some_field || '\M');
The easiest thing to do is to use a regex to prep your string to be in a regex. Escaping non-word characters in your string should be sufficient to make it regex-safe, for example:
=> select regexp_replace('. word * and µ{', E'([^\\w\\s])', E'\\\\\\1', 'g');
regexp_replace
--------------------
\. word \* and µ\{
So something like this should work in general:
where some_text ~* x || regexp_replace(some_field, E'([^\\w\\s])', E'\\\\\\1', 'g') || y
where x and y are the other parts of the regex.
If you didn't need a regex at the end (i.e. no y above), then you could use (?q):
An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE.
and a q means that the:
rest of RE is a literal ("quoted") string, all ordinary characters
So you could use:
where some_text ~* x || '(?q)' || some_field
in this limited case.

postgres regexp_replace want to allow only a-z and A-Z

In a table column in string we can have numbers/special chars/white spaces.
I want to replace numbers/special chars/white space with empty char, i see there is function named regexp_replace but how to use not much user friendly help avaialble for example i want to use following string.
String = 'abc$wanto&toremove#special~chars'
I want to remove all special chars and numbers from above string want to allow only a-z and A-Z rest of chars should be replaced with '' how to do that ?
SELECT regexp_replace('abc$wanto&toremove#special~chars', '[^a-zA-Z]', '', 'g');
regexp_replace
------------------------------
abcwantotoremovespecialchars
For me the following worked.
regexp_replace(code, '[^a-zA-Z0-9]+', '','g')
As it adds global filter so it repeats the regex for the entire string.
Example,
SELECT regexp_replace('Well- This Did-Not work&*($%%)_', '[^a-zA-Z0-9]+', '')
Returns: "WellThis Did-Not work&*($%%)_"
SELECT regexp_replace('Well- This Did-Not work&*($%%)_', '[^a-zA-Z0-9]+', '','g')
Returns: "WellThisDidNotwork"
Which has the characters we don't want removed.
To make it simpler:
regexp_replace('abc$wanto&toremove#special~chars', '[^[:alpha:]]')
If you want to replace the char with the closest not special char, you can do something like this:
select
translate(
lower( name ), ' ''àáâãäéèëêíìïîóòõöôúùüûçÇ', '--aaaaaeeeeiiiiooooouuuucc'
) as new_name,
name
from cities;
Should be:
regexp_replace('abc$wanto&toremove#special~chars', '[^a-zA-Z]+', '')