Postgres Query with Regex - regex

I'm trying to create a regex to find (and then eventually replace) parts of strings in a PG DB. I'm using PSQL 9.0.4
I've tested my regex outside of PG and it works perfectly. However, it isn't playing well with PG. If anyone can help me understand what I'm doing wrong it would me much appreciated.
Regex:
{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}
Postgres Query:
SELECT COUNT(*) FROM (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}') as x;
Postgres Response:
WARNING: nonstandard use of escape in a string literal
LINE 1: ...M (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: syntax error at or near "mister_xx"
LINE 1: ..."Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_x...

In SQL, quotes are delimited as two quotes, for example:
'Child''s play'
Applying this to your regex makes it work:
SELECT COUNT(*)
FROM "Table"
WHERE "Column" ~ '{php}.*\n.*\n.*''mister_xx'']\)\);.*\n} \n{\/php}' as x;
Note also how the redundant subquery .

You need to double escape the backslashes and add an E before the statement:
SELECT * FROM "Table" WHERE "Column" ~ E'{php}\n.\n.*''mister_xx'']\)\);.*\n} \n{\/php}'

Related

BigQuery regexp replace character between quotes

I'm trying to use the BigQuery function regexp_replace for the following scenario:
Given a string field with comma as a delimiter, I need to only remove the commas within double quotes.
I found the following regex to work in the website but it seems that the BigQuery function doesn't support Lookahead groups. Could you please help me find an equivalent expression that is supported by the Big Query function regexp_replace?
https://regex101.com/r/nxkqtb/3
Big Query example code not supported:
WITH tbl AS (
SELECT 'LINE_NR="1",TXT_FIELD="Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="2",TXT_FIELD=",,Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="3",TXT_FIELD="Some text ,",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="4",TXT_FIELD=",Some ,text,",CID="0"' as text
)
SELECT
REGEXP_REPLACE(text, r'(?m),(?=[^"]*"(?:[^"\r\n]*"[^"]*")*[^"\r\n]*$)', "")
FROM tbl;
Thank you
Consider below approach (assuming you know in advance keys within the text field)
select text,
( select string_agg(replace(kv, ',', ''), ',' order by offset)
from unnest(regexp_extract_all(text, r'((?:LINE_NR|TXT_FIELD|CID)=".*?")')) kv with offset
) corrected_text
from tbl;
if applied to sample data in your question - output is

Postgres search a char X but not XX

Hi I'm trying to find strings in a table that have '=' but not that are '==' in a postgrs table. If I use the following search
SELECT * FROM someTable where someColumn ~ ' R ';
I find all string with R. But I want to exlude this one that are RR, but if a string has 'something R other RR other' I would it have as result.
Can you geve me some tips on how to resolve this?
Tank's.
You can try and do something like so: SELECT * FROM someTable where someColumn ~ ' R[^R] ';
This should match any string R which is not followed by another R.
If you want to use a regex, word boundaries, \y can be used here:
select * from your_table where s ~ '\yR\y';
See PostgreSQL documentation:
\y matches only at the beginning or end of a word
See an online test:
CREATE TABLE table1
(s character varying)
;
INSERT INTO table1
(s)
VALUES
('R'),
('that are RR'),
('that are R')
;
select * from table1 where s ~ '\yR\y';
Output:
s
1 R
2 that are R

Use Regex from a column in Redshift

I have 2 tables in Redshift, one of them has a column containing Regex strings. And I want to join them like so:
select *
from one o
join two t
on o.value ~ t.regex
But this query throws an error:
[Amazon](500310) Invalid operation: The pattern must be a valid UTF-8 literal character expression
Details:
-----------------------------------------------
error: The pattern must be a valid UTF-8 literal character expression
code: 8001
context:
query: 412993
location: cgx_impl.cpp:1911
process: padbmaster [pid=5211]
-----------------------------------------------;
As far as I understood from searching in the docs, the right side of a regex operator ~ must be a string literal.
So this would work:
select *
from one o
where o.value ~ 'regex'
And this would fail:
select *
from one o
where 'regex' ~ o.value
Is there any way around this? Anything I missed?
Thanks!
Here's a workaround I am using. Maybe it's not super fast, but it works:
First create a function:
CREATE FUNCTION is_regex_match(pattern text, s text) RETURNS BOOLEAN IMMUTABLE AS $$
import re
return True if re.search(pattern, s) else False
$$ LANGUAGE plpythonu;
Then use it like this (o.value contains a regex pattern):
select *
from one o
where is_regex_match(o.value, 'some string');
You could try using the built-in function regexp_substr()
https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_SUBSTR.html
select *
from one o
join two t
on regexp_substr(o.value, t.regex) <> ''
Edit example added of raw query
It appears that the fields must be explicitly cast as varchars when built.
with fake_table as (
SELECT 'sample value'::varchar as value, '[a-z]'::varchar as pattern
)
SELECT *
, regexp_substr(value, pattern)
FROM
fake_table
WHERE
regexp_substr(value, pattern) <>''

Regex Syntax Misuse

I am running this query:
Select column_name
from table
where column_name ~ '%[A-Za-z]%'
group by column_name
but I am not getting any results. What am I doing wrong?
Goal: This is a column that includes phone numbers. I am trying to find any values that contain string characters.
I don't understand why ilike does not support regex
This is what I found here
The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific.
Doesn't this mean I can use ilike by using ~~*?
Edit: What I have learned so far
Don't use like, use tilda.
Where column_name ~ '%[A]%'
Where column_name ~ $$[A]$$ does work.
Theory: It has something to with the dollar signs or the apostrophe.
Result: It was the % signs.
According to RegexBuddy, the correct syntax for the WHERE clause is:
WHERE mycolumn ~ $$[A-Z]$$

escaping bracket in postgresql query

I am trying to escape a bracket in a pattern matching expression for PostgreSQL 8.2
The clause looks something like:
WHERE field SIMILAR TO '%UPC=\[ R%%(\mLE)%'
but I keep getting:
ERROR: invalid regular expression: brackets [] not balanced
Try this:
select '%UPC=\[ R%%(\mLE)%';
WARNING: nonstandard use of escape in a string literal
LINE 1: select '%UPC=\[ R%%(\mLE)%';
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
?column?
------------------
%UPC=[ R%%(mLE)%
(1 row)
You need to set Postgres in standard conforming strings mode instead of backward compatible mode.
set standard_conforming_strings=1;
select '%UPC=\[ R%%(\mLE)%';
?column?
--------------------
%UPC=\[ R%%(\mLE)%
(1 row)
Or you need to use escape string syntax which works regardless of mode:
set standard_conforming_strings=1;
select E'%UPC=\\[ R%%(\\mLE)%';
?column?
--------------------
%UPC=\[ R%%(\mLE)%
(1 row)
set standard_conforming_strings=0;
select E'%UPC=\\[ R%%(\\mLE)%';
?column?
--------------------
%UPC=\[ R%%(\mLE)%
(1 row)
You can set this setting in postgresql.conf for all databases, using alter database for single database, using alter user for single user or group of users or using set for current connection.