Using regex in WHERE in Postgres - regex

I currently have the the following query:
select regexp_matches(name, 'foo') from table;
How can I rewrite this so that the regex is in the where like the following (not working):
select * from table where regexp_matches(name, 'foo');
Current error message is:
ERROR: argument of WHERE must be type boolean, not type text[]
SQL state: 42804
Character: 29

Write instead:
select * from table where name ~ 'foo'
The '~' operator produces a boolean result for whether the regex matches or not rather than extracting the matching subgroups.

Related

Query to fetch data for list of regular expressions given

I have to fetch data for given list of regular expressions. for single regular expression below query is working but for list i am facing issue
select id from res r where
(?1 is null or CAST(r.value AS TEXT) ~ cast(?1 as TEXT));
?1 is [^\d{3}\d{1,}133\d{1,}$] and it is working fine
Now when i put list of regular expressions then [^\d{3}\d{1,}133\d{1,}$, 75$] it is not working
If ?1 type is text array (text[]) or a valid textual representation of text array syntax then
select id from res r where
?1 is null
or exists (select from unnest(?1::text[]) rx where r.value::text ~ rx);
Please note that the syntax of the text array of regular expressions shall not be
[^\d{3}\d{1,}133\d{1,}$, 75$] but {"^\\d{3}\\d{1,}133\\d{1,}$", 75$}
Edit
It might be a good idea to define a function that returns true if the first argument matches any of the regular expressions in an array. Something similar to a non-existent but good to have REGEX_IN operator.
create or replace function regex_any(needle text, haystack_rules text[])
returns boolean language sql immutable as $$
select exists (select from unnest(haystack_rules) haystack_rule where needle ~ haystack_rule);
$$;
Then your query will look like this:
select id from res r
where ?1 is null
or regex_any(r.value::text, ?1);
From the documentation, the ~ operator works against a single regular expression. You would need to update your query to work against a list of regular expressions. For example,
select id from res r where
(?1 is null or (select bool_and(r.value::text ~ x.exp) FROM unnest(?1)))
The second part of the where returns true if the column matches all regular expressions in ?1. Depending on the size of your input, you can extract unnest(?1) into a CTE.

Use Regex from a column in Redshift

I have 2 tables in Redshift, one of them has a column containing Regex strings. And I want to join them like so:
select *
from one o
join two t
on o.value ~ t.regex
But this query throws an error:
[Amazon](500310) Invalid operation: The pattern must be a valid UTF-8 literal character expression
Details:
-----------------------------------------------
error: The pattern must be a valid UTF-8 literal character expression
code: 8001
context:
query: 412993
location: cgx_impl.cpp:1911
process: padbmaster [pid=5211]
-----------------------------------------------;
As far as I understood from searching in the docs, the right side of a regex operator ~ must be a string literal.
So this would work:
select *
from one o
where o.value ~ 'regex'
And this would fail:
select *
from one o
where 'regex' ~ o.value
Is there any way around this? Anything I missed?
Thanks!
Here's a workaround I am using. Maybe it's not super fast, but it works:
First create a function:
CREATE FUNCTION is_regex_match(pattern text, s text) RETURNS BOOLEAN IMMUTABLE AS $$
import re
return True if re.search(pattern, s) else False
$$ LANGUAGE plpythonu;
Then use it like this (o.value contains a regex pattern):
select *
from one o
where is_regex_match(o.value, 'some string');
You could try using the built-in function regexp_substr()
https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_SUBSTR.html
select *
from one o
join two t
on regexp_substr(o.value, t.regex) <> ''
Edit example added of raw query
It appears that the fields must be explicitly cast as varchars when built.
with fake_table as (
SELECT 'sample value'::varchar as value, '[a-z]'::varchar as pattern
)
SELECT *
, regexp_substr(value, pattern)
FROM
fake_table
WHERE
regexp_substr(value, pattern) <>''

Selecting for a Jsonb array contains regex match

Given a data structure as follows:
{"single":"someText", "many":["text1", text2"]}
I can query a regex on single with
WHERE JsonBColumn ->> 'single' ~ '^some.*'
And I can query a contains match on the Array with
WHERE JsonBColumn -> 'many' ? 'text2'
What I would like to do is to do a contains match with a regex on the JArray
WHERE JsonBColumn -> 'many' {Something} '.*2$'
I found that it is also possible to convert the entire JSONB array to a plain text string and simply perform the regular expression on that. A side effect though is that a search on something like
xt 1", "text
would end up matching.
This approach isn't as clean since it doesn't search each element individually but it gets the job done with a visually simpler statement.
WHERE JsonBColumn ->>'many' ~ 'text2'
Use jsonb_array_elements_text() in lateral join.
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select distinct on (id) d.*
from
the_data d,
jsonb_array_elements_text(jsonbcolumn->'many') many(elem)
where elem ~ '^text.*';
id | jsonbcolumn
----+----------------------------------------------------
1 | {"many": ["text1", "text2"], "single": "someText"}
(1 row)
See also this answer.
If the feature is used frequently, you may want to write your own function:
create or replace function jsonb_array_regex_like(json_array jsonb, pattern text)
returns boolean language sql as $$
select bool_or(elem ~ pattern)
from jsonb_array_elements_text(json_array) arr(elem)
$$;
The function definitely simplifies the code:
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select *
from the_data
where jsonb_array_regex_like(jsonbcolumn->'many', '^text.*');

Postgres Query with Regex

I'm trying to create a regex to find (and then eventually replace) parts of strings in a PG DB. I'm using PSQL 9.0.4
I've tested my regex outside of PG and it works perfectly. However, it isn't playing well with PG. If anyone can help me understand what I'm doing wrong it would me much appreciated.
Regex:
{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}
Postgres Query:
SELECT COUNT(*) FROM (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}') as x;
Postgres Response:
WARNING: nonstandard use of escape in a string literal
LINE 1: ...M (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: syntax error at or near "mister_xx"
LINE 1: ..."Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_x...
In SQL, quotes are delimited as two quotes, for example:
'Child''s play'
Applying this to your regex makes it work:
SELECT COUNT(*)
FROM "Table"
WHERE "Column" ~ '{php}.*\n.*\n.*''mister_xx'']\)\);.*\n} \n{\/php}' as x;
Note also how the redundant subquery .
You need to double escape the backslashes and add an E before the statement:
SELECT * FROM "Table" WHERE "Column" ~ E'{php}\n.\n.*''mister_xx'']\)\);.*\n} \n{\/php}'

How to compare Unicode characters in SQL server?

Hi I am trying to find all rows in my database (SQL Server) which have character é in their text by executing the following queries.
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\xE9]%',question) > 0;
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\u00E9]%',question) > 0;
But I found two problems: (a) Both of them are returning different number of rows and (b) They are returning rows which do not have the specified character.
Is the way I am constructing the regular expression and comparing the Unicode correct?
EDIT:
The question column is stored using datatype nvarchar.
The following query gives the correct result though.
SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%';
Why not use SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%'?
NB: Likeand patindex do not accept regular expressions.
In the SQL Server pattern syntax [\xE9] means match any single character within the specified set. i.e. match \, x, E or 9. So any of the following strings would match that pattern.
"Elephant"
"axis"
"99.9"