Use Regex from a column in Redshift - regex

I have 2 tables in Redshift, one of them has a column containing Regex strings. And I want to join them like so:
select *
from one o
join two t
on o.value ~ t.regex
But this query throws an error:
[Amazon](500310) Invalid operation: The pattern must be a valid UTF-8 literal character expression
Details:
-----------------------------------------------
error: The pattern must be a valid UTF-8 literal character expression
code: 8001
context:
query: 412993
location: cgx_impl.cpp:1911
process: padbmaster [pid=5211]
-----------------------------------------------;
As far as I understood from searching in the docs, the right side of a regex operator ~ must be a string literal.
So this would work:
select *
from one o
where o.value ~ 'regex'
And this would fail:
select *
from one o
where 'regex' ~ o.value
Is there any way around this? Anything I missed?
Thanks!

Here's a workaround I am using. Maybe it's not super fast, but it works:
First create a function:
CREATE FUNCTION is_regex_match(pattern text, s text) RETURNS BOOLEAN IMMUTABLE AS $$
import re
return True if re.search(pattern, s) else False
$$ LANGUAGE plpythonu;
Then use it like this (o.value contains a regex pattern):
select *
from one o
where is_regex_match(o.value, 'some string');

You could try using the built-in function regexp_substr()
https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_SUBSTR.html
select *
from one o
join two t
on regexp_substr(o.value, t.regex) <> ''
Edit example added of raw query
It appears that the fields must be explicitly cast as varchars when built.
with fake_table as (
SELECT 'sample value'::varchar as value, '[a-z]'::varchar as pattern
)
SELECT *
, regexp_substr(value, pattern)
FROM
fake_table
WHERE
regexp_substr(value, pattern) <>''

Related

Query to fetch data for list of regular expressions given

I have to fetch data for given list of regular expressions. for single regular expression below query is working but for list i am facing issue
select id from res r where
(?1 is null or CAST(r.value AS TEXT) ~ cast(?1 as TEXT));
?1 is [^\d{3}\d{1,}133\d{1,}$] and it is working fine
Now when i put list of regular expressions then [^\d{3}\d{1,}133\d{1,}$, 75$] it is not working
If ?1 type is text array (text[]) or a valid textual representation of text array syntax then
select id from res r where
?1 is null
or exists (select from unnest(?1::text[]) rx where r.value::text ~ rx);
Please note that the syntax of the text array of regular expressions shall not be
[^\d{3}\d{1,}133\d{1,}$, 75$] but {"^\\d{3}\\d{1,}133\\d{1,}$", 75$}
Edit
It might be a good idea to define a function that returns true if the first argument matches any of the regular expressions in an array. Something similar to a non-existent but good to have REGEX_IN operator.
create or replace function regex_any(needle text, haystack_rules text[])
returns boolean language sql immutable as $$
select exists (select from unnest(haystack_rules) haystack_rule where needle ~ haystack_rule);
$$;
Then your query will look like this:
select id from res r
where ?1 is null
or regex_any(r.value::text, ?1);
From the documentation, the ~ operator works against a single regular expression. You would need to update your query to work against a list of regular expressions. For example,
select id from res r where
(?1 is null or (select bool_and(r.value::text ~ x.exp) FROM unnest(?1)))
The second part of the where returns true if the column matches all regular expressions in ?1. Depending on the size of your input, you can extract unnest(?1) into a CTE.

SQL pattern matching using regular expression

Can we use Regex i.e, Regular Expression in SQL Server? I'm using SQL-2012 and 2014 and there is an requirement to match and return input from my stored procedure.
I can't use LIKE in this situation since like only returns matching words, Using Regex I can match whole bunch of characters like Space, Hyphen, Numbers.
Here is my SP
--Suppose XYZ P is my Search Condition
Declare #Condition varchar(50) = 'XYZ P'
CREATE PROCEDURE [dbo].[usp_MATCHNAME]
#Condition varchar(25)
as
Begin
select * from tblPerson
where UPPER(Name) like UPPER(#Condition) + '%'
-- It should return both XYZ P and xyzp
End
Here my SP is going to return all matching condition where Name=XYZ P, but how to retrieve other Column having Name as [XYZP, XYZ-P]
and if search condition have any Alphanumeric value like
--Suppose XYZ 1 is my Search Condition
Declare #Condition varchar(50) = 'XYZ 1'
Then my search result should also return nonspace value like [XYZ1, xyz1, Xyz -1].
I don't want to use Substring by finding space and splitting them based on space and then matching.
Note: My input condition i.e., #Condition can have both Space or Space less, Hyphen(-) value when executing Stored Procedure.
Use REPLACE command.
It will replace the single space into %, so it will return your expected results:
SELECT *
FROM tblPerson
WHERE UPPER(Name) LIKE REPLACE(UPPER(#Condition), ' ', '%') + '%'

Selecting for a Jsonb array contains regex match

Given a data structure as follows:
{"single":"someText", "many":["text1", text2"]}
I can query a regex on single with
WHERE JsonBColumn ->> 'single' ~ '^some.*'
And I can query a contains match on the Array with
WHERE JsonBColumn -> 'many' ? 'text2'
What I would like to do is to do a contains match with a regex on the JArray
WHERE JsonBColumn -> 'many' {Something} '.*2$'
I found that it is also possible to convert the entire JSONB array to a plain text string and simply perform the regular expression on that. A side effect though is that a search on something like
xt 1", "text
would end up matching.
This approach isn't as clean since it doesn't search each element individually but it gets the job done with a visually simpler statement.
WHERE JsonBColumn ->>'many' ~ 'text2'
Use jsonb_array_elements_text() in lateral join.
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select distinct on (id) d.*
from
the_data d,
jsonb_array_elements_text(jsonbcolumn->'many') many(elem)
where elem ~ '^text.*';
id | jsonbcolumn
----+----------------------------------------------------
1 | {"many": ["text1", "text2"], "single": "someText"}
(1 row)
See also this answer.
If the feature is used frequently, you may want to write your own function:
create or replace function jsonb_array_regex_like(json_array jsonb, pattern text)
returns boolean language sql as $$
select bool_or(elem ~ pattern)
from jsonb_array_elements_text(json_array) arr(elem)
$$;
The function definitely simplifies the code:
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select *
from the_data
where jsonb_array_regex_like(jsonbcolumn->'many', '^text.*');

Postgres Query with Regex

I'm trying to create a regex to find (and then eventually replace) parts of strings in a PG DB. I'm using PSQL 9.0.4
I've tested my regex outside of PG and it works perfectly. However, it isn't playing well with PG. If anyone can help me understand what I'm doing wrong it would me much appreciated.
Regex:
{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}
Postgres Query:
SELECT COUNT(*) FROM (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}') as x;
Postgres Response:
WARNING: nonstandard use of escape in a string literal
LINE 1: ...M (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: syntax error at or near "mister_xx"
LINE 1: ..."Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_x...
In SQL, quotes are delimited as two quotes, for example:
'Child''s play'
Applying this to your regex makes it work:
SELECT COUNT(*)
FROM "Table"
WHERE "Column" ~ '{php}.*\n.*\n.*''mister_xx'']\)\);.*\n} \n{\/php}' as x;
Note also how the redundant subquery .
You need to double escape the backslashes and add an E before the statement:
SELECT * FROM "Table" WHERE "Column" ~ E'{php}\n.\n.*''mister_xx'']\)\);.*\n} \n{\/php}'

Using regex in WHERE in Postgres

I currently have the the following query:
select regexp_matches(name, 'foo') from table;
How can I rewrite this so that the regex is in the where like the following (not working):
select * from table where regexp_matches(name, 'foo');
Current error message is:
ERROR: argument of WHERE must be type boolean, not type text[]
SQL state: 42804
Character: 29
Write instead:
select * from table where name ~ 'foo'
The '~' operator produces a boolean result for whether the regex matches or not rather than extracting the matching subgroups.