How to use regular expression with ANY array operator - regex

I have a column containing an array of authors. How can I use the ~* operator to check if any of its values match a given regular expression?
The ~* operator takes the string to check on the left and the regular expression to match on the right. The documentation says the ANY operator has to be on the right side so, obviously
SELECT '^p' ~* ANY(authors) FROM book;
does not work as PostgreSQL tries to match the string ^p against expressions contained in the array.
Any idea?

The first obvious idea is to use your own regexp-matching operator with commuted arguments:
create function commuted_regexp_match(text,text) returns bool as
'select $2 ~* $1;'
language sql;
create operator ~!## (
procedure=commuted_regexp_match(text,text),
leftarg=text, rightarg=text
);
Then you may use it like this:
SELECT '^p' ~!## ANY(authors) FROM book;
Another different way of looking at it to unnest the array and formulate in SQL the equivalent of the ANY construct:
select bool_or(r) from
(select author ~* '^j' as r
from (select unnest(authors) as author from book) s1
) s2;

SELECT * FROM book where EXISTS ( SELECT * from unnest(author) as X where x ~* '^p' )

Here's an idea if you can make reasonable assumptions about the data. Just concatenate the array into a string and do a regex-search against the whole string.
select array_to_string(ARRAY['foo bar', 'moo cow'], ',') ~ 'foo'
Off the cuff, and without any measurements to back me up, I would say that most performance issues related to the regex stuff could be dealt with by smart uses of regex, and maybe some special delimiter characters. Creating the string may be a performance issue, but I wouldn't even dare to speculate on that.

I use this:
create or replace function regexp_match_array(a text[], regexp text)
returns boolean
strict immutable
language sql as $_$
select exists (select * from unnest(a) as x where x ~ regexp);
$_$;
comment on function regexp_match_array(text[], text) is
'returns TRUE if any element of a matches regexp';
create operator ~ (
procedure=regexp_match_array,
leftarg=text[], rightarg=text
);
comment on operator ~(text[], text) is
'returns TRUE if any element of ARRAY (left) matches REGEXP (right); think ANY(ARRAY) ~ REGEXP';
Then use it much like you'd use ~ with text scalars:
=> select distinct gl from x where gl ~ 'SH' and array_length(gl,1) < 7;
┌──────────────────────────────────────┐
│ gl │
├──────────────────────────────────────┤
│ {MSH6} │
│ {EPCAM,MLH1,MSH2,MSH6,PMS2} │
│ {SH3TC2} │
│ {SHOC2} │
│ {BRAF,KRAS,MAP2K1,MAP2K2,SHOC2,SOS1} │
│ {MSH2} │
└──────────────────────────────────────┘
(6 rows)

You can define your own operator to do what you want.
Reverse the order of the arguments and call the appropriate function :
create function revreg (text, text) returns boolean
language sql immutable
as $$ select texticregexeq($2,$1); $$;
(revreg ... please choose your favorite name).
Add a new operator using our revreg() function :
CREATE OPERATOR ### (
PROCEDURE = revreg,
LEFTARG = text,
RIGHTARG = text
);
Test:
test=# SELECT '^p' ### ANY(ARRAY['ika', 'pchu']);
t
test=# SELECT '^p' ### ANY(ARRAY['ika', 'chu']);
f
test=# SELECT '^p' ### ANY(ARRAY['pika', 'pchu']);
t
test=# SELECT '^p' ### ANY(ARRAY['pika', 'chu']);
t
Note that you may want to set JOIN and RESTICT clauses to the new operator to help the planner.

My solution
SELECT a.* FROM books a
CROSS JOIN LATERAL (
SELECT author
FROM unnest(authors) author
WHERE author ~ E'p$'
LIMIT 1
)b;
Use cross lateral join, subquery is evaluated for every row of table "books", if one of rows returned by unnest, meets the condition, subquery returns one row (becouse of limit).

I use a generalization of Reece's approach:
select format($$
create function %1$s(a text[], regexp text) returns boolean
strict immutable language sql as
%2$L;
create operator %3$s (procedure=%1$s, leftarg=text[], rightarg=text);
$$, /*1*/nameprefix||'_array_'||oname, /*2*/q1||o||q2, /*3*/oprefix||o
)
from (values
('tilde' , '~' ), ('bang_tilde' , '!~' ),
('tilde_star' , '~*' ), ('bang_tilde_star' , '!~*' ),
('dtilde' , '~~' ), ('bang_dtilde' , '!~~' ),
('dtilde_star', '~~*'), ('bang_dtilde_star', '!~~*')
) as _(oname, o),
(values
('any', '', 'select exists (select * from unnest(a) as x where x ', ' regexp);'),
('all', '#', 'select true = all (select x ', ' regexp from unnest(a) as x);')
) as _2(nameprefix, oprefix, q1, q2)
\gexec
Executing this in psql creates 16 functions and 16 operators that cover all applicable 8 matching operators for arrays -- plus 8 variations prefixed with # that implement the ALL equivalent.
Very handy!

Related

sqlite valid email input [duplicate]

I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end

PostgreSQL 9.4 UPDATE...FROM error with REGEX operators

I'm trying to use a regex pattern matching with PostgreSQL 9.4:
Have looked through previous answers but nothing I can find matches this particular problem
select 'apple' ~ '^.*pp.*$' returns 't' as expected
update <table> set column = 'value' where name ~* '^.*pp.*$' also works.
But:
update <table> set column = 'value' from <other_table> where name ~* '^.*pp.*$' produces an error:
The specific example:
update
members set
pattern = a.pattern
from
services a
where
organisation ~* '^.*' || replace(a.pattern, ' ', '.*') || '.*$';
ERROR: argument of WHERE must be type boolean, not type text
LINE 1: ...attern = a.pattern from services a where organisati...
It seems the where clause after the FROM table in the update is not recognising or processing the regex operator correctly.
Or, equally probably, I'm misunderstanding the UPDATE...FROM syntax
Many thanks if you can help
you are missing brackets around string expressions. These operators (~ and ||) has same priority and then are evaluated from left.
postgres=# update foo set b = a where a ~ 'ab';
UPDATE 1
postgres=# update foo set b = a where a ~ 'ab' || 'xxxx';
ERROR: argument of WHERE must be type boolean, not type text
LINE 1: update foo set b = a where a ~ 'ab' || 'xxxx';
^
postgres=# update foo set b = a where a ~ ('ab' || 'xxxx');
UPDATE 0

How to get the more than one mached keywords using regexp_matches

How to get the more than one matched keywords in a given string.
Please find the below query.
SELECT regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING');
output: "{BAKERY}"
the above scenario given string is matched with two keywords.
when i execute the above query getting only one keyword only.
How to get other matched keywords.
g is a global search flag using in regex.Is used to get all the matching strings
select regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g')
regexp_matches
text[]
--------------
{BAKERY}
{BAKING}
to get the result as a single row :
SELECT ARRAY(select array_to_string(regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g'),''));
array
text[]
---------------
{BAKERY,BAKING}
by using unnest - to convert the array returned to a table
select unnest(regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g'))
unnest
text
------
BAKERY
BAKING
accoring to: http://www.postgresql.org/docs/9.5/static/functions-string.html
SELECT regexp_matches(UPPER('bakerybaking'),'(BAKERY)(BAKING)');
Otput:)
regexp_matches
----------------- {BAKERY,BAKING} (1 row)
Oh the humanity. Please thank me.
--https://stackoverflow.com/questions/52178844/get-second-match-from-regexp-matches-results
--https://stackoverflow.com/questions/24274394/postgresql-8-2-how-to-get-a-string-representation-of-any-array
CREATE OR REPLACE FUNCTION aaa(anyarray,Integer, text)
RETURNS SETOF text
LANGUAGE plpgsql
AS $function$
DECLARE s $1%type;
BEGIN
FOREACH s SLICE 1 IN ARRAY $1[$2:$2] LOOP
RETURN NEXT array_to_string(s,$3);
END LOOP;
RETURN;
END;
$function$;
--SELECT aaa((ARRAY(SELECT unnest(regexp_matches('=If(If(E_Reports_# >=1, DMT(E_Date_R1_#, DateShift),0)', '(\w+_#)|([0-9]+)','g'))::TEXT)),1,',')
--select (array[1,2,3,4,5,6])[2:5];
SELECT aaa(array_remove(Array(SELECT unnest(regexp_matches('=If(If(E_Reports_# >=1, DMT(E_Date_R1_#, DateShift),0)', '(\w+_#)|([0-9]+)','g'))::TEXT), Null),3,',')

How to find all the source lines containing desired table names from user_source by using 'regexp'

For example we have a large database contains lots of oracle packages, and now we want to see where a specific table resists in the source code. The source code is stored in user_source table and our desired table is called 'company'.
Normally, I would like to use:
select * from user_source
where upper(text) like '%COMPANY%'
This will return all words containing 'company', like
121 company cmy
14 company_id, idx_name %% end of coding
453 ;companyname
1253 from db.company.company_id where
989 using company, idx, db_name,
So how to make this result more intelligent using regular expression to parse all the source lines matching a meaningful table name (means a table to the compiler)?
So normally we allow the matched word contains chars like . ; , '' "" but not _
Can anyone make this work?
To find company as a "whole word" with a regular expression:
SELECT * FROM user_source
WHERE REGEXP_LIKE(text, '(^|\s)company(\s|$)', 'i');
The third argument of i makes the REGEXP_LIKE search case-insensitive.
As far as ignoring the characters . ; , '' "", you can use REGEXP_REPLACE to suck them out of the string before doing the comparison:
SELECT * FROM user_source
WHERE REGEXP_LIKE(REGEXP_REPLACE(text, '[.;,''"]'), '(^|\s)company(\s|$)', 'i');
Addendum: The following query will also help locate table references. It won't give the source line, but it's a start:
SELECT *
FROM user_dependencies
WHERE referenced_name = 'COMPANY'
AND referenced_type = 'TABLE';
If you want to identify the objects that refer to your table, you can get that information from the data dictionary:
select *
from all_dependencies
where referenced_owner = 'DB'
and referenced_name = 'COMPANY'
and referenced_type = 'TABLE';
You can't get the individual line numbers from that, but you can then either look at user_source or use a regexp on the specific source code, which woudl at least reduce false positives.
SELECT * FROM user_source
WHERE REGEXP_LIKE(text,'([^_a-z0-9])company([^_a-z0-9])','i')
Thanks #Ed Gibbs, with a little trick this modified answer could be more intelligent.

How do I use regex in a SQLite query?

I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end