Related
I need to remove (by means of a function) possible non-latin characters (chinese, japanese, ...) by means of a regex expression from a Postgres database table.
I have tried all solutions I could find online, but nothing seems to work.
CREATE OR REPLACE FUNCTION public.function_104(param text)
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
EXECUTE 'UPDATE public.' || quote_ident(param) || ' SET "name" = REGEXP_REPLACE("name", [^x00-x7F]+, " ")';
END
$function$
I keep running into following error message :
psycopg2.errors.SyntaxError: syntax error at or near "["
LINE 1: ..._roads_free_1 SET "name" = REGEXP_REPLACE("name", [^x00-x7F]...
^
QUERY: UPDATE public.gis_osm_roads_free_1 SET "name" = REGEXP_REPLACE("name", [^x00-x7F]+, " ")
CONTEXT: PL/pgSQL function afri_terra_104(text) line 6 at EXECUTE
```
You must put the regex between single quotes, as well as the replacement text. Since it is a dynamic query, you must escape the single quotes by doubling them:
CREATE OR REPLACE FUNCTION public.function_104(param text)
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
EXECUTE 'UPDATE public.' || quote_ident(param) ||
' SET "name" = REGEXP_REPLACE("name", ''[^x00-x7F]+'', '' '')';
END
$function$;
insert into t104(name) values('abcé');
INSERT 0 1
select function_104('t104');
function_104
--------------
(1 row)
select * from t104;
name
------
abc
(1 row)
I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end
Experimenting with the language I've found that select is defined in the global scope and its precedence is higher than local variables.
def example(select)
puts select
end
example 3
# Syntax error in eval:3: unexpected token: end (expecting when, else or end)
So experimenting with select step by step I get this:
select 1 end
# Syntax error in eval:3: unexpected token: end (expecting when, else or end)
and then
select when 1 end
# Syntax error in eval:1: invalid select when expression: must be an assignment or call
then
select when x = 1 end
# Syntax error in eval:1: invalid select when expression: must be an assignment or call
then
select when x
# Syntax error in eval:1: unexpected token: EOF (expecting ',', ';' or '
I'll skip ahead a few steps as you should have an idea of how I've come to my question…
select when x;
else y
end
# Error in line 1: undefined local variable or method 'x_select_action'
and lastly
x_select_action = 4
select when x;
else y
end
# Error in line 3: undefined method 'x_select_action' (If you declared 'x_select_action' in a suffix if, declare it in a regular if for this to work. If the variable was declared in a macro it's not visible outside it)
So there is this keyword in the language which precedes local variables precedence and I don't know what it's for. But apparently it looks for x_select_action when x is given as a when clause. What is this select for and how is it meant to be used?
Searching online I see select defined on Enumerable, Hash, Channel, and Array… but at first glance these don't seem to be it.
Thanks for the help!
It's similar to Go's select: https://tour.golang.org/concurrency/5
But it still needs some tweaks to be finished, that's why there are no docs about it yet.
I have a column containing an array of authors. How can I use the ~* operator to check if any of its values match a given regular expression?
The ~* operator takes the string to check on the left and the regular expression to match on the right. The documentation says the ANY operator has to be on the right side so, obviously
SELECT '^p' ~* ANY(authors) FROM book;
does not work as PostgreSQL tries to match the string ^p against expressions contained in the array.
Any idea?
The first obvious idea is to use your own regexp-matching operator with commuted arguments:
create function commuted_regexp_match(text,text) returns bool as
'select $2 ~* $1;'
language sql;
create operator ~!## (
procedure=commuted_regexp_match(text,text),
leftarg=text, rightarg=text
);
Then you may use it like this:
SELECT '^p' ~!## ANY(authors) FROM book;
Another different way of looking at it to unnest the array and formulate in SQL the equivalent of the ANY construct:
select bool_or(r) from
(select author ~* '^j' as r
from (select unnest(authors) as author from book) s1
) s2;
SELECT * FROM book where EXISTS ( SELECT * from unnest(author) as X where x ~* '^p' )
Here's an idea if you can make reasonable assumptions about the data. Just concatenate the array into a string and do a regex-search against the whole string.
select array_to_string(ARRAY['foo bar', 'moo cow'], ',') ~ 'foo'
Off the cuff, and without any measurements to back me up, I would say that most performance issues related to the regex stuff could be dealt with by smart uses of regex, and maybe some special delimiter characters. Creating the string may be a performance issue, but I wouldn't even dare to speculate on that.
I use this:
create or replace function regexp_match_array(a text[], regexp text)
returns boolean
strict immutable
language sql as $_$
select exists (select * from unnest(a) as x where x ~ regexp);
$_$;
comment on function regexp_match_array(text[], text) is
'returns TRUE if any element of a matches regexp';
create operator ~ (
procedure=regexp_match_array,
leftarg=text[], rightarg=text
);
comment on operator ~(text[], text) is
'returns TRUE if any element of ARRAY (left) matches REGEXP (right); think ANY(ARRAY) ~ REGEXP';
Then use it much like you'd use ~ with text scalars:
=> select distinct gl from x where gl ~ 'SH' and array_length(gl,1) < 7;
┌──────────────────────────────────────┐
│ gl │
├──────────────────────────────────────┤
│ {MSH6} │
│ {EPCAM,MLH1,MSH2,MSH6,PMS2} │
│ {SH3TC2} │
│ {SHOC2} │
│ {BRAF,KRAS,MAP2K1,MAP2K2,SHOC2,SOS1} │
│ {MSH2} │
└──────────────────────────────────────┘
(6 rows)
You can define your own operator to do what you want.
Reverse the order of the arguments and call the appropriate function :
create function revreg (text, text) returns boolean
language sql immutable
as $$ select texticregexeq($2,$1); $$;
(revreg ... please choose your favorite name).
Add a new operator using our revreg() function :
CREATE OPERATOR ### (
PROCEDURE = revreg,
LEFTARG = text,
RIGHTARG = text
);
Test:
test=# SELECT '^p' ### ANY(ARRAY['ika', 'pchu']);
t
test=# SELECT '^p' ### ANY(ARRAY['ika', 'chu']);
f
test=# SELECT '^p' ### ANY(ARRAY['pika', 'pchu']);
t
test=# SELECT '^p' ### ANY(ARRAY['pika', 'chu']);
t
Note that you may want to set JOIN and RESTICT clauses to the new operator to help the planner.
My solution
SELECT a.* FROM books a
CROSS JOIN LATERAL (
SELECT author
FROM unnest(authors) author
WHERE author ~ E'p$'
LIMIT 1
)b;
Use cross lateral join, subquery is evaluated for every row of table "books", if one of rows returned by unnest, meets the condition, subquery returns one row (becouse of limit).
I use a generalization of Reece's approach:
select format($$
create function %1$s(a text[], regexp text) returns boolean
strict immutable language sql as
%2$L;
create operator %3$s (procedure=%1$s, leftarg=text[], rightarg=text);
$$, /*1*/nameprefix||'_array_'||oname, /*2*/q1||o||q2, /*3*/oprefix||o
)
from (values
('tilde' , '~' ), ('bang_tilde' , '!~' ),
('tilde_star' , '~*' ), ('bang_tilde_star' , '!~*' ),
('dtilde' , '~~' ), ('bang_dtilde' , '!~~' ),
('dtilde_star', '~~*'), ('bang_dtilde_star', '!~~*')
) as _(oname, o),
(values
('any', '', 'select exists (select * from unnest(a) as x where x ', ' regexp);'),
('all', '#', 'select true = all (select x ', ' regexp from unnest(a) as x);')
) as _2(nameprefix, oprefix, q1, q2)
\gexec
Executing this in psql creates 16 functions and 16 operators that cover all applicable 8 matching operators for arrays -- plus 8 variations prefixed with # that implement the ALL equivalent.
Very handy!
I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end