Is there a database that can store regex as values? - regex

I am looking for a database that can store regex expressions as values. E.g. somthing like this:
{:name => "Tim", :count => 3, :expression => /t+/},
{:name => "Rob", :count => 4, :expression => /a\d+/},
{:name => "Fil", :count => 1, :expression => /tt/},
{:name => "Marc", :count => 1, :expression => /bb/}
So I could return rows/documents based on whether the query matches the expression or not (e.g."FIND rows WHERE "tt" =~ :expression"). And get Tim and Fil rows as the result. Most databases can do the exactly opposite thing (check whether a text field matches a regex query). But neither mongo nor postgres can do the opposite thing, unfortunately.
P.S. Or perhaps I am wrong and there are some extensions for postgres or mongo that allow me to store regex?

MongoDB will allow you to store actual regular expressions (i.e. not a string representing a regular expression), as shown below:
> db.mycoll.insertOne({myregex: /aa/})
{
"acknowledged" : true,
"insertedId" : ObjectId("5826414249bf0898c1059b38")
}
> db.mycoll.insertOne({myregex: /a+/})
{
"acknowledged" : true,
"insertedId" : ObjectId("5826414949bf0898c1059b39")
}
> db.mycoll.find()
{ "_id" : ObjectId("5826414249bf0898c1059b38"), "myregex" : /aa/ }
{ "_id" : ObjectId("5826414949bf0898c1059b39"), "myregex" : /a+/ }
You can use this to then query for rows with a regex that matches a query, as follows:
> db.mycoll.find(function() { return this.myregex.test('a'); } )
{ "_id" : ObjectId("5826414949bf0898c1059b39"), "myregex" : /a+/ }
Here we search for rows where the string 'a' is matched by the myregex field, resulting in the second document, with regex /a+/, being returned.

Oracle database can do that.
Example query: WHERE REGEXP_LIKE(first_name, '^Ste(v|ph)en$')
You want to select an regexp from a column, See SQL Fiddle example below for an example.
SQL Fiddle
Choose Oracle database.
In schema window execute the following:
CREATE TABLE regexp (name VARCHAR2(20), count NUMBER, regexp VARCHAR2(50));
INSERT INTO regexp VALUES ('Tim', 3, 't+');
INSERT INTO regexp VALUES ('Rob', 4, 'a\d+');
INSERT INTO regexp VALUES ('Fil', 1, 'tt');
INSERT INTO regexp VALUES ('Marc', 1, 'bb');
COMMIT;
Execute an SQL statement, e.g. (as you mentioned in your question):
SELECT * FROM regexp WHERE REGEXP_LIKE('tt', regexp);
Yields:
NAME COUNT REGEXP
Tim 3 t+
Fil 1 tt
Reference here.
Excerpt:
Oracle Database implements regular expression support with a set of
Oracle Database SQL functions and conditions that enable you to search
and manipulate string data. You can use these functions in any
environment that supports Oracle Database SQL. You can use these
functions on a text literal, bind variable, or any column that holds
character data such as CHAR, NCHAR, CLOB, NCLOB, NVARCHAR2, and
VARCHAR2 (but not LONG).
And some more info to consider:
A string literal in a REGEXP function or condition conforms to the
rules of SQL text literals. By default, regular expressions must be
enclosed in single quotes. If your regular expression includes the
single quote character, then enter two single quotation marks to
represent one single quotation mark within the expression. This
technique ensures that the entire expression is interpreted by the SQL
function and improves the readability of your code. You can also use
the q-quote syntax to define your own character to terminate a text
literal. For example, you could delimit your regular expression with
the pound sign (#) and then use a single quote within the expression.
Note: If your expression comes from a column or a bind variable, then
the same rules for quoting do not apply.
Note there is no column type named RegEx, you would need to save the string as is, in a textual column.
Also you can use RegEx in constraint checking and when you project columns.

SQL Server (and probably some other SQL databases) supports this out of the box, though as has been noted before, this can only be executed by the database as a table scan -- something to keep in mind if you have large numbers of regexes. You just reverse the usual order of the LIKE operator:
create table demo.query
(
id int identity not null,
regex nvarchar(max),
primary key(id)
);
insert into demo.query (regex) values ('aa%');
select * from demo.query where 'aaaa' like regex;
Looks a little funny, but it's perfectly valid.

Adding to Ely's answer, thought of letting you all know that MySQL also supports this.
In http://sqlfiddle.com/, I tested with MySQL 5.6
Build schema:
CREATE TABLE rule (name VARCHAR(20), tot INT, exp VARCHAR(50));
INSERT INTO rule VALUES ('Tim', 3, 't+');
INSERT INTO rule VALUES ('Rob', 4, 'a\d+');
INSERT INTO rule VALUES ('Fil', 1, 'tt');
INSERT INTO rule VALUES ('Jack', 1, '^tt$');
INSERT INTO rule VALUES ('Marc', 1, 'bb');
COMMIT;
Test:
select * from rule where 'ttt' RLIKE exp ;
Expected: rows for Tim, and Fil

Related

Selecting for a Jsonb array contains regex match

Given a data structure as follows:
{"single":"someText", "many":["text1", text2"]}
I can query a regex on single with
WHERE JsonBColumn ->> 'single' ~ '^some.*'
And I can query a contains match on the Array with
WHERE JsonBColumn -> 'many' ? 'text2'
What I would like to do is to do a contains match with a regex on the JArray
WHERE JsonBColumn -> 'many' {Something} '.*2$'
I found that it is also possible to convert the entire JSONB array to a plain text string and simply perform the regular expression on that. A side effect though is that a search on something like
xt 1", "text
would end up matching.
This approach isn't as clean since it doesn't search each element individually but it gets the job done with a visually simpler statement.
WHERE JsonBColumn ->>'many' ~ 'text2'
Use jsonb_array_elements_text() in lateral join.
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select distinct on (id) d.*
from
the_data d,
jsonb_array_elements_text(jsonbcolumn->'many') many(elem)
where elem ~ '^text.*';
id | jsonbcolumn
----+----------------------------------------------------
1 | {"many": ["text1", "text2"], "single": "someText"}
(1 row)
See also this answer.
If the feature is used frequently, you may want to write your own function:
create or replace function jsonb_array_regex_like(json_array jsonb, pattern text)
returns boolean language sql as $$
select bool_or(elem ~ pattern)
from jsonb_array_elements_text(json_array) arr(elem)
$$;
The function definitely simplifies the code:
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select *
from the_data
where jsonb_array_regex_like(jsonbcolumn->'many', '^text.*');

$regex to find a document in mongodb that contains a string

I am working on a db query in mongo where i need to query the document by regular expressing the string field(fieldToQuery).
the datastructure is like
{
fieldToQuery : "4052577300",
someOtherField : "some value",
...
...
}
I have the value like "804052577300", using which i have to query the above document.
How to achieve the same using $regex operator?
Update:
I need to do like ends with regex in mongo.
You could do a sort of reverse regex query where you create a regex using the fieldToQuery value and then test that against your query value:
var value = "804052577300";
db.test.find({
$where: "new RegExp(this.fieldToQuery + '$').test('" + value + "');"
});
The $where operator allows you to execute arbitrary JavaScript against each doc in your collection; where this is the current doc, and the truthy result of the expression determines whether the doc should be included in the result set.
So in this case, a new RegExp is built for each doc's fieldToQuery field with a $ at the end to anchor the search to the end of the string. The test method of the regex is then called to test whether the value string matches the regex.
The $where string is evaluated server-side, so value's value must be evaluated client-side by directly including its value into the $where string.
Note that $where queries can be quite slow as they can't use indexes.
You could do something like this:
db.collection.find({ fieldToQuery: /4052577300$/})
The regex pattern above is equivalent to the SQL's LIKE operation:
select * from table where fieldToQuery like '%4052577300' => will return the document with the value "804052577300"

Regex to replace parameter names by values

I have a string that has a form:
UPDATE "TABLE"."ITEMS" SET ITM_DESCR=:sqldevvalue WHERE ROWID = :sqldevgridrowid AND ORA_ROWSCN = :sqldevgridrowscn
and its binding values as:
#1(11):Test Record #2(18):AAAG9IAAFAAAC0eAAB #3(7):7746161
How can I construct a regular expression to replace the parameter names (starting with :) with their corresponding values and create a combined string that has the form:
UPDATE "TABLE"."ITEMS" SET ITM_DESCR=Test Record WHERE ROWID = AAAG9IAAFAAAC0eAAB AND ORA_ROWSCN = 7746161
Here's a very simple, naive regex:
:(\w+)
Replace the match with the value, $1 contains the parameter name.
Here's a less naive attempt:
'(?:''|[^'])*'|:(\w+)
If $1 is set, replace the match with the value ($1 contains the paramter name), else do not replace. This version will let you handle situations like WHERE column = 'some text :not_a_param more text'
And... the not naive approach is to use prepared statements and SQL parameters with whatever SQL client you're using. This is the best option as it negates any risk of SQL injections if you do something wrong, and lets the DBMS cache the execution plan for your request.

Oracle regex separate diskgroup name and the rest

I have a column whose value is
col1
+ASM_DISK_GROUP_TIER1/mydb/data/myfile.111.326
i would like to split the string into something like this
ASM_DISK_GROUP_TIER1 /mydb/data/myfile.111.326 myfile.111.326
(without the +sign)
however
select regexp_substr(col1,'[^/]*') from dual
gives me +ASM_DISK_GROUP_TIER1
and i am clueless how to get the second and the third part i.e
/mydb/data/myfile.111.326 myfile.111.326
Oracle regular expression support is quite limited -- unlike other languages which let you use parentheses to get back parts of the match, in Oracle you can only get the whole matched string (or its start or end position with REGEXP_INSTR).
There are various ways to work around this if you have to, using regexp magic and arithmetic, but in this case you should admit that you are actually just looking for the first and last occurrence of / and code accordingly:
SELECT SUBSTR(col1, 2, INSTR(col1, "/") - 2) "Disk Group",
SUBSTR(col1, INSTR(col1, "/")) "Path",
SUBSTR(col1, INSTR(col1, "/", -1)) "File Name"
FROM ...
(Not tested).

How to compare Unicode characters in SQL server?

Hi I am trying to find all rows in my database (SQL Server) which have character é in their text by executing the following queries.
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\xE9]%',question) > 0;
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\u00E9]%',question) > 0;
But I found two problems: (a) Both of them are returning different number of rows and (b) They are returning rows which do not have the specified character.
Is the way I am constructing the regular expression and comparing the Unicode correct?
EDIT:
The question column is stored using datatype nvarchar.
The following query gives the correct result though.
SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%';
Why not use SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%'?
NB: Likeand patindex do not accept regular expressions.
In the SQL Server pattern syntax [\xE9] means match any single character within the specified set. i.e. match \, x, E or 9. So any of the following strings would match that pattern.
"Elephant"
"axis"
"99.9"