Update: I've updated the test string to cover a case that I've missed.
I'm trying to do count the number of WHERE filters in a query using regex.
So the general idea is to count the number of WHERE and AND occuring in the query, while excluding the AND that happens after a JOIN and before a WHERE. And also excluding the AND that happens in a CASE WHEN clause.
For example, this query:
WITH cte AS (\nSELECT a,b\nFROM something\nWHERE a>10\n AND b<5)\n, cte2 AS (\n SELECT c,\nd FROM another\nWHERE c>10\nAND d<5)\n SELECT CASE WHEN c1.a=1\nAND c2.c=1 THEN 'yes' ELSE 'no' \nEND,c1.a,c1.b,c2.c,c2.d\nFROM cte c1\nINNER JOIN cte2 c2 ON c1.a = c2.c\nAND c1.b = c2.d\nWHERE c1.a<4 AND DATE(c1)>'2022-01-01'\nAND c2.c>6
-- FORMATTED FOR EASE OF READ. PLEASE USE LINE ABOVE AS REGEX TEST STRING
WITH cte AS (
SELECT a,b
FROM something
WHERE a>10
AND b<5
)
, cte2 AS (
SELECT c,d
FROM another
WHERE c>10
AND d<5
)
SELECT
CASE
WHEN c1.a=1 AND c2.c=1 THEN 'yes'
WHEN c1.a=1 AND c2.c=1 THEN 'maybe'
ELSE 'no'
END,
c1.a,
c1.b,
c2.c,
c2.d
FROM cte c1
INNER JOIN cte2 c2
ON c1.a = c2.c
AND c1.b = c2.d
WHERE c1.a<4
AND DATE(c1)>'2022-01-01'
AND c2.c>6
should return 7, which are:
WHERE a>10
AND b<5
WHERE c>10
AND d<5
WHERE c1.a<4
AND DATE(c1)>'2022-01-01'
AND c2.c>6
The portion AND c1.b = c2.d is not counted because it happens after JOIN, before WHERE.
The portion AND c2.c=1 is not counted because it is in a CASE WHEN clause.
I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period.
I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!
I try to stay away from lookarounds as they could be messy and too painful to use, especially with the fixed-width limitation of lookbehind assertion.
My proposed solution is to capture all scenarios in different groups, and then select only the group of interest. The undesired scenarios will still be matched, but will not be selected.
Group 1 - Starts with JOIN (undesired)
Group 2 - Starts with WHERE (desired)
Group 3 - Starts with CASE (undesired)
(JOIN.*?(?=$|WHERE|JOIN|CASE|END))|(WHERE.*?(?=$|WHERE|JOIN|CASE|END))|(CASE.*?(?=$|WHERE|JOIN|CASE|END))
Note: Feel free to replace WHERE|JOIN|CASE|END to any keyword you want to be the 'stopper' words.
All scenarios including the undesired ones will be matched, but you need to select only Group 2 (highlighted in orange).
You can try something like this:
WITH DataSource (parts) AS
(
SELECT REGEXP_MATCHES(
'WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6',
E'(?= WHERE)[^)|;]+'
,'gmi'
)
)
SELECT SUM
(
(length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
+ 1 -- for the where
)
FROM DataSource
The idea is to match the text after WHERE clause:
and then simply count the ANDs and add one because of the matched WHERE.
So I have a query like this one
let query =
query {
for person in people do
select person
}
And I'd like to have it sequenced.
let sequence : seq<Person> = query
But I can't find any information on how to do it, maybe I've become bad at using search engines.
I'm getting unexpected type compiling expections using things like |> seq.ofList and ToList().
The expression was expected to have the type seq<Person> but here has the type Generic.List<Person>.
The result of a query expression has type IQueryable<_>, which is a subtype of IEnumerable<_> (for which seq<_> is a synonym), so you can simply change the type:
let mySeq : seq<_> = myQuery
Or, if you want to avoid a type annotation, use the built-in seq function, which does the same thing:
let mySeq = seq myQuery
I have a column containing an array of authors. How can I use the ~* operator to check if any of its values match a given regular expression?
The ~* operator takes the string to check on the left and the regular expression to match on the right. The documentation says the ANY operator has to be on the right side so, obviously
SELECT '^p' ~* ANY(authors) FROM book;
does not work as PostgreSQL tries to match the string ^p against expressions contained in the array.
Any idea?
The first obvious idea is to use your own regexp-matching operator with commuted arguments:
create function commuted_regexp_match(text,text) returns bool as
'select $2 ~* $1;'
language sql;
create operator ~!## (
procedure=commuted_regexp_match(text,text),
leftarg=text, rightarg=text
);
Then you may use it like this:
SELECT '^p' ~!## ANY(authors) FROM book;
Another different way of looking at it to unnest the array and formulate in SQL the equivalent of the ANY construct:
select bool_or(r) from
(select author ~* '^j' as r
from (select unnest(authors) as author from book) s1
) s2;
SELECT * FROM book where EXISTS ( SELECT * from unnest(author) as X where x ~* '^p' )
Here's an idea if you can make reasonable assumptions about the data. Just concatenate the array into a string and do a regex-search against the whole string.
select array_to_string(ARRAY['foo bar', 'moo cow'], ',') ~ 'foo'
Off the cuff, and without any measurements to back me up, I would say that most performance issues related to the regex stuff could be dealt with by smart uses of regex, and maybe some special delimiter characters. Creating the string may be a performance issue, but I wouldn't even dare to speculate on that.
I use this:
create or replace function regexp_match_array(a text[], regexp text)
returns boolean
strict immutable
language sql as $_$
select exists (select * from unnest(a) as x where x ~ regexp);
$_$;
comment on function regexp_match_array(text[], text) is
'returns TRUE if any element of a matches regexp';
create operator ~ (
procedure=regexp_match_array,
leftarg=text[], rightarg=text
);
comment on operator ~(text[], text) is
'returns TRUE if any element of ARRAY (left) matches REGEXP (right); think ANY(ARRAY) ~ REGEXP';
Then use it much like you'd use ~ with text scalars:
=> select distinct gl from x where gl ~ 'SH' and array_length(gl,1) < 7;
┌──────────────────────────────────────┐
│ gl │
├──────────────────────────────────────┤
│ {MSH6} │
│ {EPCAM,MLH1,MSH2,MSH6,PMS2} │
│ {SH3TC2} │
│ {SHOC2} │
│ {BRAF,KRAS,MAP2K1,MAP2K2,SHOC2,SOS1} │
│ {MSH2} │
└──────────────────────────────────────┘
(6 rows)
You can define your own operator to do what you want.
Reverse the order of the arguments and call the appropriate function :
create function revreg (text, text) returns boolean
language sql immutable
as $$ select texticregexeq($2,$1); $$;
(revreg ... please choose your favorite name).
Add a new operator using our revreg() function :
CREATE OPERATOR ### (
PROCEDURE = revreg,
LEFTARG = text,
RIGHTARG = text
);
Test:
test=# SELECT '^p' ### ANY(ARRAY['ika', 'pchu']);
t
test=# SELECT '^p' ### ANY(ARRAY['ika', 'chu']);
f
test=# SELECT '^p' ### ANY(ARRAY['pika', 'pchu']);
t
test=# SELECT '^p' ### ANY(ARRAY['pika', 'chu']);
t
Note that you may want to set JOIN and RESTICT clauses to the new operator to help the planner.
My solution
SELECT a.* FROM books a
CROSS JOIN LATERAL (
SELECT author
FROM unnest(authors) author
WHERE author ~ E'p$'
LIMIT 1
)b;
Use cross lateral join, subquery is evaluated for every row of table "books", if one of rows returned by unnest, meets the condition, subquery returns one row (becouse of limit).
I use a generalization of Reece's approach:
select format($$
create function %1$s(a text[], regexp text) returns boolean
strict immutable language sql as
%2$L;
create operator %3$s (procedure=%1$s, leftarg=text[], rightarg=text);
$$, /*1*/nameprefix||'_array_'||oname, /*2*/q1||o||q2, /*3*/oprefix||o
)
from (values
('tilde' , '~' ), ('bang_tilde' , '!~' ),
('tilde_star' , '~*' ), ('bang_tilde_star' , '!~*' ),
('dtilde' , '~~' ), ('bang_dtilde' , '!~~' ),
('dtilde_star', '~~*'), ('bang_dtilde_star', '!~~*')
) as _(oname, o),
(values
('any', '', 'select exists (select * from unnest(a) as x where x ', ' regexp);'),
('all', '#', 'select true = all (select x ', ' regexp from unnest(a) as x);')
) as _2(nameprefix, oprefix, q1, q2)
\gexec
Executing this in psql creates 16 functions and 16 operators that cover all applicable 8 matching operators for arrays -- plus 8 variations prefixed with # that implement the ALL equivalent.
Very handy!
Is there a better way to do this? Seems silly to have the same regex twice, but I want to indicate which phrase triggered the message content that was selected. Greenplum 4.2.2.4 (like PostgreSQL 8.2) on server.
SELECT
to_timestamp(extrainfo.startdate/1000)
,messages.timestamp
,users.username
,substring(messages.content from E'(?i)phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future')
,messages.content
FROM users
LEFT JOIN messages ON messages.senderid = users.id
LEFT JOIN extrainfo ON extrainfo.username = users.username
WHERE extrainfo.type1 = 't'
AND messages.content ~* E'phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future'
AND (extrainfo.type2 = 'f' OR extrainfo.type2 IS NULL)
Try using basic join:
SELECT
to_timestamp(extrainfo.startdate/1000)
,messages.timestamp
,users.username
,substring(messages.content from rgxp.rgxp )
,messages.content
FROM users
LEFT JOIN messages ON messages.senderid = users.id
join (
select E'(?i)phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future'::text
as rgxp
) rgxp
on messages.content ~* rgxp.rgxp
LEFT JOIN extrainfo ON extrainfo.username = users.username
WHERE extrainfo.type1 = 't'
AND (extrainfo.type2 = 'f' OR extrainfo.type2 IS NULL)
This is a demo (for one table only): http://sqlfiddle.com/#!11/4a00d/2
I have a database filled with some codes like
EE789323
990
78000
These numbers are ALWAYS endings of a larger code. Now I have a function that needs to check if the larger code contains the subcode.
So if I have codes 90 and 990 and my full code is EX888990, it should match both of them.
However I need to do it in the following way:
SELECT * FROM tableWithRecordsWithSubcode
WHERE subcode MATCHES [reg exp with full code];
Is a regular expression like this this even possible?
EDIT:
To clarify the issue I'm having, I'm not using SQL here. I just used that to give an example of the type of query I'm using.
In fact I'm using iOS with CoreData, and I need a predicate to fetch me only the records that match.
In the way that is mentioned below.
Given the observations from a comment:
Do you have two tables, one called tableWithRecordsWithSubcode and another that might be tableWithFullCodeColumn? So the matching condition is in part a join - you need to know which subcodes match any of the full codes in the second table? But you're only interested in the information in the tableWithRecordsWithSubcode table, not in which rows it matches in the other table?
and the laconic "you're correct" response, then we have to rewrite the query somewhat.
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON F.Fullcode ...ends-with... S.Subcode
or maybe using an EXISTS sub-query:
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(SELECT * FROM tableWithFullCodeColumn AS F
WHERE F.Fullcode ...ends-with... S.Subcode)
This uses a correlated sub-query but avoids the DISTINCT operation; it may mean the optimizer can work more efficiently.
That just leaves the magical 'X ...ends-with... T' operator to be defined. One possible way to do that is with LENGTH and SUBSTR. However, SUBSTR does not behave the same way in all DBMS, so you may have to tinker with this (possibly adding a third argument, LENGTH(s.subcode)):
LENGTH(f.fullcode) >= LENGTH(s.subcode) AND
SUBSTR(f.fullcode, LENGTH(f.fullcode) - LENGTH(s.subcode)) = s.subcode
This leads to two possible formulations:
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode;
and
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(
SELECT * FROM tableWithFullCodeColumn AS F
WHERE LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode);
This is not going to be a fast operation; joins on computed results such as required by this query seldom are.
I'm not sure why you think that you need a regular expression... Just use the charindex function:
select something
from table
where charindex(code, subcode) <> 0
Edit:
To find strings at the end, you can create a pattern with the % wildcard from the subcode:
select something
from table
where '%' + subcode like code