Redundant regex in SELECT and WHERE - regex

Is there a better way to do this? Seems silly to have the same regex twice, but I want to indicate which phrase triggered the message content that was selected. Greenplum 4.2.2.4 (like PostgreSQL 8.2) on server.
SELECT
to_timestamp(extrainfo.startdate/1000)
,messages.timestamp
,users.username
,substring(messages.content from E'(?i)phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future')
,messages.content
FROM users
LEFT JOIN messages ON messages.senderid = users.id
LEFT JOIN extrainfo ON extrainfo.username = users.username
WHERE extrainfo.type1 = 't'
AND messages.content ~* E'phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future'
AND (extrainfo.type2 = 'f' OR extrainfo.type2 IS NULL)

Try using basic join:
SELECT
to_timestamp(extrainfo.startdate/1000)
,messages.timestamp
,users.username
,substring(messages.content from rgxp.rgxp )
,messages.content
FROM users
LEFT JOIN messages ON messages.senderid = users.id
join (
select E'(?i)phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future'::text
as rgxp
) rgxp
on messages.content ~* rgxp.rgxp
LEFT JOIN extrainfo ON extrainfo.username = users.username
WHERE extrainfo.type1 = 't'
AND (extrainfo.type2 = 'f' OR extrainfo.type2 IS NULL)
This is a demo (for one table only): http://sqlfiddle.com/#!11/4a00d/2

Related

Count number of WHERE filters in SQL query using regex

Update: I've updated the test string to cover a case that I've missed.
I'm trying to do count the number of WHERE filters in a query using regex.
So the general idea is to count the number of WHERE and AND occuring in the query, while excluding the AND that happens after a JOIN and before a WHERE. And also excluding the AND that happens in a CASE WHEN clause.
For example, this query:
WITH cte AS (\nSELECT a,b\nFROM something\nWHERE a>10\n AND b<5)\n, cte2 AS (\n SELECT c,\nd FROM another\nWHERE c>10\nAND d<5)\n SELECT CASE WHEN c1.a=1\nAND c2.c=1 THEN 'yes' ELSE 'no' \nEND,c1.a,c1.b,c2.c,c2.d\nFROM cte c1\nINNER JOIN cte2 c2 ON c1.a = c2.c\nAND c1.b = c2.d\nWHERE c1.a<4 AND DATE(c1)>'2022-01-01'\nAND c2.c>6
-- FORMATTED FOR EASE OF READ. PLEASE USE LINE ABOVE AS REGEX TEST STRING
WITH cte AS (
SELECT a,b
FROM something
WHERE a>10
AND b<5
)
, cte2 AS (
SELECT c,d
FROM another
WHERE c>10
AND d<5
)
SELECT
CASE
WHEN c1.a=1 AND c2.c=1 THEN 'yes'
WHEN c1.a=1 AND c2.c=1 THEN 'maybe'
ELSE 'no'
END,
c1.a,
c1.b,
c2.c,
c2.d
FROM cte c1
INNER JOIN cte2 c2
ON c1.a = c2.c
AND c1.b = c2.d
WHERE c1.a<4
AND DATE(c1)>'2022-01-01'
AND c2.c>6
should return 7, which are:
WHERE a>10
AND b<5
WHERE c>10
AND d<5
WHERE c1.a<4
AND DATE(c1)>'2022-01-01'
AND c2.c>6
The portion AND c1.b = c2.d is not counted because it happens after JOIN, before WHERE.
The portion AND c2.c=1 is not counted because it is in a CASE WHEN clause.
I eventually plan to use this on a Postgresql query to count the number of filters that happens in all queries in a certain period.
I've tried searching around for answer and trying it myself but to no avail. Hence looking for help here. Thank you in advanced!
I try to stay away from lookarounds as they could be messy and too painful to use, especially with the fixed-width limitation of lookbehind assertion.
My proposed solution is to capture all scenarios in different groups, and then select only the group of interest. The undesired scenarios will still be matched, but will not be selected.
Group 1 - Starts with JOIN (undesired)
Group 2 - Starts with WHERE (desired)
Group 3 - Starts with CASE (undesired)
(JOIN.*?(?=$|WHERE|JOIN|CASE|END))|(WHERE.*?(?=$|WHERE|JOIN|CASE|END))|(CASE.*?(?=$|WHERE|JOIN|CASE|END))
Note: Feel free to replace WHERE|JOIN|CASE|END to any keyword you want to be the 'stopper' words.
All scenarios including the undesired ones will be matched, but you need to select only Group 2 (highlighted in orange).
You can try something like this:
WITH DataSource (parts) AS
(
SELECT REGEXP_MATCHES(
'WITH cte AS (SELECT a,b FROM something WHERE a>10 AND b<5)\n, cte2 AS (SELECT c,d FROM another WHERE c>10 AND d<5)\n SELECT c1.a,c1.b,c2.c,c2.d FROM cte c1 INNER JOIN cte2 c2 ON c1.a = c2.c AND c1.b = c2.d WHERE c1.a<4 AND c2.c>6',
E'(?= WHERE)[^)|;]+'
,'gmi'
)
)
SELECT SUM
(
(length(parts[1]) - length(REPLACE(parts[1], 'AND', ''))) / 3 -- counting ANDs
+ 1 -- for the where
)
FROM DataSource
The idea is to match the text after WHERE clause:
and then simply count the ANDs and add one because of the matched WHERE.

How to remove carriage returns and new lines on all the columns in a table using Postgresql?

I am trying to see if there is any way to remove carriage and new lines from all the varchar columns in a table using one statement.
I know that we can do this for a single column using something like below
select regexp_replace(field, E'[\\n\\r]+', ' ', 'g' )
In that case I need have one for every column, which I don't want to do unless there is any easy way.
Appreciate your help!
You can do this either creating a plpgsql function to execute dynamic SQL, or directly run it via DO, as the following example (replace my_table with the name of your table`):
do $$declare _q text; _table text = '<mytable>';
begin
select 'update '||attrelid::regclass::text||E' set\n'||
string_agg(' '||quote_ident(attname)||$q$ = regexp_replace($q$||quote_ident(attname)||$q$, '[\n\r]+', ' ', 'g')$q$, E',\n' order by attnum)
into _q
from pg_attribute
where attnum > 0 and atttypid::regtype::text in ('text', 'varchar')
group by attrelid
having attrelid = _table::regclass;
raise notice E'Executing:\n\n%', _q;
-- uncomment this line when happy with the query:
-- execute _q;
end;$$;

VBA: Filtering by multiple criteria (more than 2) using wildcards [duplicate]

Right now I am doing coding to set a filter for a data chart. Basically, I don't know how to post the data sheet up here so just try to type them ):
(starting from the left is column A)
Name * BDevice * Quantity * Sale* Owner
Basically I need to filter out for 2 column:
-The BDevice with any word contain "M1454" or "M1467" or "M1879" (It means that M1454A or M1467TR would still fit in)
-The Owner with PROD or RISK
Here is the code I wrote:
Sub AutoFilter()
ActiveWorkbook.ActiveSheet..Range(B:B).Select
Selection.Autofilter Field:=1 Criteria1:=Array( _
"*M1454*", "*M1467*", "*M1879*"), Operator:=xlFilterValues
Selection.AutoFilter Field:=4 Criteria1:="=PROD" _
, Operator:=xlOr, Criteria2:="=RISK"
End Sub
When I run the code, the machine returns error 1004 and the part which seems to be wrong is the Filter part 2 ( I am not sure about the use of Field, so I can not say it for sure)
Edit; Santosh: When I try your code, the machine gets error 9 subscript out of range. The error came from the with statement. (since the data table has A to AS column so I just change to A:AS)
While there is a maximum of two direct wildcards per field in the AutoFilter method, pattern matching can be used to create an array that replaces the wildcards with the Operator:=xlFilterValues option. A Select Case statement helps the wildcard matching.
The second field is a simple Criteria1 and Criteria2 direct match with a Operator:=xlOr joining the two criteria.
Sub multiWildcardFilter()
Dim a As Long, aARRs As Variant, dVALs As Object
Set dVALs = CreateObject("Scripting.Dictionary")
dVALs.CompareMode = vbTextCompare
With Worksheets("Sheet1")
If .AutoFilterMode Then .AutoFilterMode = False
With .Cells(1, 1).CurrentRegion
'build a dictionary so the keys can be used as the array filter
aARRs = .Columns(2).Cells.Value2
For a = LBound(aARRs, 1) + 1 To UBound(aARRs, 1)
Select Case True
Case aARRs(a, 1) Like "MK1454*"
dVALs.Add Key:=aARRs(a, 1), Item:=aARRs(a, 1)
Case aARRs(a, 1) Like "MK1467*"
dVALs.Add Key:=aARRs(a, 1), Item:=aARRs(a, 1)
Case aARRs(a, 1) Like "MK1879*"
dVALs.Add Key:=aARRs(a, 1), Item:=aARRs(a, 1)
Case Else
'no match. do nothing
End Select
Next a
'filter on column B if dictionary keys exist
If CBool(dVALs.Count) Then _
.AutoFilter Field:=2, Criteria1:=dVALs.keys, _
Operator:=xlFilterValues, VisibleDropDown:=False
'filter on column E
.AutoFilter Field:=5, Criteria1:="PROD", Operator:=xlOr, _
Criteria2:="RISK", VisibleDropDown:=False
'data is filtered on MK1454*, MK1467* or MK1879* (column B)
'column E is either PROD or RISK
'Perform work on filtered data here
End With
If .AutoFilterMode Then .AutoFilterMode = False
End With
dVALs.RemoveAll: Set dVALs = Nothing
End Sub
If exclusions¹ are to be added to the filtering, their logic should be placed at the top of the Select.. End Select statement in order that they are not added through a false positive to other matching criteria.
                                Before applying AutoFilter Method
                                After applying AutoFilter w/ multiple wildcards
¹ See Can Advanced Filter criteria be in the VBA rather than a range? and Can AutoFilter take both inclusive and non-inclusive wildcards from Dictionary keys? for more on adding exclusions to the dictionary's filter set.
For using partial strings to exclude rows and include blanks you should use
'From Jeeped's code
Dim dVals As Scripting.Dictionary
Set dVals = CreateObject("Scripting.Dictionary")
dVals.CompareMode = vbTextCompare
Dim col3() As Variant
Dim col3init As Integer
'Swallow row3 into an array; start from 1 so it corresponds to row
For col3init = 1 to Sheets("Sheet1").UsedRange.Rows.count
col3(col3init) = Sheets("Sheet1").Range(Cells(col3init,3),Cells(col3init,3)).Value
Next col3init
Dim excludeArray() As Variant
'Partial strings in below array will be checked against rows
excludeArray = Array("MK1", "MK2", "MK3")
Dim col3check As Integer
Dim excludecheck as Integer
Dim violations As Integer
For col3check = 1 to UBound(col3)
For excludecheck = 0 to UBound(excludeArray)
If Instr(1,col3(col3check),excludeArray(excludecheck)) <> 0 Then
violations = violations + 1
'Sometimes the partial string you're filtering out for may appear more than once.
End If
Next col3check
If violations = 0 and Not dVals.Exists(col3(col3check)) Then
dVals.Add Key:=col3(col3check), Item:=col3(col3check) 'adds keys for items where the partial strings in excludeArray do NOT appear
ElseIf col3(col3check) = "" Then
dVals.Item(Chr(61)) = Chr(61) 'blanks
End If
violations = 0
Next col3check
The dVals.Item(Chr(61)) = Chr(61) idea came from Jeeped's other answer here
Multiple Filter Criteria for blanks and numbers using wildcard on same field just doesn't work
Try below code :
max 2 wildcard expression for Criteria1 works. Refer this link
Sub AutoFilter()
With ThisWorkbook.Sheets("sheet1").Range("A:E")
.AutoFilter Field:=2, Criteria1:=Array("*M1454*", "*M1467*"), Operator:=xlFilterValues
.AutoFilter Field:=5, Criteria1:="=PROD", Operator:=xlOr, Criteria2:="=RISK"
End With
End Sub

Query with Siddhi CEP using two times windows and 2 streams (continued)

I keep trying to make complex correlations with Siddhi, on this occasion I have two input streams, web client consult and notices sent to clients visits, I want to generate an alert if the first stream for each client is repeated more than once as long as the second stream not It has occurred under two windows and depends of the status of this events.
define stream consults (idClient string,dniClient string,codProduct string,codSubProduct string,chanel string,time string )
define stream comercialActions(idClient string, idAccionComercial string,codProduct string,codSubProduct string,chanel string,time string,status string)
from consults[codProduct=='Fondos']#window.time(50 seconds) select idClient,codProduct, codSubProduct, chanel, time, count(idClient) as visitCount group by idClient insert into consultsAvg for current-events
from consultsAvg[visitCount==1] select idClient, '' as idAccionComercial,codProduct, codSubProduct ,chanel, time, 'temp' as status insert into comercialActions for all-events
from comercialActions[status=='temp' or status == 'Lanzada' ]#window.time(5 seconds) select idClient as idClient, codProduct, codSubProduct, chanel, status, count(idClient) as num_status group by idClient insert into acciones_generadas for all-events
from comercialActions[status=='temp' or status=='Aceptada' or status =='Rechazada'or status=='Caduca']#window.time(3 seconds) select idClient as idClient, codProduct, codSubProduct, chanel, status, count(idClient) as num_status group by idClient insert into acciones_realizadas for all-events
from consultsAvg[visitCount>=2]#window.time(50 seconds) as c join acciones_realizadas[num_status>=1]#window.time(5 seconds) as ag on c.idClient == ag.idClient and c.codProduct==ag.codProduct select c.idClient,c.codProduct,c.codSubProduct,c.chanel, c.time, count(c.idClient) as conteo insert into posible_ac for all-events
from posible_ac#window.time(5 seconds) as pac join acciones_generadas[num_status>=1]#window.time(1 seconds) as ar on pac.idClient == ar.idClient select pac.idClient,pac.codProduct,pac.codSubProduct,pac.chanel,pac.time,conteo, count(ar.idClient) as conteo2 insert into enviar_Ac
from enviar_Ac[conteo==1 and conteo2==1] select idClient, codProduct,codSubProduct, chanel, time insert into generar_accion_comercial
What I try to do is use intermediate streams to count the number of website hits when this is greater than or equal to 2 , I see if it has already made a commercial action for that customer through various joins...
I think I 've become very complicated and do not know if there would be a simpler solution ??? , considering it does not have the function Siddhi NOT Happened nor other join ( left join )
You can accomplish this with a pattern. In this case i assume that we have to wait for 1 minute for an event from the second stream and if there's none, and more than 1 event from the first, we are going to emit an output.
from consults#window.time(1 minute)
select idClient, count(idClient) as idCount, <select more attributes here>
insert into expiredConsultsStream for expired-events;
from expiredConsultsStream[idCount > 1]
select *
insert into filteredConsultsStream;
from firstEvent = consults ->
nonOccurringEvent = commercialActions[firstEvent.idClient == idClient]
or
triggerEvent = filteredConsultsStream[firstEvent.idClient == idClient]
select firstEvent.idClient as id, triggerEvent.idCount as idCount, nonOccurringEvent.idClient as nid
having( not (nid instanceof string))
insert into alertStream;
These are draft queries, so may require some modifications to get them working. The filteredConsultsStream contains consult events with more than 1 occurrence within the last minute.
In the last query we get the or of the conditions as:
nonOccurringEvent = commercialActions[firstEvent.idClient == idClient]
or
triggerEvent = filteredConsultsStream[firstEvent.idClient == idClient]
So the query will be triggered by one of those above occurrences. But, then we need to find whether the condition is triggered by commercialActions. For that we use the 'having' clause and check whether the id is null (id is null implies that the event is null, the non-occurrence). Finally we emit the output.
You can find a better description for a somewhat similar query here (that is new 4.0.0 version btw and there are small syntax changes)

oracle regular expression and MERGE

As updating my previous question,
I've a some newline separated strings.
I need to insert those each words into a table.
The new logic and its condition is that, it should be inserted if not exists, or update the corresponding count by 1. (as like using MERGE).
But my current query is just using insert, so I've used CONNECT BY LEVEL method without checking the value is existing or not.
it syntax is somewhat like:
if the word already EXISTS THEN
UPDATE my_table set w_count = w_count +1 where word = '...';
else
INSERT INTO my_table (word, w_count)
SELECT REGEXP_SUBSTR(i_words, '[^[:cntrl:]]+', 1 ,level),
1
FROM dual
CONNECT BY REGEXP_SUBSTR(i_words, '[^[:cntrl:]]+', 1 ,level) IS NOT NULL;
end if;
Try this
MERGE INTO my_table m
USING(WITH the_data AS (
SELECT 'a
bb
&
c' AS dat
FROM dual
)
SELECT regexp_substr(dat, '[^[:cntrl:]]+', 1 ,LEVEL) wrd
FROM the_data
CONNECT BY regexp_substr(dat, '[^[:cntrl:]]+', 1 ,LEVEL) IS NOT NULL) word_list
ON (word_list.wrd = m.word)
WHEN matched THEN UPDATE SET m.w_count = m.w_count + 1
WHEN NOT matched THEN insert(m.word,m.w_count) VALUES (word_list.wrd,1);
More details on MERGE here.
Sample fiddle