I keep trying to make complex correlations with Siddhi, on this occasion I have two input streams, web client consult and notices sent to clients visits, I want to generate an alert if the first stream for each client is repeated more than once as long as the second stream not It has occurred under two windows and depends of the status of this events.
define stream consults (idClient string,dniClient string,codProduct string,codSubProduct string,chanel string,time string )
define stream comercialActions(idClient string, idAccionComercial string,codProduct string,codSubProduct string,chanel string,time string,status string)
from consults[codProduct=='Fondos']#window.time(50 seconds) select idClient,codProduct, codSubProduct, chanel, time, count(idClient) as visitCount group by idClient insert into consultsAvg for current-events
from consultsAvg[visitCount==1] select idClient, '' as idAccionComercial,codProduct, codSubProduct ,chanel, time, 'temp' as status insert into comercialActions for all-events
from comercialActions[status=='temp' or status == 'Lanzada' ]#window.time(5 seconds) select idClient as idClient, codProduct, codSubProduct, chanel, status, count(idClient) as num_status group by idClient insert into acciones_generadas for all-events
from comercialActions[status=='temp' or status=='Aceptada' or status =='Rechazada'or status=='Caduca']#window.time(3 seconds) select idClient as idClient, codProduct, codSubProduct, chanel, status, count(idClient) as num_status group by idClient insert into acciones_realizadas for all-events
from consultsAvg[visitCount>=2]#window.time(50 seconds) as c join acciones_realizadas[num_status>=1]#window.time(5 seconds) as ag on c.idClient == ag.idClient and c.codProduct==ag.codProduct select c.idClient,c.codProduct,c.codSubProduct,c.chanel, c.time, count(c.idClient) as conteo insert into posible_ac for all-events
from posible_ac#window.time(5 seconds) as pac join acciones_generadas[num_status>=1]#window.time(1 seconds) as ar on pac.idClient == ar.idClient select pac.idClient,pac.codProduct,pac.codSubProduct,pac.chanel,pac.time,conteo, count(ar.idClient) as conteo2 insert into enviar_Ac
from enviar_Ac[conteo==1 and conteo2==1] select idClient, codProduct,codSubProduct, chanel, time insert into generar_accion_comercial
What I try to do is use intermediate streams to count the number of website hits when this is greater than or equal to 2 , I see if it has already made a commercial action for that customer through various joins...
I think I 've become very complicated and do not know if there would be a simpler solution ??? , considering it does not have the function Siddhi NOT Happened nor other join ( left join )
You can accomplish this with a pattern. In this case i assume that we have to wait for 1 minute for an event from the second stream and if there's none, and more than 1 event from the first, we are going to emit an output.
from consults#window.time(1 minute)
select idClient, count(idClient) as idCount, <select more attributes here>
insert into expiredConsultsStream for expired-events;
from expiredConsultsStream[idCount > 1]
select *
insert into filteredConsultsStream;
from firstEvent = consults ->
nonOccurringEvent = commercialActions[firstEvent.idClient == idClient]
or
triggerEvent = filteredConsultsStream[firstEvent.idClient == idClient]
select firstEvent.idClient as id, triggerEvent.idCount as idCount, nonOccurringEvent.idClient as nid
having( not (nid instanceof string))
insert into alertStream;
These are draft queries, so may require some modifications to get them working. The filteredConsultsStream contains consult events with more than 1 occurrence within the last minute.
In the last query we get the or of the conditions as:
nonOccurringEvent = commercialActions[firstEvent.idClient == idClient]
or
triggerEvent = filteredConsultsStream[firstEvent.idClient == idClient]
So the query will be triggered by one of those above occurrences. But, then we need to find whether the condition is triggered by commercialActions. For that we use the 'having' clause and check whether the id is null (id is null implies that the event is null, the non-occurrence). Finally we emit the output.
You can find a better description for a somewhat similar query here (that is new 4.0.0 version btw and there are small syntax changes)
Related
Would it be possible to construct SQL to concatenate column values from
multiple rows?
The following is an example:
Table A
PID
A
B
C
Table B
PID SEQ Desc
A 1 Have
A 2 a nice
A 3 day.
B 1 Nice Work.
C 1 Yes
C 2 we can
C 3 do
C 4 this work!
Output of the SQL should be -
PID Desc
A Have a nice day.
B Nice Work.
C Yes we can do this work!
So basically the Desc column for out put table is a concatenation of the SEQ values from Table B?
Any help with the SQL?
There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG:
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
Then join to A to pick out the pids you want.
Note: Out of the box, LISTAGG only works correctly with VARCHAR2 columns.
There's also an XMLAGG function, which works on versions prior to 11.2. Because WM_CONCAT is undocumented and unsupported by Oracle, it's recommended not to use it in production system.
With XMLAGG you can do the following:
SELECT XMLAGG(XMLELEMENT(E,ename||',')).EXTRACT('//text()') "Result"
FROM employee_names
What this does is
put the values of the ename column (concatenated with a comma) from the employee_names table in an xml element (with tag E)
extract the text of this
aggregate the xml (concatenate it)
call the resulting column "Result"
With SQL model clause:
SQL> select pid
2 , ltrim(sentence) sentence
3 from ( select pid
4 , seq
5 , sentence
6 from b
7 model
8 partition by (pid)
9 dimension by (seq)
10 measures (descr,cast(null as varchar2(100)) as sentence)
11 ( sentence[any] order by seq desc
12 = descr[cv()] || ' ' || sentence[cv()+1]
13 )
14 )
15 where seq = 1
16 /
P SENTENCE
- ---------------------------------------------------------------------------
A Have a nice day
B Nice Work.
C Yes we can do this work!
3 rows selected.
I wrote about this here. And if you follow the link to the OTN-thread you will find some more, including a performance comparison.
The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings.
If you are using 11g Release 2 you should use this function for string aggregation.
Please refer below url for more information about string concatenation.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php
String Concatenation
As most of the answers suggest, LISTAGG is the obvious option. However, one annoying aspect with LISTAGG is that if the total length of concatenated string exceeds 4000 characters( limit for VARCHAR2 in SQL ), the below error is thrown, which is difficult to manage in Oracle versions upto 12.1
ORA-01489: result of string concatenation is too long
A new feature added in 12cR2 is the ON OVERFLOW clause of LISTAGG.
The query including this clause would look like:
SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
FROM B GROUP BY pid;
The above will restrict the output to 4000 characters but will not throw the ORA-01489 error.
These are some of the additional options of ON OVERFLOW clause:
ON OVERFLOW TRUNCATE 'Contd..' : This will display 'Contd..' at
the end of string (Default is ... )
ON OVERFLOW TRUNCATE '' : This will display the 4000 characters
without any terminating string.
ON OVERFLOW TRUNCATE WITH COUNT : This will display the total
number of characters at the end after the terminating characters.
Eg:- '...(5512)'
ON OVERFLOW ERROR : If you expect the LISTAGG to fail with the
ORA-01489 error ( Which is default anyway ).
For those who must solve this problem using Oracle 9i (or earlier), you will probably need to use SYS_CONNECT_BY_PATH, since LISTAGG is not available.
To answer the OP, the following query will display the PID from Table A and concatenate all the DESC columns from Table B:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT a.pid, seq, description
FROM table_a a, table_b b
WHERE a.pid = b.pid(+)
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
There may also be instances where keys and values are all contained in one table. The following query can be used where there is no Table A, and only Table B exists:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT pid, seq, description
FROM table_b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
All values can be reordered as desired. Individual concatenated descriptions can be reordered in the PARTITION BY clause, and the list of PIDs can be reordered in the final ORDER BY clause.
Alternately: there may be times when you want to concatenate all the values from an entire table into one row.
The key idea here is using an artificial value for the group of descriptions to be concatenated.
In the following query, the constant string '1' is used, but any value will work:
SELECT SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY unique_id ORDER BY pid, seq) rnum, description
FROM (
SELECT '1' unique_id, b.pid, b.seq, b.description
FROM table_b b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1;
Individual concatenated descriptions can be reordered in the PARTITION BY clause.
Several other answers on this page have also mentioned this extremely helpful reference:
https://oracle-base.com/articles/misc/string-aggregation-techniques
LISTAGG delivers the best performance if sorting is a must(00:00:05.85)
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
COLLECT delivers the best performance if sorting is not needed(00:00:02.90):
SELECT pid, TO_STRING(CAST(COLLECT(Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
COLLECT with ordering is bit slower(00:00:07.08):
SELECT pid, TO_STRING(CAST(COLLECT(Desc ORDER BY Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
All other techniques were slower.
Before you run a select query, run this:
SET SERVEROUT ON SIZE 6000
SELECT XMLAGG(XMLELEMENT(E,SUPLR_SUPLR_ID||',')).EXTRACT('//text()') "SUPPLIER"
FROM SUPPLIERS;
Try this code:
SELECT XMLAGG(XMLELEMENT(E,fieldname||',')).EXTRACT('//text()') "FieldNames"
FROM FIELD_MASTER
WHERE FIELD_ID > 10 AND FIELD_AREA != 'NEBRASKA';
In the select where you want your concatenation, call a SQL function.
For example:
select PID, dbo.MyConcat(PID)
from TableA;
Then for the SQL function:
Function MyConcat(#PID varchar(10))
returns varchar(1000)
as
begin
declare #x varchar(1000);
select #x = isnull(#x +',', #x, #x +',') + Desc
from TableB
where PID = #PID;
return #x;
end
The Function Header syntax might be wrong, but the principle does work.
Environment: WSO2 Stream Processor 4.3.0
Let's say I have two very simple streams:
Stream where newly created requests (unfulfilled) are being delivered in real time (t1)
RequestStream(requestId)
Stream where requestsIds appear when the request has been fulfilled in real time (t2)
FulfilmentStream(requestId)
It's guaranteed that t2 is always > t1
How can I implement a SiddhiQL statement to identify requestIds that appear at RequestStream (event1) and haven't appeard in FulfilmentStream (event2) after 5 minutes have been elapsed since event1?
Working Siddhi App based on Tishan answer:
#App:name('FailedToFulfillInAmountOfTime')
#source(
type="kafka",
topic.list="some_topic",
threading.option="single.thread",
group.id="some_group",
bootstrap.servers="xxx.xxx.xxx.xxx:6667",
#Map(type="json", #attributes(request_id = '$.alarm_id', severity = '$.severity', managed_object = '$.ManagedObject')))
define stream OrigAlarmStream (request_id int, severity string, managed_object string);
#sink(type='log', prefix='Got this execution request')
define stream RequestStream (request_id int, severity string, managed_object string);
#sink(type='log', prefix='Got this fulfillment confirmation:')
define stream FulfillmentStream (request_id int, severity string, managed_object string);
#sink(type='log', prefix='This fulfillment was not done within 1 min:')
define stream AlertStream(request_id int);
#info(name='getExpiredRequests')
from every e1=RequestStream -> not FulfillmentStream[e1.request_id == request_id] for 1 min
select e1.request_id
insert into AlertStream;
#info(name='CopyFulfillments')
from OrigAlarmStream[severity == 'Clear']
select request_id, severity, managed_object
insert into FulfillmentStream;
#info(name='CopyRequests')
from OrigAlarmStream[severity != 'Clear']
select request_id, severity, managed_object
insert into RequestStream;
You can use logical patterns to achieve your requirement. Please refer below query.
from e1=RequestStream -> not e2=FulfilmentStream[e1.requestId == e2.requestId] for '5 min'
select e1.requestId as requestId
insert into AlertStream;
Here we have defined a pattern with not condition. This will be triggered when an event in RequestStream comes and within 5 minutes no event comes into FulfilmentStream within 5 minutes. Please refer logical patterns for more information.
From the below stream, I want to alert the event which has occurred twice when the temperature goes beyond 90 (like every 2 event with temp > 90 needs to be alerted).
InputStream=[1001,91]
InputStream=[1001,86]
InputStream=[1002,70]
InputStream=[1001,85]
InputStream=[1003,70]
InputStream=[1003,85]
InputStream=[1002,70]
InputStream=[1003,70]
InputStream=[1003,87]
InputStream=[1002,70]
InputStream=[1001,95]
InputStream=[1001,96]
InputStream=[1001,97]
InputStream=[1001,98]
InputStream=[1001,98]
I have written something like this:
#Plan:name('TestExecutionPlan')
define stream InputStream (id string, temp int);
partition with (id of InputStream)
begin
from InputStream
select id, temp
having temp > 90
insert into CriticalStream
end;
from CriticalStream[count(id) == 2]
select id, temp
group by id
--having count(id) == 2
insert into EventReporter;
However its alerting only 1 event in the EventReporter stream.
Below is the screen shot from Try It
I am expecting the EventReporter stream to have [1001,97] and [1001,98] as well, right now it has only the record for [1001,95]. Could someone please point out what I am doing wrong here. How I can loop through the events after grouping it? I tried adding window.time and window.length, but not getting the desired output. Any help / guidance would be really appreciated. Thank you.
You won't be requiring a partition there. You can simply use a filter and a lengthBatch window to get your desired output. Try below execution plan;
#Plan:name('ExecutionPlan')
#Import('InputStream:1.0.0')
define stream InputStream (id string, temp int);
/* Filter events with temp > 90 */
from InputStream[temp > 90]
insert into CriticalStream;
/* Aggregate within a lengthBatch window, while group by id*/
from CriticalStream#window.lengthBatch(2)
select id, temp, count() as count
group by id
insert into EventReporter;
/* Just for logging the result in the cosole */
from EventReporter#log("Logging EventReporter : ")
insert into #temp;
Is there a better way to do this? Seems silly to have the same regex twice, but I want to indicate which phrase triggered the message content that was selected. Greenplum 4.2.2.4 (like PostgreSQL 8.2) on server.
SELECT
to_timestamp(extrainfo.startdate/1000)
,messages.timestamp
,users.username
,substring(messages.content from E'(?i)phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future')
,messages.content
FROM users
LEFT JOIN messages ON messages.senderid = users.id
LEFT JOIN extrainfo ON extrainfo.username = users.username
WHERE extrainfo.type1 = 't'
AND messages.content ~* E'phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future'
AND (extrainfo.type2 = 'f' OR extrainfo.type2 IS NULL)
Try using basic join:
SELECT
to_timestamp(extrainfo.startdate/1000)
,messages.timestamp
,users.username
,substring(messages.content from rgxp.rgxp )
,messages.content
FROM users
LEFT JOIN messages ON messages.senderid = users.id
join (
select E'(?i)phrase number one|phrase\.two|another phrase|this list keeps going|lots\.of\*keyword phrases|more will be added in the future'::text
as rgxp
) rgxp
on messages.content ~* rgxp.rgxp
LEFT JOIN extrainfo ON extrainfo.username = users.username
WHERE extrainfo.type1 = 't'
AND (extrainfo.type2 = 'f' OR extrainfo.type2 IS NULL)
This is a demo (for one table only): http://sqlfiddle.com/#!11/4a00d/2
I have the following Coldfusion process:
My code makes a database call to the proc CommentInsert (this inserts a comment, and then calls an event insert proc about the comment being added called EventInsert)
I then call Event.GetEventByCommentId(commentId)
The result is no records returned, as the EventInsert hasn't finished adding the event record triggered by CommentInsert in Step 1.
I know this is the case, because if I create a delay between steps 1 and 2, then a recordset IS returned in step 2.
This leads me to believe that the read in step 2 is happening too quickly, before the event insert has committed in step 1.
My question is, how do tell the Coldfusion process to wait till Step 1 has completed before doing the read in Step 2??
Step one and step two are tow totally separate methods.
Code:
<cfset MessageHandlerManager = AddComment(argumentCollection=arguments) />
<cfset qEvents = application.API.EventManager.GetEventFeed(commentId=MessageHandlerManager.GetReturnItems()) />
Also, just let me add that the commentId being passed is valid. I have checked.
Another way to look at it:
Given this code:
<!--- Calls CommentInsert proc, which inserts a comment AND inserts an
event record by calling EventInsert within the proc --->
<cfset var newCommentId = AddComment(argumentCollection=arguments) />
<cfloop from="1" to="1000000" index="i">
</cfloop>
<!--- Gets the event record inserted in the code above --->
<cfset qEvent =
application.API.EventManager.GetEventFeed(commentId=newCommentId ) />
When I run the above code, qEvent comes back with a valid record.
However, when I comment out the loop, the record is coming back
empty.
What I think is happening is that the CommentInsert returns the new
comment Id, but when the GetEventFeed function is called, the
EventInsert proc hasn't completed in time and no record is found.
Thus, by adding the loop and delaying a bit, the event insert has time
to finish and then a valid record is returned when GetEventFeed is
called.
So my question is, how do I prevent this without using the loop.
UPDATE:
Here are the two stored procs used:
DELIMITER $$
DROP PROCEDURE IF EXISTS `CommentInsert` $$
CREATE DEFINER=`root`#`%` PROCEDURE `CommentInsert`(
IN _commentParentId bigint,
IN _commentObjectType int,
IN _commentObjectId bigint,
IN _commentText text,
IN _commentAuthorName varchar(100),
IN _commentAuthorEmail varchar(255),
IN _commentAuthorWebsite varchar(512),
IN _commentSubscribe tinyint(1),
IN _commentIsDisabled tinyint(1),
IN _commentIsActive tinyint(1),
IN _commentCSI int,
IN _commentCSD datetime,
IN _commentUSI int,
IN _commentUSD datetime,
OUT _commentIdOut bigint
)
BEGIN
DECLARE _commentId bigint default 0;
INSERT INTO comment
(
commentParentId,
commentObjectType,
commentObjectId,
commentText,
commentAuthorName,
commentAuthorEmail,
commentAuthorWebsite,
commentSubscribe,
commentIsDisabled,
commentIsActive,
commentCSI,
commentCSD,
commentUSI,
commentUSD
)
VALUES
(
_commentParentId,
_commentObjectType,
_commentObjectId,
_commentText,
_commentAuthorName,
_commentAuthorEmail,
_commentAuthorWebsite,
_commentSubscribe,
_commentIsDisabled,
_commentIsActive,
_commentCSI,
_commentCSD,
_commentUSI,
_commentUSD
);
SET _commentId = LAST_INSERT_ID();
CALL EventInsert(6, Now(), _commentId, _commentObjectType, _commentObjectId, null, null, 'Comment Added', 1, _commentCSI, Now(), _commentUSI, Now());
SELECT _commentId INTO _commentIdOut ;
END $$
DELIMITER ;
DELIMITER $$
DROP PROCEDURE IF EXISTS `EventInsert` $$
CREATE DEFINER=`root`#`%` PROCEDURE `EventInsert`(
IN _eventTypeId int,
IN _eventCreateDate datetime,
IN _eventObjectId bigint,
IN _eventAffectedObjectType1 int,
IN _eventAffectedObjectId1 bigint,
IN _eventAffectedObjectType2 int,
IN _eventAffectedObjectId2 bigint,
IN _eventText varchar(1024),
IN _eventIsActive tinyint,
IN _eventCSI int,
IN _eventCSD datetime,
IN _eventUSI int,
IN _eventUSD datetime
)
BEGIN
INSERT INTO event
(
eventTypeId,
eventCreateDate,
eventObjectId,
eventAffectedObjectType1,
eventAffectedObjectId1,
eventAffectedObjectType2,
eventAffectedObjectId2,
eventText,
eventIsActive,
eventCSI,
eventCSD,
eventUSI,
eventUSD
)
VALUES
(
_eventTypeId,
_eventCreateDate,
_eventObjectId,
_eventAffectedObjectType1,
_eventAffectedObjectId1,
_eventAffectedObjectType2,
_eventAffectedObjectId2,
_eventText,
_eventIsActive,
_eventCSI,
_eventCSD,
_eventUSI,
_eventUSD
);
END $$
DELIMITER ;
Found it. Boiled down to this line in the EventManager.GetEventFeed query:
AND eventCreateDate <= <cfqueryparam cfsqltype="cf_sql_timestamp" value="#Now()#" />
What was happening was the MySql Now() function called in the EventInsert proc was a fraction later than the Coldfusion #Now()# being used in the query. Therefore the line of code excluded that record.Also, why it was only happening when comments were added quickly.
What a biatch. Thanks for the input everyone.
Let me get this straight :
You call a MySQL SP which does an insert which then calls another SP to do another insert.
There's no return to ColdFusion between those two? Is that right?
If that's the case then the chances are there's a problem with your SP not returning values correctly or you're looking in the wrong place for the result.
I'm more inclined towards there being a problem with the MySQL SPs. They aren't exactly great and don't really give you a great deal of performance benefit. Views are useful, but the SPs are, frankly, a bit rubbish. I suspect that when you call the second SP from within the first SP and it returns a value its not being correctly passed back out of the original SP to ColdFusion, hence the lack of result.
To be honest, my suggestion would be to write two ORM functions or simple cfqueries in a suitable DAO or service to record the result of the insert of comment first and return a value. Having returned that value, make the other call to the function to get your event based on the returned comment id. (ColdFusion 8 will give you Generated_Key, ColdFusion 9 is generatedkey, I'm not sure what it'll be in Railo, but it'll be there in the "result" attribute structure).
Thinking about it, I'm not even sure why you're getting the event based on the commentid just entered. You've just added that comment against an event, so you should already have some data on that event, even if its just the ID which you can then get the full event record/object from without having to go around the house via the comment.
So over all I would suggest taking a step back and looking at the data flow you're working with and perhaps refactor it.