Siddhi CEP pattern for every event with occurrences - wso2

I am writing a pattern in Siddhi CEP like -
"from every e1=inputStream[(e1.name == 'A')]<2> -> e2=inputStream[(e2.name == 'B')<2:> within 10 seconds select 'rule1' as ruleId insert into outputStream"
The above query only executes once when I pass AABB into the stream, and then nothing happened after that. Even I pass AABB again.
When I write this -
"from every e1=inputStream[(e1.name == 'A')] -> e2=inputStream[(e2.name == 'B')<2:> within 10 seconds select 'ruel2' as ruleId insert into outputStream"
The above query works well for each ABB pattern.
Is there a way to achieve AAB and then pattern should start from next upcoming AAB when the window time expires.

Related

Siddhi complex event processing using logical AND

I am using SiddhiQL for complex event processing. My use case is to check whether two events of a particular type with certain filters occur within 15 mins.
Like for example -
If event1[filter 1] and event2[filter 2] within 15 mins
Insert into output stream
In my case any of the two events can occur first and i need to check whether after receiving the first event, do i receive the second event within 15 mins.
Is this possible in SiddhiQL?
EDIT #1 - I have defined my streams and events below (The below code does not work)
define stream RegulatorStream(deviceID long, roomNo int, tempSet double);
#sink(type = 'log', prefix = "LOGGER")
define stream outputStream (roomNo int,rooomNo int);
from e1 = RegulatorStream[roomNo==23] and e2 = RegulatorStream[e1.deviceID == deviceID AND roomNo ==24] within 5 minutes
select e2.roomNo,123 as rooomNo
insert into outputStream;
In above case, I need to alert when I receive events in my RegulatorStream having roomNo = 23 AND roomNo = 24 within 5 mins in any order with same deviceID.
How can this be achieved in SiddhiQL?
Yes this can achieved with Siddhi Patterns. Please refer the documentation on Siddhi Patterns in https://siddhi.io/en/v5.1/docs/query-guide/#pattern and examples in https://siddhi.io/en/v5.1/docs/examples/logical-pattern/.
You can use OR operation within the pattern to bypass the event occurrence order.

Run every N minutes or if item differs from average

I have an actor which receives WeatherConditions and pushes it (by using OfferAsync) it to source. Currently it is setup to run for each item it receives (it stores it to db).
public class StoreConditionsActor : ReceiveActor
{
public StoreConditionsActor(ITemperatureDataProvider temperatureDataProvider)
{
var materializer = Context.Materializer();
var source = Source.Queue<WeatherConditions>(10, OverflowStrategy.DropTail);
var graph = source
.To(Sink.ForEach<WeatherConditions>(conditions => temperatureDataProvider.Store(conditions)))
.Run(materializer);
Receive<WeatherConditions>(i =>
{
graph.OfferAsync(i);
});
}
}
What I would like to achieve is:
Run it only once every N minutes and store average value of WeatherConditions from all items received in this N minutes time window
If item received matches certain condition (i.e. item temperature is 30% higher than previous item's temperature) run it despite of being "hidden" in time window.
I've been trying ConflateWithSeed, Buffer, Throttle but neither seems to be working (I'm newbie in Akka / Akka Streams so I may be missing something basic)
This answer uses Akka Streams and Scala, but perhaps it will inspire your Akka.NET solution.
The groupedWithin method could meet your first requirement:
val queue =
Source.queue[Int](10, OverflowStrategy.dropTail)
.groupedWithin(10, 1 second)
.map(group => group.sum / group.size)
.toMat(Sink.foreach(println))(Keep.left)
.run()
Source(1 to 10000)
.throttle(10, 1 second)
.mapAsync(1)(queue.offer(_))
.runWith(Sink.ignore)
In the above example, up to 10 integers per second are offered to the SourceQueue, which groups the incoming elements in one-second bundles and calculates the respective averages of each bundle.
As for your second requirement, you could use sliding to compare an element with the previous element. The below example passes an element downstream only if it is at least 30% greater than the previous element:
val source: Source[Int, _] = ???
source
.sliding(2, 1)
.collect {
case Seq(a, b) if b >= 1.3 * a => b
}
.runForeach(println)

Query with Siddhi CEP using two times windows and 2 streams (continued)

I keep trying to make complex correlations with Siddhi, on this occasion I have two input streams, web client consult and notices sent to clients visits, I want to generate an alert if the first stream for each client is repeated more than once as long as the second stream not It has occurred under two windows and depends of the status of this events.
define stream consults (idClient string,dniClient string,codProduct string,codSubProduct string,chanel string,time string )
define stream comercialActions(idClient string, idAccionComercial string,codProduct string,codSubProduct string,chanel string,time string,status string)
from consults[codProduct=='Fondos']#window.time(50 seconds) select idClient,codProduct, codSubProduct, chanel, time, count(idClient) as visitCount group by idClient insert into consultsAvg for current-events
from consultsAvg[visitCount==1] select idClient, '' as idAccionComercial,codProduct, codSubProduct ,chanel, time, 'temp' as status insert into comercialActions for all-events
from comercialActions[status=='temp' or status == 'Lanzada' ]#window.time(5 seconds) select idClient as idClient, codProduct, codSubProduct, chanel, status, count(idClient) as num_status group by idClient insert into acciones_generadas for all-events
from comercialActions[status=='temp' or status=='Aceptada' or status =='Rechazada'or status=='Caduca']#window.time(3 seconds) select idClient as idClient, codProduct, codSubProduct, chanel, status, count(idClient) as num_status group by idClient insert into acciones_realizadas for all-events
from consultsAvg[visitCount>=2]#window.time(50 seconds) as c join acciones_realizadas[num_status>=1]#window.time(5 seconds) as ag on c.idClient == ag.idClient and c.codProduct==ag.codProduct select c.idClient,c.codProduct,c.codSubProduct,c.chanel, c.time, count(c.idClient) as conteo insert into posible_ac for all-events
from posible_ac#window.time(5 seconds) as pac join acciones_generadas[num_status>=1]#window.time(1 seconds) as ar on pac.idClient == ar.idClient select pac.idClient,pac.codProduct,pac.codSubProduct,pac.chanel,pac.time,conteo, count(ar.idClient) as conteo2 insert into enviar_Ac
from enviar_Ac[conteo==1 and conteo2==1] select idClient, codProduct,codSubProduct, chanel, time insert into generar_accion_comercial
What I try to do is use intermediate streams to count the number of website hits when this is greater than or equal to 2 , I see if it has already made a commercial action for that customer through various joins...
I think I 've become very complicated and do not know if there would be a simpler solution ??? , considering it does not have the function Siddhi NOT Happened nor other join ( left join )
You can accomplish this with a pattern. In this case i assume that we have to wait for 1 minute for an event from the second stream and if there's none, and more than 1 event from the first, we are going to emit an output.
from consults#window.time(1 minute)
select idClient, count(idClient) as idCount, <select more attributes here>
insert into expiredConsultsStream for expired-events;
from expiredConsultsStream[idCount > 1]
select *
insert into filteredConsultsStream;
from firstEvent = consults ->
nonOccurringEvent = commercialActions[firstEvent.idClient == idClient]
or
triggerEvent = filteredConsultsStream[firstEvent.idClient == idClient]
select firstEvent.idClient as id, triggerEvent.idCount as idCount, nonOccurringEvent.idClient as nid
having( not (nid instanceof string))
insert into alertStream;
These are draft queries, so may require some modifications to get them working. The filteredConsultsStream contains consult events with more than 1 occurrence within the last minute.
In the last query we get the or of the conditions as:
nonOccurringEvent = commercialActions[firstEvent.idClient == idClient]
or
triggerEvent = filteredConsultsStream[firstEvent.idClient == idClient]
So the query will be triggered by one of those above occurrences. But, then we need to find whether the condition is triggered by commercialActions. For that we use the 'having' clause and check whether the id is null (id is null implies that the event is null, the non-occurrence). Finally we emit the output.
You can find a better description for a somewhat similar query here (that is new 4.0.0 version btw and there are small syntax changes)

Siddhi Query Language 'and' operator

I was testing the usage of the 'and' operator and used the example mentioned in the documentation:
from every a1 = OrderStock1[action == "buy"] and
a2 = OrderStock2[action == "buy"] ->
b1 = StockExchangeStream[price > 70] ->
b2 = StockExchangeStream[price > 75]
select a1.action as action, b1.price as priceA, b2.price as priceB
insert into StockQuote partition by stockSymbol
I have noticed that if no event is sent to the OrderStock2 stream, a matching still happens.
The definition of 'and' in the documentation is : the occurrence of two events in any order. My understanding is that for a match to happen both OrderStock1 and OrderStock2 should receive events in any order followed by 2 events received in StockExchangeStream satisfying the price condition.
Any explanation of why a match happens even if no event is sent to the OrderStock2 stream??

How to do ANDing of conditions in a regular expression?

I want to match and modify part of a string if following conditions are true:
I want to capture information regarding a project, like project duration, client, technologies used, etc..
So, I want to select string starting with word "project" or string may start with other words like "details of project" or "project details" or "project #1".
RegEx. should first look at word "project" and it should select the string only when few or all of the following words are found after word "project".
1) client
2) duration
3) environment
4) technologies
5) role
I want to select a string if it matches at least 2 of the above words. Words can appear in any order and if the string contains ANY two or three of these words, then the string should get selected.
I have sample text given below.
Details of Projects :
*Project #1: CVC – Customer Value Creation (Sep 2007 – till now) Time
Warner Cable is the world's leading
media and entertainment company, Time
Warner Cable (TWC) makes coaxial
quiver.
Client : Time Warner Cable,US. ETL
Tool : Informatica 7.1.4
Database : Oracle 9i.
Role : ETL Developer/Team Lead.
O/S : UNIX.
Responsibilities: Created Test Plan and Test Case Book. Peer reviewed team members > Mappings. Documented Mappings. Leading the Development Team. Sending Reports to onsite. Bug >fixing for Defects, Data and Performance related.
Details of Project #2: MYER – Sales
Analysis system (Nov 2005 – till now)
Coles Myer is one of Australia's largest retailers with more than 2,000 > stores throughout Australia,
Client : Coles Myer
Retail, Australia. ETL Tool :
Informatica 7.1.3 Database : Oracle
8i. Role : ETL Developer. O/S :
UNIX. Responsibilities: Extraction,
Transformation and Loading of the data
using Informatica. Understanding the
entire source system.
Created and Run Sessions and
Workflows. Created Sort files using
Syncsort Application.*
Does anyone know how to achieve this using regular expressions?
Any clues or regular expressions are welcome!
Many thanks!
(client|duration|environment|technologies|role).+(client|duration|environment|technologies|role)(?!\1)
I would break it down into a few simpler regex's to get these results. The first would select only the chunk of text between projects: (?=Project #).*(?<=Project #)
With the match that this produces, i would run a seperate regex to ask if it contains any of those words : client | duration | environment | technologies | role
If this match comes back with a count of more then 2 distinct matches, you know to select the original string!
Edit:
string originalText;
MatchCollection projectDescriptions = Regex.Matches(originalText, "(?=Project #).(?:(?!Project #).)*", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Foreach(Match projectDescription in projectDescriptions)
{
MatchCollection keyWordMatches = Regex.Matches(projectDescription.value, "client | duration | environment | technologies | role ", RegexOptions.IgnoreCase);
if(keyWordMatches.Distinct.Count > 2)
{
//At this point, do whatever you need to with the original projectDescription match, the Match object will give you the index etc of the match inside the original string.
}
}
Maybe you need to break that requirements in two steps: first, take your key/value pairs from your string, than apply your filter.
string input = #"Project #...";
Regex projects = new Regex(#"(?<key>\S+).:.(?<value>.*?\.)");
foreach (Match project in projects.Matches(input))
{
Console.WriteLine ("{0} : {1}",
project.Groups["key" ].Value,
project.Groups["value"].Value);
}
Try
^(details of )?project.*?((client|duration|environment|technologies|role).*?){2}.*$
One note: This will also match if only one of the terms appears twice.
In C#:
foundMatch = Regex.IsMatch(subjectString, #"\A(?:(details of )?project.*?((client|duration|environment|technologies|role).*?){2}.*)\Z", RegexOptions.Singleline | RegexOptions.IgnoreCase);