Writing an "arrived" and "departed" query with siddhi using timeouts - wso2

I'm looking to replace Esper with Siddhi in my application. The esper statement right now is a "timeout" type pattern, where I need to report back when events of a unique "name" and "type" (just string values I can look for on the incoming events) arrive and depart. I know that the event has arrived when the event first arrives in my firstunique window, and I assume the event departs if I don't see any events of the same name and type in a user-defined timeout value. Here's what my esper statements look like(note that there's a lot more going on in the actual esper, I've just simplified this for the sake of example):
create window events_1.std:firstunique(name, type) as NameEvent
insert into events_1 select * from EventCycle[events]
on pattern [every event1=events_1->(timer:interval(4.0 sec) and not events_1(name=event1.name, type=event1.type))]delete from events_1 where name = event1.name AND type=event1.type
I then select the irstream from events_1 and by getting the incoming and removed events I then get the "arrived" and "departed" events from the window.
For the siddhi, the firstunique window is fairly straightforward (I think?):
from EventCycle#window.firstUnique('name')[ type=='type' ] select name, type insert into NameEvent
but I'm really drawing a blank on how to replace that esper "on pattern" with Siddhi. Can I use a single "from every" statement for this or will I need a different approach with Siddhi?
Any help setting me on the right path here will be appreciated!

One way of achieving your requirement, is by checking the non-occurrence of an event.
I'm afraid, AFAIK, non-occurrence check is not supported in WSO2 CEP-3.1.0.
However it is supported in WSO2 CEP-4.0.0 (yet to be released as I'm writing this on 24th Aug 2015).
You may refer to non-occurrence detection sample [1].
Explanation:
Here we depart the first event if no unique event has occured 4 seconds (which is the timout) later since the latest unique event has occured.
So it appears like we need to check the non-occurance of an event.
In CEP 4.0.0, you could achieve your requirement as follows:
from EventCycle#window.firstUnique(name)[ type=='type' ]
select name, type
insert into NameEvents; -- Note: I renamed NameEvent in the question to NameEvents
-- After seeing the latest unique event (Query-A), 4 seconds later (Query-B), we're checking if no unique event has occured in between (Query-C and Query-D).
-- So, we're checking the non-occurance of an event here... See link [1] for a sample.
--Query-A
from EventCycle#window.unique(name)[ type=='type' ]
select name, type
insert into latestEvents;
-- Query-B
from latestEvents#window.time(4 seconds) -- Here, I've taken 4 seconds as the timeout.
select *
insert expired events into timedoutEvents;
-- Query-C
from every latestEvent = latestEvents[ type=='type' ] ->
keepAliveEvent = latestEvents[ latestEvent.name == keepAliveEvent.name and type=='type' ]
or timedoutEvent = timedoutEvents[ latestEvent.name == timedoutEvent.name and type=='type' ]
select latestEvent.name as name, keepAliveEvent.name as keepAliveName
insert into filteredEvents;
-- Query-D
from filteredEvents [ isNull(keepAliveName)]
select name
insert into departedLatestEvents;
-- Since we want the name from the NameEvents stream, we're joining it with the departedLatestEvents stream
from departedLatestEvents#window.length(1) as departedLatestEvent join
NameEvents#window.length(1) as NameEvent
on departedLatestEvent.name == NameEvent.name -- not checking type as both departedLatestEvents and NameEvents have events only with type 'type'
select NameEvent.name as name, 'type' as type
insert into departedFirstEvents;
Link referred in the code sample:
1 https://docs.wso2.com/display/CEP400/Sample+0111+-+Detecting+non-occurrences+with+Patterns
Hope this helps!

Related

Linking groundtruth worker metadata back to the actual task?

As far as I can tell there's no identifier being passed with the GT worker metadata (see below from documentation https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html)? How would I link this information back to the actual labeling task?
sub I believe is a cognito reference to the worker, so not a unique identifier for the submisson. As of right now, I jsut know that one of the tasks took a certian amount of time for a particular worker, but I can't tell which one. I also guess i have to jump through a few hoops via cognito to get the GT worker id from the sub?
I am looking for a way to summarize origina data shown (from input manifest file), the label given, the time it took to complete. As of right now, I have to make one table that has the data with their human submitted label, and a separate table with time it took to complete by task, but no way to link the two...am I missing something?
here's the worker metadata json:
"submissionTime": "2020-12-28T18:59:58.321Z",
"acceptanceTime": "2020-12-28T18:59:15.191Z",
"timeSpentInSeconds": 40.543,
"workerId": "a12b3cdefg4h5i67",
"workerMetadata": {
"identityData": {
"identityProviderType": "Cognito",
"issuer": "https://cognito-idp.aws-region.amazonaws.com/aws-region_123456789",
"sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
}
}

Upserting a document in CosmosDb, but only if "newer"

I need to update documents in a CosmosDb.
{
firstname: "...",
lastname: "...",...
lastmodified: "2022-01-13T12:06:18.712Z"
}
Due to "concurrency" ( i.e. multiple concurrent "updaters" ), the update ( or insertion, hence Upsert? ) should only take place if the data I update is "newer" ( data.lastmodified ) than the one already persisted.
What is the proposed way to achieve this?
In plain SQL I'd opt for:
UPDATE address SET ... WHERE address.lastmodified < newdata.lastmodified
or INSERT ON DUPLICATE KEY UPDATE
If Upsert had the possibility for specifying contraints when to effectively upsert ( i.e. address.lastmodified < newdata.lastmodified ), I'd use these. But I guess ItemRequestOptions is not meant for that?
"concurrency"-context: updates are being posted into a service bus queue and an event-triggered AzureFunction handles the Events. Chances are, that multiple Events for the same data end up "concurrently" in the queue and hence are being executed "concurrently"
Thx for your advices
Clemens ( being new to ComosDb et al )
This can be achieved using partial document update with conditional update. You can use Add or Set patch operations and specify your WHERE clause of WHERE address.lastmodified < newdata.lastmodified to specify whether it executes or not.
For more information see, Partial Document Updates in Azure Cosmos DB

Override field in the input before passing to the next state in AWS Step Function

Say I have 3 states, A -> B -> C. Let's assume inputs to A include a field called names which is of type List and each element contains two fields firstName and lastName. State B will process the inputs to A and and return a response called newLastName. If I want to override every element in names such that names[i].lastName = newLastName before passing this input to state C, is there an built-in syntax to achieve that? Thanks.
You control the events passed to the next task in a Step Function with three defintion attributes: ResultPath and OutputPath on leaving one task and InputPath on entering the next one.
You have to first understand how the event to the next task is crafted by a State Machine, and each of the 3 above parameters changes it.
You have to at least have Result Path. This is the key in the event that the output of your lambda will be placed under. so ResultPath="$.my_path" would result in a json object that has a top level key of my_path with the value equal to whatever is outputted from the lambda.
If this is the only attribute, it is tacked onto whatever the input was. So if your Input event was a json object with keys original_key1 and some_other_key your output with just the above result path would be:
{
"original_key_1": some value,
"some_other_key": some other value,
"my_path": the output of your lambda
}
Now if you add OutputPath, this cuts off everything OTHER than the path (AFTER adding the result path!) in the next output.
If you added OutputPath="$.my_path" you would end up with a json of:
{ output of your lambda }
(your output better be a json comparable object, like a python dict!)
InputPath does the same thing ... but for the Input. It cuts off everything other than the path described, and that is the only thing sent into the lambda. But it does not stop the input from being appeneded - so InputPath + ResultPath results in less being sent into the lambda, but everything all together on the exit
There isn't really a loop logic like the one you describe however - Task and State Machine definitions are static directions, not dynamic logic.
You can simply handle it inside the lambda. This is kinda the preferred method. HOWEVER if you do this, then you should use a combination of OutputPath and ResultPath to 'cut off' the input, having replaced the various fields of the incoming event with whatever you want before returning it at the end.

It is possible to detect unordered event patterns with WSO2?

I would like to detect some patterns using wso2, but my current solution is only capable to detect them if the events arrived are consecutives.
Let's suppose the following pattern:
Event 1: Scanning Event from Source 1 to Target 2
Event 2: Attempt Exploit from Source 1 to Target 2
That would generate an Alert.
But in a real world scenario, the events won't come in order, there are too many computers in an enterprise.
There is a way to be able to detect the previous pattern with the following event sequence?
Event 1: Scanning Event from Source 1 to Target 2
Event 2: Not relevant
Event 3: Not relevant
...
Event N: Attempt Exploit from Source 1 to Target 2
My Current code is:
from every (e1=Events) -> e2=Events
within 10 min
select ...
having e1.type=='Scan' and e2.type=='attack' and e1.Source_IP4==e2.Source_IP4
insert into Alert;
I've also tried other kind of solutions like
from every e1=Events,e2=Events[Condition]
within 10 min
select ...
having e1.type=='Scan' and e2.type=='attack' and e1.Source_IP4==e2.Source_IP4
insert into Alert;
Maybe it could be done with a Partition? Partiotionate the streams taking into account the Source_IP4
I've finally made it.
The problem was to use "having" to detect the pattern, It has to be moved to the "filter condition" section instead.
from (every)? <event reference>=<input stream>[<filter condition>] ->
(every)? <event reference>=<input stream [<filter condition>] ->
...
(within <time gap>)?
select <event reference>.<attribute name>, <event reference>.<attribute name>, ...
insert into <output stream>
Solution:
from every (e1=Events) -> e2=Events[e1.type=='Scan' and type=='attack' and e1.Source_IP4==Source_IP4]
within 10 min
select ...
insert into Alert;

Time tracking when status is changed

I have a question about a specific functionality in Siebel, regarding service requests.
Is there a way to track time when certain service request is in a given status/substatus, for example "Waiting on Customer"? When the service request is changed again to another status that isn't "Wait for somebody" anymore, I have to stop counting the time.
I don't know of any out of the box solution to your needs, however there are many ways to achieve it with a bit of customisation. For example:
Create two new fields, Waiting Time (with predefault value: 0) and Waiting Date.
Create the following BC user properties:
On Field Update Set x = "Status", "Waiting Time", "IIF([Waiting Date] IS NULL, [Waiting Time], [Waiting Time] + (Timestamp() - [Waiting Date]))
On Field Update Set y = "Status", "Waiting Date", "IIF([Status]='Waiting on Customer',Timestamp(),NULL)"
Your Waiting Date field will store the last time the service request changed to "Waiting on Customer", or NULL if it's on another status. Then, Waiting Time will accumulate the total time the request has been in that status.
I have not tested the solution, it might need some more work, for example, it's possible that Siebel doesn't allow you to use the expression [Waiting Time] + (Timestamp() - [Waiting Date]) directly and you'll have to decompose it using auxiliary calculated fields.
Note also that the On Field Update Set user property has changed its syntax from Siebel 7.7-7.8 to Siebel 8.x.
If you're familiar with server scripting, you could implement something similar quite easily, on the BusComp_PreSetFieldValue event. If the field being changed is Status, check if you're entering or exiting (or not) the "Waiting on Customer" status, and update the two fields accordingly.