I'm new to the Siddhi language. I need to get events only within the first 10 minutes of an event stream. Is there a way to do this task?
Here is the defined input stream.
define stream InputDataStream (timestamp long, obd2_MAF float, obd2_engine_rpm float);
there's no way to get the events in the 1st 10 min in CEP as at now
siddhi time window will slide and keep the events of last 10 minutes in it.
one option you have is to create a siddhi query and ignore events that has time stamp value greater that 10 minutes + the time stamp of the very first event, this could be achieved using a filter query.
please refer [1] for this
[1] https://docs.wso2.com/display/CEP210/Filters
Related
I am using apache beam to write some streaming pipelines. One requirement for my use case is that i want to trigger every X minutes relative to window start or end time. how can i achieve this.
The current trigger AfterProcessingTime.pastFirstElementInPane(), is in relative to the first element's processing time in that window.
For example i created fixed 1 minute windows, so i have window_1 (0-1 min interval), window_2 (1 - 2 min interval) and so on.
Now i want the results for each window to be triggered exactly once after 10 minutes since the beginning of window i.e window_1 at 0 + 10 -> 10th minute , window_2 at 11th minutes (1 + 10). [Note: i configure fixed windows to allow lateness of > 10 minutes so the elements are not discarded if delayed]
Is there a way to achieve this kind of triggering for a fixed window.
I cannot just assign all elements to a global window and then do repeated trigger every minute , because then it loses all elements window timing information .For example if there are 2 elements in my pcollection that belong to window_1 and window_2 based on there event timestamp, but were delayed by 3 and 3.2 minutes. Assigning them to global window will generate me some output at the end of 4th minute taking both elements into account, whereas in reality i want them to be assigned to there actual fixed window (as late data).
I want the elements to be assigned to window_1 and window_2 based on there event timestamp and then window_1 triggering at the 10th minute outputting result by processing only 1 late data for that window and then window_2 triggering at 11th minute with output after processing the only element that came 3.2 minutes delayed.
What should be my trigger setting to achieve this kind of behavior in my streaming pipeline.
I believe the following code works for you:
pcollection | WindowInto(
FixedWindows(1 * 60).configure().withAllowedLateness(),
trigger=AfterProcessingTime(9 * 60),
The size of the window is 1 minute and after 9 minutes it triggers the data. However, for many cases, it is much faster to use sliding window and then take care of the duplicated processed elements. As AlexAmato mentioned Watermarks and AfterWatermark Event time triggers should also work here.
I'am using Stream Processor for receiveing events and I need to know, that there is any way, how to check that some event arrived within specified time in window? Let's say we want to check, that event arrived every 5 minutes. If it's not, we need to publish alert. Have Siddhi 4.0 any function for this purpose? My idea was counting same events in time window and then equal this count, but don't know, if it's the best way how to deal this problem.
You can do this using logical patterns.1
Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.
Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?
I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.
In riemann config for specific service I'm trying to assign to all its events metric=1, sum them within 5sec and send the result to influxdb.
I tried the following:
(streams
(where (service "offers")
(fixed-time-window 5
(smap folds/sum (with :metric 1 index))))
influx)
It doesn't really work, events stored in influx do not match this rule.
The built in folds/count function does this:
(fixed-time-window 5
(smap folds/count influx))
also the call to influx needs to be a child of the stream that does the counting so it's the counts that get indexed.
If you want to fix your example using folds/sum you could move the call to (with :metric 1) outside, or upstream of, the call to sum so the metrics are set to one and then the new metrics are summed in the call to folds/sum. Then put the call to index and or influx as the child stream of smap so the summed items get indexed and forwarded.
Adding to the answer by Arthur, you might want to use rate (with scale) instead of fixed-time-window (with smap and folds/count). rate is generally better than fixed-time-window because rate fires as soon as the time window has finished, while fixed-time-window has to wait until a new event arrives after the time window has finished, which can never happen or happen too far in the future. There's an issue in riemann about this.
There's also a comment from aphyr explaining why rate is more efficient than the windowing functions.
You just need to use it with scale because rate will measure the rate by second while you want to get the rate by 5 seconds (measured during a 5 seconds interval).
I have messages inside amazon SQS. for some of the messages I need to perform a delay of six hours before I can start working on them (the delay is a giving).
one solution would be to do Thread.Sleep(6h).
I don't like this solution because I'm afraid something will happen to the thread and I'll lose the data. another solution will be to read the message see if 6 hours have passed, and if not return the message to the queue. again I don't like it because the procedure will happen a lot.
Is there any better solution ??
Can you create individual Multiple Queues and put the queue items separately.
Example 1:
You can have 6 queues like Queue0, Queue1, Queue2, Queue3, Queue4, Queue5 and use a hash-function like hash(x) = current-hour % 6 - this function will return values from 0 to 5 and you can put the items in Queue_f(x) and read the queues individually based on current time.
Example 2:
If the current time is 01:00 Hours you can create separate queues like Queue0700Hours, if the current time is 02:00 hours you can create a another new queue as Queue0800Hours like wise and go.
This way you are decoupling the need to wait / stop a processing and pick up the producers and consumers independently based on the current timestamp.