Count riemann events in given time window

Count riemann events in given time window - clojure

In riemann config for specific service I'm trying to assign to all its events metric=1, sum them within 5sec and send the result to influxdb.
I tried the following:
(streams
(where (service "offers")
(fixed-time-window 5
(smap folds/sum (with :metric 1 index))))
influx)
It doesn't really work, events stored in influx do not match this rule.

The built in folds/count function does this:
(fixed-time-window 5
(smap folds/count influx))
also the call to influx needs to be a child of the stream that does the counting so it's the counts that get indexed.
If you want to fix your example using folds/sum you could move the call to (with :metric 1) outside, or upstream of, the call to sum so the metrics are set to one and then the new metrics are summed in the call to folds/sum. Then put the call to index and or influx as the child stream of smap so the summed items get indexed and forwarded.

Adding to the answer by Arthur, you might want to use rate (with scale) instead of fixed-time-window (with smap and folds/count). rate is generally better than fixed-time-window because rate fires as soon as the time window has finished, while fixed-time-window has to wait until a new event arrives after the time window has finished, which can never happen or happen too far in the future. There's an issue in riemann about this.
There's also a comment from aphyr explaining why rate is more efficient than the windowing functions.
You just need to use it with scale because rate will measure the rate by second while you want to get the rate by 5 seconds (measured during a 5 seconds interval).

Related

Debounce Events in Cloud

I am looking for a good cloud solution to handle below scenario, where I need to wait for future events within a specific time interval to know whether to process current event. Its kindoff like Debounce (“group” multiple sequential calls within a time period in a single one) but little more complex as the timer needs to be reset when next event is received.
Eg:
I get a request of Event A at X time for a particular User(U1).
a. If I get a similar Event A from same User within 5mins of X time, I need to reset the timer and keep watching again.
b. If 5 mins have passed by, I need to process Event A.

triggering at fixed intervals in apache beam streaming

I am using apache beam to write some streaming pipelines. One requirement for my use case is that i want to trigger every X minutes relative to window start or end time. how can i achieve this.
The current trigger AfterProcessingTime.pastFirstElementInPane(), is in relative to the first element's processing time in that window.
For example i created fixed 1 minute windows, so i have window_1 (0-1 min interval), window_2 (1 - 2 min interval) and so on.
Now i want the results for each window to be triggered exactly once after 10 minutes since the beginning of window i.e window_1 at 0 + 10 -> 10th minute , window_2 at 11th minutes (1 + 10). [Note: i configure fixed windows to allow lateness of > 10 minutes so the elements are not discarded if delayed]
Is there a way to achieve this kind of triggering for a fixed window.
I cannot just assign all elements to a global window and then do repeated trigger every minute , because then it loses all elements window timing information .For example if there are 2 elements in my pcollection that belong to window_1 and window_2 based on there event timestamp, but were delayed by 3 and 3.2 minutes. Assigning them to global window will generate me some output at the end of 4th minute taking both elements into account, whereas in reality i want them to be assigned to there actual fixed window (as late data).
I want the elements to be assigned to window_1 and window_2 based on there event timestamp and then window_1 triggering at the 10th minute outputting result by processing only 1 late data for that window and then window_2 triggering at 11th minute with output after processing the only element that came 3.2 minutes delayed.
What should be my trigger setting to achieve this kind of behavior in my streaming pipeline.

I believe the following code works for you:
pcollection | WindowInto(
FixedWindows(1 * 60).configure().withAllowedLateness(),
trigger=AfterProcessingTime(9 * 60),
The size of the window is 1 minute and after 9 minutes it triggers the data. However, for many cases, it is much faster to use sliding window and then take care of the duplicated processed elements. As AlexAmato mentioned Watermarks and AfterWatermark Event time triggers should also work here.

Siddhi CEP 4.x: Multiple results per group when using time batch window

Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.

Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?

I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.

Delay job in AWS

I have messages inside amazon SQS. for some of the messages I need to perform a delay of six hours before I can start working on them (the delay is a giving).
one solution would be to do Thread.Sleep(6h).
I don't like this solution because I'm afraid something will happen to the thread and I'll lose the data. another solution will be to read the message see if 6 hours have passed, and if not return the message to the queue. again I don't like it because the procedure will happen a lot.
Is there any better solution ??

Can you create individual Multiple Queues and put the queue items separately.
Example 1:
You can have 6 queues like Queue0, Queue1, Queue2, Queue3, Queue4, Queue5 and use a hash-function like hash(x) = current-hour % 6 - this function will return values from 0 to 5 and you can put the items in Queue_f(x) and read the queues individually based on current time.
Example 2:
If the current time is 01:00 Hours you can create separate queues like Queue0700Hours, if the current time is 02:00 hours you can create a another new queue as Queue0800Hours like wise and go.
This way you are decoupling the need to wait / stop a processing and pick up the producers and consumers independently based on the current timestamp.

Multiple Timers in C++ / MySQL

I've got a service system that gets requests from another system. A request contains information that is stored on the service system's MySQL database. Once a request is received, the server should start a timer that will send a FAIL message to the sender if the time has elapsed.
The problem is, it is a dynamic system that can get multiple requests from the same, or various sources. If a request is received from a source with a timeout limit of 5 minutes, and another request comes from the same source after only 2 minutes, it should be able to handle both. Thus, a timer needs to be enabled for every incoming message. The service is a web-service that is programmed in C++ with the information being stored in a MySQL database.
Any ideas how I could do this?

A way I've seen this often done: Use a SINGLE timer, and keep a priority queue (sorted by target time) of every timeout. In this way, you always know the amount of time you need to wait until the next timeout, and you don't have the overhead associated with managing hundreds of timers simultaneously.
Say at time 0 you get a request with a timeout of 100.
Queue: [100]
You set your timer to fire in 100 seconds.
Then at time 10 you get a new request with a timeout of 50.
Queue: [60, 100]
You cancel your timer and set it to fire in 50 seconds.
When it fires, it handles the timeout, removes 60 from the queue, sees that the next time is 100, and sets the timer to fire in 40 seconds. Say you get another request with a timeout of 100, at time 80.
Queue: [100, 180]
In this case, since the head of the queue (100) doesn't change, you don't need to reset the timer. Hopefully this explanation makes the algorithm pretty clear.
Of course, each entry in the queue will need some link to the request associated with the timeout, but I imagine that should be simple.
Note however that this all may be unnecessary, depending on the mechanism you use for your timers. For example, if you're on Windows, you can use CreateTimerQueue, which I imagine uses this same (or very similar) logic internally.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Count riemann events in given time window - clojure

Related

Debounce Events in Cloud

triggering at fixed intervals in apache beam streaming

Siddhi CEP 4.x: Multiple results per group when using time batch window

Delay job in AWS

Multiple Timers in C++ / MySQL

Categories

Resources