triggering at fixed intervals in apache beam streaming

triggering at fixed intervals in apache beam streaming - google-cloud-platform

I am using apache beam to write some streaming pipelines. One requirement for my use case is that i want to trigger every X minutes relative to window start or end time. how can i achieve this.
The current trigger AfterProcessingTime.pastFirstElementInPane(), is in relative to the first element's processing time in that window.
For example i created fixed 1 minute windows, so i have window_1 (0-1 min interval), window_2 (1 - 2 min interval) and so on.
Now i want the results for each window to be triggered exactly once after 10 minutes since the beginning of window i.e window_1 at 0 + 10 -> 10th minute , window_2 at 11th minutes (1 + 10). [Note: i configure fixed windows to allow lateness of > 10 minutes so the elements are not discarded if delayed]
Is there a way to achieve this kind of triggering for a fixed window.
I cannot just assign all elements to a global window and then do repeated trigger every minute , because then it loses all elements window timing information .For example if there are 2 elements in my pcollection that belong to window_1 and window_2 based on there event timestamp, but were delayed by 3 and 3.2 minutes. Assigning them to global window will generate me some output at the end of 4th minute taking both elements into account, whereas in reality i want them to be assigned to there actual fixed window (as late data).
I want the elements to be assigned to window_1 and window_2 based on there event timestamp and then window_1 triggering at the 10th minute outputting result by processing only 1 late data for that window and then window_2 triggering at 11th minute with output after processing the only element that came 3.2 minutes delayed.
What should be my trigger setting to achieve this kind of behavior in my streaming pipeline.

I believe the following code works for you:
pcollection | WindowInto(
FixedWindows(1 * 60).configure().withAllowedLateness(),
trigger=AfterProcessingTime(9 * 60),
The size of the window is 1 minute and after 9 minutes it triggers the data. However, for many cases, it is much faster to use sliding window and then take care of the duplicated processed elements. As AlexAmato mentioned Watermarks and AfterWatermark Event time triggers should also work here.

Related

Proper use of timers in Qt

I wanted to program an interval-timer which you can use for training. So there have to be a stack widget were the user can input the times for the training- and the rest-rounds and the repetitions followed by a press on a start button which changes the page and starts the first round-countdown shown in a display.
So if the user enter 20 seconds for training, 10 seconds for rest and 3 repetitions the numbers
20 to 0, 10 to 0, 20 to 0, 10 to 0 and 20 to 0
should be displayed one after another.
The problem I ran in:
I tried QTimer and a QThread with a 1 sec-sleep-function and a signal-slot to the gui, but in both options the gui froze.

The use of a QTimer will not block the main window. This is the purpose of timers.
Moreover, you don't need to use threads at all, you only have to start a timer with the desired interval (for example, a tick every 10ms) and connect the timeout() signal to a slot that will process your application behaviour.
In this slot, you just have to handle the countdown and the state change (working time to break time if the number of repetitions is not reached, break time to working time and the finished state).
I have created such an application and it worked well. Maybe I will later make it available on github. If I do it, I would made an edit to my answer to provide the link.
I hope it helps.

I think you have designed the solution in a very complicated way. With no code it's impossible to tell you what went wrong.
If I had to develop a solution for this, it'd be in the form of interconnecting blocks, which can be delay blocks or flow control blocks (child classes of the block parent).
Each block has a next one and a trigger function. A delay block also has a time. A flow control block may have different functionalities, like pointing to a previous block only for x repetitions. You can use a global QTimer when a new delay block is triggered to trigger the next block (connect the timeout signal of the timer to the trigger function of the next block, then start the timer with the current block's time).
For instance, if you wanted to do 3 times 30s exercise, 10s rest, you'd connect two delay blocks with a repeat block.

QTimer long timeout

Qt 5.7 32-bit on windows 10 64-bit
long period timer
the interval of a QTimer is given in msecs as a signed integer, so the maximum interval which can be set is a little bit more than 24 days (2^31 / (1000*3600*24) = 24.85)
I need a timer with intervals going far beyond this limit.
So my question is, which alternative do you recommend? std::chrono (C++11) seems not to be suitable as it does not have an event handler?
Alain

You could always create your own class which uses multiple QTimer's for the duration they are valid and just count how many have elapsed.
Pretty simple problem. If you can only count to 10 and you need to count to 100 - just count to 10 ten times.

I would implement this in the following way:
Upon timer start, note the current time in milliseconds like this:
m_timerStartTime = QDateTime::currentMSecsSinceEpoch()
The, I would start a timer at some large interval, such as 10 hours, and attach a handler function to the timer that simply compared the time since it started to see if we are due:
if(QDateTime::currentMSecsSinceEpoch() - m_timerStartTime > WANTED_DELAY_TIME){
// Execute the timer payload
// Stop interval timer
}
This simple approach could be improved in several ways. For example, to keep the timer running even if application is stopped/restarted, simply save the timer start time in a setting or other persistent storage, and read it back in at application start up.
And to improve precision, simply change the interval from the timer handler function in the last iteration so that it tracks the initial end time perfectly (instead of overshooting by up to 10 minutes).

Siddhi CEP 4.x: Multiple results per group when using time batch window

Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.

Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?

I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.

Count riemann events in given time window

In riemann config for specific service I'm trying to assign to all its events metric=1, sum them within 5sec and send the result to influxdb.
I tried the following:
(streams
(where (service "offers")
(fixed-time-window 5
(smap folds/sum (with :metric 1 index))))
influx)
It doesn't really work, events stored in influx do not match this rule.

The built in folds/count function does this:
(fixed-time-window 5
(smap folds/count influx))
also the call to influx needs to be a child of the stream that does the counting so it's the counts that get indexed.
If you want to fix your example using folds/sum you could move the call to (with :metric 1) outside, or upstream of, the call to sum so the metrics are set to one and then the new metrics are summed in the call to folds/sum. Then put the call to index and or influx as the child stream of smap so the summed items get indexed and forwarded.

Adding to the answer by Arthur, you might want to use rate (with scale) instead of fixed-time-window (with smap and folds/count). rate is generally better than fixed-time-window because rate fires as soon as the time window has finished, while fixed-time-window has to wait until a new event arrives after the time window has finished, which can never happen or happen too far in the future. There's an issue in riemann about this.
There's also a comment from aphyr explaining why rate is more efficient than the windowing functions.
You just need to use it with scale because rate will measure the rate by second while you want to get the rate by 5 seconds (measured during a 5 seconds interval).

Get events within first 10 minutes

I'm new to the Siddhi language. I need to get events only within the first 10 minutes of an event stream. Is there a way to do this task?
Here is the defined input stream.
define stream InputDataStream (timestamp long, obd2_MAF float, obd2_engine_rpm float);

there's no way to get the events in the 1st 10 min in CEP as at now
siddhi time window will slide and keep the events of last 10 minutes in it.
one option you have is to create a siddhi query and ignore events that has time stamp value greater that 10 minutes + the time stamp of the very first event, this could be achieved using a filter query.
please refer [1] for this
[1] https://docs.wso2.com/display/CEP210/Filters

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

triggering at fixed intervals in apache beam streaming - google-cloud-platform

Related

Proper use of timers in Qt

QTimer long timeout

Siddhi CEP 4.x: Multiple results per group when using time batch window

Count riemann events in given time window

Get events within first 10 minutes

Categories

Resources