I'am using Stream Processor for receiveing events and I need to know, that there is any way, how to check that some event arrived within specified time in window? Let's say we want to check, that event arrived every 5 minutes. If it's not, we need to publish alert. Have Siddhi 4.0 any function for this purpose? My idea was counting same events in time window and then equal this count, but don't know, if it's the best way how to deal this problem.
You can do this using logical patterns.1
Related
I have a dataflow job which reads JSON from 3 PubSub topics, flattening them in one, apply some transformations and save to BigQuery.
I'm using a GlobalWindow with following configuration.
.apply(Window.<PubsubMessage>into(new GlobalWindows()).triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterFirst.of(AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations))))
.discardingFiredPanes());
The job is running with following configuration
Max Workers : 20
Disk Size: 10GB
Machine Type : n1-standard-4
Autoscaling Algo: Throughput Based
The problem I'm facing is that after processing few messages (approx ~80k) the job stops reading messages from PubSub. There is a backlog of close to 10 Million messages in one of those topics and yet the Dataflow Job is not reading the messages or autoscaling.
I also checked the CPU usage of each worker and that is also hovering in single digit after initial burst.
I've tried changing machine type and max worker configuration but nothing seems to work.
How should I approach this problem ?
I suspect the windowing function is the culprit. GlobalWindow isn't suited to streaming jobs (which I assume this job is, due to the use of PubSub), because it won't fire the window until all elements are present, which never happens in a streaming context.
In your situation, it looks like the window will fire early once, when it hits either that element count or duration, but after that the window will get stuck waiting for all the elements to finally arrive. A quick fix to check if this is the case is to wrap the early firings in a Repeatedly.forever trigger, like so:
withEarlyFirings(
Repeatedly.forever(
AfterFirst.of(
AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations)))))
This should allow the early firing to fire repeatedly, preventing the window from getting stuck.
However for a more permanent solution I recommend moving away from using GlobalWindow in streaming pipelines. Using fixed-time windows with early firings based on element count would give you the same behavior, but without risk of getting stuck.
Working with a streaming, unbounded PCollection in Google Dataflow that originates from a Cloud PubSub subscription. We are using this as a firehose to simply deliver events to BigTable continuously. Everything with the delivery is performing nicely.
Our problem is that we have downstream batch jobs that expect to read a day's worth of data out of BigTable once it is delivered. I would like to utilize windowing and triggering to implement a side effect that will write a marker row out to bigtable when the watermark advances beyond the day threshold, indicating that dataflow has reason to believe that most of the events have been delivered (we don't need strong guarantees on completeness, just reasonable ones) and that downstream processing can begin.
What we've tried is write out the raw events as one sink in the pipeline, and then window into another sink, using the timing information in the pane to determine if the watermark has advanced. The problem with this approach is that it operates upon the raw events themselves again, which is undesirable since it would repeat writing the event rows. We can prevent this write, but the parallel path in the pipeline would still be operating over the windowed streams of events.
Is there an effecient way to attach a callback-of-sorts to the watermark, such that we can perform a single action when the watermark advances?
The general ability to set a timer in event time and receive a callback is definitely an important feature request, filed as BEAM-27, which is under active development.
But actually your approach of windowing into FixedWindows.of(Duration.standardDays(1)) seems like it will accomplish your goal using just the features of the Dataflow Java SDK 1.x. Instead of forking your pipeline, you can maintain the "firehose" behavior by adding the trigger AfterPane.elementCountAtLeast(1). It does incur the cost of a GroupByKey but does not duplicate anything.
The complete pipeline might look like this:
pipeline
// Read your data from Cloud Pubsub and parse to MyValue
.apply(PubsubIO.Read.topic(...).withCoder(MyValueCoder.of())
// You'll need some keys
.apply(WithKeys.<MyKey, MyValue>of(...))
// Window into daily windows, but still output as fast as possible
.apply(Window.into(FixedWindows.of(Duration.standardDays(1)))
.triggering(AfterPane.elementCountAtLeast(1)))
// GroupByKey adds the necessary EARLY / ON_TIME / LATE labeling
.apply(GroupByKey.<MyKey, MyValue>create())
// Convert KV<MyKey, Iterable<MyValue>>
// to KV<ByteString, Iterable<Mutation>>
// where the iterable of mutations has the "end of day" marker if
// it was ON_TIME
.apply(MapElements.via(new MessageToMutationWithEndOfWindow())
// Write it!
.apply(BigTableIO.Write.to(...);
Please do comment on my answer if I have missed some detail of your use case.
Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.
Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?
I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.
I have a program that has a thread that generates Expose messages using XSendEvent. A second thread receives the Expose messages along with other messages (mainly input handling). The problem is that the sending thread sends the Expose messages at a constant rate (~60Hz) but the receiving thread may be rendering slower than that. The X11 queue will get bogged down with extra Expose messages, and any input handling messages will start fall way behind all those extra Expose messages.
In Windows, this is not a problem because Windows will automatically coalesce all WM_PAINT messages into a single message. Is there any way to do this in X11, or some other way to solve this problem?
You can very easily coalesce any kind of event yourself with XCheckTypedEvent() and friends.
I was able to solve this problem as follows:
Block the rendering thread using XPeekEvent.
When an event comes in, read all events into a new queue data structure using a combination of XPending and XNextEvent, but only copy the first expose message.
Then run the event processing loop over the new queue data structure.
This fixed the problem for me, but I think a solution that uses XCheckTypedEvent (per n.m.'s answer here) is probably more elegant.
A few of thing you can do:
If you are doing complete redraw for each event, only action events with a count of 0, count > 1 is the redraw of a particular rectange
If you generate expose events for part of the window, this will reduce the amount of work each expose event does
The constant rate, means you could just process every nth event or keep a time since the last event and ignore events received within a given time
I am working for private video network where I have to schedule the
task based on following parameter.There is client Portal, Server and Gateway.
Through portal a user can request Streaming the video.
User can also Schedule Streaming for some future time.Each each task is having a task ID.
Task is scheduled based on following date time parameter.
start time
end time
Repeat (every day,just once, a particular day)
start date
end date
Now at the gateway I need to add logic to Implement schedule task.
I am exploring Waitable Timer Objects and CreateWaitableTimerEe.
I am bit confused whether it is possible to implement the feature using this.
I am using C++, MFC and can't use third party library.
I need Suggestion how to implement this.
There are dozens of ways to design this. It all depends on what you want to do and what the specific requirements are.
In a basic design I'd create an additional field called "next run time" which will be calculated by using start time, frequency and previous (if any) end time. Then I'd dump all the tasks in a queue sorted using this field.
The main scheduling will pick up the first queue item and create a suspended thread for that specific task. Now just calculate the time difference to the first item's 'next run time' and sleep for that time period. When you wake up just resume the thread and pick the next queue item and repeat.
I would just create a timer thread callback loop that checks the time every minute and executes your task on the specified schedule.