When I send events to a stream receiver the following occurs:
I can see the events in the data explorer
If I try to filter the results by query, theorically matching results are not shown
Those matching results appear after some time has passed
I suppose the issue is due to a delay in the indexing of the data. How can I get events to be indexed in real time?
I was overriding the _timestamp parameter, but I noticed that the timestamp shown in the data explorer was "in the future"...it was a _timestamp problem!
Related
Been recently developing a Dataflow consumer which read from a PubSub subscription and outputs to Parquet files the combination of all those objects grouped within the same window.
While I was doing testing of this without a huge load everything seemed to work fine.
However, after performing some heavy testing I can see that from 1.000.000 events sent to that PubSub queue, only 1000 make it to Parquet!
According to multiple wall times across different stages, the one which parses the events prior applying the window seems to last 58 minutes. The last stage which writes to Parquet files lasts 1h and 32 minutes.
I will show now the most relevant parts of the code within, hope you can shed some light if its due to the logic that comes before the Window object definition or if it's the Window object iself.
pipeline
.apply("Reading PubSub Events",
PubsubIO.readMessagesWithAttributes()
.fromSubscription(options.getSubscription()))
.apply("Map to AvroSchemaRecord (GenericRecord)",
ParDo.of(new PubsubMessageToGenericRecord()))
.setCoder(AvroCoder.of(AVRO_SCHEMA))
.apply("15m window",
Window.<GenericRecord>into(FixedWindows.of(Duration.standardMinutes(15)))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()
)
Also note that I'm running Beam 2.9.0.
Could the logic inside the second stage be too heavy so that messages arrive too late and get discarded in the Window? The logic basically consists reading the payload, parsing into a POJO (reading inner Map attributes, filtering and such)
However, if I sent a million events to PubSub, all those million events make it till the Parquet write to file stage, but then those Parquet files don't contain all those events, just partially. Does that make sense?
I would need the trigger to consume all those events independently of the delay.
Citing from an answer on the Apache Beam mailing list:
This is an unfortunate usability problem with triggers where you can accidentally close the window and drop all data. I think instead, you probably want this trigger:
Repeatedly.forever(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
The way I recommend to express this trigger is:
AfterWatermark.pastEndOfWindow().withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
In the second case it is impossible to accidentally "close" the window and drop all data.
How to cancel long running QSqlQuery?
Database is returning 3M+ rows and it's shown in QTableView control. I'd like to be able to force stop both long operations:
when database is running a long operation
if database is fast, but there is a huge number of rows to be returned and processing/copying/showing those takes a lot of time
2nd bullet, can be solved by not using QSqlQueryModel. In this case, parsing query results manually can be done in stages and this will be implemented, but i'd also like to know if the process of moving data DB->QTableView can be interrupted and cancelled.
I've tried following without success:
QSqlQuery::finish()
QFuture::cancel()
QSqlDatabase.close() -- this one crashes application
If full context is needed, it's here. Method in question is
on_button_stopQuery_released
Aborting queries during execution (not fetching, which is what QSqlQuery::finish does) is hit-and-miss in all databases. Qt itself doesn't support this; workarounds will be backend-specific.
For example, with PostgreSQL you can do the following:
In your original connection, retrieve the connection ID (SELECT pg_backend_pid();) and save it
When you want to abort your query, open a second connection and kill the query by issuing SELECT pg_cancel_backend(saved_id);
SQLite has sqlite3_interrupt(sqlite3*). This interrupts queries and does not close the connection.
MySQL is similar to PostgreSQL:
First retrieve the connection ID (SELECT CONNECTION_ID();)
Then kill it through another connection (KILL [CONNECTION|QUERY] $connection_id).
As you can see, even the capabilities provided are backend-specific. Postgres can only abort connections, while SQLite can only abort queries. The easiest way to implement this would thus be to discard the connection if the query was aborted and the connection is still valid. Then you can have a simple two-API interface for a cancellation management (pseudo code, i.e. Python):
class IConnectionCancellation:
def register(connection):
# save/retrieve connection ID
def cancel():
# open second connection, send backend-specific query
For large result sets, consider using canFetchMore and fetchMore in your model. That way you don't have to process the entire result set before showing some results to the user; might feel smoother to use. Doesn't help with inherent query execution latency due to e.g. order by or grouping clauses, of course.
I have my own MediaSink in Windows Media Foundation with one stream. In the OnClockStart method, I instruct the stream to queue (i) MEStreamStarted and (ii) MEStreamSinkRequestSample on itself. For implementing the queue, I use the IMFMediaEventQueue, and using the mtrace tool, I can also see that someone dequeues the event.
The problem is that ProcessSample of my stream is actually never called. This also has the effect that no further samples are requested, because this is done after processing a sample like in https://github.com/Microsoft/Windows-classic-samples/tree/master/Samples/DX11VideoRenderer.
Is the described approach the right way to implement the sink? If not, what would be the right way? If so, where could I search for the problem?
Some background info: The sink is an RTSP sink based on live555. Since the latter is also sink-driven, I thought it would be a good idea queuing a MEStreamSinkRequestSample whenever live555 requests more data from me. This is working as intended.
However, the solution has the problem that new samples are only requested as long as a client is connected to live555. If I now add a tee before the sink, eg to show a local preview, the system gets out of control, because the tee accumulates samples on the output of my sink which are never fetched. I then started playing around with discardable samples (cf. https://social.msdn.microsoft.com/Forums/sharepoint/en-US/5065a7cd-3c63-43e8-8f70-be777c89b38e/mixing-rate-sink-and-rateless-sink-on-a-tee-node?forum=mediafoundationdevelopment), but the problem is either that the stream does not start, queues are growing or the frame rate of the faster sink is artificially limited depending on which side is discardable.
Therefore, the next idea was rewriting my sink such that it always requests a new sample when it has processed the current one and puts all samples in a ring buffer for live555 such that whenever clients are connected, they can retrieve their data from there, and otherwise, the samples are just discarded. This does not work at all. Now, my sink does not get anything even without the tee.
The observation is: if I just request a lot of samples (as in the original approach), at some point, I get data. However, if I request only one (I also tried moderately larger numbers up to 5), ProcessSample is just not called, so no subsequent requests can be generated. I send MeStreamStarted once the clock is started or restarted exactly as described on https://msdn.microsoft.com/en-us/library/windows/desktop/ms701626, and after that, I request the first sample. In my understanding, MEStreamSinkRequestSample should not get lost, so I should get something even on a single request. Is that a misunderstanding? Should I request until I get something?
Two Questions:
1) Is it possible to send a message to the device and receive all the current input values?
I am trying to receive and send data to the KORG nanoKONTROL2. I found an API for real-time input/output called RtMidi. Receiving data is no issue, but I don't have the state where the sliders and knobs are in when the application starts. I presume, if this could be done, you need to send a message to the device.
There is an RtMidiOut class with sendMessage functionality where you can send data (vector<uchar>) to the device, but I don't know what data to send (I did try with some for loops to see if anything happens, but no luck).
2) Is it possible to turn the button lights on/off? If so, how?
To clarify the second question; I am trying to create a toggle where the light of the button on the device will turn on/off according to the state of the toggle.
Additional Information: The received data exists of 3 unsigned char's where the first value is probably an id, I am not quite sure, the second is the button/slider id and the third is its value.
Source code link to GitHub.
Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.
Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?
I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.