WSO2 CEP - Insert into multiple streams - wso2

In SiddhiQL how is it possible to insert into multiple streams with same condition?
When i have two correlated events (event1 and event2) and want to
insert event1 in streamA;
insert event2 in streamB;
insert abstractEvent in streamC.
Do i have to write three Siddhi queries ?
Thank you.

Yes you need to write multiple Siddhi queries in same execution plan to insert events into multiple streams. The logic you are selecting events for each stream can be done using a filter. In following case i am inserting odd meta_id events in to stream1 and even into stream2 by having a filter.
#Plan:name('ExecutionPlan')
#Import('test:1.0.0')
define stream test (meta_id int, meta_name string);
from test[meta_id % 2 == 0]
select *
insert into stream1;
from test[meta_id % 2 == 1]
select *
insert into stream2;

Related

Why my snowflake streams data is not getting flushed

I am trying to read the snowflake stream data using aws lambda (snowflake connector library) and writing the data into RDS SQL server. After the lambda run, my stream data is not getting deleted.
I don't want to read the data from stream and insert it into temporary snowflake table and again read to insert the data in the SQL server. Is there any better way to do this?
Lambda code:
for table in table_list:
sql5 = f"""SELECT "header__stream_position","header__timestamp" FROM STREAM_{table} where "header__operation" in ('UPDATE' ,'INSERT' ,'DELETE') ;"""
result =cs.execute(sql5).fetchall()
rds_columns = [(c[0],c[1],table[:-4]) for c in result]
if rds_columns:
cursor.fast_executemany = True
sql6 = f"INSERT INTO {RDS_TABLE}(LSNNUMBER,TRANSACTIONTIME,TABLENAME) VALUES (?, ?, ?);"
data = (rds_columns)
cursor.executemany(sql6,data)
table_write.append(table)
conn.commit()
ctx.commit()
Snowflake Streams requires a successful committed DML operation to advance the Stream so you can't avoid an intermediate Snowflake table (transient or otherwise) with Streams.
You could use Changes to get the same change information if you can manage the time/query offset within your application code.
The offset on a Stream will only advance if it is consumed by a DML statement. (INSERT,UPDATE,MERGE). There is a read-only version of streams called CHANGES. However, you must keep track of the offsets yourself.
https://docs.snowflake.com/en/sql-reference/constructs/changes.html

Kinesis Analytics SQL query to narrow down the sensors that are not sending data

Context: We use Kinesis analytics to process our sensor data and find anomalies in the sensor data.
Goal: We need to identify the sensors that didn’t send the data for the past X minutes.
The following methods have been tried with Kinesis analytics SQL, but no luck:
Stagger Window technique works for the first 3 test cases, but doesn't work for test case 4.
CREATE OR REPLACE PUMP "STREAM_PUMP_ALERT_DOSCONNECTION" AS INSERT INTO "INTERMEDIATE_SQL_STREAM" SELECT STREAM "deviceID" as "device_key", count("deviceID") as "device_count", ROWTIME as "time" FROM "INTERMEDIATE_SQL_STREAM_FOR_ROOM"
WINDOWED BY STAGGER (
PARTITION BY "deviceID", ROWTIME RANGE INTERVAL '1' MINUTE);
Left join and group by queries mentioned below doesn't work for all the test cases.
Query 1:
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS
INSERT INTO "INTERMEDIATE_SQL_STREAM_FOR_ROOM2"
SELECT STREAM
ROWTIME as "resultrowtime",
Input2."device_key" as "device_key",
FROM INTERMEDIATE_SQL_STREAM_FOR_ROOM
OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS Input1
LEFT JOIN INTERMEDIATE_SQL_STREAM_FOR_ROOM AS Input2
ON
Input1."device_key" = Input2."device_key"
AND Input1.ROWTIME <> Input2.ROWTIME;
Query 2:
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS
INSERT INTO "INTERMEDIATE_SQL_STREAM_FOR_ROOM2"
SELECT STREAM
ROWTIME as "resultrowtime",
Input2."device_key" as "device_key"
FROM INTERMEDIATE_SQL_STREAM_FOR_ROOM
OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS Input1
LEFT JOIN INTERMEDIATE_SQL_STREAM_FOR_ROOM AS Input2
ON
Input1."device_key" = Input2."device_key"
AND TSDIFF(Input1, Input2) > 0;
Query 3:
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS
INSERT INTO "INTERMEDIATE_SQL_STREAM_FOR_ROOM2"
SELECT STREAM
ROWTIME as "resultrowtime",
Input2."device_key" as "device_key"
FROM INTERMEDIATE_SQL_STREAM_FOR_ROOM
OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS Input1
LEFT JOIN INTERMEDIATE_SQL_STREAM_FOR_ROOM AS Input2
ON
Input1."device_key" = Input2."device_key"
AND Input1.ROWTIME = Input2.ROWTIME;
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP2" AS
INSERT INTO "DIS_CONN_DEST_SQL_STREAM_ALERT"
SELECT STREAM "device_key", "count"
FROM (
SELECT STREAM
"device_key",
COUNT(*) as "count"
FROM INTERMEDIATE_SQL_STREAM_FOR_ROOM2
GROUP BY FLOOR(INTERMEDIATE_SQL_STREAM_FOR_ROOM2.ROWTIME TO MINUTE), "device_key"
)
WHERE "count" = 1;
Here are test cases for your reference:
Test case 1:
Device 1 and Device 2 send data continuously to the Kinesis
Analytics.
After X minutes, Device 2 continues to send the data,
but device 1 is not sending the data.
Output for test case 1:
We want the Kinesis Analytics to pop out Device 1, so that we know which device is not sending data.
Test case 2 (Interval - 10 minutes)
Device 1 sends data at 09:00
Device 2 sends data at 09:02
Device 2 again sends the data at 09:11, but Device 1 doesn’t send any data.
Output for test case 2:
We want the Kinesis Analytics to pop out Device 1, so that we know which device is not sending data.
Test case 3 (Interval - 10 minutes)
Device 1 and device 2 send data continuously to kinesis analytics.
Both devices (1 & 2) don't send any data for the next 15 minutes.
Output for test case 3:
We want the Kinesis Analytics to pop out Device 1 & Device 2, so that we know which devices are not sending data.
Test case 4: (Interval - 10 mins)
Device 1 sends data at 09:00
Device 2 sends data at 09:02
Device 1 again sends data at 09:04
Device 2 again sends data at 09:06
Then no data
Output for test case 4:
We want the analytics to pop out device 1 at 09:14 and pop out device 2 at 09:16. So that we can get the disconnected devices(i.e devices not sending data) after the exact interval.
Note: AWS Support directed us to simple queries that don't answer the question. Looks like they can help with the exact query only if we upgrade our support plan.
I'm not familiar with all of the ways in which AWS has extended or modified Apache Flink, but open source Flink doesn't provide a simple way to detect that all sources have ceased to send data. One solution is to use something like a process function with processing-time timers to detect the absence of data.
The documentation has an example of something along these lines: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/process_function/#example

How to launch a long-running SQLite query in Go?

I have a code that, when needed, interrupts a Sqlite query using a context with a deadline. My problem is to write a unit test for it: I need to launch a query that I know will run for a long time, ideally in an infinite loop, and check that it is interrupted. I use https://github.com/mattn/go-sqlite3 to access Sqlite 3 from Go.
For example, this:
with recursive rec as
(select 1 as n union all select n + 1 from rec)
select n from rec;
returns 1 immediately instead of looping, as it does in the SQLite console (is there something to do to enable CTE’s?). I also found no sleep function or anything similar.

Select events with a maximum in a sliding window

I have this stream :
define stream locationStream (cell string, device string, power long);
I want to select in this stream, with a sliding windows of 10 seconds, for every device, the value of the 'cell' attribute for which 'power' is the largest.
What queries should I use to get this result with Siddhi ? Something like
from locationStream#window.time(10 seconds)
select max(power), device, <cell where power = max(power)>
group by device
insert all events into cellStream
You can use Siddhi maxByTimeWindow offered through extrema extension. Usage is documented in shared resources. You will have to use this with a partition to get per device max. Suggested query should look like below.
partition with ( device of locationStream )
begin
from locationStream#extrema:maxByTime(power, 10 sec)
select power, device, cell
insert events into cellStream
end;

WS02 CEP Siddhi Queries

New to Siddhi CEP. Other than the regular docs on WS02 CEP can someone point to a good tutorial.
Here are our requirements. Point out some clues on the right ways of writing such queries.
Have a single stream of sensor device notification ( IOT application ).
Stream input is via REST-JSON and output is also to be formatted to REST-JSON. ( Hope this is possible on WS02 CEP 3.1)
Kind of execution plan required:
- If device notification reports usage of Sensor 1, then monitor to see if within 5 mins if device notification reports usage of Sensor 2 also. If found then generate output stream reporting composite-activity back on REST-JSON.
- If such composite-activity is not detected during a time slot in morning, afternoon and evening then generate warning-event-stream status on REST-JSON. ( So how to find events which did not occur in time )
- If such composite-activity is not found within some time slots in morning, afternoon and evening then report failure1-event-stream status back on REST-JSON.
This should work day on day, so how will the previous processed data get deleted in WSO2 CEP.
Regards,
Amit
The queries can be as follows (these are draft queries and may require slight modifications to get them running)
To detect sensor 1, and then sensor 2 within 5 minutes (assuming sensorStram has id, value) you can simply use a pattern like following with the 'within' keyword:
from e1=sensorStream[sensorId == '1'] -> e2=sensorStream[sensorId == '2']
select 'composite activity detected' as description, e1.value as sensor1Value, e2.value as sensor2Value
within 5 minutes
insert into compositeActivityStream;
To detect non occurrences (id=1 arrives, but no id=2 within 5 minutes) we can have following two queries:
from sensorStream[sensorId == '1']#window.time(5 minutes)
select *
insert into delayedSensor1Stream for expired-events;
from e1=sensorStream[sensorId == '1'] -> nonOccurringEvent = sensorStream[sensorId == '2'] or delayedEvent=delayedSensor1Stream
select 'id=2 not found' as description, e1.value as id1Value, nonOccurringEvent.sensorId as nonOccurringId
having (not(nonOccurringId instanceof string))
insert into nonOccurrenceStream;
This will detect non-occurrences immediately at the end of 5 minutes after the arrival of id=1 event.
For an explanation of the above logic, have a look at the non occurrence sample of cep 4.0.0 (the syntax is a bit different, but the same idea)
Now since you need to periodically generate a report, we need another query. For convenience i assume you need a report every 6 hours (360 minutes) and use a time batch window here. Alternatively with the new CEP 4.0.0 you can use the 'Cron window' to generate this at specific times which is better for your use case.
from nonOccurrenceStream#window.timeBatch(360 minutes)
select count(id1Value) as nonOccurrenceCount
insert into nonOccurrenceReportsStream for expired-events;
You can use http input/output adaptors and do json mappings with json builders and formatters for this use case.