I have one event table called eventCount and has the following values:
ID | eventCount
1 3
2 1
3 5
4 1
I have a stream of data coming in where I count the values of a certain type for a time period (1 second) and depending on the type and time period I will count() and write the value of the count() in the correspondent row.
I need to make a sum of the values within the event table.
I tried to create another event table and join both. Although I am getting the error of you cannot join from 2 static sources.
What is the correct way of doing this from SIddiQL in WSO2 CEP
In your scenario, Sum of the values in the event table is equivalent to the total number of events, doesn't it? So why you need to keep it an event table, can't you just it then and there (like below)?
#Import('dataIn:1.0.0')
define stream dataIn (id int);
#Export('totalCountStream:1.0.0')
define stream totalCountStream (eventCount long);
#Export('perIdCountStream:1.0.0')
define stream perIdCountStream (id int, eventCount long);
partition with (id of dataIn)
begin
from dataIn#window.time(5 sec)
select id, count() as eventCount
insert into perIdCountStream;
end;
from dataIn#window.time(5 sec)
select count() as eventCount
insert into totalCountStream;
ps: if you really need the event tables, you can always persist totalCountStream and perIdCountStream in two separate tables.
Related
Today I was just bitten in the rear end by something I didn't expect. Here's a little script to reproduce the issue:
create temporary table aaa_state(id int, amount int);
create temporary table aaa_changes(id int, delta int);
insert into aaa_state(id, amount) values (1, 0);
insert into aaa_changes(id, delta) values (1, 5), (1, 7);
update aaa_changes c join aaa_state s on (c.id=s.id) set s.amount=s.amount+c.delta;
select * from aaa_state;
The final result in the aaa_state table is:
ID
Amount
1
5
Whereas I would expect it to be:
ID
Amount
1
12
What gives? I checked the docs but cannot find anything that would hint at this behavior. Is this a bug that I should report, or is this by design?
The behavior you are seeing is consistent with two updates happening on the aaa_state table. One update is assigning the amount to 7, and then this amount is being clobbered by the second update, which sets to 5. This could be explained by MySQL using a snapshot of the aaa_state table to fetch the amount for each step of the update. If true, the actual steps would look something like this:
1. join the two tables
2. update the amount using the "first" row from the changes table.
now the cached result for the amount is 7, but this value will not actually
be written out to the underlying table until AFTER the entire update
3. update the amount using the "second" row from the changes table.
now the cached amount is 5
5. the update is over, write 5 out for the actual amount
Your syntax is not really correct for what you want to do. You should be using something like the following:
UPDATE aaa_state as
INNER JOIN
(
SELECT id, SUM(delta) AS delta_sum
FROM aaa_changes
GROUP BY id
) ac
ON ac.id = as.id
SET
as.amount = as.amount + ac.delta_sum;
Here we are doing a proper aggregation of the delta values for each id in a separate bona-fide subquery. This means that the delta sums will be properly computed and materialized in the subquery before MySQL does the join, to update the first table.
Please have a look at the following data example:
In this table, I have multiple columns. There is no PRIMARY KEY, as per the image I attached, there are a few duplicates in STK_CODE. Depending on the (min) column, I want to remove duplicate rows.
According to the image, one stk_code has three different rows. Corresponding to these duplicate stk_codes, value in (min) column is different, I want to keep the row which has minimum value in (min) column.
I am very new at sqlite and I am dealing with (-lsqlite3) to join cpp with sqlite.
Is there any way possible?
Your table has rowid as primary key.
Use it to get the rowids that you don't want to delete:
DELETE FROM comparison
WHERE rowid NOT IN (
SELECT rowid
FROM comparison
GROUP BY STK_CODE
HAVING (COUNT(*) = 1 OR MIN(CASE WHEN min > 0 THEN min END))
)
This code uses rowid as a bare column and a documented feature of SQLite with which when you use MIN() or MAX() aggregate functions the query returns that row which contains the min or max value.
See a simplified demo.
I'm looking to use Kinesis Data Analytics (or some other AWS managed service) to batch records based on a filter criteria. The idea would be that as records come in, we'd start a session window and batch any matching records for 15 min.
The stagger window is exactly what we'd like except we're not looking to aggregate the data, but rather just return the records all together.
Ideally...
100 records spread over 15 min. (20 matching criteria) with first one at 10:02
|
v
At 10:17, the 20 matching records would be sent to the destination
I've tried doing something like:
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
"device_id" INTEGER,
"child_id" INTEGER,
"domain" VARCHAR(32),
"category_id" INTEGER,
"posted_at" DOUBLE,
"block" TIMESTAMP
);
-- Create pump to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
-- Select all columns from source stream
SELECT STREAM
"device_id",
"child_id",
"domain",
"category_id",
"posted_at",
FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE) as block
FROM "SOURCE_SQL_STREAM_001"
WHERE "category_id" = 888815186
WINDOWED BY STAGGER (
PARTITION BY "child_id", FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE)
RANGE INTERVAL '15' MINUTE);
I continue to get errors for all the columns not in the aggregation:
From line 6, column 5 to line 6, column 12: Expression 'domain' is not being used in PARTITION BY sub clause of WINDOWED BY clause
Kinesis Firehose was a suggested solution, but it's a blind window to all child_id, so it could possibly cut up a session in to multiple and that's what I'm trying to avoid.
Any suggestions? Feels like this might not be the right tool.
try LAST_VALUE("domain") as domain in the select clause.
I'm collecting events with different id, there are n type of fixed ids in incoming events. I want to collect average of past events based on time-frame or no. of events between different type of ids.
Let's say, there are 2 devices sending data/ event with id 'a' and 'b'. I want to get average of past 5 minutes of data for both devices and then compare both averages to make some decision.
By this code, I'm collecting data of past n minutes of data and storing in 2 windows.
`
#source(type='http', receiver.url='http://localhost:5007/SweetProductionEP', #map(type = 'json'))
define stream InProduction(name string, amount int);
define window hold_a(avg_amount double) length(1);
define window hold_b(avg_amount double) length(1);
from InProduction[name=='a']#window.timeBatch(5 min)
select avg(amount) as avg_amount
group by name
insert into hold_a;
from InProduction[name=='b']#window.timeBatch(5 min)
select avg(amount) as avg_amount
group by name
insert into hold_b;`
window hold_a and hold_b will get average data of past 5 min. Now I want to compare data from both windows and take decision.
I've tried join on both windows but join query doesn't get executed.
You have to use a pattern to achieve this. Below query with output the name which had the highest average into highestAvgStream.
#source(type='http', receiver.url='http://localhost:5007/SweetProductionEP', #map(type = 'json'))
define stream InProduction(name string, amount int);
from InProduction[name=='a']#window.timeBatch(5 min)
select avg(amount) as avg_amount, name
insert into avgStream;
from InProduction[name=='b']#window.timeBatch(5 min)
select avg(amount) as avg_amount, name
insert into avgStream;`
from every(e1=avgStream -> e2=avgStream)
select ifthenelse(e1.avg_amount>e2.avg_amount,e1.name,e2.name) as highestAvgName
insert into HighestAvgStream;
Am trying to enrich my input stream with an additional attribute which gets populated via "http-response" response sink.
I have tried using the "join" with window attribute and with "every" keyword to merge two streams and inserting the resulting merged stream into another stream to enrich it.
The window attributes (window.time(1 sec) or window.length(1)) and "every" keyword works well when the incoming events are coming at a regular interval of 1 sec or more.
When (say for example 10 or 100) events are sent at the same time(within a second). Then the result of the merge is not in expected terms.
The one with "window" attribute (join)
**
from EventInputStreamOne#window.time(1 sec) as i
join EventInputStreamTwo as s
on i.variable2 == s.variable2
select i.variable1 as variable1, i.variable2 as variable2, s.variable2 as variable2
insert into EventOutputStream;
**
The one with the "every" keyword
**
from every e1=EventInputStream,e2=EventResponseStream
select e1.variable1 as variable1, e1.variable2 as variable2, e2.variable3 as variable3
insert into EventOutputStream;
**
Is there any better way to merge the two streams in order to update a third stream?
To get the original request attributes, you can use custom mapping as follows,
#source(type='http-call-response', sink.id='source-1'
#map(type='json',#attributes(name='name', id='id', volume='trp:volume', price='trp:price')))
define stream responseStream(name String, id int, headers String, volume long, price float);
Here, the request attributes can be accessed with trp:attributeName, in this sample only name is from the response, price and volume is from the request.
The syntax in your 'every' keyword approach isn't quite right. Have you tried something like this:
from every (e1 = event1) -> e2=event2[e1.variable == e2.variable]
select e1.variable1, e2.variable1, e2.variable2
insert into outputEvent;
This document might help.