Issue related to fully distributed deployment of siddhi query with join operation - wso2

I have set up a 2 node worker and 1 node manager siddhi cluster.
Following is the query I tried pushing into manager. Everything seems to work fine when there is no join in the query, but in case of join as mentioned below query gets deployed in worker node but events dont seem to be satisfied.
#app:name("rule_1")
#source(
type="kafka",
topic.list="test-input-topic",
group.id="test-group",
threading.option="single.thread",
bootstrap.servers="localhost:9092",
#Map(type="json"))
define stream TempStream (deviceID string,roomNo string,temp int );
#sink(
type="kafka",
topic="test-output-topic",
bootstrap.servers="localhost:9092",
#Map(type="json"))
define stream OutStream (message string, message1 string, message2 double);
#info(name = "query1")
#dist(execGroup="group1")
from TempStream[deviceID=="rule_1" and temp>10]#window.time(5 sec)
select avg(temp) as avgTemp, roomNo, deviceID
insert all events into AvgTempStream1;
#info(name = "query2")
#dist(execGroup="group2")
from TempStream[deviceID=="rule_1" and temp<10]#window.time(5 sec)
select avg(temp) as avgTemp, roomNo, deviceID
insert all events into AvgTempStream2;
#info(name = "query3")
#dist(execGroup="group3")
from AvgTempStream1#window.length(1) as stream1 join AvgTempStream2#window.length(1) as stream2
select stream1.deviceID as message,stream1.roomNo as message1, stream1.avgTemp as message2
having stream1.avgTemp>stream2.avgTemp
insert into outputStream;
#info(name = "query4")
#dist(execGroup="group4")
from AvgTempStream1[deviceID=="rule_1"]
select deviceID as message, roomNo as message1, avgTemp as message2
insert into OutStream;
Event being passed
{"event":{"deviceID":"rule_1","roomNo":"123","temp":12}}

According to above given Siddhi app you will need to send two input event for it to make an output. One event with temp>10 other temp<10. Ex:
{"event":{"deviceID":"rule_1","roomNo":"123","temp":12}}
{"event":{"deviceID":"rule_1","roomNo":"123","temp":8}}
This will make sure that a join will happen and event will be emitted. For troubleshooting purposes you can subscribe to intermidiatory Kafka topics using Kafka consumer. Names of intermediate topics will follow the format of SiddhiAppName.StreamName. Ex: rule_1.AvgTempStream1
Hope this helps!!
Thanks,
Tishan

Related

Redshift deadlock on COPY command

I'm doing a simple COPY command that used to work:
echo " COPY table_name
FROM 's3://bucket/<date>/'
iam_role 'arn:aws:iam::123:role/copy-iam'
format as json 's3://bucket/jupath.json'
gzip ACCEPTINVCHARS ' ' TRUNCATECOLUMNS TRIMBLANKS MAXERROR 3;
" | psql
And now I get:
INFO: Load into table 'table_name' completed, 53465077 record(s) loaded successfully.
ERROR: deadlock detected
DETAIL: Process 26999 waits for AccessExclusiveLock on relation 3176337 of database 108036; blocked by process 26835.
Process 26835 waits for ShareLock on transaction 24230722; blocked by process 26999.
The only change is moving from dc2 instance type to ra3. Let me add this is the only command touches this table and there is only one process at a time.
The key detail here is in the error message:
Process 26999 waits for AccessExclusiveLock on relation 3176337 of
database 108036; blocked by process 26835. Process 26835 waits for
ShareLock on transaction 24230722; blocked by process 26999.
Relation 3176337, I assume, is the table in question - the target of the COPY. This should be confirmed by running something like:
select distinct(id) table_id
,trim(datname) db_name
,trim(nspname) schema_name
,trim(relname) table_name
from stv_tbl_perm
join pg_class on pg_class.oid = stv_tbl_perm.id
join pg_namespace on pg_namespace.oid = relnamespace
join pg_database on pg_database.oid = stv_tbl_perm.db_id
;
I don't expect any surprises here but it is good to check. If it is some different table (object) then this is important to know.
Now for the meat. You have 2 processes listed in the error message - PID 26999 and PID 26835. A process is a unique connection to the database or a session. So these are identifying the 2 connections to the database that have gotten locked with each other. So a good next step is to see what each of these sessions (processes or PIDs) are doing. Like this:
select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where pid in (26999, 26835)
--where xid = 16627013
and sequence < 320
--and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;
The thing you might run into is that these logging table "recycle" every few days so the data from this exact failure might be lost.
The next part of the error is about the open transaction that is preventing 26835 from moving forward. This transaction (identified by an XID, or transaction ID) is preventing 26835 progressing and is part of process 26999 but 26999 needs 26835 to complete some action before it a move - a deadlock. So seeing what is in this transaction will be helpful as well:
select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where xid = 24230722
and sequence < 320
--and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;
Again the data may have been lost due to time. I commented out the date range where clause of the last 2 queries to allow for looking back further in these tables. You should also be aware that PID and XID numbers are reused so check the date stamps on the results to be sure that that info from different sessions aren't be combined. You may need a new where clause to focus in on just the event you care about.
Now you should have all the info you need to see why this deadlock is happening. Use the timestamps of the statements to see the order in which statements are being issued by each session (process). Remember that every transaction ends with a COMMIT (or ROLLBACK) and this will change the XID of the following statements in the session. A simple fix might be issuing a COMMIT in the "26999" process flow to close that transaction and let the other process advance. However, you need to understand if such a commit will cause other issues.
If you can find all this info and if you need any help reach out.
Clearly a bug.
Table was cloned from one redshift to another by doing SHOW TABLE table_name, which provided:
CREATE TABLE table_name (
message character varying(50) ENCODE lzo,
version integer ENCODE az64,
id character varying(100) ENCODE lzo ,
access character varying(25) ENCODE lzo,
type character varying(25) ENCODE lzo,
product character varying(50) ENCODE lzo,
)
DISTSTYLE AUTO SORTKEY AUTO ;
After removing the "noise" the command completed as usual without errors:
DROP TABLE table_name;
CREATE TABLE table_name (
message character varying(50),
version integer,
id character varying(100),
access character varying(25),
type character varying(25),
product character varying(50),
);

Azure Stream Analytics with two event hub inputs joins

I have two Event hub inputs (Event-A & Event-B) to azure stream analytics.
Input Event-A: Primary (whenever I am getting event from 'event-A' then I have to do joins with data from 'event-B' for last 20 minutes duration)
Input Event-B: Secondary (Its a kind of reference data but from other Azure EventHub)
Select a.id,b.id,b.action into outputevent from eventA a
left join eventB b on a.id = b.id -- don't know how to consider last 20 minutes event-B data
Need to match/join with Event-B only for last 20 minutes duration and don't know which window function applicable for this use case (And observed most of the window function is waiting for the future to actually trigger but my requirement is to play with past 'event-B' when I receive event-A)
I have formatted your own answer so it's more readable for others:
select
b.eventenqueuedutctime as btime,
a.Id,
a.SysTime,
a.UTCTime ,
b.Id as BId,
b.SysTime as BSysTime
into outputStorage -- to blob storage (container)
from
eventA a TIMESTAMP BY eventenqueuedutctime
left outer join
eventB b TIMESTAMP BY eventenqueuedutctime
on
a.id = b.id and
datediff(minute,b,a) between 0 and 180 -- join with last 3 hours of eventB data
You posted this answer in 2020. Is this query still in use today?

How Insert events into http sink apart from one another?

I need to send events to http endpoint. If I do something like :
from DataStream
select f_id
insert into OutputToHttpEndpoint ;
I have following messages in my webservice:
[{f_id:1}, {f_id:2}, ...]
instead N request with expected message like {f_id:N}.
I found solution:
from ExtractedDataStream
select f_id
output last every 1 events
insert into OutputToNodejs ;
Is it's correct? Is there another way to solve this?
Siddhi length batch window of size 1 can be used,
from DataStream#window.lengthBatch(1)
select *
insert into DataStreamTemp;
from DataStreamTemp
select f_id
insert into OutputToHttpEndpoint ;

IF statements in SQL Server triggers

I need to create a SQL Server trigger to block updates and deletes to a table Service.
This action should be done only to Service in which the column States sample data is "completed".
It should allow updates and deletes to Service in which the column States sample data is "active".
This is what I tried, I am having problems with doing the else operation (that is allowing updates to Service in which the column State sample data is "active").
CREATE TRIGGER [Triggername]
ON dbo.Service
FOR INSERT, UPDATE, DELETE
AS
DECLARE #para varchar(10),
#results varchar(50)
SELECT #para = Status
FROM Service
IF (#para = 'completed')
BEGIN
SET #results = 'An invoiced service cannot be updated or deleted!';
SELECT #results;
END
BEGIN
RAISERROR ('An invoiced service cannot be updated or deleted', 16, 1)
ROLLBACK TRANSACTION
RETURN
END
So if I understand you correctly, any UPDATE or DELETE should be allowed if the State column has a value of Active, but stopped in any other case??
Then I'd do this:
CREATE TRIGGER [Triggername]
ON dbo.Service
FOR UPDATE, DELETE
AS
BEGIN
-- if any row exists in the "Deleted" pseudo table of rows that WERE
-- in fact updated or deleted, that has a state that is *not* "Active"
-- then abort the operation
IF EXISTS (SELECT * FROM Deleted WHERE State <> 'Active')
ROLLBACK TRANSACTION
-- otherwise let the operation finish
END
As a note: you cannot easily return messages from a trigger (with SELECT #Results) - the trigger just silently fails by rolling back the currently active transaction

How to update a stream with the response from another stream where the sink type is "http-response"

Am trying to enrich my input stream with an additional attribute which gets populated via "http-response" response sink.
I have tried using the "join" with window attribute and with "every" keyword to merge two streams and inserting the resulting merged stream into another stream to enrich it.
The window attributes (window.time(1 sec) or window.length(1)) and "every" keyword works well when the incoming events are coming at a regular interval of 1 sec or more.
When (say for example 10 or 100) events are sent at the same time(within a second). Then the result of the merge is not in expected terms.
The one with "window" attribute (join)
**
from EventInputStreamOne#window.time(1 sec) as i
join EventInputStreamTwo as s
on i.variable2 == s.variable2
select i.variable1 as variable1, i.variable2 as variable2, s.variable2 as variable2
insert into EventOutputStream;
**
The one with the "every" keyword
**
from every e1=EventInputStream,e2=EventResponseStream
select e1.variable1 as variable1, e1.variable2 as variable2, e2.variable3 as variable3
insert into EventOutputStream;
**
Is there any better way to merge the two streams in order to update a third stream?
To get the original request attributes, you can use custom mapping as follows,
#source(type='http-call-response', sink.id='source-1'
#map(type='json',#attributes(name='name', id='id', volume='trp:volume', price='trp:price')))
define stream responseStream(name String, id int, headers String, volume long, price float);
Here, the request attributes can be accessed with trp:attributeName, in this sample only name is from the response, price and volume is from the request.
The syntax in your 'every' keyword approach isn't quite right. Have you tried something like this:
from every (e1 = event1) -> e2=event2[e1.variable == e2.variable]
select e1.variable1, e2.variable1, e2.variable2
insert into outputEvent;
This document might help.