How Insert events into http sink apart from one another? - wso2

I need to send events to http endpoint. If I do something like :
from DataStream
select f_id
insert into OutputToHttpEndpoint ;
I have following messages in my webservice:
[{f_id:1}, {f_id:2}, ...]
instead N request with expected message like {f_id:N}.
I found solution:
from ExtractedDataStream
select f_id
output last every 1 events
insert into OutputToNodejs ;
Is it's correct? Is there another way to solve this?

Siddhi length batch window of size 1 can be used,
from DataStream#window.lengthBatch(1)
select *
insert into DataStreamTemp;
from DataStreamTemp
select f_id
insert into OutputToHttpEndpoint ;

Related

Kinesis Analytics Session or Stagger Window Batching Without Aggregation

I'm looking to use Kinesis Data Analytics (or some other AWS managed service) to batch records based on a filter criteria. The idea would be that as records come in, we'd start a session window and batch any matching records for 15 min.
The stagger window is exactly what we'd like except we're not looking to aggregate the data, but rather just return the records all together.
Ideally...
100 records spread over 15 min. (20 matching criteria) with first one at 10:02
|
v
At 10:17, the 20 matching records would be sent to the destination
I've tried doing something like:
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
"device_id" INTEGER,
"child_id" INTEGER,
"domain" VARCHAR(32),
"category_id" INTEGER,
"posted_at" DOUBLE,
"block" TIMESTAMP
);
-- Create pump to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
-- Select all columns from source stream
SELECT STREAM
"device_id",
"child_id",
"domain",
"category_id",
"posted_at",
FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE) as block
FROM "SOURCE_SQL_STREAM_001"
WHERE "category_id" = 888815186
WINDOWED BY STAGGER (
PARTITION BY "child_id", FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE)
RANGE INTERVAL '15' MINUTE);
I continue to get errors for all the columns not in the aggregation:
From line 6, column 5 to line 6, column 12: Expression 'domain' is not being used in PARTITION BY sub clause of WINDOWED BY clause
Kinesis Firehose was a suggested solution, but it's a blind window to all child_id, so it could possibly cut up a session in to multiple and that's what I'm trying to avoid.
Any suggestions? Feels like this might not be the right tool.
try LAST_VALUE("domain") as domain in the select clause.

Pass Mapping value to Session Email Task

I have 2 source. Oracle and SQL Server. I need to extract CustomerID from both and match. I need 2 outputs.
Number of CustomerID from Oracle
Number of CustomerID matching between Oracle and SQL Server.
Then, generate report and send it through mail to user.
Source - Oracle
Source - MS SQL
Joiner (Detail outer join with oracle)
Router
Group 1: CustomerID(Oracle) is not null and CustomerID(SQL Server) is null
Group 2: CustomerID from both not null
AGG transformation after both group to get count
Union to merge it
Load into target file
Now I will have to use Shell script to prepare mail and send it to user.
Is there way we can do it simple? like assigning count to workflow variable and then use it in Email task?
goto workflow:
open the session task and navigate to components tab
edit on sucess email and set type t0 non-reusable
click on edit button in value
click on edit button next to email text
enter "%l" . this will get the count of records and send to you in the email body.

Issue related to fully distributed deployment of siddhi query with join operation

I have set up a 2 node worker and 1 node manager siddhi cluster.
Following is the query I tried pushing into manager. Everything seems to work fine when there is no join in the query, but in case of join as mentioned below query gets deployed in worker node but events dont seem to be satisfied.
#app:name("rule_1")
#source(
type="kafka",
topic.list="test-input-topic",
group.id="test-group",
threading.option="single.thread",
bootstrap.servers="localhost:9092",
#Map(type="json"))
define stream TempStream (deviceID string,roomNo string,temp int );
#sink(
type="kafka",
topic="test-output-topic",
bootstrap.servers="localhost:9092",
#Map(type="json"))
define stream OutStream (message string, message1 string, message2 double);
#info(name = "query1")
#dist(execGroup="group1")
from TempStream[deviceID=="rule_1" and temp>10]#window.time(5 sec)
select avg(temp) as avgTemp, roomNo, deviceID
insert all events into AvgTempStream1;
#info(name = "query2")
#dist(execGroup="group2")
from TempStream[deviceID=="rule_1" and temp<10]#window.time(5 sec)
select avg(temp) as avgTemp, roomNo, deviceID
insert all events into AvgTempStream2;
#info(name = "query3")
#dist(execGroup="group3")
from AvgTempStream1#window.length(1) as stream1 join AvgTempStream2#window.length(1) as stream2
select stream1.deviceID as message,stream1.roomNo as message1, stream1.avgTemp as message2
having stream1.avgTemp>stream2.avgTemp
insert into outputStream;
#info(name = "query4")
#dist(execGroup="group4")
from AvgTempStream1[deviceID=="rule_1"]
select deviceID as message, roomNo as message1, avgTemp as message2
insert into OutStream;
Event being passed
{"event":{"deviceID":"rule_1","roomNo":"123","temp":12}}
According to above given Siddhi app you will need to send two input event for it to make an output. One event with temp>10 other temp<10. Ex:
{"event":{"deviceID":"rule_1","roomNo":"123","temp":12}}
{"event":{"deviceID":"rule_1","roomNo":"123","temp":8}}
This will make sure that a join will happen and event will be emitted. For troubleshooting purposes you can subscribe to intermidiatory Kafka topics using Kafka consumer. Names of intermediate topics will follow the format of SiddhiAppName.StreamName. Ex: rule_1.AvgTempStream1
Hope this helps!!
Thanks,
Tishan

Regex QueryString Parsing for a specific in BigQuery

So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table.
The data in protoPayload.resource is the page requested with the querystring paramters included.
The contents of protoPayload.resource looks like the following examples:
/service.html?device_ID=123456
/service.html?v=2&device_ID=78ec9b4a56
I am getting close, but when there is another entry before device_ID, I am not getting it. As you can see I am not great with Regex, but it is the only way I think I can parse the data in the query. To get just the device ID from the first example, I was able to use the following example. Works great. My next challenge is to the data when the second parameter exists. The device IDs can vary in length from about 10 to 26 characters.
SELECT
RIGHT(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'),
length(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'))-10) as Device_ID
FROM logs
What I would like is just the values from the querystring device_ID such as:
123456
78ec9b4a56
Assuming you have just 1 query string per record then you can do this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable
The part within the parentheses will be captured and returned in the result.
If device_ID isn't guaranteed to be the last parameter in the string, then use something like this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=([^\&]*)') as device_id FROM mytable
One approach is to split protoPayload.resource into multiple service entries, and then apply regexp - this way it will support arbitrary number of device_id, i.e.
select regexp_extract(service_entry, r'device_ID=(.*$)') from
(select split(protoPayload.resource, ' ') service_entry from
(select
'/service.html?device_ID=123456 /service.html?v=2&device_ID=78ec9b4a56'
as protoPayload.resource))

How to update a stream with the response from another stream where the sink type is "http-response"

Am trying to enrich my input stream with an additional attribute which gets populated via "http-response" response sink.
I have tried using the "join" with window attribute and with "every" keyword to merge two streams and inserting the resulting merged stream into another stream to enrich it.
The window attributes (window.time(1 sec) or window.length(1)) and "every" keyword works well when the incoming events are coming at a regular interval of 1 sec or more.
When (say for example 10 or 100) events are sent at the same time(within a second). Then the result of the merge is not in expected terms.
The one with "window" attribute (join)
**
from EventInputStreamOne#window.time(1 sec) as i
join EventInputStreamTwo as s
on i.variable2 == s.variable2
select i.variable1 as variable1, i.variable2 as variable2, s.variable2 as variable2
insert into EventOutputStream;
**
The one with the "every" keyword
**
from every e1=EventInputStream,e2=EventResponseStream
select e1.variable1 as variable1, e1.variable2 as variable2, e2.variable3 as variable3
insert into EventOutputStream;
**
Is there any better way to merge the two streams in order to update a third stream?
To get the original request attributes, you can use custom mapping as follows,
#source(type='http-call-response', sink.id='source-1'
#map(type='json',#attributes(name='name', id='id', volume='trp:volume', price='trp:price')))
define stream responseStream(name String, id int, headers String, volume long, price float);
Here, the request attributes can be accessed with trp:attributeName, in this sample only name is from the response, price and volume is from the request.
The syntax in your 'every' keyword approach isn't quite right. Have you tried something like this:
from every (e1 = event1) -> e2=event2[e1.variable == e2.variable]
select e1.variable1, e2.variable1, e2.variable2
insert into outputEvent;
This document might help.