Siddhi Send Performance Issues - Embedded - wso2

We are evaluating Siddhi as a embedded CEP processor for our application. While scale testing we found that as you increase the number of rules the time it takes to insert an event increases significantly for each unique ID. For example:
Create 10 rules (using windows and a partition by id)
Load 1000 unique entries at a time. Track the timing. Note that insert performance increases from ms -> many seconds as you approach 100K unique entries. The more rules you have also increases this time.
Now load the "next" time for each record - insertion time remains constant regardless of ID.
Here is a code file which reproduces this:
public class ScaleSiddhiTest {
private SiddhiManager siddhiManager = new SiddhiManager();
#Test
public void testWindow() throws InterruptedException {
String plan = "#Plan:name('MetricPlan') \n" +
"define stream metricStream (id string, timestamp long, metric1 double,metric2 double); \n" +
"partition with (id of metricStream) begin \n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule0' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule1' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule2' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule3' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule4' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule5' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule6' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule7' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule8' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule9' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule10' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule11' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule12' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule13' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule14' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule15' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule16' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule17' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric1) as value, 'Metric1-rule18' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"\n" +
"from metricStream#window.externalTime(timestamp, 300000) \n" +
"select id, avg(metric2) as value, 'Metric2-rule19' as ruleName\n" +
"having value>-1.000000 \n" +
"insert into outputStream;\n" +
"end ;";
// Generating runtime
ExecutionPlanRuntime executionPlanRuntime = siddhiManager.createExecutionPlanRuntime(plan);
AtomicInteger counter = new AtomicInteger();
// Adding callback to retrieve output events from query
executionPlanRuntime.addCallback("outputStream", new StreamCallback() {
#Override
public void receive(Event[] events) {
counter.addAndGet(events.length);
}
});
// Starting event processing
executionPlanRuntime.start();
// Retrieving InputHandler to push events into Siddhi
InputHandler inputHandler = executionPlanRuntime.getInputHandler("metricStream");
int numOfUniqueItems = 10000;
IntStream.range(0, 2).forEach(curMinute->{
long iterationStartTime = System.currentTimeMillis();
AtomicLong lastStart = new AtomicLong(System.currentTimeMillis());
IntStream.range(0, numOfUniqueItems).forEach(id->{
try {
inputHandler.send(TimeUnit.MINUTES.toMillis(curMinute), new Object[]{id, TimeUnit.MINUTES.toMillis(curMinute), 10.0, 20.0});
if( id > 0 && id % 1000 == 0 ){
long ls = lastStart.get();
long curTime = System.currentTimeMillis();
lastStart.set(curTime);
System.out.println("It took " + (curTime - ls) + " ms to load the last 1000 entities. Num Alarms So Far: " + counter.get());
}
} catch (Exception e ){
throw new RuntimeException(e);
}
});
System.out.println("It took " + (System.currentTimeMillis() - iterationStartTime) + "ms to load the last " + numOfUniqueItems);
});
// Shutting down the runtime
executionPlanRuntime.shutdown();
siddhiManager.shutdown();
}
}
here are my questions:
Are we doing anything incorrect here that may be leading to the initial load performance issues?
Any recommendations to work around this problem?
UPDATE:
Per an suggested answer below I updated the test to use group by instead of partitions. The same growth is shown for initial load of each object, except it is even worse:
Specifically, I changed the rules to:
#Plan:name('MetricPlan')
define stream metricStream (id string, timestamp long, metric1 double,metric2 double);
from metricStream#window.externalTime(timestamp, 300000)
select id, avg(metric1) as value, 'Metric1-rule0' as ruleName
group by id
having value>-1.000000
insert into outputStream;
...
Here are the result outputs for the Group By vs Partition By. Both show the growth for the initial load.
Group By Load Results
Load 10K Items - Group By
It took 3098 ms to load the last 1000 entities. Num Alarms So Far: 20020
It took 2507 ms to load the last 1000 entities. Num Alarms So Far: 40020
It took 5993 ms to load the last 1000 entities. Num Alarms So Far: 60020
It took 4878 ms to load the last 1000 entities. Num Alarms So Far: 80020
It took 6079 ms to load the last 1000 entities. Num Alarms So Far: 100020
It took 8466 ms to load the last 1000 entities. Num Alarms So Far: 120020
It took 11840 ms to load the last 1000 entities. Num Alarms So Far: 140020
It took 12634 ms to load the last 1000 entities. Num Alarms So Far: 160020
It took 14779 ms to load the last 1000 entities. Num Alarms So Far: 180020
It took 87053ms to load the last 10000
Load Same 10K Items - Group By
It took 31 ms to load the last 1000 entities. Num Alarms So Far: 220020
It took 22 ms to load the last 1000 entities. Num Alarms So Far: 240020
It took 19 ms to load the last 1000 entities. Num Alarms So Far: 260020
It took 19 ms to load the last 1000 entities. Num Alarms So Far: 280020
It took 17 ms to load the last 1000 entities. Num Alarms So Far: 300020
It took 20 ms to load the last 1000 entities. Num Alarms So Far: 320020
It took 17 ms to load the last 1000 entities. Num Alarms So Far: 340020
It took 18 ms to load the last 1000 entities. Num Alarms So Far: 360020
It took 18 ms to load the last 1000 entities. Num Alarms So Far: 380020
It took 202ms to load the last 10000
Partition By Load Results
Load 10K Items - Partition By
It took 1148 ms to load the last 1000 entities. Num Alarms So Far: 20020
It took 1870 ms to load the last 1000 entities. Num Alarms So Far: 40020
It took 1393 ms to load the last 1000 entities. Num Alarms So Far: 60020
It took 1745 ms to load the last 1000 entities. Num Alarms So Far: 80020
It took 2040 ms to load the last 1000 entities. Num Alarms So Far: 100020
It took 2108 ms to load the last 1000 entities. Num Alarms So Far: 120020
It took 3068 ms to load the last 1000 entities. Num Alarms So Far: 140020
It took 2798 ms to load the last 1000 entities. Num Alarms So Far: 160020
It took 3532 ms to load the last 1000 entities. Num Alarms So Far: 180020
It took 23363ms to load the last 10000
Load Same 10K Items - Partition By
It took 39 ms to load the last 1000 entities. Num Alarms So Far: 220020
It took 21 ms to load the last 1000 entities. Num Alarms So Far: 240020
It took 30 ms to load the last 1000 entities. Num Alarms So Far: 260020
It took 22 ms to load the last 1000 entities. Num Alarms So Far: 280020
It took 35 ms to load the last 1000 entities. Num Alarms So Far: 300020
It took 26 ms to load the last 1000 entities. Num Alarms So Far: 320020
It took 25 ms to load the last 1000 entities. Num Alarms So Far: 340020
It took 34 ms to load the last 1000 entities. Num Alarms So Far: 360020
It took 48 ms to load the last 1000 entities. Num Alarms So Far: 380020
It took 343ms to load the last 10000
This type of growth almost seems to imply that on load of an ID which is not found it is compared against every other ID, instead of leveraging a hash etc. Hence the linear growth we see as the number of unique IDs increase.

Yes, the behavior is as expected.
When you use a partition with ID for each new ID you push a new instance of the partition is created, so if your partition is big it may have taken more time to create the partition. In the second time since the partition is already created for the unique ID it processes that faster.
In your case, I don't think using a partition is an ideal solution. Partition is only useful if you have inner streams or when you use non time based windows.
E.g.
partition with (id of metricStream)
begin
from metricStream ...
insert into #TempStream ;
from #TempStream ....
select ...
insert into outputStream;
end;
If you want to just group time-based aggregations then use the group by keyword.
from metricStream#window.externalTime(timestamp, 300000)
select id, avg(metric1) as value, 'Metric1-rule18' as ruleName
group by id
having value>-1.000000
insert into outputStream;

Related

Select Today's date using BigQuery

I'm using Google Cloud SDK (command-line) via C# and I want to select the information for the Current Date(today).
The select is working but I'm not able to bring the latest date on the column DATE
Below is the query I'm using:
var table = client.GetTable("projectId", "datasetId", "table");
var sql = $"" +
$"SELECT " +
$"sku, " +
$"FROM {table} " +
$"WHERE DATE=CurrentDate('America/Sao_Paulo') " +
$"LIMIT 10";
Schema: SKU - String
DATE - Timestamp
Try to use CURRENT_DATE instead of CurrentDate
var table = client.GetTable("projectId", "datasetId", "table");
var sql = $"" +
$"SELECT " +
$"sku, " +
$"FROM {table} " +
$"WHERE DATE=CURRENT_DATE('America/Sao_Paulo') " +
$"LIMIT 10";

Siddhi - Fetching from Event tables, which are not updated within certain time

In Siddhi query, I am importing two stream S1 and S2. If I receive in S1 stream I will insert in event table T1, and when I receive in S2 I will update in the T1 table based on the id, and also I will send the updated values from the table into Output stream O1.
As a part of the requirement, I need to get the content which table T1, which is inserted before 5 min(ie, if a record resides more than 5 min) and send to another output stream O2.
#name('S1')
from S1
select id, srcId, 'null' as msgId, 'INP' as status
insert into StatusTable;
#name('S2')
from S2#window.time(1min) as g join StatusTable[t.status == 'INP'] as t
on ( g.srcId == t.id)
select t.id as id, g.msgId as msgId, 'CMP' as status
update StatusTable on TradeStatusTable.id == id;
#name('Publish')
from S2 as g join StatusTable[t.status == 'CMP'] as t on ( g.srcId == t.id and t.status == 'CMP')
select t.id as id, t.msgId as msgId, t.status as status
insert into O1;
How to add a query in this existing query to fetch the records from TradeStatus table, which receides more than 5 minutes. Since the table cannot be used alone, I need to join it with a stream, how to do this scenario?
String WebAttackSuccess = "" +
"#info(name = 'found_host_charged1') "+
"from ATDEventStream[ rid == 10190001 ]#window.timeBatch(10 sec) as a1 "+
"join ATDEventStream[ rid == 10180004 ]#window.time(10 sec) as a2 on a2.src_ip == a1.src_ip and a2.dst_ip == a1.dst_ip " +
" select UUID() as uuid,1007 as cid,a1.sensor_id as sensor_id,a1.interface_id as interface_id,a1.other_id as other_id,count(a1.uuid) as event_num,min(a1.timestamp) as first_seen,max(a2.timestamp) as last_seen,'' as IOC,a1.dst_ip as victim,a1.src_ip as attacker,a1.uuid as NDE4,sample:sample(a2.uuid) as Sample_NDE4 " +
" insert into found_host_charged1;"+
""+
"#info(name = 'found_host_charged2') "+
"from every a1 = found_host_charged1 " +
"-> a2 = ATDEventStream[dns_answers != ''] "+
"within 5 min "+
"select UUID() as uuid,1008 as cid,a2.sensor_id as sensor_id,a2.interface_id as interface_id,a2.other_id as other_id,count(a2.uuid) as event_num,a1.first_seen as first_seen,max(a2.timestamp) as last_seen,a2.dns_answers as IOC,a2.dst_ip as victim,a2.src_ip as attacker,a1.uuid as NDE5,sample:sample(a2.uuid) as Sample_NDE5 " +
"insert into found_host_charged2; ";
This is part of my work,i use two stream,maybe you can get the data from StatusTable in your second stream.If not yet resolved,you can change StatusTable to S1.

How do I make special characters (from a referenced page item) not error out a mailto button?

I have a dynamic action on a button that runs a small bit of javascript. Basically, it acts as a mailto link and adds some of the page items to the body of the email. it works for the most part but I have noticed that if the value of the page item contains an & the email cuts off at that point in the text. This is what I currently have:
var policy_num = $v('P9_POLICY');
var tclose = $v('P9_TDATE');
var taskt = $v('P9_TYPE');
var taskd = $v('P9_DESC');
var audito = $v('P9_TASK_AUDIT_OUTCOME');
var auditc = $v('P9_NOTE');
location.href= "mailto:" +
"?subject=" + "Please take immediate action" +
"&body="+
"%0APolicy: " + policy_num +
"%0ATask Closed: " + tclose +
"%0ATask Type: " + taskt +
"%0ATask Description: " + taskd +
"%0AAudit Outcome: " + audito +
"%0AAudit Comment: " + auditc ;
If there is a better way to accomplish this kind of mailto function that I would definitely be open to that. This is just the first way I found that actually worked. Thanks!
If your Items contain an &-Character it will act as a control-character. You need to escape it so it won't be interpreted as a control-character anymore.
It should look something like this:
var policy_num = escape($v('P9_POLICY'));
var tclose = escape($v('P9_TDATE'));
var taskt = escape($v('P9_TYPE'));
var taskd = escape($v('P9_DESC'));
var audito = escape($v('P9_TASK_AUDIT_OUTCOME'));
var auditc = escape($v('P9_NOTE'));
location.href= "mailto:" +
"?subject=" + "Please take immediate action" +
"&body="+
"%0APolicy: " + policy_num +
"%0ATask Closed: " + tclose +
"%0ATask Type: " + taskt +
"%0ATask Description: " + taskd +
"%0AAudit Outcome: " + audito +
"%0AAudit Comment: " + auditc ;

How can I use C++ to update an SQLite row relative to its original value?

I am trying to update a row in a table in an SQLite database using C++, but I want to update it relative to its current value.
This is what I have tried so far:
int val=argv[2];
string bal = "UPDATE accounts SET balance = balance + " + argv[1] + "WHERE account_id = " + bal + argv[2];
if (sqlite3_open("bank.db", &db) == SQLITE_OK)
{
sqlite3_prepare( db, balance.c_str(), -1, &stmt, NULL );//preparing the statement
sqlite3_step( stmt );//executing the statement
}
So that the first parameter is the account_id, and the second parameter is the current balance.
However, this does not work. What can I do to have the database successfully update?
Thank you!
EDIT: Sorry for the confusion. The primary situation is having a table with many entries, each with a unique account id. For example, one has an id of 1 with a balance of 5.
If I run this program with the parameters "1 5", the balance should now be 10. If I run it again with "1 7", it should be 17.
You cannot use the + operator to concatenate C-style strings and string literals. A quick and dirty fix:
string bal = string("UPDATE accounts SET balance = balance + ") + argv[1] + string( " WHERE account_id = " ) + argv[2];

How do I offline check china or not given lat, lon data.

Can I check china or not given lat, lon GPS Data?
If user stay in china then
STEP1. check lat/lon in offline func
If true
request googleGeocoding API for china
like this..
var chinaGoogleGeocoding="http://maps.google.cn/maps/api/geocode/json?language=en&latlng=" + pos.coords.latitude + "," + pos.coords.longitude + "&key=" + Google_API_Key;
else
var googleGeocoding="https://maps.googleapis.com/maps/api/geocode/json?language=en&latlng=" + pos.coords.latitude + "," + pos.coords.longitude + "&key=" + Google_API_Key;
Could you advising me?
I check http://maps.google.cn is working fine other country..but I want use it only user stay in china.
http://www.latlong.net
I think china's lat is start 37, end 47 and lon is start 123, end 110
so...
if((pos.coords.latitude>=37 && pos.coords.latitude<=47)&& (pos.coords.latitude>=110 && pos.coords.latitude<=123))
{
//china
var chinaGoogleGeocoding="http://maps.google.cn/maps/api/geocode/json?language=en&latlng=" + pos.coords.latitude + "," + pos.coords.longitude + "&key=" + Google_API_Key;
}else {
var googleGeocoding="https://maps.googleapis.com/maps/api/geocode/json?language=en&latlng=" + pos.coords.latitude + "," + pos.coords.longitude + "&key=" + Google_API_Key;
}
Is it right?
Usually a country region can be defined by something like GeoJson
Once you have a defined region to work with, you can compute if the coordinate you have received is inside the defined region for China.