CEP query based on date / time of the day - wso2

In WSO2 CEP, I made an execution plan that includes the following query:
(it will be fired if the temperature exeeds 20 degrees 3 times in a row within 10 seconds)
from MQTTstream[meta_temperature > 20]#window.time(10 sec)
select count(meta_temperature) as meta_temperature
having meta_temperature > 3
insert into out_temperatureAlarm
How can I achieve that the query is only applied if it is a special time of the day, e.g. 08:00 until 10:00 o'clock?
Is there something that I could put into the query like:
having meta_temperature > 3 and HOUR_OF_THE_DAY BETWEEN 8 and 10

You can use a cron window #window.cron instead of using a time window #window.time. You can specify Cron expression string for desired time periods in Siddhi [1]. Please refer quartz scheduler documentation to get more information on cron expression strings [2].
[1] https://docs.wso2.com/display/CEP400/Inbuilt+Windows#InbuiltWindows-croncron
[2] http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger

Related

Cloud scheduler trigger on the first monday of each month

I'm trying to schedule a job to trigger on the first Monday of each month:
This is the cron expression I got: 0 5 1-7 * 1
(which ,as far as I can read unix cron expressions, triggers at 5:00 am on Monday if it happens to be in the first 7 days of the month)
However the job is triggered on what seems to be random days at 5:00 am. the job was triggered today on the 16 of Aug!
Am I reading the expression awfully wrong? BTW, I'm setting the timezone to be on AEST, if that makes difference.
You can use the legacy cron syntax to describe the schedule.
For your case, specify something like below:
"first monday of month 05:00"
Do explore the "Custom interval" tab in the provided link, to get better understanding on this.

Configure a cron schedule for different intervals (hour and days)

Hello I am configuring jobs in GCP following the google cloud guide: https://cloud.google.com/scheduler/docs/configuring/cron-job-schedules?&_ga=2.226390674.-907001748.1582817924#defining_the_job_schedule
I have to configure a job which will be executed once on weekdays at 6am and once on weekends at 5am. I am not pretty sure if it is possible to configure this for several intervals of time with something like an and statment:
0 5 * * 1-5 # monday to friday 5 am.
0 6 * * 6,0 # saturday and sunday 5 am.
In what way I can combine this intervals, besides that I need to add others ones but I am not pretty sure how can I do this.
Thanks.
You can't combine them in one record. The times do not match on any way. Of course if you have more jobs this eventually can be possible (depend on intervals)

AWS IoT Analytics Delta Window

I am having real problems getting the AWS IoT Analytics Delta Window (docs) to work.
I am trying to set it up so that every day a query is run to get the last 1 hour of data only. According to the docs the schedule feature can be used to run the query using a cron expression (in my case every hour) and the delta window should restrict my query to only include records that are in the specified time window (in my case the last hour).
The SQL query I am running is simply SELECT * FROM dev_iot_analytics_datastore and if I don't include any delta window I get the records as expected. Unfortunately when I include a delta expression I get nothing (ever). I left the data accumulating for about 10 days now so there are a couple of million records in the database. Given that I was unsure what the optimal format would be I have included the following temporal fields in the entries:
datetime : 2019-05-15T01:29:26.509
(A string formatted using ISO Local Date Time)
timestamp_sec : 1557883766
(A unix epoch expressed in seconds)
timestamp_milli : 1557883766509
(A unix epoch expressed in milliseconds)
There is also a value automatically added by AWS called __dt which is a uses the same format as my datetime except it seems to be accurate to within 1 day. i.e. All values entered within a given day have the same value (e.g. 2019-05-15 00:00:00.00)
I have tried a range of expressions (including the suggested AWS expression) from both standard SQL and Presto as I'm not sure which one is being used for this query. I know they use a subset of Presto for the analytics so it makes sense that they would use it for the delta but the docs simply say '... any valid SQL expression'.
Expressions I have tried so far with no luck:
from_unixtime(timestamp_sec)
from_unixtime(timestamp_milli)
cast(from_unixtime(unixtime_sec) as date)
cast(from_unixtime(unixtime_milli) as date)
date_format(from_unixtime(timestamp_sec), '%Y-%m-%dT%h:%i:%s')
date_format(from_unixtime(timestamp_milli), '%Y-%m-%dT%h:%i:%s')
from_iso8601_timestamp(datetime)
What are the offset and time expression parameters that you are using?
Since delta windows are effectively filters inserted into your SQL, you can troubleshoot them by manually inserting the filter expression into your data set's query.
Namely, applying a delta window filter with -3 minute (negative) offset and 'from_unixtime(my_timestamp)' time expression to a 'SELECT my_field FROM my_datastore' query translates to an equivalent query:
SELECT my_field FROM
(SELECT * FROM "my_datastore" WHERE
(__dt between date_trunc('day', iota_latest_succeeded_schedule_time() - interval '1' day)
and date_trunc('day', iota_current_schedule_time() + interval '1' day)) AND
iota_latest_succeeded_schedule_time() - interval '3' minute < from_unixtime(my_timestamp) AND
from_unixtime(my_timestamp) <= iota_current_schedule_time() - interval '3' minute)
Try using a similar query (with no delta time filter) with correct values for offset and time expression and see what you get, The (_dt between ...) is just an optimization for limiting the scanned partitions. You can remove it for the purposes of troubleshooting.
Please try the following:
Set query to SELECT * FROM dev_iot_analytics_datastore
Data selection filter:
Data selection window: Delta time
Offset: -1 Hours
Timestamp expression: from_unixtime(timestamp_sec)
Wait for dataset content to run for a bit, say 15 minutes or more.
Check contents
After several weeks of testing and trying all the suggestions in this post along with many more it appears that the extremely technical answer was to 'switch off and back on'. I deleted the whole analytics stack and rebuild everything with different names and it now seems to now be working!
Its important that even though I have flagged this as the correct answer due to the actual resolution. Both the answers provided by #Populus and #Roger are correct had my deployment being functioning as expected.
I found by chance that changing SELECT * FROM datastore to SELECT id1, id2, ... FROM datastore solved the problem.

AWS Data Pipeline scheduling with expressions and date functions

I want to schedule a AWS Data Pipeline job hourly. I would like to create hourly partition on S3 using this. Something like:
s3://my-bucket/2016/07/19/09/
s3://my-bucket/2016/07/19/10/
s3://my-bucket/2016/07/19/11/
I am using expressions for my EMRActivity for this:
s3://my-bucket/#{year(minusHours(#scheduledStartTime,1))}/#{month(minusHours(#scheduledStartTime,1))}/#{day(minusHours(#scheduledStartTime,1))}/#{hour(minusHours(#scheduledStartTime,1))}
However, hour and month functions give me data such as 7 for July instead of 07, and 3 for 3rd hour instead of 03. I would like to get hours,months and hours with 0 appended (when required)
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-reference-functions-datetime.html
You can use the format function to get hours/months in the format you want.
#{format(myDateTime,'YYYY-MM-dd hh:mm:ss')}
Refer to the link for more details: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-reference-functions-datetime.html
In your case, to display hour with 0 appended this should work:
#{format(minusHours(#scheduledStartTime,1), 'hh')}
you can replace 'hh' with 'MM' to get months with 0 appended.

Using Siddhi patterns for events that haven't happened

In the CEP engine can I look for a patterns for events that haven't occurred.
Editing the fraud pattern detection query: Can I fire the event if two purchases of the same card are made within one day and if the first purchase is less than $10 and the second one isn't greater than $10,000.
from every (a1 = purchase[price > 10] ) NOT -> a2 = purchase [price >10000 and 1.cardNo==a2.cardNo] within 1 day
insert into potentialFraud a1.cardNo as cardNo, a2.price as price, a2.place as place;
Fire if event1 hasn't been followed by event2 within the last hour rather than fire if event1 has been followed by event2 within the last hour?
Non occurrences are not supported as of CEP 3.1.0 (but it will be available in the next version, 4.0.0).
But your use case can be implemented in an alternative way. Since you want to find the occurrence of at least 1 event > 10 and no events > 10000 (per card no.) in the last hour, you can do something like follows:
add a filter that filters events with price > 10
send them to a time window (of 1 hour)
in the time window, use functions to calculate max() value (with a group by) and emit the max value with the output events
In a filter, check for max < 10000
This will look for one or more events with price > 10, but less than 10000 in the last hour.
You'll find the below documentation useful for implementing this:
https://docs.wso2.com/display/CEP310/Windows