How to put an arrival and ingestion timestamp in siddhi CEP

How to put an arrival and ingestion timestamp in siddhi CEP - wso2

I want to put the arrival timestamp for the stream as soon as it arrives at the WSO2 IoT server and Ingestion timestamp when it is consumed by the CEP engine. These times will be used to compute Queuing latency and CEP latency as follows.
Queuing latency = ingestion time - arrival time
CEP latency = detection time - ingestion time
Below is my execution plan
#Plan:name('Server_CEP')
#Plan:statistics('true')
#Plan:trace('true')
#plan:async(bufferSize='1024')
#Import('stream2_scep:1.0.0')
define stream eeg_stream (meta_sensorID_s2 int, meta_tupleID_s2 int, value_s2 int, generationTime_s2 long);
#Import('stream1_scep:1.0.0')
define stream ecg_stream (meta_sensorID_s1 int, meta_tupleID_s1 int, value_s1 int, generationTime_s1 long);
#Export('cep_stream_scep:1.0.0')
define stream CEPStream (cep_event int, cepLatency long);
from every ecg = ecg_stream[value_s1 >= 50 ] -> eeg = eeg_stream[value_s2 >= 50] within 10 sec
select ecg.value_s1 as cep_event , convert(time:currentTimestamp(), 'long') - ecg.generationTime_s1 as cepLatency
insert into CEPStream;
I am able to find the detection time as the current time when CEP event is detected. I am also using the #async with a buffer size of 1024. Now the issue is how do I timestamp the arrival time of the stream as soon as it arrives. Also, the second issue is how to put an engine ingestion timestamp.
Can someone tell me how can I achieve this?
PS: I was able to achieve this is an Android device as I used a non-blocking queue, and arrival time was the time it arrives into FIFO queue and ingestion time was the time it is dequeued

It's advisable not to use #plan:async(bufferSize='1024') as it will be applicable to all streams associated with the Siddhi App. Hence, apply async(buffer.size = '1024') only to the streams you want to make async.
E.g.
async(buffer.size = '1024')
define stream <stream name> (...);
Now to achieve what you have asked. Send the make the initial stream non-async (sync), use that steam in a query and inject the current timestamp to that event, them send the result to another stream that is configured in async mode, and finally use the second stream for rest of the processing. This way you will also be able to add the arrival time to the events in a synchronized way.

I did this by creating an execution plan which receives a stream x and put a timestamp on it and sends it to another stream y. Example code of such an execution plan is
#Plan:name('scep_s1_arrival_timestamping')
#Plan:statistics('false')
#Plan:trace('false')
#Import('stream1_scep:1.0.0')
define stream inputStream (meta_sensorID_s1 int, meta_tupleID_s1 int, value_s1 int, generationTime_s1 long);
#Export('stream1_scep:2.0.0')
define stream outputStream (meta_sensorID_s1 int, meta_tupleID_s1 int, value_s1 int, generationTime_s1 long, arrivalTime_s1 long);
from inputStream
select meta_sensorID_s1 as meta_sensorID_s1 , meta_tupleID_s1 as meta_tupleID_s1, value_s1 as value_s1, generationTime_s1 as generationTime_s1, convert(time:timestampInMilliseconds (),'long') as arrivalTime_s1
insert into outputStream;
The timestamping is done using convert(time:timestampInMilliseconds (),'long') as arrivalTime_s1 . Note that covert is used to convert the datatype to long which is then inserted into arrivalTime_s1 variable.

Related

NextShardIterator never returns null when reading data from kinesis stream

I am trying to read records from kinesis stream after a particular timestamp in a lambda function. I get the shards, shard iterators and then the data.
When I get the first iterator, I get the data and keep calling the same function recursively using NextShardIterator (present in the data returned). According to the documentation, the NextShardIterator will return null when there is no more data to read and it has reached $latest.
But it never returns null, and the function keeps getting invoked and eventually I get Provisioned Throughput Exceeded Exception.
I also tried using MillisBehindLatest to stop reading when the value is zero, but it also fails in some cases.
Is there a correct way to get the data from kinesis based on timestamp?

NextShardIterator will only return null when it reaches the end of a closed shard ( in cases when the shard count is updated using UpdateShardCount, SplitShard or MergeShard)
https://docs.amazonaws.cn/en_us/kinesis/latest/APIReference/API_GetRecords.html#API_GetRecords_ResponseSyntax
"NextShardIterator
The next position in the shard from which to start sequentially reading data records. If set to null, the shard has been closed and the requested iterator does not return any more data."
If you want to start reading the stream from a specified timestamp, the best way to do this would be to use event source mapping with lambda and specifying the StartingPosition as TIMESTAMP in lambda.
https://docs.aws.amazon.com/lambda/latest/dg/API_CreateEventSourceMapping.html#SSS-CreateEventSourceMapping-request-StartingPosition

When trying to execute an Siddhi APP using an event-stream generated by JMeter the RAM usage gets out of control

When trying to simulate an event-stream with JMeter and use it as source on siddhi it works for a little time but ends with RAM being overused and the execution of the program stops.
I tried executing the code with a database,without a database, using a partition to get the events one by one.
This is the stream code:
#Source(type = 'http',
receiver.url='http://172.23.3.22:8007/insertSweetProduction',
basic.auth.enabled='false',
#map(type='json', #attributes( tipoDato='$.tipoDato', fecha='$.fecha', valor='$.valor', servicio='$.servicio')))
define stream insertSweetProduction (tipoDato string, fecha string, valor double, servicio string);
This is the sink stream:
#Sink(type='file',
#map(type='json'),
#attributes( tipoDato='$.tipoDato', fecha='$.fecha', valor='$.valor', servicio='$.servicio'),
file.uri='/dev/null')
define stream fileSweetProduction (tipoDato string, fecha string, valor double, servicio string);
And this is the query executed to copy from one stream to another:
#info(name='query2')
from insertSweetProduction
select tipoDato,fecha,valor,servicio
insert into fileSweetProduction;
The expected results are that the wso2worker would show that all the events were processed and inserted on the sink stream.
On JMeter I am simulating 1 user introducing 6000 events during 1 hour and it looks like the memory ends up overused and the simulation stops.
Tried with partition and the memory usage improved a lot but still ended up in failure.
All I can think is that is a coding problem but cant seem to find anything that could cause this.
//Sorry for the poor english, not my first language//

The recommended heap size is 2 Gb[1]. How much have you allocated to the worker profile?
[1] https://docs.wso2.com/display/SP430/Installation+Prerequisites

Siddhi check if an event does not arrive within a specified time window?

I am using CEP to check if an event has arrived within a specified amount of time (lets say 1 min). If not, I want to publish an alert.
More specifically, a (server) machine generates a heartbeat data stream and sends it to CEP. The heartbeat stream contains the server id and a timestamp. An alert should be generated if no heartbeat data arrive within the 1 min period.
Is it possible to do something like that with CEP? I have seen other questions regarding the detection of non-occurencies but I am still not sure how to approach the scenario described above.

You can try this :
define stream heartbeats (serverId string, timestamp long);
from heartbeats#window.time(1 minute) insert expired events into delayedStream;
from every e = heartbeats -> e2 = hearbeats[serverId == e.serverId]
or expired = delayedStream[serverId == e.serverId]
within 1 minute
select e.serverId, e2.serverId as id2, expired.serverId as id3
insert into tmpStream;
// every event on tmpStream with a 'expired' match has timeout
from tmpStream[id3 is not null]
select serverId
insert into expiredHearbeats;

How to limit an Akka Stream to execute and send down one message only once per second?

I have an Akka Stream and I want the stream to send messages down stream approximately every second.
I tried two ways to solve this problem, the first way was to make the producer at the start of the stream only send messages once every second when a Continue messages comes into this actor.
// When receive a Continue message in a ActorPublisher
// do work then...
if (totalDemand > 0) {
import scala.concurrent.duration._
context.system.scheduler.scheduleOnce(1 second, self, Continue)
}
This works for a short while then a flood of Continue messages appear in the ActorPublisher actor, I assume (guess but not sure) from downstream via back-pressure requesting messages as the downstream can consume fast but the upstream is not producing at a fast rate. So this method failed.
The other way I tried was via backpressure control, I used a MaxInFlightRequestStrategy on the ActorSubscriber at the end of the stream to limit the number of messages to 1 per second. This works but messages coming in come in at approximately three or so at a time, not just one at a time. It seems the backpressure control doesn't immediately change the rate of messages coming in OR messages were already queued in the stream and waiting to be processed.
So the problem is, how can I have an Akka Stream which can process one message only per second?
I discovered that MaxInFlightRequestStrategy is a valid way to do it but I should set the batch size to 1, its batch size is default 5, which was causing the problem I found. Also its an over-complicated way to solve the problem now that I am looking at the submitted answer here.

You can either put your elements through the throttling flow, which will back pressure a fast source, or you can use combination of tick and zip.
The first solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val throttlingFlow = Flow[Long].throttle(
// how many elements do you allow
elements = 1,
// in what unit of time
per = 1.second,
maximumBurst = 0,
// you can also set this to Enforcing, but then your
// stream will collapse if exceeding the number of elements / s
mode = ThrottleMode.Shaping
)
veryFastSource.via(throttlingFlow).runWith(Sink.foreach(println))
The second solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val tickingSource = Source.tick(1.second, 1.second, 0)
veryFastSource.zip(tickingSource).map(_._1).runWith(Sink.foreach(println))

How can I check if a message is about to pass the MessageRetentionPeriod?

I have an app that uses SQS to queue jobs. Ideally I want every job to be completed, but some are going to fail. Sometimes re-running them will work, and sometimes they will just keep failing until the retention period is reached. . I want to keep failing jobs in the queue as long as possible, to give them the maximum possible chance of success, so I don't want to set a maxReceiveCount. But I do want to detect when a job reaches the MessageRetentionPeriod limit, as I need to send an alert when a job fails completely. Currently I have the max retention at 14 days, but some jobs will still not be completed by then.
Is there a way to detect when a job is about to expire, and from there send it to a deadletter queue for additional processing?

Before you follow my advice below and assuming I've done the math for periods correctly, you will be better off enabling a redrive policy on the queue if you check for messages less often than every 20 minutes and 9 seconds.
SQS's "redrive policy" allows you to migrates messages to a dead letter queue after a threshold number of receives. The maximum receives that AWS allows for this is 1000, and over 14 days that works out to about 20 minutes per receive. (For simplicity, that is assuming that your job never misses an attempt to read queue messages. You can tweak the numbers to build in a tolerance for failure.)
If you check more often than that, you'll want to implement the solution below.
You can check for this "cutoff date" (when the job is about to expire) as you process the messages, and send messages to the deadletter queue if they've passed the time when you've given up on them.
Pseudocode to add to your current routine:
Call GetQueueAttributes to get the count, in seconds, of your queue's Message Retention Period.
Call ReceiveMessage to pull messages off of the queue. Make sure to explicitly request that the SentTimestamp is visible.
Foreach message,
Find your message's expiration time by adding the message retention period to the sent timestamp.
Create your cutoff date by subtracting your desired amount of time from the message's expiration time.
Compare the cutoff date with the current time. If the cutoff date has passed:
Call SendMessage to send your message to the Dead Letter queue.
Call DeleteMessage to remove your message from the queue you are processing.
If the cutoff date has not passed:
Process the job as normal.
Here's an example implementation in Powershell:
$queueUrl = "https://sqs.amazonaws.com/0000/my-queue"
$deadLetterQueueUrl = "https://sqs.amazonaws.com/0000/deadletter"
# Get the message retention period in seconds
$messageRetentionPeriod = (Get-SQSQueueAttribute -AttributeNames "MessageRetentionPeriod" -QueueUrl $queueUrl).Attributes.MessageRetentionPeriod
# Receive messages from our queue.
$queueMessages = #(receive-sqsmessage -QueueUrl $queueUrl -WaitTimeSeconds 5 -AttributeNames SentTimestamp)
foreach($message in $queueMessages)
{
# The sent timestamp is in epoch time.
$sentTimestampUnix = $message.Attributes.SentTimestamp
# For powershell, we need to do some quick conversion to get a DateTime.
$sentTimestamp = ([datetime]'1970-01-01 00:00:00').AddMilliseconds($sentTimestampUnix)
# Get the expiration time by adding the retention period to the sent time.
$expirationTime = $sentTimestamp.AddDays($messageRetentionPeriod / 86400 )
# I want my cutoff date to be one hour before the expiration time.
$cutoffDate = $expirationTime.AddHours(-1)
# Check if the cutoff date has passed.
if((Get-Date) -ge $cutoffDate)
{
# Cutoff Date has passed, move to deadletter queue
Send-SQSMessage -QueueUrl $deadLetterQueueUrl -MessageBody $message.Body
remove-sqsmessage -QueueUrl $queueUrl -ReceiptHandle $message.ReceiptHandle -Force
}
else
{
# Cutoff Date has not passed. Retry job?
}
}
This will add some overhead to every message you process. This also assumes that your message handler will receive the message inbetween the cutoff time and the expiration time. Make sure that your application is polling often enough to receive the message.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js