In CEP engine I am trying to do a pattern like
from s1 = SensorStream[level == 'A'] **NOT** -> s2 = SensorStream[level == 'B'] within 10 sec
select s1.id as id1, s2.id as id2 insert into AlertStream
I found this link , but what I have is not a range...
Any idea?
Thanks!
Marta,
Can't you use below approach. Is that what you want to achieve ?
from s1 = SensorStream[level != 'A'] -> s2 = SensorStream[level == 'B'] within 10 sec
select s1.id as id1, s2.id as id2 insert into AlertStream
This feature has been added to Siddhi 4.0 (currently under development) with the PR#483. According to this implementation, your requirement can be achieved using the following query:
from s1 = SensorStream[level == 'A'] -> not SensorStream[level == 'B'] for 10 sec
select s1.id as id
insert into AlertStream;
Note that the not pattern cannot have a stream reference s2 because you cannt select the id of an event that is not arrived.
If you want to try Siddhi early access, follow the instructions given in this tutorial: Siddhi 4.0.0 Early Access
For more samples, have a look at the test cases.
Related
I have many an IonStruct as follows.
{
revenueId: "0dcb7eb6-8cec-4af1-babe-7292618b9c69",
ownerId: "u102john2021",
revenueAddedTime: 2020-06-20T19:31:31.000Z,
}
I want to write a query to select the latest records set within a given year.
for example, suppose I have a set of timestamps like this -
A - 2019-06-20T19:31:31.000Z
B - 2020-06-20T19:31:31.000Z
C - 2020-06-20T19:31:31.000Z
D - 2021-07-20T19:31:31.000Z
E - 2020-09-20T19:31:31.000Z
F - 2020-09-20T19:31:31.000Z
If the selected year is between 2020 and 2021, I want to return records which having the latest timestamp.
in this case. E and F,
I tried many ways like
"SELECT * FROM REVENUES AS r WHERE r.ownerId = ? AND r.revenueAddedTime >= ? AND r.revenueAddedTime < ?"
Can anyone help me here?
Although I have no experience in qldb syntax, it seems to have similar properties to other db syntax in the sense that you can format your timestamps using these doc:
https://docs.aws.amazon.com/qldb/latest/developerguide/ql-functions.timestamp-format.html
https://docs.aws.amazon.com/qldb/latest/developerguide/ql-functions.to_timestamp.html
Once you format the timestamp, you may be able to do the > and < query syntax.
My question: how do I create a dictionary from a list by assigning dictionary keys based on a regex pattern match ('^--L-[0-9]{8}'), and assigning the values by using all lines between each key.
Example excerpt from the raw file:
SQL> --L-93752133
SQL> --SELECT table_name, tablespace_name from dba_tables where upper(table_name) like &tablename_from_developer;
SQL>
SQL> --L-52852243
SQL>
SQL> SELECT log_mode FROM v$database;
LOG_MODE
------------
NOARCHIVELOG
SQL>
SQL> archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 3
Current log sequence 5
SQL>
SQL> --L-42127143
SQL>
SQL> SELECT t.name "TSName", e.encryptionalg "Algorithm", d.file_name "File Name"
2 FROM v$tablespace t
3 , v$encrypted_tablespaces e
4 , dba_data_files d
5 WHERE t.ts# = e.ts#
6 AND t.name = d.tablespace_name;
no rows selected
Some additional detail: The raw file can be large (at least 80K+ lines, but often much larger) and I need to preserve the original spacing so the output is still easy to read. Here's how I'm reading the file in and removing "SQL>" from the beginning of each line:
with open(rawFile, 'r') as inFile:
content = inFile.read()
rawList = content.splitlines()
for line in rawList:
cleanLine = re.sub('^SQL> ', '', line)
Finding the dictionary keys I'm looking for is easy:
pattern = re.compile(r'^--L-[0-9]{8}')
if pattern.search(cleanLine) is not None:
itemID = pattern.search(cleanLine)
print(itemID.group(0))
But how do I assign all lines between each key as the value belonging to the most recent key preceding them? I've been playing around with new lists, tuples, and dictionaries but everything I do is returning garbage. The goal is to have the data and keys linked to each other so that I can return them as needed later in my script.
I spent a while searching for a similar question, but in most other cases the source file was already in a dictionary-like format so creating the new dictionary was a less complicated problem. Maybe a dictionary or tuple isn't the right answer, but any help would be appreciated! Thanks!
In general, you should question why you would read the entire file, split the lines into a list, and then iterate over the list. This is a Python anti-pattern.
For line oriented text files, just do:
with open(fn) as f:
for line in f:
# process a line
It sounds, however, that you have multi-line block oriented patterns. If so, with smaller files, read the entire file into a single string and use a regex on that. Then you would use group 1 and group 2 as the key, value in your dict:
pat=re.compile(pattern, flags)
with open(file_name) as f:
di={m.group(1):m.group(2) for m in pat.finditer(f.read())}
With a larger file, use a mmap:
import re, mmap
pat=re.compile(pattern, flags)
with open(file_name, 'r+') as f:
mm = mmap.mmap(f.fileno(), 0)
for i, m in enumerate(pat.finditer(mm)):
# process each block accordingly...
As far as the regex, I am a little unclear on what you are trying to capture or not. I think this regex is what I am understanding you want:
^SQL> (--L-[0-9]{8})(.*?)(?=SQL> --L-[0-9]{8}|\Z)
Demo
In either case, running that regex with the example string yields:
>>> pat=re.compile(r'^SQL> (--L-[0-9]{8})\s*(.*?)\s*(?=SQL> --L-[0-9]{8}|\Z)', re.S | re.M)
>>> with open(file_name) as f:
... di={m.group(1):m.group(2) for m in pat.finditer(f.read())}
...
>>> di
{'--L-52852243': 'SQL> \nSQL> SELECT log_mode FROM v;\n\n LOG_MODE\n ------------\n NOARCHIVELOG\n\nSQL> \nSQL> archive log list\n Database log mode No Archive Mode\n Automatic archival Disabled\n Archive destination USE_DB_RECOVERY_FILE_DEST\n Oldest online log sequence 3\n Current log sequence 5\nSQL>',
'--L-93752133': 'SQL> --SELECT table_name, tablespace_name from dba_tables where upper(table_name) like &tablename_from_developer;\nSQL>',
'--L-42127143': 'SQL> \nSQL> SELECT t.name TSName, e.encryptionalg Algorithm, d.file_name File Name\n 2 FROM v t\n 3 , v e\n 4 , dba_data_files d\n 5 WHERE t.ts# = e.ts#\n 6 AND t.name = d.tablespace_name;\n\n no rows selected'}
Something like this?
with open(rawFile, 'r') as inFile:
content = inFile.read()
rawList = content.splitlines()
keyed_dict = {}
in_between_lines = ""
last_key = 0
for line in rawList:
cleanLine = re.sub('^SQL> ', '', line)
pattern = re.compile(r'^--L-[0-9]{8}')
if pattern.search(cleanLine) is not None:
itemID = pattern.search(cleanLine)
if last_key: keyed_dict[last_key] = in_between_lines
last_key = itemID.group(0)
in_between_lines = ""
else:
in_between_lines += cleanLine
I have Spark Data Frame 1 of several columns: (user_uuid, url, date_visit)
I want to transform this DF1 to Data Frame 2 with form : (user_uuid, domain, date_visit)
What I wanted to use is regular expression to detect domain and apply it to DF1 val regexpr = """(?i)^((https?):\/\/)?((www|www1)\.)?([\w-\.]+)""".r
Could you please help me composing code to transform Data Frames in Scala? I am completely new to Spark and Scala and syntax is hard. Thanks!
Spark >= 1.5:
You can use regexp_extract function:
import org.apache.spark.sql.functions.regexp_extract
val patter: String = ???
val groupIdx: Int = ???
df.withColumn("domain", regexp_extract(url, pattern, groupIdx))
Spark < 1.5.0
Define an UDF
val pattern: scala.util.matching.Regex = ???
def getFirst(pattern: scala.util.matching.Regex) = udf(
(url: String) => pattern.findFirstIn(url) match {
case Some(domain) => domain
case None => "unknown"
}
)
Use defined UDF:
df.select(
$"user_uuid",
getFirst(pattern)($"url").alias("domain"),
$"date_visit"
)
or register temp table:
df.registerTempTable("df")
sqlContext.sql(s"""
SELECT user_uuid, regexp_extract(url, '$pattern', $group_idx) AS domain, date_visit FROM df""")
Replace pattern with a valid Java regexp and group_id with an index of the group.
Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?
The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.
Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.
Hie all,
I'm trying to get eventlog entries using WMI and WQL.
I can get the right log with the right sourcename of itand so on, but i can make a select query to only get result for the 5 or 10 past minutes.
here is my query:
Here are a few snippets from a script of mine:
Dim dtmStart, dtmEnd, sDate, ...
I actually had an array of dates and I was looking for logon/off/unlock events for the entire day. So I built my complete start and end dates from that.
I won't put in the day month and year but, you could just define it, e.g. sDate = 10100608.
dtmStart = sDate + "000000.000000-420" '0hr on the date in question.
dtmEnd = sDate + "235900.000000-420" ' 23:59 on the date in question
(Note that the UTC offset is in minutes here -420 day light savings time North America.)
Set colEvents = oWMIService.ExecQuery _
("SELECT * FROM Win32_NTLogEvent WHERE Logfile = 'Security' AND " _
& "TimeWritten >= '" & dtmStart & "' AND TimeWritten < '" _
& dtmEnd & "' AND " _
& "(EventCode = '528' OR EventCode = '540' OR EventCode = '538')")
' Query for events during the time range we're looking for.
Mike,
Show me your query. Usually the time format is something like this
20100608100000.000000-300
see this for more details about DateTime formatting for WQL