Siddhi externalTime window issue

Siddhi externalTime window issue - wso2

I'm trying to calculate some stats in a sliding window interval
from CsvInputFileStreamWithConvertedTimestamp#window.externalTime(time, 250 milliseconds)
select time as timeslice, time:dateFormat(time, 'yyyy-MM-dd') as date, time:dateFormat(time, 'HH:mm:ss') as time, instrument, sum(fin) / sum(quantity) as vwap, max(price) * min(price)-1 as prange, max(price) as prangemax, min(price) as prangemin, sum(quantity) as totalquant, avg(quantity) as avgquant, 0 as medquant, sum(fin) as totalfin, avg(fin) as avgfin, count() as trades, distinctCount(buyer) as nofbuy, distinctCount(seller) as nofsell, cast(distinctCount(seller), 'double') / distinctCount(buyer) as bsratio, count(buyer) as buyaggr, count(seller) as sellaggr, sum(quantity) as totalblockquant
insert expired events into OutputStream;
But in the output most values are null
This is my input data
Any idea what i'm doing wrong here

Solved.
I was using insert expired events into instead of insert all events into

Related

Hour:Minute format on an APEX chart is not possible

I use Oracle APEX (v22.1) and on a page I created a (line) chart, but I have the following problem for the visualization of the graphic:
On the y-axis it is not possible to show the values in the format 'hh:mi' and I need a help for this.
Details for the axis:
x-axis: A date column represented as a string: to_char(time2, 'YYYY-MM')
y-axis: Two date columns and the average of the difference will be calculated: AVG(time2 - time1); the date time2 is the same as the date in the x-axis.
So I have the following SQL query for the visualization of the series:
SELECT DISTINCT to_char(time2, 'YYYY-MM') AS YEAR_MONTH --x-axis,
AVG(time2 - time1) AS AVERAGE_VALUE --y-axis
FROM users
GROUP BY to_char(time2, 'YYYY-MM')
ORDER BY to_char(time2, 'YYYY-MM')
I have another problem to solve it in another way: I am not familiar with JavaScript, if the solution is only possible in this way. Because I started new with APEX, but I have seen in different tutorials that you can use JS. So, when JS is the only solution, I would be happy to get a short description what I must do on the page.
(I don't know if this point is important for this case: The values time1 and time2 are updated daily.)
On the attributes of the chart I enabled the 'Time Axis Type' under Settings
On the y-axis I change the format to "Time - Short" and I tried with different pattern like ##:## but in this case you see for every value and also on the y-axis the value '01:00' although the line chart was represented in the right way. But when I change the format to Decimal the values are shown correct as the line chart.
I also tried it with the EXTRACT function for the value like 'EXTRACT(HOUR FROM AVG(time2 - time1))|| ':' || EXTRACT(MINUTE FROM AVG(time2 - time1))' but in this case I get an error message
So where is my mistake or is it more difficult to solve this?

ROUND(TRUNC(avg(time2 - time1)/60) + mod(avg(time2 - time1),60)/100, 2) AS Y
will get close to what you want, you can set Y Axis minimum 0 maximum 24
then 12.23 means 12 hour and 23 minutes.

Redshift Result size exceeds LISTAGG limit on svl_statementtext

Trying to reconstruct my query history from svl_statementtext using listagg.
Getting error :
Result size exceeds LISTAGG limit (limit: 65535)
However, I cannot see how or where I have exceeded limit.
My failing query :
SELECT pid,xid, min(starttime) AS starttime,
pg_catalog.listagg(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
, ''::text
) WITHIN GROUP(ORDER BY "sequence")
AS query_statement
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00';
After the fail, I checked to see if I could find where the excessive size was coming from :
SELECT pid,xid, min(starttime) AS starttime,
SUM(OCTET_LENGTH(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
)) as total_bytes
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00'
ORDER BY total_bytes desc;
However the largest size that this query reports is 2962
So how/why is listagg complaining about 65535 ??
Have seen some other posts mentioning using listaggdistinct, and catering for when the value being aggregated is null, but none seem to change my problem.
Any guidance appreciated :)

The longest string that Redshift can hold is 64K bytes. Listagg() is likely generating a string longer than this. The "text" column in svl_statementtext is 200 characters so if you have more than 319 segments you can overflow this string size.
The other issue I see is that your query will combine multiple statements into one string. You are only grouping by xid and pid which will give you all statements for a transaction. Add starttime to your group by list and this will break different statements into different results.
Also remember that xid and pid values repeat every few days so have some date range limit can help prevent a lot of confusion.
You need to add
where sequence < 320
to your query and also group by starttime.
Here's a query I have used to put together statements in Redshift:
select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where pid = (SELECT pg_backend_pid()) --current session
and sequence < 320
and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;

Power BI: how to remove top 20% of values from an average

I'm working with call center data and looking to calculate the average ring time of calls while removing the highest 20% of ring times. I assume I'll need to use PERCENTILEX.EXC embedded somewhere in AVERAGE, but I'm not quite sure where, or if I'm totally off base. 2 other caveats on this are that there are calls answered immediately (queue time = 0) which have to be counted in the average time and only data where the disposition column = Handled are used.
Example:
The Aborted and Abandoned call would be filtered out. Of the remaining calls, the top 20% of queue times (the 14,9, 6, and one of the 5s) would be eliminated and the average would be 3 seconds.
Appreciate any help on this!

I would do it like this:
VAR totalRows = COUNTROWS(FILTER(table, table[disposition] = "Handled"))
VAR bottomN = ROUNDDOWN(totalRows * .8, 0)
RETURN AVERAGEX(TOPN(bottomN, FILTER(table, table[disposition] = "Handled"), table[queue time], ASC),table[queue time])

How to use window.frequent?

Anybody can give me a example about how to use window.frequent?
For example，
I write a test,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"#info(name = 'query1') " +
"from cseEventStream[700 > price]#window.frequent(3, symbol) " +
"select symbol, price, time " +
"insert expired events into outputStream;";
But from the outputStream, i can't find out the rule.
Thanks.

In this particular query 'window.frequent(3, symbol)' will make the query to find the most frequent 3 symbols(or 3 symbols that has the highest number of occurrences). But, when you insert events to outputStream you have inserted only expired events. So that, as the end result this query will output events that are expired from the frequent window.
In a frequent window, expired events are events that are not belonging to a frequent group anymore. In this case events which are the symbol is not among 3 symbols that has the highest number of occurrences.
for an example if you send the following sequence of events,
{"symbolA", 71.36f, 100}
{"symbolB", 72.36f, 100}
{"symbolB", 74.36f, 100}
{"symbolC", 73.36f, 100}
{"symbolC", 76.36f, 100}
{"symbolD", 76.36f, 100}
{"symbolD", 76.36f, 100}
The query will output {"symbolA", 71.36f, 100}.
When you send the events with 'symbolD'. SymbolA will not be among the top3 symbols with highest number of occurrences anymore so that event with symbolA is expired and {"symbolA", 71.36f, 100} is emitted.

For every event which has a price > 700, this window will retain most frequent 3 items based on symbol and since the output type is 'expired events' you will only receive output once an event loose it's position as a frequent event.
Ex: for frequent window of size 2
Input
WSO2 1000 1
WSO2 1000 2
ABC 700 3
XYZ 800 4
Output
ABC 700 3
ABC event was in the frequent window and was expired upon receiving of XYZ event. If you use default output which is 'current events' it will output all incoming events which are selected as frequent events and put into the window.
Implementation is based on Misra-Gries counting algorithm.
Documentation : https://docs.wso2.com/display/CEP400/Inbuilt+Windows#InbuiltWindows-frequent
Test cases : https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/test/java/org/wso2/siddhi/core/query/window/FrequentWindowTestCase.java

SQL Comparison to a value in the next row

I have been a long time reader of this forum, it has helped me a lot, however I have a question which I cant find a solution specific to my requirements, so this is the first time I have had to ask anything.
I have a select statement which returns meter readings sorted by date (the newest readings at the top), in 99.9% of cases the meter readings always go up as the date moves on, however due to system errors occasionally some go down, I need to identify instances where the reading in the row below (previous reading) is GREATER than the latest reading (Current cell)
I have come across the LEAD function, however its only in Oracle or SS-MS-2012, I'm using SS-MS-2008.
Here is a simplified version of my select statment:
SELECT Devices.SerialNumber,
MeterReadings.ScanDateTime,
MeterReadings.TotalMono,
MeterReadings.TotalColour
FROM dbo.MeterReadings AS MeterReadings
JOIN DBO.Devices AS Devices
ON MeterReadings.DeviceID = Devices.DeviceID
WHERE Devices.serialnumber = 'ANY GIVEN DEVICE SERIAL NUMBER'
AND Meterreadings.Scandatetime > 'ANY GIVEN SCAN DATE TIME'
ORDER BY MeterReadings.ScanDateTime DESC, Devices.SerialNumber ASC

This is the code I used in the end
WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
WHERE m.ScanDateTime > '2012-01-01'
)
SELECT top 1 *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
and p.ScanDateTime < r.ScanDateTime
and p.TotalMono > r.TotalMono
order by r.serialnumber, p.TotalMono desc, r.TotalMono asc

Try something like this.
;WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
)
SELECT *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
AND p.ScanDateTime < r.ScanDateTime
WHERE p.reading > r.reading

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Siddhi externalTime window issue - wso2

Solved. I was using insert expired events into instead of insert all events into

Related

Hour:Minute format on an APEX chart is not possible

Redshift Result size exceeds LISTAGG limit on svl_statementtext

Power BI: how to remove top 20% of values from an average

How to use window.frequent?

SQL Comparison to a value in the next row

Categories

Resources