i am working on Oracle to Snowflake migration.
while migrating oracle window functions to snowflake getting below error, could you let me know, alternate way for oracle function in snowflake.
SELECT
COL1,
COL2, ...,
SUM(SUM(TAB1.COL1)) OVER (PARTITION BY
TAB1.COL2,
TAB1.COL3,
TAB1.COL4,
TAB1.COL5,
TAB1.COL6,
TAB1.COL7,
TAB1.COL8,
TAB1.COL9,
TAB1.COL10,
ORDER BY MAX(CALENDAR_TAB.DATE_COLUMN) RANGE BETWEEN INTERVAL '21' DAY PRECEDING AND CURRENT ROW)/4 AS COLMN
FROM TAB1,CALENDAR_TAB
JOIN
GROUP BYCOL1,
COL2, ...
Below is the error message:
QL Error [1003] [42000]: SQL compilation error:
syntax error line 75 at position 60 unexpected 'INTERVAL'.
syntax error line 75 at position 78 unexpected 'PRECEDING'.
Per the documentation for Snowflake, here is the syntax:
https://docs.snowflake.com/en/sql-reference/functions-analytic.html#window-syntax-and-usage
slidingFrame ::=
{
ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND <N> { PRECEDING | FOLLOWING }
| ROWS BETWEEN UNBOUNDED PRECEDING AND <N> { PRECEDING | FOLLOWING }
| ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND UNBOUNDED FOLLOWING
}
It might not like the INTERVAL and the quoted number.
The Window frame document is a good place to start.
If I read the Oracle syntax correctly, the window frame your are using for the MAX is value based aka (interval '21' day) which Snowflake does not support, it only supports N rows based logic. If you have 1 row per day, and always 1 row, then you can use the row count logic, but otherwise this is not supported.
Which means you to join back to your own data tables and apply the prior time filter on the join.
Related
Trying to reconstruct my query history from svl_statementtext using listagg.
Getting error :
Result size exceeds LISTAGG limit (limit: 65535)
However, I cannot see how or where I have exceeded limit.
My failing query :
SELECT pid,xid, min(starttime) AS starttime,
pg_catalog.listagg(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
, ''::text
) WITHIN GROUP(ORDER BY "sequence")
AS query_statement
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00';
After the fail, I checked to see if I could find where the excessive size was coming from :
SELECT pid,xid, min(starttime) AS starttime,
SUM(OCTET_LENGTH(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
)) as total_bytes
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00'
ORDER BY total_bytes desc;
However the largest size that this query reports is 2962
So how/why is listagg complaining about 65535 ??
Have seen some other posts mentioning using listaggdistinct, and catering for when the value being aggregated is null, but none seem to change my problem.
Any guidance appreciated :)
The longest string that Redshift can hold is 64K bytes. Listagg() is likely generating a string longer than this. The "text" column in svl_statementtext is 200 characters so if you have more than 319 segments you can overflow this string size.
The other issue I see is that your query will combine multiple statements into one string. You are only grouping by xid and pid which will give you all statements for a transaction. Add starttime to your group by list and this will break different statements into different results.
Also remember that xid and pid values repeat every few days so have some date range limit can help prevent a lot of confusion.
You need to add
where sequence < 320
to your query and also group by starttime.
Here's a query I have used to put together statements in Redshift:
select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where pid = (SELECT pg_backend_pid()) --current session
and sequence < 320
and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;
I have a single table of data named RDSLPDSL. I am trying to calculate two columns based on two measures I am creating from the table.
Count of RDSL Marker for 1 =
CALCULATE(
COUNT('RDSLPDSL'[RDSL Marker]),
'RDSLPDSL'[RDSL Marker] IN { 1 }
)
I am using the above code as a measure to look for values only with 1 in it in the RDSL Marker column.
RDSL % = DIVIDE([Count of RDSL Marker for 1], COUNTROWS(RDSLPDSL))
Then I created a column using the above code to divide the rows with 1 by the total number of rows in the table.
I am doing the same for another column with PDSL. It is as follows:
Count of PDSL Marker for 1 =
CALCULATE(
COUNT('RDSLPDSL'[PDSL Marker]),
'RDSLPDSL'[PDSL Marker] IN { 1 }
)
PDSL % = DIVIDE([Count of PDSL Marker for 1], COUNTROWS(RDSLPDSL))
But when I do this calculation, I am getting an error for circular dependency detected and not getting the final output even though the same code worked for the previous column.
I tried COUNTAX directly instead of using CALCULATE but that brings up the same error too.
I also tried using measures instead of custom column which seems to remove the error but the output is not what I expect and is incorrect.
Any help for the same would be highly appreciated.
I'm looking for a way to convert a decimal number into a valid HH:mm:ss format.
I'm importing data from an SQL database.
One of the columns in my database is labelled Actual Start Time.
The values in my database are stored in the following decimal format:
73758 // which translates to 07:27:58
114436 // which translates to 11:44:36
I cannot simply convert this Actual Start Time column into a Time format in my Power BI import as it returns errors for some values, saying it doesn't recognise 73758 as a valid 'time'. It needs to have a leading zero for cases such as 73758.
To combat this, I created a new Text column with the following code to append a leading zero:
Column = FORMAT([Actual Start Time], "000000")
This returns the following results:
073758
114436
-- which is perfect. Exactly what I needed.
I now want to convert these values into a Time.
Simply changing the data type field to Time doesn't do anything, returning:
Cannot convert value '073758' of type Text to type Date.
So I created another column with the following code:
Column 2 = FORMAT(TIME(LEFT([Column], 2), MID([Column], 3, 2), RIGHT([Column], 2)), "HH:mm:ss")
To pass the values 07, 37 and 58 into a TIME format.
This returns the following:
_______________________________________
| Actual Start Date | Column | Column 2 |
|_______________________________________|
| 73758 | 073758 | 07:37:58 |
| 114436 | 114436 | 11:44:36 |
Which is what I wanted but is there any other way of doing this? I want to ideally do it in one step without creating additional columns.
You could use a variable as suggested by Aldert or you can replace Column by the format function:
Time Format = FORMAT(
TIME(
LEFT(FORMAT([Actual Start Time],"000000"),2),
MID(FORMAT([Actual Start Time],"000000"),3,2),
RIGHT([Actual Start Time],2)),
"hh:mm:ss")
Edit:
If you want to do this in Power query, you can create a customer column with the following calculation:
Time.FromText(
if Text.Length([Actual Start Time])=5 then Text.PadStart( [Actual Start Time],6,"0")
else [Actual Start Time])
Once this column is created you can drop the old column, so that you only have one time column in the data. Hope this helps.
I, on purpose show you the concept of variables so you can use this in future with more complex queries.
TimeC =
var timeStr = FORMAT([Actual Start Time], "000000")
return FORMAT(TIME(LEFT([timeStr], 2), MID([timeStr], 3, 2), RIGHT([timeStr], 2)), "HH:mm:ss")
When I use this SELECT I get the following output.
SELECT integral("value",1h) / 1000 FROM /(Klima|NAS)_Power/ WHERE time > now()-1w AND time <= now() GROUP BY time(1d) fill(null)
name: Klima_Power
time integral
---- --------
2019-07-11T00:00:00Z 0.0028576888333333326
2019-07-12T00:00:00Z 0.05559535705833335
2019-07-13T00:00:00Z 0.055475250270833325
2019-07-14T00:00:00Z 0.0551049064541667
2019-07-15T00:00:00Z 0.055454312898611136
2019-07-16T00:00:00Z 0.05580957162916666
2019-07-17T00:00:00Z 0.05551291632777774
name: NAS_Power
time integral
---- --------
2019-07-11T00:00:00Z 0
2019-07-12T00:00:00Z 0
2019-07-13T00:00:00Z 0
2019-07-14T00:00:00Z 0
2019-07-15T00:00:00Z 0
2019-07-16T00:00:00Z 0.1073428686286408
2019-07-17T00:00:00Z 0.7449990083701262
2019-07-18T00:00:00Z 0.756581078140122
name: Klima_Power
time integral
---- --------
2019-07-18T00:00:00Z 0.05271264777916669
I want to create a graph in Grafana that shows stacked bars for multiple Measurements.
It works, but some Measurements are listet multiple times at the same time interval.
I guess I some how need to "group" the output so the values of the same measurement are listet in the same table.
Multiple blocks(tables) are getting because you are executing select statement with group by clause. If you prefer to get distinct records as output you can use distinct function.
Help much appreciated - I have a field in Redshift giving data of the form:
{\"frequencyCapList\":[{\"frequencyCapped\":true,\"frequencyCapPeriodCount\":1,\"frequencyCapPeriodType\":\"DAYS\",\"frequencyCapCount\":501}]}
What I would like to do is parse this cleanly as the output of a Redshift query into some columns like:
Frequency Cap Period Count | Frequency Cap Period Type | Frequency Cap Count
1 | DAYS | 501
I believe I need to use the regexp_subst function to achieve this but I cannot work out the syntax to get the required output :(
Thanks in advance for any assistance,
Carter
Here you go
select json_extract_path_text(json_extract_array_element_text(json_extract_path_text(replace('{\"frequencyCapList\":[{\"frequencyCapped\":true,\"frequencyCapPeriodCount\":1,\"frequencyCapPeriodType\":\"DAYS\",\"frequencyCapCount\":501}]}','\\',''),'frequencyCapList'),0),'frequencyCapPeriodCount');
just replace the last string with each one you want to extract!