Cannot find running sum of a blended chart in data studio - google-cloud-platform

I'm trying to create a chart for following big query using data studio. Instead, auto generating the chart from the GCP. I'm trying to create chart using tools in the data studio.
SELECT t.timestamp, sum(t.introduced_violation)
OVER(
PARTITION BY t.introduced_user_id
ORDER BY t.timestamp desc
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
as cumulative_introduced_violation,
sum(t.fixed_violation)
OVER(
PARTITION BY t.introduced_user_id
ORDER BY t.timestamp desc
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
as cumulative_fixed_violation,
FROM (SELECT SUM(CASE
WHEN is_fixed = 1 THEN 1
ELSE 0
END) AS fixed_violation,
SUM(1) AS introduced_violation,
timestamp, introduced_user_id
FROM `project_id.violation.table_name`
where introduced_user_id = 'username#company.com'
and timestamp >=1622556834000
and timestamp <=1631231999999
group by timestamp, introduced_user_id
order by timestamp desc) as t;
Expected output from the query:
At first, I try to create chart for the inner query (below). I succeed this step by creating 2 charts and blending them together.
SELECT SUM(CASE
WHEN is_fixed = 1 THEN 1
ELSE 0
END) AS fixed_violation,
SUM(1) AS introduced_violation,
timestamp, introduced_user_id
FROM `project_id.violation.table_name`
where introduced_user_id = 'username#company.com'
and timestamp >=1622556834000
and timestamp <=1631231999999
group by timestamp, introduced_user_id
order by timestamp desc;
Expected output from inner query:
As in the query output for introduced_violation and fixed_violation values are RunningSUM values.
Is there is a way to find the RunningSUM of introduced_violation and fixed_violation columns in the blended charts or some other way to do the whole scenario?

Related

Is there any way to get customer duplicate data from customers table then update all others except 1 row from duplicate data?

I am trying to get duplicate data of my customers table then after finding update isactive column of all duplicates found to 0 except 1 row of the duplicate data.
here is my script using oracle 19c:
merge into customers c
using (
WITH cte AS (
SELECT DISTINCT ROWID, fn_createfullname(firstname, middlename, lastname) as fullName, mobile, branchid, isactive,
ROW_NUMBER() OVER (PARTITION BY fn_createfullname(firstname, middlename, lastname), mobile, branchid ORDER BY ROWID) AS rn
FROM customers
)
select * from cte
WHERE rn > 1
) tbl
on (tbl.mobile = c.mobile and fn_createfullname(c.firstname, c.middlename, c.lastname) = tbl.fullname)
when matched then update
SET c.isactive = 0
WHERE rn > 1;
i am expecting to get all duplicate data then update single row from duplicate data.
plz any help.
after running my query is displaying this error:
Error report - ORA-30926: unable to get a stable set of rows in the
source tables

Retrieving the row with the greatest timestamp in questDB

I'm currently running QuestDB 6.1.2 on linux. How do I get the row with maximum value from a table? I have tried the following on a test table with around 5 million rows:
select * from table where cast(timestamp as symbol) in (select cast(max(timestamp) as symbol) from table );
select * from table inner join (select max(timestamp) mm from table ) on timestamp >= mm
select * from table where timestamp = max(timestamp)
select * from table where timestamp = (select max(timestamp) from table )
where 1 is correct but runs in ~5s, 2 is correct and runs in ~500ms but looks unnecessarily verbose for a query, 3 compiles but returns an empty table, and 4 is incorrect syntax although that's how sql usually does it
select * from table limit -1 works. QuestDB returns rows sorted by timestamp as default, and limit -1 takes the last row, which happens to be the row with the greatest timestamp. To be explicit about ordering by timestamp, select * from table order by timestamp limit -1 could be used instead. This query runs in around 300-400ms on the same table.
As a side note, the third query using timestamp=max(timestamp) doesn't work yet since QuestDB does not support subqueries in where yet (questDB 6.1.2).

How to find missing dates in BigQuery table using sql

How to get a list of missing dates from a BigQuery table. For e.g. a table(test_table) is populated everyday by some job but on few days the jobs fails and data isn't written into the table.
Use Case:
We have a table(test_table) which is populated everyday by some job( a scheduled query or cloud function).Sometimes those job fail and data isn't available for those particular dates in my table.
How to find those dates rather than scrolling through thousands of rows.
The below query will return me a list of dates and ad_ids where data wasn't uploaded (null).
note: I have used MAX(Date) as I knew dates was missing in between my boundary dates. For safe side you can also specify the starting_date and ending_date incase data hasn't been populated in the last few days at all.
WITH Date_Range AS
-- anchor for date range
(
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM `project_name.dataset_name.test_table`
),
day_series AS
-- anchor to get all the dates within the range
(
SELECT *
FROM Date_Range
,UNNEST(GENERATE_TIMESTAMP_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) AS days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
original_table.ad_id
FROM day_series
-- do a left join on the source table
LEFT JOIN `project_name.dataset_name.test_table` AS original_table ON (original_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE original_table.ad_id IS NULL
GROUP BY 1,2
ORDER BY 1
Final output will look like below:
An Alternate solution you can try following query to get desired output:-
with t as (select 1 as id, cast ('2020-12-25' as timestamp) Days union all
select 1 as id, cast ('2020-12-26' as timestamp) Days union all
select 1 as id, cast ('2020-12-27' as timestamp) Days union all
select 1 as id, cast ('2020-12-31' as timestamp) Days union all
select 1 as id, cast ('2021-01-01' as timestamp) Days union all
select 1 as id, cast ('2021-01-04' as timestamp) Days)
SELECT *
FROM (
select TIMESTAMP_ADD(Days, INTERVAL 1 DAY) AS Days, TIMESTAMP_SUB(next_days, INTERVAL 1 DAY) AS next_days from (
select t.Days,
(case when lag(Days) over (partition by id order by Days) = Days
then NULL
when lag(Days) over (partition by id order by Days) is null
then Null
else Lead(Days) over (partition by id order by Days)
end) as next_days
from t) where next_days is not null
and Days <> TIMESTAMP_SUB(next_days, INTERVAL 1 DAY)),
UNNEST(GENERATE_TIMESTAMP_ARRAY(Days, next_days, INTERVAL 1 DAY)) AS days
Output will be as :-
I used the code above but had to restructure it for BigQuery:
-- anchor for date range - this will select dates from the source table (i.e. the table your query runs off of)
WITH day_series AS(
SELECT *
FROM (
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM --enter source table here--
---OPTIONAL: filter for a specific date range
WHERE DATE BETWEEN 'YYYY-MM-DD' AND YYYY-MM-DD'
),UNNEST(GENERATE_DATE_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) as days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
output_table.date
FROM day_series
-- do a left join on the output table (i.e. the table you are searching the missing dates for)
LEFT JOIN `project_name.dataset_name.test_table` AS output_table
ON (output_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE output_table.date IS NULL
GROUP BY 1,2
ORDER BY 1

Why bigquery can't handle a query processing 4TB data?

I'm trying to run this query
SELECT
id AS id,
ARRAY_AGG(DISTINCT users_ids) AS users_ids,
MAX(date) AS date
FROM
users,
UNNEST(users_ids) AS users_ids
WHERE
users_ids != " 1111"
AND users_ids != " 2222"
GROUP BY
id;
Where users table is splitted table with id column and user_ids (comma separated) column and date column
on a +4TB and it give me resources
Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations.
.. any idea why?
id userids date
1 2,3,4 1-10-20
2 4,5,6 1-10-20
1 7,8,4 2-10-20
so the final result I'm trying to reach
id userids date
1 2,3,4,7,8 2-10-20
2 4,5,6 1-10-20
Execution details:
It's constantly repartitioning - I would guess that you're trying to cramp too much stuff into the aggregation part. Just remove the aggregation part - I don't even think you have to cross join here.
Use a subquery instead of this cross join + aggregation combo.
Edit: just realized that you want to aggregate the arrays but with distinct values
WITH t AS (
SELECT
id AS id,
ARRAY_CONCAT_AGG(ARRAY(SELECT DISTINCT uids FROM UNNEST(user_ids) as uids WHERE
uids != " 1111" AND uids != " 2222")) AS users_ids,
MAX(date) OVER (partition by id) AS date
FROM
users
GROUP BY id
)
SELECT
id,
ARRAY(SELECT DISTINCT * FROM UNNEST(user_ids)) as user_ids
,date
FROM t
Just the draft I assume id is unique but it should be something along those lines? Grouping by arrays is not possible ...
array_concat_agg() has no distinct so it comes in a second step.

Filter sql query results using static values from "Select list" page item in Oracle APEX 5.0

I'm looking for a way to filter and display records using static values from a "Select list" page item. I have created a bar chart in APEX 5.0 using the following query:
select to_char(to_date(time_stamp,'YYYY-MM-DD-HH24:MI:SS'),'YYYY-MM-DD-HH24:MI:SS') as label, col2 as value from table1 where :P5_NEW_1 = col1 ;
The time_stamp column of table1 is of datatype varchar2 in my database and contains the date values in the format YYYY-MM-DD-HH24:MI:SS
e.g., values like below are stored in the time_stamp column
2015-08-26-10:17:15
2015-08-26-13:17:15
2015-09-17-12:45:54
2015-09-17-14:12:32
2015-10-06-10:01:42
2015-10-06-11:01:28
2015-10-06-05:01:28
and so on...
I have added a "Select list" item named "interval" on my form that contains a pre-populated list of values like 1hr, 6hrs.
Now I want to modify the above query such that following should happen:
For the value of 1hr selected from the drop down list, the query should check the time_stamp column and display the records for the last one hour
(i.e., records which fall in range to_char(sysdate - 1/24 ,'YYYY-MM-DD-HH24:MI:SS') to to_char(sysdate ,'YYYY-MM-DD-HH24:MI:SS')
And for the value of 6hrs selected from the drop-down list, the query should display records for the last 6 hours.
(i.e., records which fall in range to_char(sysdate - 6/24 ,'YYYY-MM-DD-HH24:MI:SS' to to_char(sysdate ,'YYYY-MM-DD-HH24:MI:SS') )
How do I modify my SQL query to add the above condition in my current SQL query, using static values from page item select list?
Hi you can try this:
SELECT to_char(to_date(time_stamp,'YYYY-MM-DD-HH24:MI:SS'),'YYYY-MM-DD-HH24:MI:SS') AS label, col2 AS VALUE
FROM table1
WHERE :P5_NEW_1 = col1 AND
to_date(time_stamp,'yyyy-mm-dd hh24:mi:ss') BETWEEN
(SYSDATE - (to_number(:SELECT_LIST_ID,'99')/24)) AND SYSDATE;
Just make sure that the static values has a return value equal to hr itself
For example, if static value is equal to 1HR then return value should be 1
:SELECT_LIST_ID is the select list id which holds the static values for the interval.
Hope this helps.