We have wrong duplicate id loaded in the table and we need to correct it.
The rules to update the id is whenever there is a time difference of more than 30 min, the id should be new/unique.
I have written the query to filter that out, however update is not happening
The below query is there to find the ids to be updated.
For testing I have used a particular id.
select id,
BEFORE_TIME,
TIMESTAMP,
datediff(minute,BEFORE_TIME,TIMESTAMP) time_diff,
row_number() over (PARTITION BY id ORDER BY TIMESTAMP) rowno,
concat(id,to_varchar(rowno)) newid from
(SELECT id,
TIMESTAMP,
LAG(TIMESTAMP_EST) OVER (PARTITION BY visit_id ORDER BY TIMESTAMP) as BEFORE_TIME
FROM table_name t
where id = 'XX1X2375'
order by TIMESTAMP_EST)
where BEFORE_TIME is not NULL and time_diff > 30
order by time_diff desc
;
And i could see the 12 record with same id and time difference more than 30.
However when I am trying to update. the query is succesfull but nothing is getting update.
update table_name t
set t.id = c.newid
from
(select id ,
BEFORE_TIME,
TIMESTAMP,
datediff(minute,BEFORE_TIME,TIMESTAMP) time_diff,
row_number() over (PARTITION BY id ORDER BY TIMESTAMP) rowno,
concat(id,to_varchar(rowno)) newid from
(SELECT id,
TIMESTAMP,
LAG(TIMESTAMP) OVER (PARTITION BY visit_id ORDER BY TIMESTAMP) as BEFORE_TIME
FROM table_name t
where id = 'XX1X2375'
order by TIMESTAMP_EST)
where BEFORE_TIME is not NULL and time_diff > 30
order by time_diff desc) c
where t.id = c.id
and t.timestamp = c.BEFORE_TIME
;
Related
I am trying to get Last/Latest record for each day. For 14th Dec there are two records I want to get 5 PM record as it is the last record for that day but if you see the row_number(rn = 2 & 3), so it does not pick this record and only give me pointintime record that is of today (15th Dec as its rn = 1).
ITCFID ControlId ResourceId DateTime rn
P05.01.03 CloudFront.1 AWS:::Acount:1111111111111 12/14/2021 5:00 PM 2
P05.01.03 CloudFront.1 AWS:::Acount:1111111111111 12/14/2021 06:01 AM 3
This is the query I am using:
WITH
Pointintimesecurityfindings AS (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY ResourceId,ControlId,itcfid ORDER BY DateTime DESC) rn
FROM itcf_final_summary_dashboard
)
SELECT *
FROM Pointintimesecurityfindings
WHERE rn = 1
For a particular ITCFID -> There can be multiple Control ID and Control ID can have multiple Resource ID. I want to get for a particular itcfid -> Unique Control ID -> Unique ResourceID -> Get latest record for that day.
If you want the latest per day, you need to partition by day e.g.
SQL Server solution as originally tagged:
ROW_NUMBER() OVER (PARTITION BY
ResourceId
, ControlId
, itcfid
, DATEPART(year,[DateTime])
, DATEPART(dayofyear,[DateTime]
) ORDER BY [DateTime] DESC) rn
Presto solution for updated question:
ROW_NUMBER() OVER (PARTITION BY
, ResourceId
, ControlId
, itcfid
, DATE_FORMAT(Datetime,'%Y')
, DATE_FORMAT(Datetime,'%j')
ORDER BY DateTime DESC
) rn
I'm trying to dynamically calculate the % Change of the prices in a table, where each price will be compared with the first price (Price in the first record, ordered by date). There is a date filter, so the first record being displayed will change accordingly. Date column values are unique.
Assume the user applies a filter for date BETWEEN '15-APR-2021' AND '30-APR-2021'. Then the expected '% Change' column would look like:
For this example, I had to hardcode the starting price in the calculation:
% Change = (('table1'[price]/3218.95) -1)*100
I tried the below, but it doesn't return a static value of 3218.95. Instead, it re-calculates it at record level rather filtered table level as we see in the above screenshot :
first_date = (MIN(table1[Date]))
first_price = LOOKUPVALUE('table1'[price],table1[Date],'table1'[first_date])
% Change = (('table1'[price]/first_price) -1)*100
I'm new to PowerBI DAX. Logically the SQL would look like so:
SELECT
date,
price,
((
price /
( -- Gets the first price
SELECT price FROM table1
WHERE date IN (SELECT MIN(date) FROM table1 WHERE date BETWEEN '15-APR-2021' AND '30-APR-2021')
)
)-1) * 100 as '% change'
FROM table1
WHERE date BETWEEN '15-APR-2021' AND '30-APR-2021'
IF you want to get the first price you can use the following DAX:
first_price = CALCULATE(MIN('table1'[price]), FILTER( 'table1', MIN(table1[Date])))
As for the % Change:
% Change =
var curPrice = 'table1'[price]
var first_price = CALCULATE(MIN('table1'[price]), FILTER( 'table1', MIN(table1[Date])))
return ((curPrice/first_price) - 1) * 100
How to get a list of missing dates from a BigQuery table. For e.g. a table(test_table) is populated everyday by some job but on few days the jobs fails and data isn't written into the table.
Use Case:
We have a table(test_table) which is populated everyday by some job( a scheduled query or cloud function).Sometimes those job fail and data isn't available for those particular dates in my table.
How to find those dates rather than scrolling through thousands of rows.
The below query will return me a list of dates and ad_ids where data wasn't uploaded (null).
note: I have used MAX(Date) as I knew dates was missing in between my boundary dates. For safe side you can also specify the starting_date and ending_date incase data hasn't been populated in the last few days at all.
WITH Date_Range AS
-- anchor for date range
(
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM `project_name.dataset_name.test_table`
),
day_series AS
-- anchor to get all the dates within the range
(
SELECT *
FROM Date_Range
,UNNEST(GENERATE_TIMESTAMP_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) AS days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
original_table.ad_id
FROM day_series
-- do a left join on the source table
LEFT JOIN `project_name.dataset_name.test_table` AS original_table ON (original_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE original_table.ad_id IS NULL
GROUP BY 1,2
ORDER BY 1
Final output will look like below:
An Alternate solution you can try following query to get desired output:-
with t as (select 1 as id, cast ('2020-12-25' as timestamp) Days union all
select 1 as id, cast ('2020-12-26' as timestamp) Days union all
select 1 as id, cast ('2020-12-27' as timestamp) Days union all
select 1 as id, cast ('2020-12-31' as timestamp) Days union all
select 1 as id, cast ('2021-01-01' as timestamp) Days union all
select 1 as id, cast ('2021-01-04' as timestamp) Days)
SELECT *
FROM (
select TIMESTAMP_ADD(Days, INTERVAL 1 DAY) AS Days, TIMESTAMP_SUB(next_days, INTERVAL 1 DAY) AS next_days from (
select t.Days,
(case when lag(Days) over (partition by id order by Days) = Days
then NULL
when lag(Days) over (partition by id order by Days) is null
then Null
else Lead(Days) over (partition by id order by Days)
end) as next_days
from t) where next_days is not null
and Days <> TIMESTAMP_SUB(next_days, INTERVAL 1 DAY)),
UNNEST(GENERATE_TIMESTAMP_ARRAY(Days, next_days, INTERVAL 1 DAY)) AS days
Output will be as :-
I used the code above but had to restructure it for BigQuery:
-- anchor for date range - this will select dates from the source table (i.e. the table your query runs off of)
WITH day_series AS(
SELECT *
FROM (
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM --enter source table here--
---OPTIONAL: filter for a specific date range
WHERE DATE BETWEEN 'YYYY-MM-DD' AND YYYY-MM-DD'
),UNNEST(GENERATE_DATE_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) as days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
output_table.date
FROM day_series
-- do a left join on the output table (i.e. the table you are searching the missing dates for)
LEFT JOIN `project_name.dataset_name.test_table` AS output_table
ON (output_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE output_table.date IS NULL
GROUP BY 1,2
ORDER BY 1
The following BigQuery DELETE query fails by a timeout, because it reaches the limit of 6 hours of execution time:
DELETE animals A WHERE EXISTS
(SELECT id from pets P WHERE A.id = P.id)
Table animals has ~50.000.000.000 records.
Table pets has ~300.000 records.
Tables are not partitioned.
Edit:
Seems like this query does not give any improvement:
DELETE animals WHERE id IN
(SELECT id from pets)
SELECT id FROM(
SELECT id, tbl, DENSE_RANK OVER(PARTITION BY id ORDER BY tbl) AS rk FROM (
SELECT id, 1 AS tbl FROM animals
UNION ALL
SELECT id, 0 AS tbl FROM pets)
)
) WHERE rk = 1 AND tbl = 1;
This code will give you all the ids from animals which do not exist in pets.
If id is unique in animals you can use ROW_NUMBER() instead of DENSE_RANK().
I have an SQL statement like this:
SELECT TOP 100 id,
lastname,
firstname,
address1,
city,
state,
zip
FROM leads
WHERE id > 100
ORDER BY id ASC
Now I want the ZIP to be different (not dup) for 100 result with 1 query statement like that.
SELECT TOP 100 l.id,
l.lastname,
l.firstname,
l.address1,
l.city,
l.state,
l.zip
FROM leads l
WHERE l.id = (select MIN (id) FROM leads l2 where l2.zip=l.zip)
ORDER BY l.id ASC