Get Last Record for each day in a Group - amazon-athena

I am trying to get Last/Latest record for each day. For 14th Dec there are two records I want to get 5 PM record as it is the last record for that day but if you see the row_number(rn = 2 & 3), so it does not pick this record and only give me pointintime record that is of today (15th Dec as its rn = 1).
ITCFID ControlId ResourceId DateTime rn
P05.01.03 CloudFront.1 AWS:::Acount:1111111111111 12/14/2021 5:00 PM 2
P05.01.03 CloudFront.1 AWS:::Acount:1111111111111 12/14/2021 06:01 AM 3
This is the query I am using:
WITH
Pointintimesecurityfindings AS (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY ResourceId,ControlId,itcfid ORDER BY DateTime DESC) rn
FROM itcf_final_summary_dashboard
)
SELECT *
FROM Pointintimesecurityfindings
WHERE rn = 1
For a particular ITCFID -> There can be multiple Control ID and Control ID can have multiple Resource ID. I want to get for a particular itcfid -> Unique Control ID -> Unique ResourceID -> Get latest record for that day.

If you want the latest per day, you need to partition by day e.g.
SQL Server solution as originally tagged:
ROW_NUMBER() OVER (PARTITION BY
ResourceId
, ControlId
, itcfid
, DATEPART(year,[DateTime])
, DATEPART(dayofyear,[DateTime]
) ORDER BY [DateTime] DESC) rn
Presto solution for updated question:
ROW_NUMBER() OVER (PARTITION BY
, ResourceId
, ControlId
, itcfid
, DATE_FORMAT(Datetime,'%Y')
, DATE_FORMAT(Datetime,'%j')
ORDER BY DateTime DESC
) rn

Related

Is there any way to get customer duplicate data from customers table then update all others except 1 row from duplicate data?

I am trying to get duplicate data of my customers table then after finding update isactive column of all duplicates found to 0 except 1 row of the duplicate data.
here is my script using oracle 19c:
merge into customers c
using (
WITH cte AS (
SELECT DISTINCT ROWID, fn_createfullname(firstname, middlename, lastname) as fullName, mobile, branchid, isactive,
ROW_NUMBER() OVER (PARTITION BY fn_createfullname(firstname, middlename, lastname), mobile, branchid ORDER BY ROWID) AS rn
FROM customers
)
select * from cte
WHERE rn > 1
) tbl
on (tbl.mobile = c.mobile and fn_createfullname(c.firstname, c.middlename, c.lastname) = tbl.fullname)
when matched then update
SET c.isactive = 0
WHERE rn > 1;
i am expecting to get all duplicate data then update single row from duplicate data.
plz any help.
after running my query is displaying this error:
Error report - ORA-30926: unable to get a stable set of rows in the
source tables

Get min date from another table with filter

I got two tables. There is a relationship between id and user id column. I want to add a calculated column in the user table with the first transaction of the user, for invoice type 'A'.
When I use:
Calculate(min(Transaction[transaction date])) it works fine, but I need to filter on invoice type. When I use Calculate(min(Transaction[transaction date]),Filter(Transaction,Invoice type="A")) I only get 2021-02-01 and the realtionship does not work.
What is the most efficent way to achieve this?
User table:
ID
Name
1
John Smith
2
Alan Walker
Transaction table:
user id
transaction date
Invoice type
1
2021-02-01
A
1
2021-02-25
A
1
2021-02-25
B
2
2021-03-05
A
2
2021-01-23
B
Here is the correct DAX code for your calculated column, just drop the FILTER statement, since that changes your filter context within the CALCULATE to look at all rows where Invoice type = "A", regardless of User ID.
Minimum Date =
CALCULATE (
MIN ( 'Transaction'[transaction date] ),
'Transaction'[Invoice type] = "A"
)
Since you need context transition to provide you the row context from the Users table to the Transactions table, you can alternatively use this sort of filtering statement, where the table you are filtering on is also provided within the current filter context of your row in Users:
Min Date =
CALCULATE (
MIN ( 'Transaction'[transaction date] ) ,
FILTER (
CALCULATETABLE ( 'Transaction' ) ,
'Transaction'[Invoice type] = "A"
)
)

How to find missing dates in BigQuery table using sql

How to get a list of missing dates from a BigQuery table. For e.g. a table(test_table) is populated everyday by some job but on few days the jobs fails and data isn't written into the table.
Use Case:
We have a table(test_table) which is populated everyday by some job( a scheduled query or cloud function).Sometimes those job fail and data isn't available for those particular dates in my table.
How to find those dates rather than scrolling through thousands of rows.
The below query will return me a list of dates and ad_ids where data wasn't uploaded (null).
note: I have used MAX(Date) as I knew dates was missing in between my boundary dates. For safe side you can also specify the starting_date and ending_date incase data hasn't been populated in the last few days at all.
WITH Date_Range AS
-- anchor for date range
(
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM `project_name.dataset_name.test_table`
),
day_series AS
-- anchor to get all the dates within the range
(
SELECT *
FROM Date_Range
,UNNEST(GENERATE_TIMESTAMP_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) AS days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
original_table.ad_id
FROM day_series
-- do a left join on the source table
LEFT JOIN `project_name.dataset_name.test_table` AS original_table ON (original_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE original_table.ad_id IS NULL
GROUP BY 1,2
ORDER BY 1
Final output will look like below:
An Alternate solution you can try following query to get desired output:-
with t as (select 1 as id, cast ('2020-12-25' as timestamp) Days union all
select 1 as id, cast ('2020-12-26' as timestamp) Days union all
select 1 as id, cast ('2020-12-27' as timestamp) Days union all
select 1 as id, cast ('2020-12-31' as timestamp) Days union all
select 1 as id, cast ('2021-01-01' as timestamp) Days union all
select 1 as id, cast ('2021-01-04' as timestamp) Days)
SELECT *
FROM (
select TIMESTAMP_ADD(Days, INTERVAL 1 DAY) AS Days, TIMESTAMP_SUB(next_days, INTERVAL 1 DAY) AS next_days from (
select t.Days,
(case when lag(Days) over (partition by id order by Days) = Days
then NULL
when lag(Days) over (partition by id order by Days) is null
then Null
else Lead(Days) over (partition by id order by Days)
end) as next_days
from t) where next_days is not null
and Days <> TIMESTAMP_SUB(next_days, INTERVAL 1 DAY)),
UNNEST(GENERATE_TIMESTAMP_ARRAY(Days, next_days, INTERVAL 1 DAY)) AS days
Output will be as :-
I used the code above but had to restructure it for BigQuery:
-- anchor for date range - this will select dates from the source table (i.e. the table your query runs off of)
WITH day_series AS(
SELECT *
FROM (
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM --enter source table here--
---OPTIONAL: filter for a specific date range
WHERE DATE BETWEEN 'YYYY-MM-DD' AND YYYY-MM-DD'
),UNNEST(GENERATE_DATE_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) as days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
output_table.date
FROM day_series
-- do a left join on the output table (i.e. the table you are searching the missing dates for)
LEFT JOIN `project_name.dataset_name.test_table` AS output_table
ON (output_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE output_table.date IS NULL
GROUP BY 1,2
ORDER BY 1

How to calculate working days on ORACLE APEX?

I have 2 tables; employee and time. For time table, I want to take id information automatically from employee table and subtracting every employees' start date of work from employee table from system date. If result = 1 year plus, result will be automatically added to the time table.
employee.hire_date date /
time.year number
I want it to do this (time.year = sysdate - employee.hire_date) process automatically.
For example, the employee started on 21.01.2019. Today(21.01.2020) write '1' on time table year column automatically.
Thanks.
Create times as a view on the employees table.
Oracle Setup:
CREATE TABLE employees ( id, start_date ) AS
SELECT 1, DATE '2019-01-21' FROM DUAL UNION ALL
SELECT 2, DATE '2019-06-21' FROM DUAL UNION ALL
SELECT 3, DATE '2015-01-01' FROM DUAL;
Create View:
CREATE VIEW times ( id, years ) AS
SELECT id,
FLOOR( MONTHS_BETWEEN( SYSDATE, start_date ) / 12 )
FROM employees;
Output:
SELECT *
FROM times;
gives:
ID | YEARS
-: | ----:
1 | 1
2 | 0
3 | 5
db<>fiddle here

update with sub query successful but not updating

We have wrong duplicate id loaded in the table and we need to correct it.
The rules to update the id is whenever there is a time difference of more than 30 min, the id should be new/unique.
I have written the query to filter that out, however update is not happening
The below query is there to find the ids to be updated.
For testing I have used a particular id.
select id,
BEFORE_TIME,
TIMESTAMP,
datediff(minute,BEFORE_TIME,TIMESTAMP) time_diff,
row_number() over (PARTITION BY id ORDER BY TIMESTAMP) rowno,
concat(id,to_varchar(rowno)) newid from
(SELECT id,
TIMESTAMP,
LAG(TIMESTAMP_EST) OVER (PARTITION BY visit_id ORDER BY TIMESTAMP) as BEFORE_TIME
FROM table_name t
where id = 'XX1X2375'
order by TIMESTAMP_EST)
where BEFORE_TIME is not NULL and time_diff > 30
order by time_diff desc
;
And i could see the 12 record with same id and time difference more than 30.
However when I am trying to update. the query is succesfull but nothing is getting update.
update table_name t
set t.id = c.newid
from
(select id ,
BEFORE_TIME,
TIMESTAMP,
datediff(minute,BEFORE_TIME,TIMESTAMP) time_diff,
row_number() over (PARTITION BY id ORDER BY TIMESTAMP) rowno,
concat(id,to_varchar(rowno)) newid from
(SELECT id,
TIMESTAMP,
LAG(TIMESTAMP) OVER (PARTITION BY visit_id ORDER BY TIMESTAMP) as BEFORE_TIME
FROM table_name t
where id = 'XX1X2375'
order by TIMESTAMP_EST)
where BEFORE_TIME is not NULL and time_diff > 30
order by time_diff desc) c
where t.id = c.id
and t.timestamp = c.BEFORE_TIME
;