How to pluck the oldest child record from a join in BigQuery SQL where ordered by date and limit 1 - google-cloud-platform

I'm analyzing Shopify Schema Data with BiqQuery in GCP and trying to join the Shopify Customers data to the child record that is Orders data so that I can find the very first Order for each Customer created in a certain time, but limit that to 1...like order by created_at asc, limit 1. I'm not really able to figure this out
Select
customers.created_at as customer,
customers.orders_count,
orders.created_at,
orders.order_number
FROM `shopxxx.shopify.customers` as customers
Join `shopxxx.shopify.orders` as orders
on customer = orders.customer
WHERE
customers.orders_count != 0
AND customers.created_at > "2018-09-12 00:00:00 UTC" and customers.created_at < "2019-09-12 23:59:59 UTC"
and orders.source_name != 'web'
order by customers.created_at desc

Below is for BigQuery Standard SQL
#standardSQL
SELECT
customers.customer_id,
customers.created_at AS customer,
customers.orders_count,
ARRAY_AGG(STRUCT(orders.created_at AS created_at, orders.order_number AS order_number) ORDER BY orders.created_at LIMIT 1)[OFFSET(0)].*
FROM `shopxxx.shopify.customers` AS customers
JOIN `shopxxx.shopify.orders` AS orders
ON customers.customer_id = orders.customer_id
WHERE customers.orders_count != 0
AND customers.created_at > "2018-09-12 00:00:00 UTC" AND customers.created_at < "2019-09-12 23:59:59 UTC"
AND orders.source_name != 'web'
GROUP BY 1, 2, 3
ORDER BY customers.created_at DESC
or different approach
#standardSQL
SELECT
customers.customer_id,
customers.created_at AS customer,
customers.orders_count,
orders.created_at AS created_at,
orders.order_number AS order_number
FROM `shopxxx.shopify.customers` AS customers
JOIN (
SELECT customer_id, ARRAY_AGG(STRUCT(created_at, order_number) ORDER BY created_at LIMIT 1)[OFFSET(0)].*
FROM `shopxxx.shopify.orders`
WHERE source_name != 'web' GROUP BY customer_id
) AS orders
ON customers.customer_id = orders.customer_id
WHERE customers.orders_count != 0
AND customers.created_at > "2018-09-12 00:00:00 UTC" AND customers.created_at < "2019-09-12 23:59:59 UTC"
ORDER BY customers.created_at DESC

Related

DAX Cycle Time at column level which can then be AVERAGE'd up (Parameter is not the correct type)

I'm trying to calculate cycle times at row level, based on two slicers. Which I can then use in a measure to get Averages e.g. per client.
I have two pre-populated tables
CycleStartDateOptions
Id StartDateName
1 [Incident Date]
2 [Examination Date]
CycleEndDateOptions
Id StartDateName
3 [Resolution Date]
4 [Closed Date]
Then a measure on CycleStartDateOptions
SelectedCycleStartDate = SELECTEDVALUE(CycleStartDateOptions[Id])
.. and CycleEndDateOptions
SelectedCycleEndDate = SELECTEDVALUE(CycleEndDateOptions[Id])
Example MainTable Data:
ClientID [IncidentId] [Incident Date] [Examination Date] [Resolution Date] [Closed Date]
C0001 I00001 2020-01-01 2020-02-01 2020-03-01 2020-04-01
C0001 I00002 2020-01-01 2020-03-01 2020-04-01 2020-05-01
C0002 I00003 2021-01-02 2021-02-02 2021-03-02 2020-04-02
C0002 I00004 2021-01-02 2021-03-02 2021-04-02 2020-05-02
I have two measures on MainTable:
CycleStartDateSwitch = SWITCH(CycleStartDateOptions[SelectedCycleStartDate],
1,MIN('MainTable'[Incident Date]),
2,MIN('MainTable'[Examination Date]))
CycleEndDateSwitch = SWITCH(CycleEndDateOptions[SelectedCycleEndDate],
3,MIN('MainTable'[Resolution Date]),
4,MIN('MainTable'[Closed Date]))
Finally to generate my dynamic cycle time, I have this measure:
CycleTimeDynamic = DATEDIFF('MainTable'[CycleStartDateSwitch], 'MainTable'[CycleEndDateSwitch], DAY)
Phew! This works when shown in a table and for each incident I see a value for Cycle Time between the user selected start and end dates e.g. Incident Date to Resolution Date.
The problem now is, the above is done as measures. But I need to be able to filter on date ranges e.g. Incident Date > 2020-01-01 and 2022-01-01 and get Average cycle times per client. When I try to create a new measure it underlines my CycleTimeDynamic measure and says "Parameter is not the correct type".
I tried doing the CycleStartDateSwitch and CycleEndDateSwitch as Columns using e.g.
CycleStartDateSwitch = SWITCH(CycleStartDateOptions[SelectedCycleStartDate],
1,'MainTable'[Incident Date],
2,'MainTable'[Examination Date])
.. but it won't surface any data.
Any ideas what I'm doing wrong? This would be easy in SQL and I'm sure it's doable in DAX but I'm struggling with what should be Columns and Measures.
What I want to be able to see:
ClientID [Incident Count] [Average Cycle Time]
C0001 5 123
C0002 8 345
Any help much appreciated!

Update column based on lead function in oracle

I have valid_from and valid_to columns in a table.
I need to update the valid_to column based on the next row of valid_from column.
Please help me on this.
Current
Runid
Valid_from
valid_to
1
1-Jan-21
10-Jan-21
1
11-Jan-21
11-Jan-21
1
15-Jan-21
17-Jan-21
1
18-Jan-21
1-Jan-00
Desired
Runid
Valid_from
valid_to
1
1-Jan-21
11-Jan-21
1
11-Jan-21
15-Jan-21
1
15-Jan-21
18-Jan-21
1
18-Jan-21
1-Jan-00
You can use the LEAD function and correlate on the ROWID pseudo-column:
UPDATE table_name t
SET valid_to = (SELECT next_from
FROM (
SELECT LEAD(valid_from, 1, valid_to)
OVER (PARTITION BY runid ORDER BY valid_from) AS next_from
FROM table_name
) x
WHERE x.ROWID = t.ROWID
)
Which, for the sample data:
CREATE TABLE table_name (Runid, Valid_from, valid_to) AS
SELECT 1, DATE '2021-01-01', DATE '2021-01-10' FROM DUAL UNION ALL
SELECT 1, DATE '2021-01-11', DATE '2021-01-11' FROM DUAL UNION ALL
SELECT 1, DATE '2021-01-15', DATE '2021-01-17' FROM DUAL UNION ALL
SELECT 1, DATE '2021-01-18', DATE '4000-01-11' FROM DUAL;
Updates the table to:
RUNID
VALID_FROM
VALID_TO
1
2021-01-01 00:00:00
2021-01-11 00:00:00
1
2021-01-11 00:00:00
2021-01-15 00:00:00
1
2021-01-15 00:00:00
2021-01-18 00:00:00
1
2021-01-18 00:00:00
4000-01-01 00:00:00
db<>fiddle here

Show (count) and filter active employees based on date slicer

what i am trying to achieve is the following,
i have a (flat) table with employees containing first name,last name, hire date and termination date.
i would like to filter the table and also count the active ones based on a date slicer.
January 2022 - 200 total employees - 190 active employees - 10 terminated employees
the issues that i am facing is that if an employee was terminated on 01/10/2022 and I choose the date of 01/09/2022, that employee should appear on the list because he was active on that date.
i m coming from this topic https://community.powerbi.com/t5/Desktop/List-of-active-employees-on-a-date/td-p/1609370 -- but i do not have a status of Active/Terminated,just dates.
any thoughts?
If you want to count emp status for end of period, use a measure:
CountOfActive =
var _selectedDate = MAX('Calendar'[Date])
return
CALCULATE(COUNTROWS('employee'), filter(ALL(employee), employee[Hire Date] <= VALUE(_selectedDate) && (employee[Termination Date] >= VALUE(_selectedDate) || ISBLANK(employee[Termination Date]))))
CountOfTerminated =
var _selectedDate = MAX('Calendar'[Date])
return
CALCULATE(COUNTROWS('employee'), filter(ALL(employee), employee[Hire Date] <= VALUE(_selectedDate) && (employee[Termination Date] < VALUE(_selectedDate) )))

i am looking to get the date diff from two or more row in a way that first rows serviceto date - second rows service start date so that i can get diff

my data looks like this
userid
completedat
serviceperiodfrom
serviceperiodto
00002cd9-94eb-4c06-a2c4-75253fd541b9
2020-11-25T14:20:04.293Z
2020-11-25T14:20:04.200Z
2021-02-25T14:20:04.200Z
00002cd9-94eb-4c06-a2c4-75253fd541b9
2021-03-21T10:27:34.842Z
2021-03-21T10:27:34.800Z
2022-03-21T10:27:34.800Z
00002cd9-94eb-4c06-a2c4-75253fd541b9
2020-07-24T11:22:12.410Z
2020-07-24T11:22:12.300Z
2020-10-24T11:22:12.300Z
I need the date diff from serviceperiodto date of first row - serviceperiodfrom date of secondrow and it goes for as many iteration as it has these details for each userid
please help me i tried joining the tables using subqueries tried to create a pivot table but none of them seem working for me please help
You can use lag/lead to access previous/next item:
WITH dataset
AS (SELECT *
FROM
(
VALUES
(1, from_iso8601_timestamp('2020-11-25T14:20:04.200Z'), from_iso8601_timestamp('2021-02-25T14:20:04.200Z')),
(1, from_iso8601_timestamp('2021-03-21T10:27:34.800Z'), from_iso8601_timestamp('2022-03-21T10:27:34.800Z')),
(1, from_iso8601_timestamp('2020-07-24T11:22:12.300Z'), from_iso8601_timestamp('2020-10-24T11:22:12.300Z'))
) AS t (userid, serviceperiodfrom, serviceperiodto)
)
SELECT date_diff(
'hour',
serviceperiodto,
lead(serviceperiodfrom, 1) OVER (PARTITION BY userid ORDER BY serviceperiodfrom))
FROM dataset
Output:
_col0
770
572
 

How to write a measure in power bi to count items based on their status as of selected date

I cannot think of a solution for this:
I have a table with the following columns:
Employee id, hired date, fired date, current_status.
current_status is a conditional column showing status of employee for a current date i.e. if fired date is blank the status is "working", else the status is "fired".
I want to create a measure, which would show count of employees by their status as of end of selected month.
E.g an employee's current status may be "fired", but as of end of June 2019 his status was "working", so in the context of previous date he should be counted as working.
You can use an iterator function, such as FILTER or SUMX, to decide the status of each employee based on the dates.
Here is an example to count the number of working employees at the ending date of the currently displayed period.
Working Employees Count =
COUNTROWS (
FILTER (
Employee,
Employee[Hired Date] <= MAX ( 'Calendar'[Date] ) &&
(
ISBLANK ( Employee[Fired Date] ) ||
Employee[Fired Date] > MAX ( 'Calendar'[Date] )
)
)
)
The answer provided above by Kosuke Sakai is incorrect. This measure would not produce any rows for dates where employee status did not change.
For example if we have this kind of Data with just a single employee:
Jan 10 : Active
Jan 12 : Suspended
Jan 14 : Terminated
it would produce 0 for Active employees count for Jan 11, while the correct result would be 1, since the only employee we have is "Active" since Jan 10.