How to add current session time to each event in BigQuery? - google-cloud-platform

I've got some data that looks similar to this:
I want to add a column that contains the start time of the session that each event occurred in so that the output looks something like this:
The session_start_time column is based on the session_start event.
I've tried using partitions in analytic functions but to do so I need values that are the same in each row to start with and if I had that I would have solved my problem.
I've also tried FIRST_VALUE with a window function but I haven't managed to pull only the events where the event_name is "session_start" because I can't see a way to filter inside window functions.
How can I achieve this using Standard SQL on BigQuery?
Below is a sample query that includes the sample data:
WITH user_events AS (
SELECT
1 AS user_id,
'session_start' AS event_name,
0 AS event_time
UNION ALL SELECT 1, 'video_play', 2
UNION ALL SELECT 1, 'ecommerce_purchase', 3
UNION ALL SELECT 1, 'session_start', 100
UNION ALL SELECT 1, 'video_play', 105
)
SELECT
user_id,
event_name,
event_time
FROM
user_events
ORDER BY
event_time

#standardSQL
WITH user_events AS (
SELECT 1 AS user_id, 'session_start' AS event_name, 0 AS event_time UNION ALL
SELECT 1, 'video_play', 2 UNION ALL
SELECT 1, 'ecommerce_purchase', 3 UNION ALL
SELECT 1, 'session_start', 100 UNION ALL
SELECT 1, 'video_play', 105
)
SELECT
user_id,
event_name,
event_time,
MIN(event_time) OVER(PARTITION BY user_id, session) AS session_start_time
FROM (
SELECT
user_id,
event_name,
event_time,
COUNTIF(event_name='session_start') OVER(PARTITION BY user_id ORDER BY event_time) AS session
FROM user_events
)
ORDER BY event_time

Related

why does not my query work in oracle apex?

This is my query but when I run this in oracle apex it gives me the following error:
delete from
(select ename,e1.store_id,e1.sal as highest_sal
from
employees e1 inner join
(select store_id,max(sal) as sal
from employees
group by store_id
) e2
on e1.store_id=e2.store_id
and e1.sal=e2.sal
order by store_id) s
where rowid not in
(select min(rowid) from s
group by highest_sal);
The output is:
ORA-00942: table or view does not exist
ORA-06512: at "SYS.WWV_DBMS_SQL_APEX_210200", line 673
ORA-06512: at "SYS.DBMS_SYS_SQL", line 1658
ORA-06512: at "SYS.WWV_DBMS_SQL_APEX_210200", line 659
ORA-06512: at "APEX_210200.WWV_FLOW_DYNAMIC_EXEC", line 1829
4. (select store_id,max(sal) as sal
5. from employees
6. group by store_id
7. ) e2
8. on e1.store_id=e2.store_id
When I run the code in parentheses, which has the alias s alone, it runs without any problems, but when it is placed in this code, it gives an error
updated: My goal is to first group the data according to store_id and get the maximum sal in each, and join it to the main table itself where sal and store_id are the same, and display its name, which The resulting table is called s. Then I want to remove the duplicate rows from the table (which have the same sal) and to do this we group according to highest_sal and select the least rowid between them, and remove those rowId that are not in the subquery. As a result, non-duplicates are obtained. (This is a trick to remove duplicate lines.)
You appear to want to delete all rows with the highest sal for each store_id grouping except for the row in each group with the lowest ROWID.
You can do that with analytic functions. Either:
DELETE FROM employees
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT RANK() OVER (PARTITION BY store_id ORDER BY sal DESC) AS rnk,
ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY sal DESC, ROWID ASC)
AS rn
FROM employees
)
WHERE rnk = 1
AND rn > 1
);
or:
DELETE FROM employees
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT sal,
MAX(sal) OVER (PARTITION BY store_id) AS max_sal,
MIN(ROWID) KEEP (DENSE_RANK LAST ORDER BY sal)
OVER (PARTITION BY store_id) AS min_rid_for_max_sal
FROM employees
)
WHERE sal = max_sal
AND ROWID != min_rid_for_max_sal
);
Or, from Oracle 12, with row limiting clauses in a correlated sub-query:
DELETE FROM employees e
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT sal
FROM employees x
WHERE e.store_id = x.store_id
ORDER BY sal DESC
FETCH FIRST ROW WITH TIES
)
ORDER BY ROWID
OFFSET 1 ROW FETCH NEXT 100 PERCENT ROWS ONLY
);
Which, for the sample data:
CREATE TABLE employees (ename, store_id, sal) AS
SELECT 'A', 1, 1 FROM DUAL UNION ALL
SELECT 'B', 1, 2 FROM DUAL UNION ALL
SELECT 'C', 1, 3 FROM DUAL UNION ALL
SELECT 'D', 2, 1 FROM DUAL UNION ALL
SELECT 'E', 2, 2 FROM DUAL UNION ALL
SELECT 'F', 2, 2 FROM DUAL;
All delete the f row.
db<>fiddle here

How to do an Update from a Select in Azure?

I need to update a second table with the results of this query:
SELECT Tag, battery, Wearlevel, SensorTime
FROM (
SELECT m.* , ROW_NUMBER() OVER (PARTITION BY TAG ORDER BY SensorTime DESC) AS rn
FROM [dbo].[TELE] m
) m2
where m2.rn = 1;
But. I had a hard time fixing the SET without messing it up. I want to have a table which has all data from last date of each TAG without duplicates.
Below code maybe you want.
UPDATE
Table_A
SET
Table_A.Primarykey = 'ss'+Table_B.Primarykey,
Table_A.AddTime = 'jason_'+Table_B.AddTime
FROM
Test AS Table_A
INNER JOIN UsersInfo AS Table_B
ON Table_A.id = Table_B.id
WHERE
Table_A.Primarykey = '559713e6-0d85-4fe7-87a4-e9ceb22abdcf'
For more details, you also can refer below posts and blogs.
1. How do I UPDATE from a SELECT in SQL Server?
2. How to UPDATE from SELECT in SQL Server

How to find missing dates in BigQuery table using sql

How to get a list of missing dates from a BigQuery table. For e.g. a table(test_table) is populated everyday by some job but on few days the jobs fails and data isn't written into the table.
Use Case:
We have a table(test_table) which is populated everyday by some job( a scheduled query or cloud function).Sometimes those job fail and data isn't available for those particular dates in my table.
How to find those dates rather than scrolling through thousands of rows.
The below query will return me a list of dates and ad_ids where data wasn't uploaded (null).
note: I have used MAX(Date) as I knew dates was missing in between my boundary dates. For safe side you can also specify the starting_date and ending_date incase data hasn't been populated in the last few days at all.
WITH Date_Range AS
-- anchor for date range
(
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM `project_name.dataset_name.test_table`
),
day_series AS
-- anchor to get all the dates within the range
(
SELECT *
FROM Date_Range
,UNNEST(GENERATE_TIMESTAMP_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) AS days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
original_table.ad_id
FROM day_series
-- do a left join on the source table
LEFT JOIN `project_name.dataset_name.test_table` AS original_table ON (original_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE original_table.ad_id IS NULL
GROUP BY 1,2
ORDER BY 1
Final output will look like below:
An Alternate solution you can try following query to get desired output:-
with t as (select 1 as id, cast ('2020-12-25' as timestamp) Days union all
select 1 as id, cast ('2020-12-26' as timestamp) Days union all
select 1 as id, cast ('2020-12-27' as timestamp) Days union all
select 1 as id, cast ('2020-12-31' as timestamp) Days union all
select 1 as id, cast ('2021-01-01' as timestamp) Days union all
select 1 as id, cast ('2021-01-04' as timestamp) Days)
SELECT *
FROM (
select TIMESTAMP_ADD(Days, INTERVAL 1 DAY) AS Days, TIMESTAMP_SUB(next_days, INTERVAL 1 DAY) AS next_days from (
select t.Days,
(case when lag(Days) over (partition by id order by Days) = Days
then NULL
when lag(Days) over (partition by id order by Days) is null
then Null
else Lead(Days) over (partition by id order by Days)
end) as next_days
from t) where next_days is not null
and Days <> TIMESTAMP_SUB(next_days, INTERVAL 1 DAY)),
UNNEST(GENERATE_TIMESTAMP_ARRAY(Days, next_days, INTERVAL 1 DAY)) AS days
Output will be as :-
I used the code above but had to restructure it for BigQuery:
-- anchor for date range - this will select dates from the source table (i.e. the table your query runs off of)
WITH day_series AS(
SELECT *
FROM (
SELECT MIN(DATE) as starting_date,
MAX(DATE) AS ending_date
FROM --enter source table here--
---OPTIONAL: filter for a specific date range
WHERE DATE BETWEEN 'YYYY-MM-DD' AND YYYY-MM-DD'
),UNNEST(GENERATE_DATE_ARRAY(starting_date, ending_date, INTERVAL 1 DAY)) as days
-- other options depending on your date type ( mine was timestamp)
-- GENERATE_DATETIME_ARRAY or GENERATE_DATE_ARRAY
)
SELECT
day_series.days,
output_table.date
FROM day_series
-- do a left join on the output table (i.e. the table you are searching the missing dates for)
LEFT JOIN `project_name.dataset_name.test_table` AS output_table
ON (output_table.date)= day_series.days
-- I only want the records where data is not available or in other words empty/missing
WHERE output_table.date IS NULL
GROUP BY 1,2
ORDER BY 1

Building a table from an existing table

I have a SQL query which generates a table which in turn populates a custom visual, which works as intended. However, I now want to use this table, to avoid calling another SQL script, to populate a new table.
The original SQL creates a table such as this:
Now i want to use this information in order to populate another table, which if I were using the original SQL query as the starting point, I would write:
SELECT
2 AS 'Level',
'Warning' AS 'Text',
[Heading] AS 'Department'
FROM
#t
WHERE
[Inventory Days] > 4
UNION ALL
SELECT
1,
'CodeRedAlert',
[Heading]
FROM
#t
WHERE
[Capacity Warning] > 3
Which would output the following:
Level Text Department
2 Warning Section 1
2 Warning Section 2
2 Warning Section 3
1 CodeRedAlert Section 2
Which could be then used to populate the table visual within Power BI which would have icons for the warnings and code red alert.
Whilst this is achievable within SQL, considering I have the data in a table within Power BI, is there a way of building this new table within the confines of Power BI using DAX?
Cheers for any help provided.
I think you can get to where you want to be with something like this:
Table = UNION(
SELECTCOLUMNS(
FILTER(Table1, [Inventory Days] > 4)
, "Level", 2
, "Text", "Warning"
, "Department", 'Table1'[Heading]
)
, SELECTCOLUMNS(
FILTER(Table1, [Inventory Days] > 3)
, "Level", 1
, "Text", "CodeRedAlert"
, "Department", 'Table1'[Heading]
)
)
This is using the SelectColumns function to pull data out of the original table and set the constant fields. I've replaced the 'where' clause from the SQL example with a Filter. Then the two different sets are stitched together with Union.
Hope it helps

Choose random column value between rows with same ID

I'm a beginner with SQL and I have to do a saved query in Informatica Cloud for a connection with a SQL Server database.
I have a table where the rows with the same formId have the same columns except "possibleSalesman", which is a text column:
formId, email, possibleSalesman
1, email1, user1
1, email1, user2
1, email1, user4
2, email2, user2
3, email3, user3
3, email3, user1
What I need, is to get one row for each id and pick "possibleSalesman" randomly.
For example:
1, email1, user4
2, email2, user2
3, email3, user3
I found kind of a similar question but the solution wouldn't help me because there are a few restrictions in Informatica:
Only SELECT statements
Can't use asterisk to select all columns
Can't use conversion functions
Can't use COUNT function
If someone could help me I would be very grateful!
SELECT
FormId
,Email
,possibleSalesMan
FROM
(
SELECT
FormId
,Email
,possibleSalesMan
,ROW_NUMBER() OVER (PARTITION BY FormId ORDER BY NEWID()) AS RowNumber
FROM
TableName) t
WHERE
t.RowNumber = 1
In SQL Server 2008+ you can use the ROW_NUMBER() window function and NEWID() to achieve a random order and then select the result where ROW_NUMBER() = 1.