Bigquery, how to count events group by week? - google-cloud-platform

I just started working with Bigquery. Data comes from firebase and I noticed that I got the data each day, for example gara-e78a5.analytics_247657392.events_20221231, gara-e78a5.analytics_247657392.events_20221230 etc....
Each row comes with an event_date like under this format 20221231
I want to count the number of people landing on our page each week, but I don't know I to group them by week.
I started something like this but I don't know to group it by week:
SELECT count(event_name) FROM app-xxxx.analytics_247657392.events_* where event_name = 'page_download_view' group by
Thanks in advance for your help

Based on #Ronak, i found the solution.
SELECT week_of_year, sum(nb_download) as nb_download_per_week from (
SELECT DISTINCT EXTRACT (WEEK from (PARSE_DATE('%Y%m%d', event_date))) as week_of_year, count(event_name) as nb_download from `tabllle-e78a5.analytics_XXXXX.events_*` where event_name = 'landing_event_download_apk' group by event_date) group by week_of_year

You can use the WEEK (or ISOWEEK) function.
WEEK: Returns the week number of the date in the range [0, 53]
More: https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions
Formats - https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions#format_timestamp
This should work
select EXTRACT(ISOWEEK FROM(CAST(PARSE_DATE('%Y%m%d', <column>) as TIMESTAMP))) as week_of_year from <table>
Output

Related

I'm trying to get Last month to date in Power BI

I have a Power Bi dashboard tracking several metrics since the beginning of last month . Some of the comparisons I make are MTD vs Last MTD count of metrics like Total users, No of posts and connections made.
MTD(June) and LMTD(May) were working well last month(June) but when we moved to a new month(july) the numbers are off.
Here's my measure
MTD_Users = CALCULATE(COUNTROWS('reporting profiles'), FILTER('reporting profile', MONTH('reporting profile'[date_created])=MONTH(TODAY())))
LMTD_USERS = CALCULATE(COUNTROWS('reporting profiles'), FILTER('reporting profile', MONTH('reporting profile'[date_created])=MONTH(TODAY())-1))
Since July 2nd these measures are not displaying correct figures for MTD(July 1st) and LMTD(June 1st)
Any advice/assistance will be highly appreciated
You need to ride off contex filter and that mean we must use function ALL or REMOVEFILTERS;
https://dax.guide/removefilters/
https://dax.guide/all/
LMTD_USERS = CALCULATE(COUNTROWS('reporting profiles'), FILTER(ALL('reporting profile'[date_created])), MONTH('reporting profile'[date_created])=MONTH(TODAY())-1))

SSAS Tabular/Analysis services Many-to-Many for Multiple curriencies input - Multiple currencies output

I am trying to create on the fly currencies conversion from many currencies input ( InvoicesHeaders rows have differents currencies , so each row have an amount and the currency code for this amount) and many currencies output ( each affiliate want to see figures with it's own currency ).
Therefore I end up in a many to many, join between the InvoiceTable and the currency table. To join them I create in SQL a concatenated field with the day and the currency code.
Then ( reusing tutorial from internet ) I create a calculation doing a lookup from the Invoice to the rate.
Amount adj:=SUMX(Invoices,Invoices[TotalInvoiceAmount]/LOOKUPVALUE(ExchangeRatesPerDay[Rate],ExchangeRatesPerDay[ToCurrencyConcatenatedday112],Invoices[CurrencyCodeConcatenateInvoiceDate112]))
However, when I am trying to use this measure in excel (filtering on one currency at the time of course ) I am getting an error message saying many rows where pass but only one was expected.
From the error message, it looks like the lookup is getting multiple values which is strange because in the excel I am filtering on one currency. Therefore for each combination of day+currencycode there is only one row. I check the SQL using this query
with cte as (
SELECT [RateTypeName]
,[FromCurrency]
,[ToCurrency]
,[StartDate]
,[Rate]
,[EndDate]
,[ConversionFactor]
,[RateTypeDescription]
,[dday]
,[dday112]
,[ToCurrencyConcatenatedday112]
,[FromCurrencyConcatenatedday112]
, count(*) over (partition by [ToCurrencyConcatenatedday112],FromCurrency ) as co
FROM [stg].[ExchangeRatesPerDay]
)
select * from cte where co>1
And it doesn't return any record.
I will appreciate any idea you may have.
Regards
Vincent
I don't understand why my answer have been deleted. Anynay I am posting it back : I found this website https://www.kasperonbi.com/currency-conversion-in-dax-for-power-bi-and-ssas/ that provide an answer. I have been using this logic in production for over a month and it work great.

Stream Analytics Aggregation Window

I need help \ advice on how to ignore old events when performing aggregation over an extended window. I have sale data that is streaming into Event Hub.
Event hub is used as as Input stream. I need to produce two metrices
- 30 sec aggregation ( tumbling )
- Whole day aggregated sales value i.e. from Gate open
Gate open time is variable (dynamic) hence I read reference dataset off the blob; and join the Gateopen datetime to sales stream.
The 30 sec aggregation over the tumbling window works fine.
Given the gate open is variable; I am currently using 12 hour Hopping window with 30 sec hop and trying to limit the event to be aggregated by using EventProcessDatetime > GateOpen logic.
SELECT
Dateadd(ss,-30,System.Timestamp ) AS TimeSliceUTCStart
, System.Timestamp AS TimeSliceUTCEnd
, p.Section AS Section
, SUM(CASE WHEN p.Classification = 'Retail'
AND p.ActivityDateTime > p.GateOpen THEN p.[sales_amt_gross] ELSE 0 END) AS SaleTotalRetail
FROM FilteredBase p
GROUP BY
p.Section
, HoppingWindow(Duration(Hour, 12), hop(second, 30),Offset(millisecond, -1))
Problem: I am getting sales aggregated from the previous day\timeslice.
Overall the outcome I am trying to achieve is simple. The store could be open for 5,8,10 or 12 hour max. We want to be able to know sales as in Live stream, for each section as the day progresses. Any advise or tip will be much appreciated.
Intuitively the query looks good, but what happens under the cover is that Azure Stream Analytics is using the reference data file that was valid at the time of each time window. Then, when it sees the even of the previous day, it will use the reference data present at that time (which may make the comparison p.ActivityDateTime > p.GateOpen True for the previous opening time).
I modified the query as followed (supposing you have 1 open event per day per section). Let me know if it works for you. If it doesn't, can you send some sample data so I can modify the query accordingly. We will investigate to see how to make these queries easier to write.
WITH thirdtysecReporting AS
(
SELECT
p.Section Section,
DATETIMEFROMPARTS(DATEPART(year, System.Timestamp), DATEPART(month, System.Timestamp), DATEPART(day, System.Timestamp), 0, 0, 0, 0) as date,
System.Timestamp Windowend,
SUM(p.sales_amt_gross) thirtysecSales
FROM input TIMESTAMP BY p.ActivityDateTime
GROUP BY TumblingWindow(second, 30), p.Section
)
,hopping AS
(
SELECT
Section,
System.Timestamp HopEnd,
date,
SUM(thirtysecSales) SumSales
FROM thirdtysecReporting
GROUP BY HoppingWindow(second, 86400, 30), Section, date -- Hopping on 24 hours, reported every 30 second
)
,filtered as -- This step ignores data from the previous day
(
SELECT
Section,
HopEnd,
date,
SUMQt = CASE
WHEN DAY(HopEnd) = DAY(date) OR DATEPART(hour, HopEnd) = DATEPART(hour, date) THEN SumSales
ELSE 0
END
FROM hopping
)
SELECT Section, -- Final query
HopEnd,
MAX(SUMQt) AS SumQt
FROM filtered
GROUP BY TumblingWindow(hour, 1), Section, hopend
Thanks,
JS - Azure Stream Analytics

Analyzing tweeter with hive, regex extract

I am trying to analyze what are the most popular hashtags of July. So far I am able to select tweets from July, or display the most popular tweets, but I didn't sucess in putting them together. I am thinking about creating a intermediate table with july tweets, then display the popular hashtags, but I don't know how, can you help me? What about a 2 level select (select a from select b from table) ?
SELECT hashtags.text, count(*) as total FROM tweets
WHERE regexp_extract(created_at, "(Tue) (Jul)*", 2) = "Jul"
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text), created_at
ORDER BY total_count DESC
LIMIT 200
Regards, K.
So far, I did this, which is pretty much what I want, but is there any mean to achieve this differently ?
Working nested query:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
EDIT:
Ok, so if you want you can also do it by a temporary table:
CREATE TABLE tmpdb (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
Then you update it:
INSERT OVERWRITE TABLE tmpdb
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
And the request become as simple as this:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM tmpdb
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
The pro/cons about the second method:
You need to update the table if you want accurate requests, so it is not suited for one-shot request, but if you need to do multiple requests on the current state of the database, then this method is better.
Don't forget that, copying a database is a costly operation ! So know when to use it :)

Sorting events to show this week, month etc

I have a list of events but I'm struggling to work out how to show specific date ranges in the index view.
I would like to list the events by showing events today, this week, this month etc.
I'm new to rails so I've tried to use this site and I've come up with the following which works for today's events.
#events_today = Event.find(:all, :conditions => ["date between ? and ?", Date.today, Date.tomorrow])
But I'm not sure how to set the page to automatically update and show only this weeks events and this month.
Your basic query should do something like this:
Event.where(date: date_range)
Now before calling this query you can set the date range variable. If you only want this week:
date_range = Date.today.beginning_of_week..Date.today
Event.where(date: date_range)
Now there are all sorts of things you can do. You can select a start and end date or a custom period using either a form or a dropdown select. In this case date_range is set based on your params. You could also always use one predefined period.
If you want to work with several date range periods it could be nice to have a last_week, last_month, etc. scope in your model (or concern). Or you could simply define date_range constants in your initializers.
As per my understanding, you want to show all the event of a week or month on click of tab 'Week' or 'Month' from view.
When you clicked on month or week for getting events, you send simply month or week in params(assuming params[:events_in] hold 'week' or 'month')
apply check on this attributes of params
def get_events_in_week_or_month
if params[:events_in] =='week'
start_date_of_time_period = Date.today.beginning_of_week
end_date_of_time_period = Date.today.end_of_week
else
start_date_of_time_period = Date.today.beginning_of_month
end_date_of_time_period = Date.today.end_of_month
end
#events in descending order
#events = Event.where("date between ? and ? ", start_date_of_time_period, end_date_of_time_period).order("created_at DESC")
#events in ascending order
#events = Event.where("date between ? and ? ", start_date_of_time_period, end_date_of_time_period)
end
for getting more methods of date class, you should run following command on rails console :
Date.public_methods