Replace Traffic Source from raw Google analytics session data in Bigquery? - google-cloud-platform

Recently we observed that when a user tries to complete a transaction on our website using an ios device. Apple ends the current session and begins a new session. The difficulty with this is that if the user came through paid source/email the current session ends and starts a new session with apple.com traffic source.
For Instance
google->appleid.apple.com
(direct)->appleid.apple.com
email->appleid.apple.com
ios->appleid.apple.com->appleid.apple.com->appleid.apple.com
Since we have this raw data coming into BQ we are looking at replacing appleid.apple.com with their actual traffic Source i.e. google,direct,email,ios.
Any help regarding the logic/function to workaround this problem will help?
This is the code I tried implementing:
WITH DATA AS (
SELECT
PARSE_DATE("%Y%m%d",date) AS Date,
clientId as ClientId,
fullVisitorId AS fullvisitorid,
visitNumber AS visitnumber,
trafficSource.medium as medium,
CONCAT(fullvisitorid,"-",CAST(visitStartTime AS STRING)) AS Session_ID,
trafficsource.source AS Traffic_Source,
MAX((CASE WHEN (hits.eventInfo.eventLabel="complete") THEN 1 ELSE 0 END)) AS ConversionComplete
FROM `project.dataset.ga_sessions_20*`
,UNNEST(hits) AS hits
WHERE totals.visits=1
GROUP BY
1,2,3,4,5,6,7
),
Source_Replace AS (
SELECT
Date AS Date,
IF(Traffic_Source LIKE "%apple.com" ,(CASE WHEN Traffic_Source NOT LIKE "%apple.com%" THEN LAG(Traffic_Source,1) OVER (PARTITION BY ClientId ORDER BY visitnumber ASC)end), Traffic_Source) AS traffic_source_1,
medium AS Medium,
fullvisitorid AS User_ID,
Session_ID AS SessionID,
ConversionComplete AS ConversionComplete
FROM
DATA
)
SELECT
Date AS Date,
traffic_source_1 AS TrafficSource,
Medium AS TrafficMedium,
COUNT(DISTINCT User_ID) AS Users,
COUNT(DISTINCT SessionID) AS Sessions,
SUM(ConversionComplete) AS ConversionComplete
FROM
Source_Replace
GROUP BY
1,2,3
Thanks

Does assuming the visitStartTime as key to identifying the session start help? Maybe something like:
source_replaced as (
select *,
min(Traffic_Source) over (
partition by date, clientid, fullvisitorid, visitnumber order by visitStartTime
) as originating_source
from data
)
Then you can do your aggregation over the originating_source. Its kind of difficult without looking at some sample of data about whats going on.
Hope it helps.

Related

Setting Row level security with multiple columns in Power BI

I am trying to create a filter for target table having country code column. I want to give access to people with logged in upn either in AdditionalOwner or OwnerEmail with countrycode
AdditionalOwner has emails separated by commas, Number of emails in additionalowner column is not fixed also it may have values from OwnerEmail column. So Please help me in giving RLS in this scenario
Please find my column structure
CountryCode
AdditionalOwner
OwnerEmail
AU
test1#test.com,test2#test.com
test2#test.com
Here is the DAX I have used for Owner Email
[Country Code] = LOOKUPVALUE(
UserRoles[CountryCode],
UserRoles[OwnerEmail],UserPrincipalName())
Please help me in adding RLS for AdditionalOwner column too
You may want to try using something like OR, IF, and CONTAINSSTRING in your managed role. I've included an example for you to test below. For transparency, I did not test this but this is where I would start to solve this.
OR (
UserRoles[CountryCode] =
IF(
CONTAINSSTRING ( UserRoles[AdditionalOwner], UserPrincipalName() )
, UserRoles[CountryCode], BLANK() )
, UserRoles[CountryCode] = LOOKUPVALUE(
UserRoles[CountryCode],
UserRoles[OwnerEmail], UserPrincipalName())
)

Bigquery, how to count events group by week?

I just started working with Bigquery. Data comes from firebase and I noticed that I got the data each day, for example gara-e78a5.analytics_247657392.events_20221231, gara-e78a5.analytics_247657392.events_20221230 etc....
Each row comes with an event_date like under this format 20221231
I want to count the number of people landing on our page each week, but I don't know I to group them by week.
I started something like this but I don't know to group it by week:
SELECT count(event_name) FROM app-xxxx.analytics_247657392.events_* where event_name = 'page_download_view' group by
Thanks in advance for your help
Based on #Ronak, i found the solution.
SELECT week_of_year, sum(nb_download) as nb_download_per_week from (
SELECT DISTINCT EXTRACT (WEEK from (PARSE_DATE('%Y%m%d', event_date))) as week_of_year, count(event_name) as nb_download from `tabllle-e78a5.analytics_XXXXX.events_*` where event_name = 'landing_event_download_apk' group by event_date) group by week_of_year
You can use the WEEK (or ISOWEEK) function.
WEEK: Returns the week number of the date in the range [0, 53]
More: https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions
Formats - https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions#format_timestamp
This should work
select EXTRACT(ISOWEEK FROM(CAST(PARSE_DATE('%Y%m%d', <column>) as TIMESTAMP))) as week_of_year from <table>
Output

SSAS Tabular/Analysis services Many-to-Many for Multiple curriencies input - Multiple currencies output

I am trying to create on the fly currencies conversion from many currencies input ( InvoicesHeaders rows have differents currencies , so each row have an amount and the currency code for this amount) and many currencies output ( each affiliate want to see figures with it's own currency ).
Therefore I end up in a many to many, join between the InvoiceTable and the currency table. To join them I create in SQL a concatenated field with the day and the currency code.
Then ( reusing tutorial from internet ) I create a calculation doing a lookup from the Invoice to the rate.
Amount adj:=SUMX(Invoices,Invoices[TotalInvoiceAmount]/LOOKUPVALUE(ExchangeRatesPerDay[Rate],ExchangeRatesPerDay[ToCurrencyConcatenatedday112],Invoices[CurrencyCodeConcatenateInvoiceDate112]))
However, when I am trying to use this measure in excel (filtering on one currency at the time of course ) I am getting an error message saying many rows where pass but only one was expected.
From the error message, it looks like the lookup is getting multiple values which is strange because in the excel I am filtering on one currency. Therefore for each combination of day+currencycode there is only one row. I check the SQL using this query
with cte as (
SELECT [RateTypeName]
,[FromCurrency]
,[ToCurrency]
,[StartDate]
,[Rate]
,[EndDate]
,[ConversionFactor]
,[RateTypeDescription]
,[dday]
,[dday112]
,[ToCurrencyConcatenatedday112]
,[FromCurrencyConcatenatedday112]
, count(*) over (partition by [ToCurrencyConcatenatedday112],FromCurrency ) as co
FROM [stg].[ExchangeRatesPerDay]
)
select * from cte where co>1
And it doesn't return any record.
I will appreciate any idea you may have.
Regards
Vincent
I don't understand why my answer have been deleted. Anynay I am posting it back : I found this website https://www.kasperonbi.com/currency-conversion-in-dax-for-power-bi-and-ssas/ that provide an answer. I have been using this logic in production for over a month and it work great.

WSO2 Stream processor: Siddhi App to calculate sum

I am working on stream processor 4.3.0. I have came across one scenario where I am putting some datafeeds into the rdbms table using siddhiapp. Using siddiapp, I am entering the data in RDBMS table as below
Now, I am using another SiddhiApp to retrieve the data, but I would want to try out to fetch the data in such way like below
As the common columns are shrinked to get into one row and the column which has counts are now summed to get the final Sum of all counts.
Can some one please guide me how to proceed here.
Thanks in advance
here is the app to get the total sum
#App:name("IncomingStream3")
#App:description("Description of the plan")
-- Please refer to https://docs.wso2.com/display/SP400/Quick+Start+Guide on getting started with SP editor.
--#store(type = 'rdbms', datasource = 'APIM_ANALYTICS_DB')
--#purge(enable='false', interval='60 min', #retentionPeriod(sec='1 day', min='72 hours', hours='90 days', days='1 year', months='2 years', years='3 years'))
define stream TempStatsStream (AGG_TIMESTAMP long, AGG_EVENT_TIMESTAMP long, apiName string, apiVersion string, apiResourcePath string,apiCreator string,username string, applicationConsumerKey string, AGG_LAST_EVENT_TIMESTAMP long, applicationName string, dateTime string, AGG_COUNT int);
define aggregation StatsToCal
from TempStatsStream
select apiName, apiVersion, apiResourcePath, apiCreator, username, applicationName,
applicationConsumerKey, SUM (AGG_COUNT) as totalRequestCount, dateTime
group by apiName, apiVersion, apiResourcePath, username, applicationConsumerKey
aggregate by dateTime every days;
Only change I have made here is instead of fetching the value from DB table, I am considering it as stream ( as the aggregation can be done only for Stream, I suppose).
Seems like you have to group by API, Name1, Name2 and ID? You can use group by similar to SQL group by
from TriggerStream join APITable
select APIName, Name1, Name2, ID, sum(Count) as totalCount
group by API, Name1, Name2, ID
insert into OutputStream;

Analyzing tweeter with hive, regex extract

I am trying to analyze what are the most popular hashtags of July. So far I am able to select tweets from July, or display the most popular tweets, but I didn't sucess in putting them together. I am thinking about creating a intermediate table with july tweets, then display the popular hashtags, but I don't know how, can you help me? What about a 2 level select (select a from select b from table) ?
SELECT hashtags.text, count(*) as total FROM tweets
WHERE regexp_extract(created_at, "(Tue) (Jul)*", 2) = "Jul"
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text), created_at
ORDER BY total_count DESC
LIMIT 200
Regards, K.
So far, I did this, which is pretty much what I want, but is there any mean to achieve this differently ?
Working nested query:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
EDIT:
Ok, so if you want you can also do it by a temporary table:
CREATE TABLE tmpdb (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
Then you update it:
INSERT OVERWRITE TABLE tmpdb
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
And the request become as simple as this:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM tmpdb
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
The pro/cons about the second method:
You need to update the table if you want accurate requests, so it is not suited for one-shot request, but if you need to do multiple requests on the current state of the database, then this method is better.
Don't forget that, copying a database is a costly operation ! So know when to use it :)