Google Analytics events regular expression grouping - regex

We have set-up JW Player (an off the shelf embedded video player) to send our video play events into Google Analytics as Events (they appear in Behaviours > Events > Top Events > JW Video Plays). We stream from AWS Cloudfront with signed URLs so the URLs for each video play are computed and unique - something like this:
rtmp://streaming.oursite.com/cfx/st/mp4:xtra/tutorial/video/somevideo.mp4?Key-Pair-Id=APKAJPGHQNAH3663VQQQ&Signature=m6HTuv-IRaR5N3zu58w1Rh5mIuhhETPuEVBMBQv9Tb1ZXvsy3lg9dgpp-FtBPwZYWkI5fR0kAuBir6OnAXst3F6FyXve7s5gQSdoJMtCDcGIFtyyw8kZCBaFPa71jr1sDy9L~xf3VDDH0tIksfXZ-z9t~tZg7tnfw~iVLfKDTtE_&Expires=1413316048
So in order to judge popularity we'd like to group the play events by their basic video path, e.g.
rtmp://streaming.oursite.com/cfx/st/mp4:xtra/tutorial/video/somevideo.mp4
I tried using the configurable JWPlayer id_string but that doesn't seem to work, so I am falling back to using regex in Google Analytics, but we can't seem to get the URL grouping to work. We tried this advanced regex filter:
^(.*?)\?.*$
based on https://support.google.com/analytics/answer/1034836?hl=en which says
() remember contents of parenthesis as item
but that has no effect.
Is it even possible, and if so what Regex should we be using please?

I think your problem is the first ? question mark, in (.*?). You've used it in the usual regex way to change greedy to non-greedy, as done in PHP/Java/Perl etc. But there appears to be nothing in the Google Analytics help screens to suggest it can rise to those heights. They only say it means 0 or 1 in their regex system.
So you need an alternative form for your regex. You could try just putting ^(.*)\?.*$ instead, but it probably won't work (however I couldn't see in the documentation whether GA is by default greedy or lazy, so it's worth trying first).
Failing that, then the alternative way to achieve what you want will be to use a character class in square brackets to look for all the possible letters and digits that could occur in your file names:
^([a-zA-Z0-9/:.]*)\?.*$
(check if there are any other characters from your file names that should be in there as well).

Related

Regex for multiple URLs without clear pattern

I'm quite new to using regex so I hope there's someone who can help me out. I want to set up an event on Google Tag Manager through RegEx that fires whenever someone views a page. I'm trying to do this using the Page URL as a parameter so that the event hits, when that URL is visited. Its for around 1400 urls that are in the same sub-folder but have a different page name. For example: https://www.example.com/products/product-name-1, https://www.example.com/products/product-name-2
What would be the best way to group these into one RegEx formula?
I've tried to separate all urls by using the '|' sign without any result. I've also tried this format, without any luck: (^/page-url-1/$|^/page-url-1/$|^/page-url-1/$|^/page-url-1/$)
A couple things are happening with your attempt. First, you aren't escaping the '/'. This is a reserved or special character and you will need to precede it with a \ to tell the engine that you want that specific character. It would look like this:
\/products\/page-url-1
I am assuming you are using a {{Page Path}} so the above would match for any paths that contain /products/page-url-1.
If you want the event to fire on all pages within the /products directory, there is an easier way of doing this.
\/products\/.*
what this will do is match any pages within your /products directory. If you have a landing page on /products, this will be omitted from the firing. The '.' means it will then match any character after the / and '*' means it can do this unlimited times.
EDIT:
Since you aren't looking for all the products pages, you can you a matching group and list them all. I suspect that all the product names will be different enough and not share any common path elements so you will have to list out the ones want.
\/products\/(product-url-1|product-url-2|product-url-3).*

Trying to regex YouTube ads with pihole

EDIT:
As far as I know, Pihole does not block YouTube ads.
Original Post:
Trying to regex urls like:
r4---sn-vgqsrnez.googlevideo.com
r1---sn-vgqsknlz.googlevideo.com
r5---sn-vgqskn7e.googlevideo.com
r3---sn-vgqsknez.googlevideo.com
r6---sn-vgqs7ney.googlevideo.com
r4---sn-vgqskne6.googlevideo.com
r4---sn-vgqsrnez.googlevideo.com
r5---sn-vgqskn76.googlevideo.com
r6---sn-vgqs7ns7.googlevideo.com
r1---sn-vgqsener.googlevideo.com
r1---sn-vgqskn7z.googlevideo.com
r1---sn-vgqsknek.googlevideo.com
r6---sn-vgqsener.googlevideo.com
r3---sn-vgqs7nly.googlevideo.com
r1---sn-vgqsknes.googlevideo.com
r4---sn-vgqsrnes.googlevideo.com
r6---sn-vgqskn76.googlevideo.com
I've tried:
(^|\.)r[0-100]---sn-vgqs?n??\.googlevideo\.com$
(^|\.)r[0-100]?*\.googlevideo\.com$
^r[0-100]---sn-vgqs(?:.*)n(?:.*)(?:.*).googlevideo.com$
^r[0-100]---sn-vgqs(?:.*)n(?:.*).googlevideo.com$
but nothing works
I am probably using regex wrong because I don't have much experience with it but looking online some people have said it could be a thing with Pihole.
I'm guessing that you'd like to have restricted boundaries, if not though, this expression might be somewhat close to what you have in mind:
^r\d+---sn-vgqs[a-z0-9]{4}\.googlevideo\.com$
Demo 1
You can add more boundaries, if necessary, such as:
^r(?:100|[1-9]\d|\d)---sn-vgqs[a-z0-9]{4}\.googlevideo\.com$
Demo 2
or:
^r(?:100|[1-9]\d|\d)---sn-vgqs(?:rne(?:s|z)|kne(?:s|z)|knlz|kn7e|7ney|kne6|kn76|7ns7|ener|kn7z|knek|7nly)\.googlevideo\.com$
Demo 3
which I'm just guessing.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
The following Regex match all the url start with "r" then followed by anything else without limiting number of character then followed by "sn" then followed by any number of characters then end with ".googlevideo.com" the expression was anchor with ^ and $.
I try it on my pihole with great success but have to remove it later. all r....sn...googlevideo.com was blocked in the query list but it also rendered my smart tv youtube app broken. It will not play any video at all unless I remove it from pihole. use it at your own risk.
^r.+sn.+(\.googlevideo\.com)$
The post is a bit older but because I tried myself with regexes I just want to say that your regexes can't work because of one "little" point.
Pi-Hole uses the POSIX ERE (POSIX Extended Regular Expressions) standard.
So there are no lazy quantifiers or shorthand character classes.
It also does not support non-capturing groups like in your third and fourth line.
You can check such regexes in tools like RegexBuddy. Maybe other free tools can check it too and help to convert it.
My current regex is:
^r[[:digit:]]+---sn-4g5e[a-z0-9]{4}\.googlevideo\.com$
It correctly blocks all ads BUT also videos.
If you use it you have to do the following.
Open a youtube video and check if the video loads.
If not, go to your pi hole dashboard to the query log.
For your device you will have two dns queries
r5---sn-4g5e6nze.googlevideo.com
and
r5---sn-4g5ednse.googlevideo.com
The last one (upper) in the query log is the video. So whitelist
the dns. You have to do it sometimes.
Greetings

Google Analytic URL based goal not catching everything

I have a simple destination based goal triggered by regex match (confirmation page when someone books a trip). Problem is, compared to our crm (on any given day) there is around 20-25% discrepancy.
Here is 3 different types of URL that signaling goal completion :
/owner/reservation-confirmation?bv=true&reservationNo=MG3P3
/owner/reservation-confirmation?bv=true&type=Future&reservationNo=MG4GX
/owner/reservation-confirmation?type=Future&reservationNo=MG225
And Goal destination Regex:
(/owner/reservation-confirmation\?bv=true&type=Future&reservationNo=.* )|(/owner/reservation-confirmation\?type=Future&reservationNo=.* )|(/owner/reservation-confirmation\?bv=true&reservationNo=.* )
For some reason, GA missed creating a completed goal for URL's like:
/owner/reservation-confirmation?type=Future&reservationNo=MG4J0
(URL above is in GA under "site content/all pages)
Goal setup :
Any idea why is this happening?
Thank you!
The space at the end of each group appears to be breaking this. The following (also with correct escape characters for the forward slashes) works fine:
(\/owner\/reservation-confirmation\?bv=true&type=Future&reservationNo=.*)|(\/owner\/reservation-confirmation\?type=Future&reservationNo=.*)|(\/owner\/reservation-confirmation\?bv=true&reservationNo=.*)
regex101.com working example here.

Chrome dev tools: any way to exclude requests whose URL matches a regex?

Unfortunately in the last versions of Chrome the negative network filter doesn't work anymore. I used this filter in order to exclude each http call containing a particular string. I asked a solution in Chrome dev tool forum but at the moment nobody answered.
So I would like to know if there is a way to resolve this problem (and exclude for example each call containing the string 'loadMess') with regex syntax.
Update (2018):
This is an update to my old answer to clarify that both bugs have been fixed for some time now.
Negate or exclude filtering is working as expected now. That means you can filter request paths with my.com/path (show requests matching this), or -my.com/path (show requests not matching this).
The regex solution also works after my PR fix made it in production. That means you can also filter with /my.com.path/ and /^((?!my.com/path).)*$/, which will achieve the same result.
I have left the old answer here for reference, and it also explains the negative lookup solution.
The pre-defined negative filters do work, but it doesn't currently allow you to do NOT filters on the names in Chrome stable, only CONTAINS. This is a bug that has been fixed in Chrome Canary.
Once the change has been pushed to Chrome stable, you should be able to do loadMess to filter only for that name, and -loadMess to filter out that name and leave the rest, as it was previously.
Workaround: Regex for matching a string not containing a string
^((?!YOUR_STRING).)*$
Example:
^((?!loadMess).)*$
Explanation:
^ - Start of string
(?!loadMess) - Negative lookahead (at this cursor, do not match the next bit, without capturing)
. - Match any character (except line breaks)
()* - 0 or more of the preceeding group
$ - End of string
Update (2016):
I discovered that there is actually a bug with how DevTools deals with Regex in the Network panel. This means the workaround above doesn't work, despite it being valid.
The Network panel filters on Name and Path (as discovered from the source code), but it does two tests that are OR'ed. In the case above, if you have loadMess in the Name, but not in the Path (e.g. not the domain or directory), it's going to match on either. To clarify, true || false === true, which means it will only filter out loadMess if it's found in both the Name and Path.
I have created an issue in Chromium and have subsequently pushed a fix to be reviewed. This has subsequently been merged.
This is answered here - for latest Chrome 58.0.3029.110 (Official Build) (64-bit)
https://stackoverflow.com/a/27770139/4772631
E.g.: If I want to exclude all gifs then just type -gif
Negative lookahead is recommended everywhere, but it does not work.
Instead, "-myregex" does work for me. Like this: -/(Violation|HMR)/.
Chrome broswer dev tools support regrex filter not very well.
When I want to hide some requests, it does not work as showed above. But you can use -hide1 -hide2 to hide the request you want.
Just leave a space between the conditions, and this does not match the regrex, I guess it may use string match other than regrex in principle
Filtering multiple different urls
You can negate symbol for filtering the network call.
Eg: -lab.com would filter lab.com urls.
But for filtering multiple urls you can use the | symbol in the regex
Eg: -/lab.com|mini.com/ This will filter lab.com and mini.com as well you can use it to filter many different websites or urls.
You can use "Invert" option to exclude the APIs matching a string in the Filter text box.
On latest chrome version (62) you have to use :
-mime-type:image/gif

Writing Regular Expression for URL in Google Analytics

I have a huge list of URL's, in the format:
http://www.example.com/dest/uk/bath/
http://www.example.com/dest/aus/sydney/
http://www.example.com/dest/aus/
http://www.example.com/dest/uk/
http://www.example.com/dest/nor/
What RegEx could I use to get the last three URL's, but miss the first two, so that every URL without a city attached is given, but the ones with cities are denied?
Note: I am using Google Analytics, so I need to use RegEx's to monitor my URL's with their advanced feature. As of right now Google is rejecting each regular expression.
Generally, the best suggestion I can make for parsing URL's with a Regex is don't.
Your time is much much better spent finding a libary that exists for your language dedicated to the task of processing URLs.
It will have worked out all the edge cases, be fully RFC compliant, be bug free, secure, and have a great user interface so you can just suck out the bits you really want.
In your case, the suggested way to process it would be, using your URL library, extract the element s and then work explicitly on them.
That way, at most you'll have to deal with the path on its own, and not have to worry so much wether its
http://site.com/
https://site.com/
http://site.com:80/
http://www.site.com/
Unless you really want to.
For the "Path" you might even wish to use a splitter ( or a dedicated path parser ) to tokenise the path into elements first just to be sure.
tj111's current solution doesn't work - it matches all your urls.
Here's one that works (and I checked with your values). It also matches, no matter if there is a trailing slash or not:
http:\/\/.*dest\/\w+/?$
/http:\/\/www\.site\.com\/dest\/\w+\/?$/i
matches if they're all the same site with the "dest" there. you could also do this:
/\w+:\/\/[^/]+\/dest\/\w+\/?$/i
which will match any site with any protocal (http,ftp) and any site with the /dest/country at the end, and an optional /
Note, that this will only work with a subset of what the urls could legitimately be.
Try this regular expression:
^http://www\.example\.com/dest/[^/]+/$
This would only match the last three URLs.