Grabbing specific query string parameters from URL with regex - regex

We have an implementation of Liferay portal and I'm just getting started with using Google Analytics with it. I'm noticing a lot of duplicate entries in GA, mainly because of the query strings in the URI, for example:
/web/home-community/search-and-help?p_p_id=mytcdirectory_WAR_mytcdirectory&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-3&p_p_col_count=4&_mytcdirectory_WAR_mytcdirectory_action=getResults
I'm playing around with the Search and Replace filters in GA (using regex) and my goal is to try to pull out the ?p_p_id and &*_action parameters from the URI, and disregard the rest. I'm getting close with the following regex:
^([^\?]+)([\?\&]p_p_id=[^\&]+)?.*(\&[^\&]+_action=[^\&]+)?.*$
But that last grouping isn't working correctly. If I remove the ? from the end of the last grouping it matches, but the problem with that approach is that not all URIs contain that query string so it needs to be optional. But if I keep it in, it won't grab that last parameter. My regex fiddle is located here:
http://regex101.com/r/qQ2dE4/13
Thank you all in advance for any help.

Related

Using Regex in Google Analytics Filter Out URLs that contain one string and does not contain another

You will find this simple and silly Question but it is not. Please Guys help me here.
I have set of URLs in these two formats:-
https://lenskart.sg/collections/abc/products/xyz
https://lenskart.sg/collections/abc/xyz
I only need those URLs that contain the word "collections"(double quote to highlight the word) and does not contain the word "products"
How to write regex(Regular Expression) for this?
PS:- I need To filter out the URLs from Google Analytics using Regex. The Best expression I have come up till now is:- (collections/)(\w+)(/)(?!products) But Google Analytics is showing it as an Invalid Regex. It is working fine in other regex testing tool. May be Google Analytics is not accepting Negative Lookaheads. Here are Few URLs to support the same:- Google Analytics Regex - Alternative to no negative lookahead
https://www.reddit.com/r/analytics/comments/5v6q4i/regex_expression_for_does_not_contain/
https://recalll.co/?q=negative%20lookahead%20-%20Google%20Analytics%20look%20ahead&type=code
Please Guys help me here. It's a big issue for me
I do not think you need a complex regular expression at all, an include and subsequent exclude filter should suffice.
Do an include filter, select request url als filter field and "/collections/" as filter pattern. This will dismiss all Urls that do not have "/collections/" in their path (or to put it another way, this will only include Urls that match the pattern).
Then (order is important) do and exclude filter, select request url as filter field, and enter "/products" as pattern.
Filters are applied in the order they are displayed in the view settings. Each subsequent filter will work on the data a previous filter has returned. So it is often easier to split the work between multiple filters.
This is assuming that you are filtering in your view settings, but frankly if this is a filter in a report, it basically works the same way (you have to click the "advanced" link next to the filter box to access multiple filter conditions, and "Request Url" is called "Page" here, but otherwise it's basically the same).
Filters in reports do not support negative lookaheads, the (permanent) view filters allegedly do.
^\/lenskart\.sg\/collections\/abc((?!\/products).)*$
I'm not an expert, but the above RegEx will satisfy the following by matching on the first and third URLs, but not the middle.
/lenskart.sg/collections/abc/
/lenskart.sg/collections/abc/products/xyz
/lenskart.sg/collections/abc/services

Google Analytics Regex Code

I'm having trouble figuring out the last part of my regex code for Google Analytics. I want to be able to grab any URL from my site that fits the following pattern:
www.site.com/hotel/[any text]/rooms?[any text]
So the URLs will always begin with /hotels and will always end with /rooms? followed by any possible text string with any possible text between "hotel/" and "/rooms?".
I have this much: ^/hotel/([^/])+/rooms([^\?])
But I'm not sure how to finish this so that it will only capture URLs that have text after the "?"
This works. You may want to tighten up the the allowed text in the path parameter and query parameter.
^www.site.com/hotel/[^/]+/rooms\?.+$

Google Analytics Regex to exclude certain parameter

I'm relatively new to regex and in order to set up a goal in Google Analytics, I'd like to use a regular expression to match a URL containing both "thank-you" and "purchaseisFree=False" but exclude two specific rate plans that are represented in the URL as "productRatePlanID=5197e" and "productRatePlanID=c1760".
Here is a full URL example:
https://www.examplepage.com/thank-you?productRatePlanId=5197e&purchaseIsFree=False&grossTotal=99.95&netTotal=99.95&couponCode=&invoiceNumber=INV00000589
I tried using this post as a model and created this regex:
\bthank-you\b.+\purchaseIsFree=False\b(?:$|=[^c1760]|[^5197e])
However, I'm not getting the desired results. Thanks in advance for any suggestions.
I think the below mentioned regex should solve your problem. It uses the positive|negative look ahead facility. We can sit at the beginning of http[s] and check all the three condition and then engulp the whole tree
(https?:\/\/)(?=.*?productRatePlanId=(?!5197e&)(?!c1760&))(?=.*?thank-you)(?=.*?purchaseisFree=False).*
Note:- I have used & after the productRatePlanId values just to ensure it doesnt ignore other values as 5197f, 5198d and all other sorts of values.

Firing a tag on certain pages in Google Tag Manager

I am trying to create a regexp to fire a tag within Google Tag Manager on certain pages.
The issue I am having is that I do not want to fire this tag on URLs matching a querystring in them, since it is only a session identifier and I do not need the tag to fire on the pages that have a query string. These are basically duplicates and they do not need tracking in a 3rd party tracking program. I know how I could exclude them in GA, but I can't figure out how to do it for the third party tracking.
I'll detail the scenario below and what I have tried.
Example pages that come up in my URL report if I look in GA:
/page
/page/subpage?my-handsome-query-string&some-other-data
/page/subpage
/page/subpage/subsubpage
/page/?query-string-again
So what I want to do is to fire the tracking on pages that does NOT have the query string, and it is proving quite the issue.
If I put in ^/page.*[^\?] it just doesn't work. I guess I am completely using the negated character class all wrong? I can't get it working and would require some assistance on how to devise a better regexp.
Some other I tried were:
^/page/.* but this one only matched everything after /page/ but not /page.
I am not very good with regular expressions, so what I basically want to do is match /page, /page/subpage, /page/subpage/subpage etc, but not any URL that has a query string in it.
In GTM I can't create two rules that says "Include {{url path}} matching this" and "Exclude {{url path}} matching \?", so it all needs to be done within one regexp... And that totally got me at a loss.
Edit: Mike gave a good answer to solve my GTM part, but I am still interested in learning if it is possible to do above but with a single regex?
You can actually create two rules as you described.
In GTM, tags can have both Firing rules and Blocking rules. Blocking takes precedence. eg.
Firing rule:
{{url}} matches ^page/.*
Blocking rule:
{{url}} does not contain ?
Another option is to use a custom javascript macro.
It is in the form of a function(){ } which can detect a query string value in window.location.search and return boolean. Then have a firing rule {{your custom fn}} equals 1.
You can also create a macro which uses the URL macro type and Query component type.
The value is set to the query string without the leading ?. If the url was example.com?foo=bar this macro would contain foo=bar. Then simply add a firing rule {{query}} matches Regex ^$ or {{query}} does not contain something-that-will-never-be-in-the-url-to-avoid-regex

Regex: Match any string but one ending with thanks/

I am attempting to set up a goal funnel in Google Analytics. It is for an online quote request system that we want to track. Basically all the pages that contain the quote request form have unique dynamically generated urls that are similar. The form of the URL is:
/quoterequest/categoryone/categorytwo/productname/
I have regex that works for tracking that:
^/quoterequest/([A-Za-z0-9/-]+)?
Today we added a thank you page after the user submits the form. The URL is always the same for that:
/quoterequest/thanks/
I would like to modify the above regex so that it continues to match any of the Quote Request URLs, but NOT that thank you URL. I have been trying different variations, including t. he negative look ahead,but unfortunately I am not very experienced with regex and I think I've been doing it completely incorrectly. Can anyone give me some insight as to the correct method of doing this?
You can use:
^\/quoterequest\/(?!thanks\/?$)(?:([A-Za-z0-9\-]+)\/?)*$
See it