Firing a tag on certain pages in Google Tag Manager - regex

I am trying to create a regexp to fire a tag within Google Tag Manager on certain pages.
The issue I am having is that I do not want to fire this tag on URLs matching a querystring in them, since it is only a session identifier and I do not need the tag to fire on the pages that have a query string. These are basically duplicates and they do not need tracking in a 3rd party tracking program. I know how I could exclude them in GA, but I can't figure out how to do it for the third party tracking.
I'll detail the scenario below and what I have tried.
Example pages that come up in my URL report if I look in GA:
/page
/page/subpage?my-handsome-query-string&some-other-data
/page/subpage
/page/subpage/subsubpage
/page/?query-string-again
So what I want to do is to fire the tracking on pages that does NOT have the query string, and it is proving quite the issue.
If I put in ^/page.*[^\?] it just doesn't work. I guess I am completely using the negated character class all wrong? I can't get it working and would require some assistance on how to devise a better regexp.
Some other I tried were:
^/page/.* but this one only matched everything after /page/ but not /page.
I am not very good with regular expressions, so what I basically want to do is match /page, /page/subpage, /page/subpage/subpage etc, but not any URL that has a query string in it.
In GTM I can't create two rules that says "Include {{url path}} matching this" and "Exclude {{url path}} matching \?", so it all needs to be done within one regexp... And that totally got me at a loss.
Edit: Mike gave a good answer to solve my GTM part, but I am still interested in learning if it is possible to do above but with a single regex?

You can actually create two rules as you described.
In GTM, tags can have both Firing rules and Blocking rules. Blocking takes precedence. eg.
Firing rule:
{{url}} matches ^page/.*
Blocking rule:
{{url}} does not contain ?
Another option is to use a custom javascript macro.
It is in the form of a function(){ } which can detect a query string value in window.location.search and return boolean. Then have a firing rule {{your custom fn}} equals 1.
You can also create a macro which uses the URL macro type and Query component type.
The value is set to the query string without the leading ?. If the url was example.com?foo=bar this macro would contain foo=bar. Then simply add a firing rule {{query}} matches Regex ^$ or {{query}} does not contain something-that-will-never-be-in-the-url-to-avoid-regex

Related

Regex for multiple URLs without clear pattern

I'm quite new to using regex so I hope there's someone who can help me out. I want to set up an event on Google Tag Manager through RegEx that fires whenever someone views a page. I'm trying to do this using the Page URL as a parameter so that the event hits, when that URL is visited. Its for around 1400 urls that are in the same sub-folder but have a different page name. For example: https://www.example.com/products/product-name-1, https://www.example.com/products/product-name-2
What would be the best way to group these into one RegEx formula?
I've tried to separate all urls by using the '|' sign without any result. I've also tried this format, without any luck: (^/page-url-1/$|^/page-url-1/$|^/page-url-1/$|^/page-url-1/$)
A couple things are happening with your attempt. First, you aren't escaping the '/'. This is a reserved or special character and you will need to precede it with a \ to tell the engine that you want that specific character. It would look like this:
\/products\/page-url-1
I am assuming you are using a {{Page Path}} so the above would match for any paths that contain /products/page-url-1.
If you want the event to fire on all pages within the /products directory, there is an easier way of doing this.
\/products\/.*
what this will do is match any pages within your /products directory. If you have a landing page on /products, this will be omitted from the firing. The '.' means it will then match any character after the / and '*' means it can do this unlimited times.
EDIT:
Since you aren't looking for all the products pages, you can you a matching group and list them all. I suspect that all the product names will be different enough and not share any common path elements so you will have to list out the ones want.
\/products\/(product-url-1|product-url-2|product-url-3).*

How to use regex to insert dynamic urls in the Analytics conversion funnel?

A have a client who has an ecommerce website working on Wix. She asked me to setup the conversion funnel so she could identify when visitors leave a step, I mean, those who don't get to the order placed page. On Wix, we have 3 steps/urls, as below:
Cart: https://www.easyhomedesign.com.br/cart?appSectionParams=%7B%22origin%22%3A%22cart-popup%22%7D
Checkout: https://www.easyhomedesign.com.br/checkout?appSectionParams=%7B%22a11y%22%3Afalse%2C%22cartId%22%3A%2283476f86-4ac9-44ac-8779-4479dde12cc2%22%2C%22storeUrl%22%3A%22https%3A%2F%2Fwww.easyhomedesign.com.br%2F%22%2C%22isFastFlow%22%3Afalse%2C%22isPickupFlow%22%3Afalse%7D
and the Thank you page: https://www.easyhomedesign.com.br/thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17
After each of those urls, we have dynamic string values, so I need to put into the funnel step, on the url field, those same urls but using a regex that matches to the config "starts with", since we can't know what the values on the end of the urls are and, on the funnel setup section, we don't have that combobox "Starts with". At least, that's the only solution I could think about.
Then, my idea is use something like https://www.easyhomedesign.com.br/thank-you-page/$. I don't know regex, that's only an example of what I thought about, since that part of the url is the fixed one.
Could someone help me? tks.
Although the final goal setup highly depends on your Analyitcs implementation, it is basically possible to cover such funnel with RegEx in goal funnels.
Google Analytics tracks page visits with path, buy default. So https://www.easyhomedesign.com.br/thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17 will become /thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17 in your reports. It can be set up to contain the domain, and the goal flow can be created as well, but it's important to check, how the RegEx should be created.
It is also important, if the above mentioned query parts are affecting the step, whether they are part of the flow or not.
Assuming that the path part is relevant, you can set up something like this.
Cart URL in reports: /cart?appSectionParams=%7B%22origin%22%3A%22cart-popup%22%7D
Cart step RegEx in goal flow: ^\/cart
^ stands for beginning of the string, but you might need to adjust, if host is present in your reports. You can also extend it to ^\/cart\?, if you expect any parameters to be present, to qualify for cart visit.
Checkout URL in reports: /checkout?appSectionParams=%7B%22a11y%22%3Afalse%2C%22cartId%22%3A%2283476f86-4ac9-44ac-8779-4479dde12cc2%22%2C%22storeUrl%22%3A%22https%3A%2F%2Fwww.easyhomedesign.com.br%2F%22%2C%22isFastFlow%22%3Afalse%2C%22isPickupFlow%22%3Afalse%7D
Checkout Cart step RegEx in goal flow: ^\/checkout
The same applies for checkout step for beginning of the string, or for any parameters required.
Thank you URL in reports: /thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17
Thank you RegEx, which is essentially the goal's RegEx: ^\/thank-you-page\/[\w-]+
Here, the [\w-]+ part expects alphanumerical characters, underscore, or hyphen to be present. More precisely, one or more must exist there.
The $ sign, mentioned in the OP, could not be used here, as it indicates the end of the string, and therefore the id at the end of the URL would make it non-matching.

Using Regex in Google Analytics Filter Out URLs that contain one string and does not contain another

You will find this simple and silly Question but it is not. Please Guys help me here.
I have set of URLs in these two formats:-
https://lenskart.sg/collections/abc/products/xyz
https://lenskart.sg/collections/abc/xyz
I only need those URLs that contain the word "collections"(double quote to highlight the word) and does not contain the word "products"
How to write regex(Regular Expression) for this?
PS:- I need To filter out the URLs from Google Analytics using Regex. The Best expression I have come up till now is:- (collections/)(\w+)(/)(?!products) But Google Analytics is showing it as an Invalid Regex. It is working fine in other regex testing tool. May be Google Analytics is not accepting Negative Lookaheads. Here are Few URLs to support the same:- Google Analytics Regex - Alternative to no negative lookahead
https://www.reddit.com/r/analytics/comments/5v6q4i/regex_expression_for_does_not_contain/
https://recalll.co/?q=negative%20lookahead%20-%20Google%20Analytics%20look%20ahead&type=code
Please Guys help me here. It's a big issue for me
I do not think you need a complex regular expression at all, an include and subsequent exclude filter should suffice.
Do an include filter, select request url als filter field and "/collections/" as filter pattern. This will dismiss all Urls that do not have "/collections/" in their path (or to put it another way, this will only include Urls that match the pattern).
Then (order is important) do and exclude filter, select request url as filter field, and enter "/products" as pattern.
Filters are applied in the order they are displayed in the view settings. Each subsequent filter will work on the data a previous filter has returned. So it is often easier to split the work between multiple filters.
This is assuming that you are filtering in your view settings, but frankly if this is a filter in a report, it basically works the same way (you have to click the "advanced" link next to the filter box to access multiple filter conditions, and "Request Url" is called "Page" here, but otherwise it's basically the same).
Filters in reports do not support negative lookaheads, the (permanent) view filters allegedly do.
^\/lenskart\.sg\/collections\/abc((?!\/products).)*$
I'm not an expert, but the above RegEx will satisfy the following by matching on the first and third URLs, but not the middle.
/lenskart.sg/collections/abc/
/lenskart.sg/collections/abc/products/xyz
/lenskart.sg/collections/abc/services

Creating filters for Google Analytics to remove spam

I have successfully managed to filter out hits from certain spammy sites from Google Analytics. It's an ongoing battle, as new sites are popping up all the time and polluting my acquisition/referral results.
At present, the following match is used by the GA filter to stop all the sites below showing up in the data:
.*(best\-seo\-solution|semalt|buttons\-for\-website|social\-buttons|best\-seo\-offer|Get\-Free\-Traffic\-Now|buttons\-for\-your\-website|free\-share\-buttons)\.com.*
I've added most of these myself and it works however I now need to create a pattern that allows me to input URLs that aren't a standard something.com pattern. E.g:
site4.free-share-buttons.com
site5.free-share-buttons.com
So in these cases the end is always the same but the start can be variable.
buy-cheap-online.info
In this case it ends with .info
www.event-tracking.com
This one uses www. whereas others do not
http://webmaster-traffic.com
This one has the http:// as well.
And on top of all of that, the filter pattern can only be 255 maximum characters (but I can have more than one filter pattern) so I need to segment it up.
How can I create a regex filter pattern that would target all above URLs?
Google Analytics allows to create regex without having to escape all especial characters when the expression is simple. So you can write the expression without the backslashes \ and .* You can even remove the .com and the parenthesis since these names are very specific already
best-seo-solution|semalt|buttons-for-website|social-buttons|best-seo-offer|Get-Free-Traffic-Now|buttons-for-your-website|free-share-buttons|event-tracking|buy-cheap.info
If you happen to have a spam with a common name just add the full name |commonname.net for this specific case.
You can keep going until you reach 255 characters after that just add a second filter. This will work, but it has 3 downsides,
first there is 1 or 2 new spammers every week
second by the time you add it you already have some hits
third and this is a new behavior, some spam in now hitting with direct visits along with the referral and this won't be stopped by this filter.
To prevent this, I recommend you to use a valid hostname filter instead, this filter will only allow hits with one of your hostnames, and all ghost spam will be excluded since they use either a fake hostname or is not set.
Here you can find more information about referrer spam and the valid hostname filter
https://stackoverflow.com/a/28354319/3197362
http://www.ohow.co/things-you-must-know-about-spam-in-google-analytics/

Regex for URL routing - match alphanumeric and dashes except words in this list

I'm using CodeIgniter to write an app where a user will be allowed to register an account and is assigned a URL (URL slug) of their choosing (ex. domain.com/user-name). CodeIgniter has a URL routing feature that allows the utilization of regular expressions (link).
User's are only allowed to register URL's that contain alphanumeric characters, dashes (-), and under scores (_). This is the regex I'm using to verify the validity of the URL slug: ^[A-Za-z0-9][A-Za-z0-9_-]{2,254}$
I am using the url routing feature to route a few url's to features on my site (ex. /home -> /pages/index, /activity -> /user/activity) so those particular URL's obviously cannot be registered by a user.
I'm largely inexperienced with regular expressions but have attempted to write an expression that would match any URL slugs with alphanumerics/dash/underscore except if they are any of the following:
default_controller
404_override
home
activity
Here is the code I'm using to try to match the words with that specific criteria:
$route['(?!default_controller|404_override|home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
but it isn't routing properly. Can someone help? (side question: is it necessary to have ^ or $ in the regex when trying to match with URL's?)
Alright, let's pick this apart.
Ignore CodeIgniter's reserved routes.
The default_controller and 404_override portions of your route are unnecessary. Routes are compared to the requested URI to see if there's a match. It is highly unlikely that those two items will ever be in your URI, since they are special reserved routes for CodeIgniter. So let's forget about them.
$route['(?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
Capture everything!
With regular expressions, a group is created using parentheses (). This group can then be retrieved with a back reference - in our case, the $1, $2, etc. located in the second part of the route. You only had a group around the first set of items you were trying to exclude, so it would not properly capture the entire wild card. You found this out yourself already, and added a group around the entire item (good!).
$route['((?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Look-ahead?!
On that subject, the first group around home|activity is not actually a traditional group, due to the use of ?! at the beginning. This is called a negative look-ahead, and it's a complicated regular expression feature. And it's being used incorrectly:
Negative lookahead is indispensable if you want to match something not followed by something else.
There's a LOT more I could go into with this, but basically we don't really want or need it in the first place, so I'll let you explore if you'd like.
In order to make your life easier, I'd suggest separating the home, activity, and other existing controllers in the routes. CodeIgniter will look through the list of routes from top to bottom, and once something matches, it stops checking. So if you specify your existing controllers before the wild card, they will match, and your wild card regular expression can be greatly simplified.
$route['home'] = 'pages';
$route['activity'] = 'user/activity';
$route['([A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Remember to list your routes in order from most specific to least. Wild card matches are less specific than exact matches (like home and activity), so they should come after (below).
Now, that's all the complicated stuff. A little more FYI.
Remember that dashes - have a special meaning when in between [] brackets. You should escape them if you want to match a literal dash.
$route['([A-Za-z0-9][A-Za-z0-9_\-]{2,254})'] = 'view/slug/$1';
Note that your character repetition min/max {2,254} only applies to the second set of characters, so your user names must be 3 characters at minimum, and 255 at maximum. Just an FYI if you didn't realize that already.
I saw your own answer to this problem, and it's just ugly. Sorry. The ^ and $ symbols are used improperly throughout the lookahead (which still shouldn't be there in the first place). It may "work" for a few use cases that you're testing it with, but it will just give you problems and headaches in the future.
Hopefully now you know more about regular expressions and how they're matched in the routing process.
And to answer your question, no, you should not use ^ and $ at the beginning and end of your regex -- CodeIgniter will add that for you.
Use the 404, Luke...
At this point your routes are improved and should be functional. I will throw it out there, though, that you might want to consider using the controller/method defined as the 404_override to handle your wild cards. The main benefit of this is that you don't need ANY routes to direct a wild card, or to prevent your wild card from goofing up existing controllers. You only need:
$route['404_override'] = 'view/slug';
Then, your View::slug() method would check the URI, and see if it's a valid pattern, then check if it exists as a user (same as your slug method does now, no doubt). If it does, then you're good to go. If it doesn't, then you throw a 404 error.
It may not seem that graceful, but it works great. Give it a shot if it sounds better for you.
I'm not familiar with codeIgniter specifically, but most frameworks routing operate based on precedence. In other words, the default controller, 404, etc routes should be defined first. Then you can simplify your regex to only match the slugs.
Ok answering my own question
I've seem to come up with a different expression that works:
$route['(^(?!default_controller$|404_override$|home$|activity$)[A-Za-z0-9][A-Za-z0-9_-]{2,254}$)'] = 'view/slug/$1';
I added parenthesis around the whole expression (I think that's what CodeIgniter matches with $1 on the right) and added a start of line identifier: ^ and a bunch of end of line identifiers: $
Hope this helps someone who may run into this problem later.