Matching URL containing one word AND another word using Regex - regex

I am trying to write a regular expression to be used in a Google Analytics goal that will match URLs containing
?package=whatever
and also
/success
The user will first visit a page like
www.website.com/become-client/?package=greatpackage
and if they purchase they will be lead to this page
www.website.com/become-client/?package=greatpackage/success
So based on this I could use the following regex
\?package\=greatpackage/success
This should match the correct destination and I would be able to use this in the goal settings in Analytics to create a goal for purchases of the greatpackage package.
But sometimes the website will use other parameters in addition to ?package. Like ?type, ?media and so on.
?type=business
Resulting in URLs like this
www.website.com/become-client/?package=greatpackage?type=business
and if they purchase they will be lead to this page
www.website.com/become-client/?package=greatpackage?type=business/success
Now the /success part is moved away from the ?package part. My questions is how do I write a regex that will still match this URL no matter what other parameters there may be in between the parts?
---update----
#jonarz proposed the following and it works like a charm.
\?package\=greatpackage(.*?)/success
But what if there are two products with nearly the same name. For example greatpackage and greatpackageULTRA. The code above will select both. If changing the product names is impossible, how can I then select only one of them?

The regex that would solve the problem introduced in the edit, would be:
\?package\=greatpackage((\?|\/)(.*?))?\/success(\/|\b)
Here is a test: https://regex101.com/r/jS4cH5/1 and it seems to suit your needs.

If you want to match an url like this one :
www.website.com/become-client/?package=greatpackage?type=business?other=nada/success
With a group to extract your package type :
.*\?package=([^\/?]+).*\/success
Without group (just matching the url if it's containing package=greatpackage and success)
.*\?package=greatpackage.*\/success
Without group and matching for any package type :
.*\?package=[^\/?]+.*\/success
You just need to add .* to match any char (except new lines). The [^/?]* part is there to be sure your package type isn't empty (ie : the first char isn't a / nor ?).

Related

How to use regex to insert dynamic urls in the Analytics conversion funnel?

A have a client who has an ecommerce website working on Wix. She asked me to setup the conversion funnel so she could identify when visitors leave a step, I mean, those who don't get to the order placed page. On Wix, we have 3 steps/urls, as below:
Cart: https://www.easyhomedesign.com.br/cart?appSectionParams=%7B%22origin%22%3A%22cart-popup%22%7D
Checkout: https://www.easyhomedesign.com.br/checkout?appSectionParams=%7B%22a11y%22%3Afalse%2C%22cartId%22%3A%2283476f86-4ac9-44ac-8779-4479dde12cc2%22%2C%22storeUrl%22%3A%22https%3A%2F%2Fwww.easyhomedesign.com.br%2F%22%2C%22isFastFlow%22%3Afalse%2C%22isPickupFlow%22%3Afalse%7D
and the Thank you page: https://www.easyhomedesign.com.br/thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17
After each of those urls, we have dynamic string values, so I need to put into the funnel step, on the url field, those same urls but using a regex that matches to the config "starts with", since we can't know what the values on the end of the urls are and, on the funnel setup section, we don't have that combobox "Starts with". At least, that's the only solution I could think about.
Then, my idea is use something like https://www.easyhomedesign.com.br/thank-you-page/$. I don't know regex, that's only an example of what I thought about, since that part of the url is the fixed one.
Could someone help me? tks.
Although the final goal setup highly depends on your Analyitcs implementation, it is basically possible to cover such funnel with RegEx in goal funnels.
Google Analytics tracks page visits with path, buy default. So https://www.easyhomedesign.com.br/thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17 will become /thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17 in your reports. It can be set up to contain the domain, and the goal flow can be created as well, but it's important to check, how the RegEx should be created.
It is also important, if the above mentioned query parts are affecting the step, whether they are part of the flow or not.
Assuming that the path part is relevant, you can set up something like this.
Cart URL in reports: /cart?appSectionParams=%7B%22origin%22%3A%22cart-popup%22%7D
Cart step RegEx in goal flow: ^\/cart
^ stands for beginning of the string, but you might need to adjust, if host is present in your reports. You can also extend it to ^\/cart\?, if you expect any parameters to be present, to qualify for cart visit.
Checkout URL in reports: /checkout?appSectionParams=%7B%22a11y%22%3Afalse%2C%22cartId%22%3A%2283476f86-4ac9-44ac-8779-4479dde12cc2%22%2C%22storeUrl%22%3A%22https%3A%2F%2Fwww.easyhomedesign.com.br%2F%22%2C%22isFastFlow%22%3Afalse%2C%22isPickupFlow%22%3Afalse%7D
Checkout Cart step RegEx in goal flow: ^\/checkout
The same applies for checkout step for beginning of the string, or for any parameters required.
Thank you URL in reports: /thank-you-page/d28bc342-0afe-40cb-8d98-a5784b6b2f17
Thank you RegEx, which is essentially the goal's RegEx: ^\/thank-you-page\/[\w-]+
Here, the [\w-]+ part expects alphanumerical characters, underscore, or hyphen to be present. More precisely, one or more must exist there.
The $ sign, mentioned in the OP, could not be used here, as it indicates the end of the string, and therefore the id at the end of the URL would make it non-matching.

Chrome dev tools: any way to exclude requests whose URL matches a regex?

Unfortunately in the last versions of Chrome the negative network filter doesn't work anymore. I used this filter in order to exclude each http call containing a particular string. I asked a solution in Chrome dev tool forum but at the moment nobody answered.
So I would like to know if there is a way to resolve this problem (and exclude for example each call containing the string 'loadMess') with regex syntax.
Update (2018):
This is an update to my old answer to clarify that both bugs have been fixed for some time now.
Negate or exclude filtering is working as expected now. That means you can filter request paths with my.com/path (show requests matching this), or -my.com/path (show requests not matching this).
The regex solution also works after my PR fix made it in production. That means you can also filter with /my.com.path/ and /^((?!my.com/path).)*$/, which will achieve the same result.
I have left the old answer here for reference, and it also explains the negative lookup solution.
The pre-defined negative filters do work, but it doesn't currently allow you to do NOT filters on the names in Chrome stable, only CONTAINS. This is a bug that has been fixed in Chrome Canary.
Once the change has been pushed to Chrome stable, you should be able to do loadMess to filter only for that name, and -loadMess to filter out that name and leave the rest, as it was previously.
Workaround: Regex for matching a string not containing a string
^((?!YOUR_STRING).)*$
Example:
^((?!loadMess).)*$
Explanation:
^ - Start of string
(?!loadMess) - Negative lookahead (at this cursor, do not match the next bit, without capturing)
. - Match any character (except line breaks)
()* - 0 or more of the preceeding group
$ - End of string
Update (2016):
I discovered that there is actually a bug with how DevTools deals with Regex in the Network panel. This means the workaround above doesn't work, despite it being valid.
The Network panel filters on Name and Path (as discovered from the source code), but it does two tests that are OR'ed. In the case above, if you have loadMess in the Name, but not in the Path (e.g. not the domain or directory), it's going to match on either. To clarify, true || false === true, which means it will only filter out loadMess if it's found in both the Name and Path.
I have created an issue in Chromium and have subsequently pushed a fix to be reviewed. This has subsequently been merged.
This is answered here - for latest Chrome 58.0.3029.110 (Official Build) (64-bit)
https://stackoverflow.com/a/27770139/4772631
E.g.: If I want to exclude all gifs then just type -gif
Negative lookahead is recommended everywhere, but it does not work.
Instead, "-myregex" does work for me. Like this: -/(Violation|HMR)/.
Chrome broswer dev tools support regrex filter not very well.
When I want to hide some requests, it does not work as showed above. But you can use -hide1 -hide2 to hide the request you want.
Just leave a space between the conditions, and this does not match the regrex, I guess it may use string match other than regrex in principle
Filtering multiple different urls
You can negate symbol for filtering the network call.
Eg: -lab.com would filter lab.com urls.
But for filtering multiple urls you can use the | symbol in the regex
Eg: -/lab.com|mini.com/ This will filter lab.com and mini.com as well you can use it to filter many different websites or urls.
You can use "Invert" option to exclude the APIs matching a string in the Filter text box.
On latest chrome version (62) you have to use :
-mime-type:image/gif

Google Analytics Regex excluding a certain url in a sub folder

Currently on my GA Account I have the following URL's from our website tracked:
domain/contact-us/
domain/contact-us/global-contact-list.aspx
domain/contact-us/contactlist.aspx
The first two are from our new website which we want to track, the last one is from our old website (traffic is still being tracked but we do not want to use this)
I tried using a regex filter on this as the following:
(^/contact-us/global-contact-list\.aspx)|(^/contact-us/)
Reading up, I believe this looks for matches of exactly:
/contact-us/global-contact-list or /contact us/ but would disallow /contact-us/contactlist/
for some reason, the above one is still coming through. Can someone please see as to why this may be happening or know why this is happening?
You need to add a negative look-behind or a end of string anchor:
(^/contact-us/global-contact-list\.aspx)|(^/contact-us/$)
or
(^/contact-us/global-contact-list\.aspx)|(^/contact-us/(?!contactlist/))
This way, you will exclude /contact-us/contactlist/ from matching.
Have a look at the Demo 1 and Demo 2.
BTW, /contact us/ will not pass since (^/contact-us/) only allows a hyphen. You should add a space, e.g. (^/contact-us/global-contact-list\.aspx)|(^/contact[-\s]us/$).
Also, (^/contact-us/global-contact-list\.aspx) won't match /contact-us/global-contact-list because it needs to match .aspx.

Firing a tag on certain pages in Google Tag Manager

I am trying to create a regexp to fire a tag within Google Tag Manager on certain pages.
The issue I am having is that I do not want to fire this tag on URLs matching a querystring in them, since it is only a session identifier and I do not need the tag to fire on the pages that have a query string. These are basically duplicates and they do not need tracking in a 3rd party tracking program. I know how I could exclude them in GA, but I can't figure out how to do it for the third party tracking.
I'll detail the scenario below and what I have tried.
Example pages that come up in my URL report if I look in GA:
/page
/page/subpage?my-handsome-query-string&some-other-data
/page/subpage
/page/subpage/subsubpage
/page/?query-string-again
So what I want to do is to fire the tracking on pages that does NOT have the query string, and it is proving quite the issue.
If I put in ^/page.*[^\?] it just doesn't work. I guess I am completely using the negated character class all wrong? I can't get it working and would require some assistance on how to devise a better regexp.
Some other I tried were:
^/page/.* but this one only matched everything after /page/ but not /page.
I am not very good with regular expressions, so what I basically want to do is match /page, /page/subpage, /page/subpage/subpage etc, but not any URL that has a query string in it.
In GTM I can't create two rules that says "Include {{url path}} matching this" and "Exclude {{url path}} matching \?", so it all needs to be done within one regexp... And that totally got me at a loss.
Edit: Mike gave a good answer to solve my GTM part, but I am still interested in learning if it is possible to do above but with a single regex?
You can actually create two rules as you described.
In GTM, tags can have both Firing rules and Blocking rules. Blocking takes precedence. eg.
Firing rule:
{{url}} matches ^page/.*
Blocking rule:
{{url}} does not contain ?
Another option is to use a custom javascript macro.
It is in the form of a function(){ } which can detect a query string value in window.location.search and return boolean. Then have a firing rule {{your custom fn}} equals 1.
You can also create a macro which uses the URL macro type and Query component type.
The value is set to the query string without the leading ?. If the url was example.com?foo=bar this macro would contain foo=bar. Then simply add a firing rule {{query}} matches Regex ^$ or {{query}} does not contain something-that-will-never-be-in-the-url-to-avoid-regex

need regular expression to match dynamic url to setup goal in Google Analytics

I need to match complete dynamic URL to set-up as a goal in Google Analytics. I don't know how to do that. I have searched on Google with no luck.
So here is the case.
When pressed enter button, the goal URL would be different depending on the product selected.
Example:
http://www.somesite.com/footwear/mens/hiking-boots/atmosphere-boot-p7023.aspx?cl=BLACK
http://www.somesite.com/womens/clothing/waterproof-jackets/canyon-womens-long-jacket-p7372.aspx?cl=KHAKI
http://www.somesite.com/travel/accessories/mosquito-nets/mosquito-net-double-p5549.aspx?cl=WHITE
http://www.somesite.com/ski/accessories/ski-socks-tubes/ski-socks-p2348.aspx?cl=BLACK
If you look closely in the URL, you can see that there are three parts:
http://www.somesite.com/{ 1st part }/{ 2nd part }/{3rd part }/{ page URL }/{ querystring param}
So if I manually change page URL part like p2348 to p1234, website will redirect to the proper page:
http://www.somesite.com/kids/clothing/padded-down-jackets/khuno-kids-padded-jacket-p1234.aspx?cl=BLUE
I don't know how to do that. Please help with regular expression to match those 4 digit while p remains there OR help me with those three parts matching any text/number and then 4 digit product code.
You should try this regex. It's the most simple one and functional as well.
p\d{4}
This will return you strings like p7634, p7351, p0872.
If you are not completely sure there will be exactly 4 digits, use the following regex.
p\d*
This one will return you strings like p43, p9165, p012, p456897689 and others.
Try
p[0-9][0-9][0-9][0-9]\.aspx
if there are always 4 digits after the p.
Your attempt
[^p]\d[0-9][0-9]
does not work because [^p] matches anything except for p, and \d[0-9][0-9] matches only three digits instead of four.