How to dismiss the end of the url parameters with regex? - regex

I have a script that is supposed to trigger when a certain page path is open.
The issue: the page path contains multiple parameters including the parameter "returnUrl", returning the previous page visited.
Here is the url I want to check :
/cxsSearchApply?positionId=a0w0X000004IceYQAS&lang=en&returnUrl=https://example.com/cxsrec__cxsSearchDetail?id=a0w0X000004IceYQAS&lang=en&returnUrl=https://example.com/cxsrec__cxsSearch&lang=en
I initially used this regex code to get triggered on this page :
(cxsSearchApply.*)
But I have others regex codes like:
(cxsSearchSearchDetail.*)
And they also trigger because of the page path included in the url...
What reggex I should use to match the first part of the url but nothing after "returnUrl" ?

So you want to match cxsSearchApply on the text before &returnUrl. You could use a lookahead:
(cxsSearchApply.*)(?=returnUrl=)
However, what you really want is to match everything before the first &returnUrl. So you need a non-greedy operator:
(cxsSearchApply.*?)(?=returnUrl=)
Likewise, for your other search, it should no longer match because it is also only looking at the first part:
(cxsSearchSearchDetail.*?)(?=returnUrl=)
I believe that will get you what you want.

Nothing after "returnUrl"
If this is literally what you want, you can simply do (.*)(&returnUrl=.*) and take the first capture group as your result.

Related

Regex to target a page but not its children

I'm trying to write a regular expression to target a URL but not any of its children. My regex is definitely pretty weak and could use some help.
Page I want to target (may include trailing slash and or UTM parameters): https://test.com/deals/
Example of a page I do not want to target: https://test.com/deals/Best-Sellers/c/901
My attempt:
.*Deals\/((?!Best).)*
You can use \/deals\/?(?:[?#]\S*)?$
Check on Regex101
This is a bit more permissive than what your question suggests but it might come in handy.
The main thing is that it tries to match /deals at the end of the line. This ensures that you won't match, say https://test.com/best-deals or similar but only the URL that ends with /deals. Also, the final / is optional - you might get https://test.com/deals.
In addition to that, the regex allows for the URL to end with # anchors or ? followed by parameters. The page might allow this right now or in the future - for example, if a link is used that leads to the same page (e.g. to a specific section), you'd get a # added to the URL. Or there might be something like a filter configuration embedded in the URL https://test.com/deals/?sort=price&productsPerPage=15&page=2&minPrice=100.
Finally, you should make your regex case insensitive to account for the fact the URL might also be https://test.com/Deals/. How you set this flag will depend on where you are using the regex, so I am just adding this as a reminder.

REGEX: find URL with specific words/pages

I have the current regex exp:
http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
Which retrieves all the urls from a file, but I need it to only get the urls with a specific page, let's say page-to-find and I can't seem to do it without having the expression to add to a second group and I only want it natively in one group instead of two, as direct as possible.
Any tips?
Thanks
If its a page what does it end in? .asp? .php? .aspx? .htm? .html? (Something else?)
Try this for a start:
http[s]?://.*page-to-find

Matching URL containing one word AND another word using Regex

I am trying to write a regular expression to be used in a Google Analytics goal that will match URLs containing
?package=whatever
and also
/success
The user will first visit a page like
www.website.com/become-client/?package=greatpackage
and if they purchase they will be lead to this page
www.website.com/become-client/?package=greatpackage/success
So based on this I could use the following regex
\?package\=greatpackage/success
This should match the correct destination and I would be able to use this in the goal settings in Analytics to create a goal for purchases of the greatpackage package.
But sometimes the website will use other parameters in addition to ?package. Like ?type, ?media and so on.
?type=business
Resulting in URLs like this
www.website.com/become-client/?package=greatpackage?type=business
and if they purchase they will be lead to this page
www.website.com/become-client/?package=greatpackage?type=business/success
Now the /success part is moved away from the ?package part. My questions is how do I write a regex that will still match this URL no matter what other parameters there may be in between the parts?
---update----
#jonarz proposed the following and it works like a charm.
\?package\=greatpackage(.*?)/success
But what if there are two products with nearly the same name. For example greatpackage and greatpackageULTRA. The code above will select both. If changing the product names is impossible, how can I then select only one of them?
The regex that would solve the problem introduced in the edit, would be:
\?package\=greatpackage((\?|\/)(.*?))?\/success(\/|\b)
Here is a test: https://regex101.com/r/jS4cH5/1 and it seems to suit your needs.
If you want to match an url like this one :
www.website.com/become-client/?package=greatpackage?type=business?other=nada/success
With a group to extract your package type :
.*\?package=([^\/?]+).*\/success
Without group (just matching the url if it's containing package=greatpackage and success)
.*\?package=greatpackage.*\/success
Without group and matching for any package type :
.*\?package=[^\/?]+.*\/success
You just need to add .* to match any char (except new lines). The [^/?]* part is there to be sure your package type isn't empty (ie : the first char isn't a / nor ?).

Regex URL in django

Maybe this question is repeat it but I cant find an appropriate answer for my specific issue. I have two URL's:
url(r'^dashboard/completar-perfil/(?P<pk>[-_\w]+)/$', CompleteProfileView.as_view()),
url(r'^dashboard/.*$', DashboardView.as_view()),
As you can see both begin with dashboard. Problem is the first one does not render CompleteProfileView, always renders DashboardView, if I remove dashboard/ from the first URL, it does work fine, how can I achieve that both urls render each of their respective views?
The problem is that ^dashboard/.*$ is a greedy regular expression that will match everything that start with dashbord/, including dashboard/completar-perfil/.
So, you may need specify better the second regex. Do you really need .* ?
If it is the index of your dashboard, you could use ^dashboard/$. Otherwise, you could put another word between dashboard and your greedy regex, like the following:
r"^dashboard/another-word/.*$"

Google Analytics Regex excluding a certain url in a sub folder

Currently on my GA Account I have the following URL's from our website tracked:
domain/contact-us/
domain/contact-us/global-contact-list.aspx
domain/contact-us/contactlist.aspx
The first two are from our new website which we want to track, the last one is from our old website (traffic is still being tracked but we do not want to use this)
I tried using a regex filter on this as the following:
(^/contact-us/global-contact-list\.aspx)|(^/contact-us/)
Reading up, I believe this looks for matches of exactly:
/contact-us/global-contact-list or /contact us/ but would disallow /contact-us/contactlist/
for some reason, the above one is still coming through. Can someone please see as to why this may be happening or know why this is happening?
You need to add a negative look-behind or a end of string anchor:
(^/contact-us/global-contact-list\.aspx)|(^/contact-us/$)
or
(^/contact-us/global-contact-list\.aspx)|(^/contact-us/(?!contactlist/))
This way, you will exclude /contact-us/contactlist/ from matching.
Have a look at the Demo 1 and Demo 2.
BTW, /contact us/ will not pass since (^/contact-us/) only allows a hyphen. You should add a space, e.g. (^/contact-us/global-contact-list\.aspx)|(^/contact[-\s]us/$).
Also, (^/contact-us/global-contact-list\.aspx) won't match /contact-us/global-contact-list because it needs to match .aspx.