Regex help to match number pattern in a URL - regex

I am updating a website and need to forward url's with a specific pattern to a new page. I need to identify those URL's using a regex so only those get redirected.
An example of the URL I need to match on is;
/events/2894/an-event-page-title-123
I would not want to match on this URL
/events/example-page-123
The way to positively identify the URL's is to check for '/events/' followed by only numbers in the next section of the URL. there may be more or less than 4 numbers, but only ever numbers. As in the example, there is also
/^events\/([0-9]+)\/([a-z0-9_\.-]+)$/
Above is what I have tried which is probably very wrong as I am not very experienced with Regex (please stifle any laughter), but this is probably straightforward for someone.. many thanks in advance

^\/events\/([0-9]+)\/([a-z0-9_\.-]+)
Try this.See demo.Your regex was not matching because \ before `events was not matched.
http://regex101.com/r/rQ6mK9/45

I would use:
\/events\/\d+\/
which is /events/ followed by 1 or more digits, followed by /

Related

What regex in Google Analytics to use for this case?

I'm trying to figure out what landing page regex to use to only show URLs that have only two sub-folders, e.g. see image below: just show green URLs but not the read ones as they have 3+ subfolders. Any advice on how to do this in GA with regex?
Cheers
If you want to match a path having only two components, e.g.
/component1/component2/
Then you may use the following regex:
/[^/]+/[^/]+/
Demo
If your regex tool requires anchors, then add them:
^/[^/]+/[^/]+/$
Is this what you are looking for?
^\/[!#$&-;=?-[]_a-z~]+\/[!#$&-;=?-[]_a-z~]+\/$
The two sections contain all the valid html characters. We're also forcing the regex to start with slash, end with slash and have only one slash in between.

regular expressions: catch any URLs of the domain example.com

I'm trying to get regexp code for the below case. I tried multiple tries but in vain.
I need to catch any URLs of the domain site.com. Tried using regexp '^site.com/*$
but it does not recognizes it.
i'm just looking for regexp code whichmatches site.com/*
With your expression ^site.com/*$ you match all strings that start with site.com and have zero or more trailing / characters (/*):
If you want to match any strings starting with site.com/ you might want to try ^site\.com/.*$:
There are already a lot of other regex questions regarding domain names on SO, but your question is not clear to me in what context you are trying to do this, or what is the actual goal you want to achieve. If you describe your needs more precisely you could probably find some answers on this forum.
I generally use a helper website like regex101.com.
Also, a few things to note, . has a special meaning in regex meaning any character, and if you wanted to capture site.com/foo you might want to use something where you are not limited to the number of characters by the end. I'd do this with groupings.
^(site\.com\/)(.+)$
You can see this in action here: https://regex101.com/r/AU2iYC/2
Your regex ^site.com/*$ is only matched follow sentences
ex) site.com/ site.com//////// site.com
because * asterisk in regex means Match 0 or more of the preceding token.
so, it should be work
^site.com\/.*$

Regex on domain and negation against language subfolders

Let's say my domains are:
www.test.com
www.test.com/en-gb
www.test.com/cn-cn
These are language sites, the first is the main US English site. In Google Analytics I want to set up a filter to only show me traffic of the first (US) domain. I could do this, I think:
^\/(en-gb|cn-cn).*$
If I EXCLUDE my Request URI with that filter pattern, then I should get a view for the en-US domain. However, I'm interested in understanding regex better so here is some test data and code which I am trying out on http://www.regextester.com/
Regular expression:
^\/(en-gb|cn-cn).*$
Test String
/cn-cn/about
/cn-cn/about/
/cn-cn
/cn-cn/about/test
/en-gb/
/en-gb
/en-gb-test/
/en-gb/aboutus/
/en-gb?q=1
/en-gb/?q=1
/about-us
/test?q=1
/aword/me/
/three
/about/en-gb/
/about/en-gb-test/
/test-yes/
/test/me/
/hello/world/
My questions:
If you try this out, you'll notice that /en-gb-test/ is actually matched with the Regex. How do I avoid this?
Also, let's say I wanted to have a rule to NEGATE this whole option. So rather than telling Google Analytics to "exclude", I am curious how I could write the opposite of this same rule. So basically, catch all URLs that are not in /en-gb and /cn-cn sub-folders.
Thanks in advance!
You may stop the regex from matching en-gb-test by making sure you may / or ? after it or the end of the string
^\/(en-gb|cn-cn)([\/?]|$)
See the regex demo. If you really need to get the rest of the string, add .* after [\/?]: ^\/(en-gb|cn-cn)([\/?]|$).
Details:
^ - start of string
\/ - a / (note that you do not need to escape / in GA regex)
(en-gb|cn-cn) - a capturing group with 2 alternatives, either en-gb or cn-cn
([\/?]|$) - a capturing group with two alternatives: a ? or / OR the end of the string.
In RE2 regex, you cannot use lookaheads that are crucial when you need to match something other than something else. It would look like ^(?!\/(en-gb|cn-cn)([\/?]|$)).*, but it is not possible with RE2.

RegEx match all website links except those containing admin

I'm setting up URL Rewrite on an IIS and i need to match the following URLs using regex.
http://sub.mysite.com
sub.mysite.com
sub.mysite.com/
sub.mysite.com/Site1
sub.mysite.com/Site1/admin
but not:
sub.mysite.com/admin
sub.mysite.com/admin/somethingelse
sub.mysite.com/admin/admin
The site it self (sub.mysite.com) should not be "hardcoded" in the expression. Instead, it should be matched by something like .*.
I'm really blank on this one. I did find solutions to match the different URLs but once i try to combine them either none of them match or all of them do.
I hope someone can help me.
For your specific case, assuming you are matching the part after the domain (REQUEST_URI):
(?!/admin).*
(?!...) is a negative lookahead. I am not sure if it is supported in the IIS URL Rewrite engine. If not, a better approach would be to check for a complementary approach:
Or as #kirilloid said, just match /admin/? and discard (pay attention to slashes).
BTW. if you want to quickly test RegExps with a "visual" feedback, I highly recommend http://gskinner.com/RegExr/
([A-Za-z0-9]+.)+.com(?!/admin)/?([A-Za-z0-9]+/?)*
this should do the trick

Regex for simple urls

I am looking for regex for simple URLs as
http://www.google.com
http://www.yahoo.in
http://www.example.eu
http://www.example.net
etc.
No subdirectories allowed. For example in this cases it must not validate http://www.google.com/, http://www.yahoo.in/mail.
Does anyone know any regex to do this?
I'm still a noob, but try this:
^http:\/\/[a-zA-Z0-9_\-]+\.[a-zA-Z0-9_\-]+\.[a-zA-Z0-9_\-]+$
This one should do:
^(https?:\/\/)?[0-9a-zA-Z]+\.[-_0-9a-zA-Z]+\.[0-9a-zA-Z]+$
This should work for URLs starting with http:// or https:// or without the protocol name.
The regex should also be used as case-insensitive. In that case, it can be shortened a bit:
^(https?:\/\/)?[0-9a-z]+\.[-_0-9a-z]+\.[0-9a-z]+$
If you don't care whether it is a valid url, you can use:
\S*www\.\S+
All the examples contain www. followed by a nonspace character, but that is unlikely to occur in a normal word.