Google Analytics exclude string apart from one variation using Regex - regex

I'm trying to configure a Google Analytics filter that would exclude all pages matching ^/app$ and ^/app/.* apart from the following: /app/business-signup
It looks like Google doesn't support negative lookahead. I have found the 2 following relevant discussions, but I haven't managed to make it work. Google Analytics Regex include and exclude string without negative lookahead and RegExp alternative to negative lookahead match for Google Analytics
So far the following include filter is showing the most accurate results, but it's still excluding URLs that shouldn't be excluded.
^(/$|/app/business\-signup|/[^a][^p][^p][^/])
Expected excluded URLs:
/app
/app/
/app/abc
/app/abc?test=1
...
Expected included URLs:
/app/business-signup
/
/?test=1
/about
/about/abc
...

I think you can use:
^/(?:app/business-signup|(?:[^a\n].?.?|.?[^p\n].?|.?.?[^p\n])(?:/.*|$)|...[^/\n].*)?$
See an online demo
^/ - Start-line anchor followed by a literal forward slash;
(?: - Open non-capture group;
app/business-signup - Match the one option you want to exclude;
| - Or;
(?:[^a\n].?.?|.?[^p\n].?|.?.?[^p\n])(?:/.*|$) - Two nested non-capture groups. Where the 1st would match up to three characters where we excluded the 'a', 'p' and 2nd 'p' in order to exclude the word 'app' followed by the 2nd group to match a literal forward slash followed by 0+ more characters or end-line anchor;
| - Or;
...[^/\n].* - Match three characters followed by any character other than forward slash followed by 0+ characters;
)? - Close non-capture group and make it optional to allow a single forward slash;
$ - End-line anchor.
Note: you may just remove the newline characters from the pattern if need be.

You can use
^(/$|/app/business-signup|/(?:[^a]..|.[^p].|..[^p]).*)
See the regex demo. Details:
^ - start of string
(/$|/app/business\-signup|/(?:[^a]..|.[^p].|..[^p]).*) - Group 1:
/$ - / at the end of string
| - or
/app/business-signup - a /app/business-signup fixed string
| - or
/(?:[^a]..|.[^p].|..[^p]).* - a /, then either any char other than a and then any two chars, or any char + a char other than p + any char, or any two chars and then any char other than p, and the rest of the line.

Related

How do I make this regular expression not match anything after forward slash /

I have this regular expression:
/^www\.example\.(com|co(\.(in|uk))?|net|us|me)\/?(.*)?[^\/]$/g
It matches:
www.example.com/example1/something
But doesn't match
www.example.com/example1/something/
But the problem is that, it matches: I do not want it to match:
www.example.com/example1/something/otherstuff
I just want it to stop when a slash is enountered after "something". If there is no slash after "something", it should continue matching any character, except line breaks.
I am a new learner for regex. So, I get confused easily with those characters
You may use this regex:
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)(?:\/[^\/]+){2}$
RegEx Demo
This will match following URL:
www.example.co.uk/example1/something
You can use
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)\/([^\/]+)\/([^\/]+)$
See the regex demo
The (.*)? part in your pattern matches any zero or more chars, so it won't stop even after encountering two slashes. The \/([^\/]+)\/([^\/]+) part in the new pattern will match two parts after slash, and capture each part into a separate group (in case you need to access those values).
Details:
^ - start of string
www\.example\. - www.example. string
(?:com|co(?:\.(?:in|uk))?|net|us|me) - com, co.in, co.uk, co, net, us, me strings
\/ - a / char
([^\/]+) - Group 1: one or more chars other than /
\/ - a / char
([^\/]+) - Group 2: one or more chars other than /
$ - end of string.

Regex to properly match urls with a particular domain and also if there is a subdomain added

I have the following regex:
(^|^[^:]+:\/\/|[^\.]+\.)hello\.net
Which seems to work fors most cases such as these:
http://hello.net
https://hello.net
http://www.hello.net
https://www.hello.net
http://domain.hello.net
https://solutions.hello.net
hello.net
www.hello.net
However it still matches this which it should not:
hello.net.domain.com
You can see it here:
https://regex101.com/r/fBH112/1
I am basically trying to check if a url is part of hello.net. so hello.net and any subdomains such as sub.hello.net should all match.
it should also match hello.net/bye. So anything after hello.net is irrelevant.
You may fix your pattern by adding (?:\/.*)?$ at the end:
(^|^[^:]+:\/\/|[^.]+\.)hello\.net(?:\/.*)?$
See the regex demo. The (?:\/.*)?$ matches an optional sequence of / and any 0 or more chars and then the end of string.
You might consider a "cleaner" pattern like
^(?:\w+:\/\/)?(?:[^\/.]+\.)?hello\.net(?:\/.*)?$
See the regex demo. Details:
^ - start of string
(?:\w+:\/\/)? - an optional occurrence of 1+ word chars, and then :// char sqequence
(?:[^\/.]+\.)? - an optional occurrence of any 1 or more chars other than / and . and then .
hello\.net - hello.net
(?:\/.*)?$ - an optional occurrence of / and then any 0+ chars and then end of string

Regex to Match Words and Numbers with Repeating Sequences (FOO-123 / FOO-456 /...etc)

https://regexr.com/539me
I have a changelog that I need to look like this:
- [FOO-123] This is a change from one project
- [FOO-567 / FOO-890] This has two changes from one project
- [BAR-123 / BAZ-456 / BANG-1234 ] This has three changes from three different projects
I was satisfied with my current regex that I have, but then I started testing it further, and it messes up when I accidentally type typos or add a character like A from BAR to FOO to make FOA or missing a /:
- [FOB-1234] hello
- [BAG-1234] how
- [FOO-1234 FOO-5678] are
- [FOA-1234 / BARG-1234 / BZF-1234] you?
How would I get it so that the top is always good but the bottom never works?
Regex I've currently created:
/-\s\[[(FOO|BAR|BAZ|BANG)-\d{\s}{/}{\s}+]*]\s.+/g
https://regexr.com/539me
You could match one of the alternatives and use an optionally repeating group prepended with a space, forward slash and space.
^-\s\[(?:FOO|BAR|BAZ|BANG)-\d+(?: / (?:FOO|BAR|BAZ|BANG)-\d+)*\] .+$
That will match
^ Start of string
\s\[ Match a whitespace char and [
(?:FOO|BAR|BAZ|BANG) Match any of the alternatives
-\d+ Match - and 1+ digits
(?: Non capture group
/ (?:FOO|BAR|BAZ|BANG)-\d+ Match / , 1 or the alternatives and - plus 1+ digits
)* Close group and repeat 0+ times
\] .+ Match ], space and 1+ occurrences of any char except a newline.
$ End of string
Regex demo
Note to remove the [ and ] around the group or else it would make it a character class.

regex match URL path only with specific chars?

I search a regex in PHP to match a simple URL path with specific characters and not more.
My regex don't work exactly (flag 'gm' only for test. in working process please without 'g' for more exactly.):
/^\/[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?$/gm
URL path Examples with comment:
#match: YES
/
/trip-001
/trip-001/
/trip-001/summer-2019
/trip-001/summer-2019/
/trip-001/summer-2019/ibiza-001/
/trip-001/summer-2019/ibiza-001/PICT-001
#match: NO
//
trip-001
trip-001/
trip-001/summer-2019
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001
trip-001//
//trip-001/summer-2019
//trip-001//summer-2019
trip-001//summer-2019
//trip-001/summer-2019/
//trip-001//summer-2019//
trip-001//summer-2019/
trip-001/summer-2019//
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
//trip-001/summer-2019/ibiza-001/
//trip-001//summer-2019/ibiza-001/
//trip-001/summer-2019//ibiza-001/
//trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001/summer-2019/ibiza-001/PICT-001
# and similar
/trip-001/summer-2019/ibiza-001/PICT-001/
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
trip-001/summer-2019/ibiza-001/PICT-001/
trip-001/summer-2019/ibiza-001/whatever-987/PICT001
trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
I have no idea it works with {n}.
Only this charset: A-Z a-z 0-9 - / and exactly no more. Please no \d for digits.
It's for a !preg_match() in PHP.
EDIT: Leading slash is a must have. Double slash and more is not allowed. Trailing slash yes or no.
It appears the URL should only be valid if there are not more than 5 slashes.
You may adjust your pattern as
^(?!(?:[^\/]*\/){5})(?:(?:\/[A-Za-z0-9-]+){1,4}\/?|\/)$
See regex demo
Details
^ - start of string
(?!(?:[^\/]*\/){5}) - a negative lookahead that fails the match if there are 5 occurrences of / chars in the string
(?: - start of the non-capturing group:
(?:\/[A-Za-z0-9-]+){1,4}\/? - 1 to 4 occurrences of a / and 1+ ASCII alphanumeric or - chars and then an optional / char
| - or
\/ - a single / char in the string
) - end of the non-capturing group
$ - end of string.

Regex pattern not working in AEM Templates-allowedPaths property

(?=(/content/xxx/(.*)/(.*)/(.*)/(.*)/*))(?=(^(?:(?!sample1|sample2).)*).*)
This is my regex pattern to limit my visibility of templates under some path and avoid being created under specific folders.
Could anyone figure out any issue or suggest some other ways?
You may use
^/content/([^/]*)/([^/]*)/([^/]*)/(?![^/]*/(?:sample1|sample2))([^/]*)
See the regex demo
Details:
^ - start of string
/content/ - a literal substring
([^/]*)/ - 0+ chars other than / and a /
([^/]*)/([^/]*)/ - 2 previous subpatterns on end
(?![^/]*/(?:sample1|sample2)) - a negative lookahead that fails the match if there are any 0+ chars other than /, then / and either sample1 or sample2 immediately to the right of the current location
([^/]*) - 0+ chars other than /
Note that if you are not using submatches, the pattern can be shortened to
^/content/(?:[^/]*/){3}(?![^/]*/(?:sample1|sample2))[^/]*
See another demo