How to match a regex with a fixed URL + variable slug - regex

I am trying to write the following regexes for google analytics usage and so far I was unable to.
Case 1. to match with all the URLs containing /cms/en/product/{variable slug}/ which only contains one slug after the /product/. I mean something like the following:
/cms/en/product/firstslug/
Case 2. to match with all the URLs containing /cms/en/product/{variable slug1}/{variable slug2}/ which only contains two slugs after the /product/. I mean something like the following:
/cms/en/product/firstslug/secondslug/
Really appreciate anyone's help in advance.
I have already tried basics like the following and it doesn't work:
`/cms/en/product/.*/$
^/cms/en/product/.*/$
^/cms/en/product/.*/$
/cms/en/product/([^/]+)/?$
^/cms/en/product/([^/]+)/?$`

^/cms/en/product/[^/]+/$ matches "/cms/en/product/firstslug/"
^/cms/en/product/[^/]+/[^/]+/$ matches "/cms/en/product/firstslug/secondslug/"
^/cms/en/product/[^/]+/([^/]+/)?$ matches both "/cms/en/product/firstslug/" and "/cms/en/product/firstslug/secondslug/"
where
[^/]+ matches a single slug, i.e. one or more character(s) (+) which are not "/" ([^/])
([^/]+/)? matches an optional slug, i.e. an optional (?) group (()) of one or more character(s) (+) which are not "/" ([^/]) followed by a single "/"
Anyway: I would suggest using Content Grouping on collection.

Related

nginx regex - match variable number of fields

I have a route with urls that can have an optional extra field. It can be either of the form :
"/my-route/azezaezaeazeaze.123x456.jpg"
"/my-route/azezaezaeazeaze.123x456.6786786786.jpg"
with :
"azezaezaeazeaze" being a mongoId
123x456 two integers separated by "x"
6786786786 a unix timestamp
jpg an image extension (could be jpeg, png, gif...)
all those are separated by a "."
I would like to remove the optional part (the timestamp) from the request with the http rewrite module. So that the second url effectively becomes lie the first.
I made a small test on regex101 to get the groups, but :
- it doesn't seem to be the right syntax for nginx
- I do not see how it will allow me to remove the timestamp
How can I remove the timestamp from that url?
Starting from the right-hand end, you need to match a dot followed by anything
except a dot, so we have (\.[^.]*)$, then moving to the left, we want
to match a dot followed by only digits \.[0-9]*, which we dont want to
capture, and then to the left of that we want everything.
I ended up with something like this:
rewrite ^(.*)\.[0-9]*(\.[^.]*)$ $1$2 ;
Capitalizing on my first attempt and #meuh answer, I ended up with the following :
rewrite ^(/.*\..*)(\..*)(\..*)$ $1$3 last;
Now it works, but I would welcome any comment regarding the style/efficiency of this rewrite.

Google Analytics - Content grouping - Regex fix

This is our URL structure:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2
http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2
http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre
I am trying to create a list of groups based on the client names
/access-guide/[this bit]/...
So I can have a performance list of all our clients.
This is my regex:
/access-guide/(.*universit(y|ies)|.*colleg(e|es))/
I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.
At the moment, my current regex will only return groups that are X-University:
Durham-University
Plymouth-University
Cardiff-University
etc.
What does the regex need to be to have the list I'm looking for?
Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)?
Thanks for your help in advance!
Depending upon your needs you may want to do:
/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/
This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"
Live Example
Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:
/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/
If the client name section of the URL is after the access-guid/ and before the next /:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
|----------------------------|
you need to use a negated character class to only match university before the regex reaches that rightmost / boundary.
As per the Reference:
You can extract pages by Page URL, Page Title, or Screen Name. Identify each one with a regex capture group (Analytics uses the first capture group for each expression)
Thus, you can use
/access-guide/([^/]*(universit(y|ies)|colleges?))
^^^^^
See demo.
The regex matches
/access-guide/ - leftmost boundary, matches /access-guide/ literally
[^/]* - any character other than / (so we still remain in that customer section)
(universit(y|ies)|colleges?) - university, or universities, orcollegeorcolleges` literally. Add more if needed.

RegEx pattern to handle URL with dates

I moved to a new website and it mangled up my URL's. Now blog posts are accessible from multiple URL's and would like to redirect one pattern to the other.
I am trying to redirect the first case to the second case:
~/blogs/johndoe/john-doe/2014/03/14/test-article1 =>
~/blogs/john-doe/2014/03/14/test-article1
~/blogs/jimjones/jim-jones/2014/03/14/test-articleb =>
~/blogs/jim-jones/2014/03/14/test-articleb
How do I create a pattern smart enough to slice out the first "johndoe" and "jimjones"? I am using this for IIS rewrite but I think any RegEx should work. Thanks for any help.
This works:
^~/blogs/\w+/(\w+)-(\w+)/(\d{4})/(\d\d)/(\d\d)/([\w-]+)$
Debuggex Demo
It just discards the non-dash name. It doesn't know if its equal to the dash name or not. And it also assumes that the date numbers are valid. 9899/45/33 would be matched.
Capture groups:
First name
Last name
Year
Month
Day
Article name
I don't know about IIS rewrites, but this should work:
/^~/blogs\/[a-z]+\/ -> ~/blogs/
The regular expression will match the start of a string, following by ~/blogs/, followed by a string of all lowercase characters.
I don't use IIS, but this should be at least close.
Pattern:
^blogs/\w+/(\w+/)
Action
blogs/{R:1}
Handy usage doc

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+

Exclude regular expression match if it contains a string

I'm still learning regular expressions and I seem to be stuck.
I wanted to write a reg exp that matches URL paths like these that contain "bulk":
/bulk-category_one/product
/another-category/bulk-product
to only get the product pages, but not the category pages like:
/bulk-category_one/
/another-category/
So I came up with:
[/].*(bulk).*[/].+|[/].*[/].*(bulk).*
But there's pagination, so when I put the reg exp in Google Analytics, I'm finding stuff like:
/bulk-category/_/showAll/1/
All of them have
/_/
and I don't want any URL paths that contain
/_/
and I can't figure out how to exclude them.
I would go about it this way:
/[^/\s]*bulk[^/]*/[^/\s]+(?!/)|/[^/\s]+/[^/]*bulk[^/\s]*(?!/)
first part:
/ - match the slash
[^/\s]* - match everything that's not a slash and not a whitespace
bulk - match bulk literally
[^/]* - match everything that's not a slash
/ - match the slash
[^/\s]+ - match everything that's not a slash and not a whitespace
(?!/) - ensure there is not a slash afterwards (i.e. url has two parts)
The second part is more of the same, but this time 'bulk' is expected in the second part of the url not the first one.
If you need the word 'product' specifically in the second part of the url one more alternative would be required:
/[^/\s]*bulk[^/]*/[^/\s]*product[^/\s]*(?!/)|/[^/\s]+/[^/]*bulk[^/\s]*product[^/\s]*(?!/)|/[^/\s]+/[^/]*product[^/\s]*bulk[^/\s]*(?!/)
If I apply that simple regex to a file FILE
egrep ".*bulk.*product" FILE
which contains your examples above, it only matches the 2 lines with bulk and product. We can, additionally, exclude '/_/':
egrep ".*bulk.*product" FILE | egrep -v "/_/"
Two invocations are often much more easy to define and to understand, than a big one-fits-all.