Matching just the first and second block of an URL - regex

I'm trying to do a regex to match just the second part of a URL and leave the rest behind
For example
https://example.com/first-part/second-part/third-part/?prop=2
result = https://example.com/alt/second-part/
How can I do this?
I'm able to match the first two parts but for when I use the "/" for match it picks the last / one, instead the one before.
I can go the simple way like this:
RewriteRule ^(.*)first-part\/(.+)\/(.*)\/(.*)$ https://example.com/alt/$2 [R=301,L]
The problem is that if the URL is like this:
https://example.com/first-part/second-part/
Result expected. https://example.com/alt/second-part/
It won't even match it
So I'm looking for a more generic alternative, that may match multiple scenarios giving the same result ultimately in the same format:
https://example.com/alt/second-part/
Just knowing how the first-part exactly is and not knowing how anything beyond the second-part will be formated.

Taking into account the recommendations of #Eraklon to avoid the greedy checks I've found out a solution:
RewriteRule ^first-part\/([^\/]+(\/)?)(.*) https://example.com/alt/$1 [R=301,L]
Can be checked here:
https://htaccess.madewithlove.be?share=8973fe68-f137-59a5-b27b-0cbbe3d842bc
It exactly matches the first-part with ^first-part/ and then in enters the group:
([^\/]+(\/)?)
That checks for 1 or more chars that are not a slash /. When it finds the first slash it can be the next section of the URL or the end of the URL.
Not sure if this is the best but the idea is that it matches just 1 pattern for $1 that includes both the end slash and not-slash for the second-part block of the URL.
I've not been able to remove the last bit from the url (the parameters ?parameter=a)
So the result with this form a URL like:
https://example.com/first-part/second-part/third-part/?parameter=a
Will be
https://example.com/alt/second-part/?parameter=a
Fortunately, the parameters are not too bad, but I would have preferred the full solution.

Related

Htaccess redirect matching from a list?

I believe there should be a simpler method to rewriting URLs than I currently have and wonder if anyone can help.
The site I am working on has multiple brands for example:
https://example.com/anything/brand/nike/something
https://example.com/anything/brand/puma/something
My current redirect would be
RedirectMatch 301 "(.*)/brand/nike(.*)$" "$1/manufacturer--nike"
To get me the following output, removing /brand/ and replacing nike with manufacturer--nike and finally removing anything that follows i.e. /something.
https://example.com/anything/manufacturer--nike
Now I could add a second rule for Puma and each of the other brands, but I imagine there is a way to match against a list of brands and use one rule but my Google skills have failed me in finding a solution.
Is there a way?
If the number of brands are limited then you could use alternation (eg. nike|puma|brand etc.) in the regex, to match nike or puma or brand.
Aside: To match nike/<something>, you should at least check for that slash, otherwise it's going to match nikeeeee and nikey etc. (Although not necessarily an issue.)
For example:
RedirectMatch 301 "(.*)/brand/(nike|puma|brand)/" "$1/manufacturer--$2"
The (.*)$ at the end of the regex - to simply remove the trailing part of the URL-path - is not required.
The $2 backreference contains the "brand-name" matched by the second capturing (alternation) subpattern.
method to rewriting URLs
Just to clarify, this is an external redirect, it's not "rewriting URLs".

Rewriting a URL to remove certain characters

I need to rewrite some blog URLs to remove certain characters. These are the along the lines of "a556" (a is always present, the numbers are always 3 digits and are random). This is proceeded by either a single or double hyphen, which I also need to remove.
These need to redirect from:
[domain]/blog/[article_name]-a556
or
[domain]/blog/[article_name]--a556
To
[domain]/blog/[article_name_with_characters_removed]
I think the regex to detect the text to be removed is:
([-]{1,2}a[0-9])\w+
But I don't know how to put this into a Rewrite rule.
Can anyone help?
Please try this:
RewriteEngine On
RewriteRule (.*)-{1,2}a\d{3}(.*) $1$2 [R]
Are you looking for a function to process your old URLs into new ones? Something like this should do the trick, if you have an array of URLs:
var processedURLs = oldURLs.map(function(url) {
return url.replace(/[-]{1,2}a[0-9]+/, '');
})
This rewrite rule could does the trick:
blog/(.+[a-zA-Z0-9])-+a[0-9]+ blog/$1
You can simplify [a-zA-Z0-9] removing all characters ranges that can't appear in the end of the articles name slug (ie [a-z0-9] or [a-z]).

mod_rewrite regex ignoring empty matches

I have a section of my site that I want to browse by 4 filter criteria passed in the URL:
http://site/browse/a/b/c/d
Each of the 4 parameters should be optional.
I have this mod_rewrite rule in place:
RewriteRule ^browse(/([^/]*)(/([^/]*)(/([^/]*)(/([^/]*))?)?)?)? /photo.php?a=$2&b=$4&c=$6&d=$8 [L]
It works fine if I have all 4 parameters, or omit later parameters, but if I try and skip the first parameters I get unexpected behavior:
http://site/browse/1/2/3/4 = /photo.php?a=1&b=2&c=3&d=4 [correct]
http://site/browse/1/2 = /photo.php?a=1&b=2 [correct]
http://site/browse//2/3/4 = /photo.php?a=2&b=3&c=4 [unexpected]
http://site/browse////4 = /photo.php?a=4 [unexpected]
Rather than passing an empty string as the first match, it ignores that match entirely and treats multiple sequential slashes as if they were one and puts the parameters in the wrong variable. If I put any non-empty placeholder in the empty variable it works, but I would rather not handle it like that:
http://site/browse/-/-/-/4 = /photo.php?a=-&b=-&c=-&d=4 [works,not pretty]
How can I fix my regex so that http://site/browse////4 gives /photo.php?a=&b=&c=&d=4 ?
edit: In another experiement I found that the multiple slashes are always merged, for example http://site/photo/browse////4 into RewriteRule ^photo/(browse.*) matches "browse/4" not "browse////4" as would be expected.
I guess the question should be how to stop mod_rewrite from merging sequential slashes into one?
This seems to do the trick, at least for matching (I'm not 100% sure you're even allowed to have blank url segments in rewrite rules, but this regex does the right thing, anyway):
browse\/([^\/]*)\/?([^\/]*)\/?([^\/]*)\/?([^\/]*)
Try it out here
The key mistake you made was in the way you used ?. You made whole blocks optional, so it dropped them and you got your matches out with a different indexing. My regex only makes the / optional, causing a zero-length capturing group for all the other variables if there's nothing there.

Regex for string that contains a '='

I've tried to create a regular expression that validates a string and checks if it has a = character in it.
I also need it to be in brackets like this
(.*)
in order to retrieve the value later.
What I tried was
(.*=.*)
but it doesn't work.
How can I match a string that contains a = ?
Edit:
This is my regex from my htaccess file:
RewriteRule ^(home|page1|page2|page3|admin)/(.*)/(.*)/(.*=.*) index.php?area=$1&page=$2&content=$3&$4 [L]
RewriteRule ^(home|page1|page2|page3|admin)/(.*)/(.*) index.php?area=$1&page=$2&content=$3 [L]
Examples would be
/home/foo/bar and /home/foo/bar/page=2
That's what I pretty much want to achieve. Add GET parameters in an eye-candy way. Also, I need to parse if it contains a = character, because there are various depths in the web site such as /foo/page=1 and foo/bar/page=1
Actually this works for me. This call:
preg_match('/.*=.*/','foo=bar');
returns 1.
However, if you just want to check if the string contains =, then strpos is just enough.
If, instead, it is in the context of a bigger regular expression, the problem may be elsewhere. Please show us the whole matching pattern and some sample inputs with the corresponding expected behaviour.

.htaccess regex to find the last occurence for a match

I am trying to get the last occurrence of a particular match in a url, this is the line I currently have RewriteRule ^([a-zA-Z0-9\-.]+/?)$ index.php?p=$1 [QSA,L].
So for example the following url /first-part/second-part/ needs to return a get variable of p='second-part' whereas /first-part/ needs to return a variable of p='first-part'
You can get what you want quite easily by removing the caret - you don't want the pattern to match from the start to the end, but just near the end:
([a-zA-Z[0-9\-.]+/?)$
Example: http://rubular.com/r/TQcppGpoq4
You may also want to remove the [ character, it looks like a mistake.
You can simplify the pattern to: ([\w\-.]+/?)$.