.htaccess regex to find the last occurence for a match - regex

I am trying to get the last occurrence of a particular match in a url, this is the line I currently have RewriteRule ^([a-zA-Z0-9\-.]+/?)$ index.php?p=$1 [QSA,L].
So for example the following url /first-part/second-part/ needs to return a get variable of p='second-part' whereas /first-part/ needs to return a variable of p='first-part'

You can get what you want quite easily by removing the caret - you don't want the pattern to match from the start to the end, but just near the end:
([a-zA-Z[0-9\-.]+/?)$
Example: http://rubular.com/r/TQcppGpoq4
You may also want to remove the [ character, it looks like a mistake.
You can simplify the pattern to: ([\w\-.]+/?)$.

Related

Matching just the first and second block of an URL

I'm trying to do a regex to match just the second part of a URL and leave the rest behind
For example
https://example.com/first-part/second-part/third-part/?prop=2
result = https://example.com/alt/second-part/
How can I do this?
I'm able to match the first two parts but for when I use the "/" for match it picks the last / one, instead the one before.
I can go the simple way like this:
RewriteRule ^(.*)first-part\/(.+)\/(.*)\/(.*)$ https://example.com/alt/$2 [R=301,L]
The problem is that if the URL is like this:
https://example.com/first-part/second-part/
Result expected. https://example.com/alt/second-part/
It won't even match it
So I'm looking for a more generic alternative, that may match multiple scenarios giving the same result ultimately in the same format:
https://example.com/alt/second-part/
Just knowing how the first-part exactly is and not knowing how anything beyond the second-part will be formated.
Taking into account the recommendations of #Eraklon to avoid the greedy checks I've found out a solution:
RewriteRule ^first-part\/([^\/]+(\/)?)(.*) https://example.com/alt/$1 [R=301,L]
Can be checked here:
https://htaccess.madewithlove.be?share=8973fe68-f137-59a5-b27b-0cbbe3d842bc
It exactly matches the first-part with ^first-part/ and then in enters the group:
([^\/]+(\/)?)
That checks for 1 or more chars that are not a slash /. When it finds the first slash it can be the next section of the URL or the end of the URL.
Not sure if this is the best but the idea is that it matches just 1 pattern for $1 that includes both the end slash and not-slash for the second-part block of the URL.
I've not been able to remove the last bit from the url (the parameters ?parameter=a)
So the result with this form a URL like:
https://example.com/first-part/second-part/third-part/?parameter=a
Will be
https://example.com/alt/second-part/?parameter=a
Fortunately, the parameters are not too bad, but I would have preferred the full solution.

Matching redirect on url end, ignoring the substring

Im currently trying to redirect from and old website to the new one.
The domain has changed and the subpath has changed, but the end is always the same, so I am trying to create a regex that will ignore the subpath, and only match with the ending, no matter what the combination might be.
Example:
http://shop.kmsport.dk/team-sport/bolde/fodbolde
https://kmsport.dk/collections/fodbolde
http://shop.kmsport.dk/fodbolde/fodbold-udstryr/anforerbind-325
These 3 urls all contain the word "fodbolde" but I only wanna match up the first two, since they both end on "/fodbolde", and ignoring the subpath in the process.
So far I've been able to match up the ends with this:
\/([a-zA-Z]*)*+$
How do I create something to account for the different subpaths?
P.s Its a massive sporting good store, so would be nice not having to creating a unique redirect for every possible combination -.-
If you are only interested in the last part just go with
url.rsplit('/', 1)[-1]
You current regex is not taking /fodbolde into account. If that has to be at the end you could use $ to assert the end of the string like /fodbolde$
One possibility could be to match the start of the string ^https?:// and optionally match shop. (?:shop\.)? followed by kmsport\.sk/
Then use a repeating pattern matching not a forward slash followed by a forward slash zero or more times (?:[^/]+/)* and at the end of the string match fodbolde fodbolde$
^https?://(?:shop\.)?kmsport\.dk/(?:[^/]+/)*fodbolde$

Regex Matching the submatch not having some words

I want to write RewriteRule some part of URL should not end with specific set of words.
URL's like:
/en/drivers/drivername/play
But I want (drivername) section not "ending with specific words, such as "excluded" or "banned"
In other words I want following URL's to work:
/en/drivers/drivername/play
But following not to work:
/en/drivers/drivername-excluded/play
/en/drivers/drivername-banned/play
But this should be working:
/en/drivers/driver-excluded-name/play
/en/drivers/driver-banned-test/play
Is it even possible?
Without exclusion part I was using:
^(en|de)/([^\/]+)/(play|test)?
Try something like this, using a negative lookahead:
(en|de)\/([^\/]+)\/driver.+-(?!(excluded|banned)\/).*?\/(play|test)?
I took your regular expression and inserted the bit dealing with "drivername"
driver.+-(?!(excluded|banned)\/).*?
In this case, (?!(excluded|banned)\/) ensures that the "driver" section between forward slashes does not end with "excluded" or "banned" directly before the following forward slash.
https://regex101.com/r/pC8sP3/3
This appears to be working with your provided examples.

Rewriting a URL to remove certain characters

I need to rewrite some blog URLs to remove certain characters. These are the along the lines of "a556" (a is always present, the numbers are always 3 digits and are random). This is proceeded by either a single or double hyphen, which I also need to remove.
These need to redirect from:
[domain]/blog/[article_name]-a556
or
[domain]/blog/[article_name]--a556
To
[domain]/blog/[article_name_with_characters_removed]
I think the regex to detect the text to be removed is:
([-]{1,2}a[0-9])\w+
But I don't know how to put this into a Rewrite rule.
Can anyone help?
Please try this:
RewriteEngine On
RewriteRule (.*)-{1,2}a\d{3}(.*) $1$2 [R]
Are you looking for a function to process your old URLs into new ones? Something like this should do the trick, if you have an array of URLs:
var processedURLs = oldURLs.map(function(url) {
return url.replace(/[-]{1,2}a[0-9]+/, '');
})
This rewrite rule could does the trick:
blog/(.+[a-zA-Z0-9])-+a[0-9]+ blog/$1
You can simplify [a-zA-Z0-9] removing all characters ranges that can't appear in the end of the articles name slug (ie [a-z0-9] or [a-z]).

Regex for string that contains a '='

I've tried to create a regular expression that validates a string and checks if it has a = character in it.
I also need it to be in brackets like this
(.*)
in order to retrieve the value later.
What I tried was
(.*=.*)
but it doesn't work.
How can I match a string that contains a = ?
Edit:
This is my regex from my htaccess file:
RewriteRule ^(home|page1|page2|page3|admin)/(.*)/(.*)/(.*=.*) index.php?area=$1&page=$2&content=$3&$4 [L]
RewriteRule ^(home|page1|page2|page3|admin)/(.*)/(.*) index.php?area=$1&page=$2&content=$3 [L]
Examples would be
/home/foo/bar and /home/foo/bar/page=2
That's what I pretty much want to achieve. Add GET parameters in an eye-candy way. Also, I need to parse if it contains a = character, because there are various depths in the web site such as /foo/page=1 and foo/bar/page=1
Actually this works for me. This call:
preg_match('/.*=.*/','foo=bar');
returns 1.
However, if you just want to check if the string contains =, then strpos is just enough.
If, instead, it is in the context of a bigger regular expression, the problem may be elsewhere. Please show us the whole matching pattern and some sample inputs with the corresponding expected behaviour.