Expand existing regex to NOT match URL part - regex

A messy legacy part of a website I am trying to simplify uses many URL rewrites for routing. Now I have a problem with a new feature, because rewrites I need to use for said feature don't work because of older rewrites, that are essential for the legacy website functionality.
For example, the "new feature" URL https://www.example.com/new-feature/something is matched by the legacy rewrite:
(.?.+?)(/[0-9]+)?/?$
... and a few others.
I tried expanding the legacy rewrite with a modified version of a negative lookahead like suggested here:
^(?!.*(new\-feature))(.?.+?)(/[0-9]+)?/?$ EDIT 2: Is this syntactically ok?
... but that broke my feature as well as the legacy part.
How can I expand the legacy rewrites without affecting their functionality?
Thanks!
EDIT:
The system is WordPress, the rewrites are done in the old "Rewrite" plugin which is based on the WordPress WP_Rewrite class.
I'd be happy to not change this, because there are 80+ rules. Sadly, it doesn't look like this plugin respects order and [L] flags – or I don't know how to do it.

Ok, this hurts a bit, but I want to end this with style ...
E. g. hyphens and slashes in URLs like "/new-feature/something" don't need to be escaped in the mod_rewrite environment. So, if you test regex-es with online tools (which want them to be escaped), remember to remove escape characters for e.g. "-" and "/".
So, my "solution" was to remove the escape characters from my regex-es.
Thanks everyone for having a look, sorry for wasting your time :-/
Cheers

Related

Regex to change old url to new with wordpress redirection

I want to redirect for example
www.mydomain.com/my-profile.html?userId=18681
to
www.mydomain.com/members
what shall i put in my Source URL?
I have more than 2000 404 errors on webmaster because i changed from cms to cms, so i want to fix my redirection regex so not to enter the errors one bye one because I have
/my-profile.html?userId=18681
/my-profile.html?userId=12451
/my-profile.html?userId=9251
How can i make it general so it automatic redirects all to www.mydomain.com/members
I use this plugin http://wordpress.org/plugins/redirection/
I'm not sure how you're going about implementing the redirect. But from a purely regex standpoint, If I wanted to convert the top url format to the one you put below it, here is the find-and-replace format I would use:
s/(my-.+\d+)$/members/
So find 'my-', then one or more of any character, then ENDING with one or more digits. Replace that (starting with my- and ending with the digits) with 'members'.
Sorry if this does not solve your issue, and keep in mind this is 'perl compatible' format for regex, find-and-replace may (likely) be a formatted differently for the language you are implementing this with.

More efficient RewriteRule for messy URL

I have developed a new web site to replace an existing one for a client. Their previous site had some pretty nasty looking URLs to their products. For example, an old URL:
http://mydomain.com/p/-3-0-Some-Ugly-Product-Info-With-1-3pt-/a-arbitrary-folder/-18pt/-1-8pt-/ABC1234
I want to catch all requests to the new site that use these old URLs. The information I need out of the old URL is the ABC1234 which is the product ID. To clarify, the old URL begins with /p/ followed by four levels of folders, then the product ID.
So for example, the above URL needs to get rewritten to this:
http://mydomain.com/shop/?sku=ABC1234
I'm using Apache 2.2 on Linux. Can anyone point me on the correct pattern to match? I know this is wrong, but here is where I am currently at:
RewriteRule ^p/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)?$ shop/?sku=$5 [R=301,NC,L]
I'm pretty sure that the pattern used to match each of the 4 folders is redundant, but I'm just not that sharp with regex. I've tried some online regex evaluators with no success.
Thank you.
--EDIT #1--
Actually, my RewriteRule above does work, but is there a way to shorten it up?
--EDIT #2--
Thanks to ddr, I've been able to get this expression down to this:
RewriteRule ^p/([\w-]+/){4}([\w-]+)$ shop/?_sku=$2 [R=301,NC,L]
--EDIT #3--
Mostly for the benefit of ddr, but I welcome anyone to assist who can. I'm dealing with over 10,000 URLs that need to be rewritten to work with a new site. The information I've provided so far still stands, but now that I am testing that all of the old URLs are being rewritten properly I am running into a few anomolies that don't work with the RewriteRule example provided by ddr.
The old URLs are consistent in that the product ID I need is at the very end of the URL as documented above. The first folder is always /p/. The problem I am running into at this point is that some of the URLs have a URL encoded double quote ("). Additionally, some of the URLs contain a /-/ as one of the four folders mentioned. So here are some examples of the variations in the old URLs:
/p/-letters-numbers-hyphens-88/another-folder/-and-another-/another-18/ABC1234
/p/-letters-numbers-hyphens-88/2%22/-/-/ABCD1234
/p/letters-numbers-hyphens-1234/34-88/-22/-/ABCD1234/
Though the old URLs are nasty, I think it is safe to say that the following are always true:
Each begins with /p/
Each ends with the product ID that I need to isolate.
There are always four levels of folders between /p/ and the product ID.
Some folders in between have hyphens, some don't.
Some folders in between are a hyphen ONLY.
Some folders in between contain a % character where they are URL encoded.
Some requests include a / at the very end and some do not.
The following rule was provided by ddr and worked great until I ran into the URLs that contain a % percent sign or a folder with only a hyphen:
RewriteRule ^p/(?:[\w-]+/){4}([\w-]+)$ shop/?_sku=$1 [R=301,NC,L]
Given the rule above, how do I edit it to allow for a folder that is hyphen only (/-/) or for a percent sign?
You can use character classes to reduce some of the length. The parentheses (capture groups) are also unnecessary, except the last one, as #jpmc26 says.
I'm not especially familiar with Apache rules, but try this:
RewriteRule ^p/(?:[\w-]+/){4}([\w-]+)$ shop/?sku=$1 [R=301,NC,L]
It should work if extended regular expressions are supported.
\w is equivalent to [A-Za-z0-9_] and you don't need to not capture underscores, so that's one replacement.
The {4} matches exactly four repetitions of the previous group. This is not always supported so Apache may not like it.
The ?: is optional but indicates that these parens should not be treated as a capture. Makes it slightly more efficient.
I'm not sure what the part in [] at the end is for but I left it. I can't see why you'd need a ? before the $, so I took it out.
Edit: the most compact way, if Apache likes it, would probably be
RewriteRule ^p(/[\w-]+){5}$ shop/?sku=$5 [R=301,NC,L]
EDIT: response to edit 3 of the question.
I'm surprised it doesn't work with only -. The [\w-]+ should match even where there is just a single -. Are you sure there isn't something else going on in these URLs?
You might also try replacing - in the regex with \-.
As for the %, just change [\w-] to [\w%-]. Make sure you leave the - at the end! Otherwise the regex engine will try to interpret it as part of a char sequence.
EDIT 2: Or try this:
RewriteRule ^p/(?:.*?/){4}(.*?)/?$ shop/?sku=$1 [R=301,NC,L]

Help convert Apache rewrite rules to PHP regular expressions

Short story: I am using this technique to auto-version my css and js files by adding a string to the filename with filemtime():
http://w-shadow.com/blog/2012/07/30/automatic-versioning-of-css-js/
I got it up and running perfectly on my local machine (MAMP), but I use WP Engine for my hosting and they are set up on nginx and don't support .htaccess rewrite rules.
They do have a place to enter PHP regular expressions (preg_replace), though, and their instructions look like this:
HTML Post-Processing
A mapping of PHP regular expressions to replacement values which are executed on all blog HTML after WordPress finishes emitting the entire page. The pattern and replacement behavior is in the manner of preg_replace().
The following example removes all HTML comments in the first pattern, and causes a favicon (with any filename extension) to be loaded from another domain in the second pattern:
#<!--.*?-->#s =>
#\bsrc="/(favicon\..*)"# => src="http://mycdn.somewhere.com/$1"
. So I'm wondering how hard it is to convert this rewrite rule to a PHP regular expression:
RewriteRule ^(.*)\.[\d]{10}\.(css|js)$ $1.$2 [L]
And if this would even be doing the same thing as the apache rewrite. the whole point of the technique is to bust the browser cache for css or js files and time they are changed, but without resorting to query strings, which have various drawbacks.
Actually, it's pretty much the same. Take your regex, delimit it, drop it in a string and escape the right things, then take your rewrite rule and use single quotes to make it a string, and you're done. In your example:
$newUrl = preg_replace('/^(.*)\\.[\\d]{10}\\.(css|js)$/', '$1.$2', $url);
This will properly rewrite anything url you give it. However, it sounds like these preg_replaces are being done across a large document, which means your regex there won't do what you think it will. That, however, is a completely separate question. One I won't even guess at, because I don't know what your requirements are. If you need help crafting the regex, please open another question with your specific requirements.
Also: Next time, Check the documentation.

Apache mod_proxy_html Substitute: how to re-use part of regex match? (regex variables?)

[Full disclosure: Cross-post between here and ServerFault, because I believe the audiences (server admins & devs) are distinct enough to warrant asking the question to both separately.]
Hi all,
Have a unique URL-rewriting situation in Apache.
I need to be able to take a URL that starts with
"\u002f[X]"
or
'\u002f[X]"
Where X is the rest of some URL, and substitute the text
"\u002fmeis2\u002f[X]
I'm not sure how the Regex works in Apache -- I think it's the same as Perl 5? -- but even then I'm a little unsure how this would be done. My hunch is that it has to do with Regex grouping and then using $1 to pull the variable out, but I'm entirely unfamiliar with this process in Apache.
Hoping someone can help -- thanks!
You are right. Group the text that you want to re-use with parens, and use $1 in the substitution. Use the following .htaccess file:
RewriteEngine On
RewriteRule ^\u002f(.*) /\u002fmeis2\u002f$1
(I am not certain that mod_rewrite handles unicode escapes, but it seems so from your question.)

Can one use named backreference's in Apache mod_rewrite

All,
I've come across an interesting little quirk in one of my RewriteRules, which I wanted to resolve by the use of named back references. However from what I can see, this is not possible in Apache's mod_rewrite.
I have two incoming urls, each containing a key variable, which need to be rewritten to the same underlying framework action.
Incoming urls:
/users/list/page-2
/users/list/2
Desired rewrite endpoint
/?module=users&action=list&pagenum=2
I would have liked to do something like this
RewriteRule ^/(?P<module>([\w]+))/(?P<action>([\w]+))/(page-)?(?P<pagenum>([\d]+))$ /?module=${module}&action=${action}&pagenum=${pagenum} [L,QSA]
However Apache just doesn't want to play like that at all, and gives me null values in the places of the named backreferences. To get me round the problem I've used numerical references to the captured groups ($1, $2, $4)(but I'm almost halfway to the N=9 apache limit). So this isn't a show stopper for me.
I would just like to know, if named backreferences are available in Apache's mod_rewrite, and if they are, why does my RewriteRule's pattern not match?
Thanks,
Ian
THis might be useful:
https://httpd.apache.org/docs/trunk/rewrite/rewritemap.html
If #superspace's latest answer doesn't work, what I would suggest is routing all links that are not to direct files/directories and route them to an index page. Then setup a routing class which takes in the page name and does manual matching, so you can have your named capture regex array and list the templates or pages you want to feed.
If you have to go this way, let me know and I can offer some code from my classes.
No backreferences it seems, after looking into the mod_rewrite source.
I'd recommend using the RewriteMap option anyway instead of a long list of RewriteRules, as it will be much faster than iterating through a lengthy list.