Unknown number of regex replacements, how? - regex

I need to change a large number of URIs in the following way:
substitute %20 separators with dashes -,
substitute the old root with a new domain.
Examples:
/old_root/first/second.html -> http://new_domain.com/first/second
/old_root/first/second%20third.html -> http://new_domain.com/first/second-third
/old_root/first/second%20third%20fourth.html -> http://new_domain.com/first/second-third-fourth
The best I came up with using regex is to write as many pattern-replacement rules as the maximum number of %20 separators that can occur in my URIs:
old_root/(.*?)/(.*?)\.html -> http://new_domain.com/$1/$2
old_root/(.*?)/(.*?)%20(.*?)\.html -> http://new_domain.com/$1/$2-$3
old_root/(.*?)/(.*?)%20(.*?)%20(.*?)\.html -> http://new_domain.com/$1/$2-$3-$4
My question is: is it possible to obtain the same result using a single regular expression rule?
I thought I could use a two-step approach: first change all %20 separators to - and then use the rule old_root/(.*?)/(.*?)\.html -> http://new_domain.com/$1/$2/. However, I need to apply this rule in a .htaccess file as a RedirectMatch directive and, as far as I know, it is not possible to use two successive rules for the same redirect directive.

It turns out that Apache recursively applies all regex rules until they stop matching. Therefore, I am allowed to write two rules rather than one to solve my problem.
The following rules do what I was looking for, and more; I have tested them on my local Apache server and they work fine. Note that for them to work, you need to first turn on the rewrite engine by prepending
RewriteEngine on
Options +FollowSymlinks -MultiViews
in the local .htaccess file or in the global httpd.conf file.
Replace all spaces with hyphens
Replace both literal spaces and %20 with hyphens:
RewriteRule ^(.+)(\s|%20)(.+)$ /$1-$3 [R=301,NE,L]
Replace all apostrophes with hyphens
Replace all literal apostrophes and %60 with hyphens:
RewriteRule ^(.+)('|`|%60)(.+)$ /$1-$3 [R=301,NE,L]
Delete the trailing .html extension
RewriteRule (.+)\.html$ $1 [R=301,L]
Convert the last field in the URL to lower case
Convert the last field in the URL to lower case and prepend the new domain:
RewriteRule /whatever/(.*?)/(.*?)/(.*) http://new.domain.com/$1/$2/${lc:$3} [R=301,L]
Important: The lowercase conversion will only work if you include the following lines at the end of the Apache configuration file httpd.conf, which is usually located in the etc directory on the server:
RewriteEngine on
RewriteMap lc int:tolower
A last note: I recommend prepending each rule with a RewriteCond directive to restrict the field of application of the rule. For example, to apply the space-to-hyphens rule only to those URI that match a certain regex, you should write the following in your .htaccess file:
RewriteCond %{REQUEST_URI} regex_for_URIs
RewriteRule ^(.+)(\s|%20)(.+)$ /$1-$3 [R=301,NE,L]
where regex_for_URIs is the regular expression that the URI must match in order for the next RewriteRule to be applied; it can also be a simple string.

Well, you were almost done.
Problems
Don't return "%20" - We'll Use them as "delimiter" of parts of the path
Add condition on third & fourth group (because you might pass short URL i.e. your examples)
Solution
\/old_root\/(.*?)\/(\w*)(?:%20)?(\w*)?(?:%20)?(\w*)?\.html
See Demo
Explanation
(?:%20)? means "%20" is non catching group that can occurs 0 or 1 time.
Same logic applyies on possible 3rd & 4th part.

Related

How to match string if it doesn't contain only numbers after slash?

I am redirecting certain urls with path to get variables like the following:
localhost2/post/myTitle => localhost2/post.php?title=myTitle
localhost2/post/123 => localhost2/post.php?id=123
So In my htaccess file, I use
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^post/(\d+) post.php?id=$1
RewriteRule ^post/(.*) post.php?title=$1
</IfModule>
This works no problem. But I want to learn how to write negative of ^post/(\d+), that is ^post/(NEGATE-ONLY-NUMBERS). In other words I want a regex that matches the whole input sting if there is not only numbers after post/. So post/abc, post/a23, post/ab3, post/12c and post/a2c should all pass but not post/123. I refered to this post, which suggest using:
(?!^\d+$)^.+$
I can't use ^post/(?!^\d+$)^.+$, because there can be only one ^ and one $. I don't know what regex anchor specifies first position in a substring. My best guess is
post\/(?!\d++).*
I think (?!\d++), with the ++ would eat all characters followig and check if all are digits. But this fails at post/1ab.
Another guess is:
post\/(?![\d,\/]+$).*
The works the best but it allows: post/3455/X.
Secondly, eventually I need to convert localhost2/post/myTitle/123 => localhost2/post.php?title=myTitle&repeat=123 as well. I ave come up with the following:
^post/(?!\d+($|/))(.+?($|/))(\d+$)?
Note: +? to use lazy quantifier, otherwise multiple slashes will be matched by .
and
^post/(?!\d+($|/))([^/\n\r]+($|/))(\d+$)?
Here I use [^/\n\r] instead of .+?
Patterns inside zero-width assertions like (?!\d++) are non-consuming, they do not "eat" chars, they only check the context while keeping the regex index at the same location as before matching the zero-width assertion pattern.
You can use any of the following:
^post/(?!\d+(?:/|$)).*
^post/(?!\d+(?=/|$)).*
^post/(?!\d+(?![^/])).*
See the regex demo. Details:
^post/ - start of input, post/ literal string
(?!\d+(?=/|$)) - a negative lookahead that fails the match if, immediately to the right of the current location, there are one or more digits followed with / or end of string
.* - the rest of the input.
Do not over complicate things when you can keep things simple by keeping 3 separate rewrite rules and since your query parameters are named differently you will need 3 separate rewrite rules anyway.
Consider:
Options -MultiViews
RewriteEngine On
RewriteRule ^post/(\d+) post.php?id=$1 [L,QSA,NC]
RewriteRule ^post/([^/]+)/(\d+) post.php?title=$1&repeat=$2 [L,QSA,NC]
RewriteRule ^post/([^/]*) post.php?title=$1 [L,QSA,NC]
Take note of Options -MultiViews. If this is not enabled in Apache config you must have it here otherwise it will keep all $_GET parameters empty in your php file.
Option MultiViews (see http://httpd.apache.org/docs/2.4/content-negotiation.html) is used by Apache's content negotiation module that runs before mod_rewrite and makes Apache server match extensions of files. So if /file is the URL then Apache will serve /file.html.

product/1 instead of product.php?id=1

This is not really a problem but a question.
I have this file that shows the product information when you go to URL/product.php?id=1 How can I make it show the same when going to URL/product/1. id is a variable that changes.
Sorry, I have no clue how htaccess works, and how to rewrite..
You need to internally rewrite the request from /product/1 to /product.php?id=1. On Apache, you need to do this with mod_rewrite. In .htaccess this would take the form of:
# We must enable the rewrite engine before using mod_rewrite
RewriteEngine On
# Internally rewrite a request from "/product/1"
RewriteRule ^product/1$ /product.php?id=1 [L]
Note that this literally rewrites from /product/1 to /product.php?id=1 (as stated in the question) nothing else. And is internal to the server - the URL in the browser's address bar does not change.
The arguments to Apache directives are space separated:
^product/1$ - The first argument (pattern) to the RewriteRule directive is a regular expression (regex) that matches against the URL-path (only) of the request. Note that in .htaccess this URL-path does not start with a slash, so the URL-path that is matched is product/1 not /product/1, even though you are requesting example.com/product/1.
/product.php?id=1 - The second argument (substitution) is the string that is substituted for the requested URL. ie. the target URL. This is an "ordinary" string, not a regex.
[L] - The third argument (flags) are additional options that can change how the RewriteRule directive behaves. The argument must be surrounded in square brackets and contains a comma separated list of flags. The L (or last) flag signifies this is the last directive in this round of processing. If this is the last directive in the file then the L flag is not required. If you omit the L flag then processing continues and the request could be further rewritten (if you have more directives). Another common flag is the R (or redirect) flag. This changes the internal rewrite into an external redirect (which sends a Location HTTP response header back to the client and results in the browser being externally redirected to the new URL - the URL in the browser's address bar changes).
Additional Note: In this instance, since you are requesting "product" and a file with that basename exists (in fact, that is the file you are rewriting to) you also need to make sure that MultiViews is disabled (it is by default). If MultiViews is enabled (some shared hosts enable this for some reason) then mod_negotiation will trigger an internal subrequest for product.php before your mod_rewrite directive gets to rewrite the request and this will be missing the id URL parameter. (Numerous rewriting issues on SO are caused by conflicts with MultiViews.) To disable MultiViews, you can include this at the top of your .htaccess file:
Options -MultiViews
More generic
To make this more generic and rewrite /product/<number> to /product.php?id=<number>, where <number> is 1 or more digits, you can modify the regex (first argument) and create a backreference that you use in the substitution string (second argument). For example:
# Internally rewrite a request from "/product/<number>"
RewriteRule ^product/(\d+)$ /product.php?id=$1 [L]
This would rewrite a URL of the form /product/123 to /product.php?id=123. Where 123 is 1 or more digits, denoted by the regex subpattern \d+. (\d is a shorthand character class and is the same as the marginally more verbose [0-9]. + is a quantifier that indicates 1 or more of the preceding pattern - in this case digits.) By surrounding this in parentheses, we create a capturing group, which we can refer to in the substitution. That's what the $1 backreference is. $1 is essentially a variable that contains whatever value the regex captured.
In summary:
# Disable MultiViews
Options -MultiViews
# We must enable the rewrite engine before using mod_rewrite
RewriteEngine On
# Internally rewrite a request to "/product/<number>"
RewriteRule ^product/(\d+)$ /product.php?id=$1 [L]
Reference:
https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html#rewriterule

RewriteRule to remove superfluous single "?" in URL

I am using IBM HTTP server configuration file to rewrite a URL redirected from CDN.
For some reason the URL comes with a superfluous single question mark even when there are no any query string. For example:
/index.html?
I'm in the process of making the 301 redirect for this. I want to remove the single "?" from the url but keep it if there is any query string.
Here's what I tried but it doesn't work:
RewriteRule ^/index.html? http://localhost/index.html [L,R=301]
update:
I tried this rule with correct regular expression but it never be triggered either.
RewriteRule ^/index.html\?$ http://localhost/index.html [L,R=301]
I tried to write another rule to rewrite "index.html" to "test.html" and I input "index.html?" in browser, it redirected me to "test.html?" but not "index.html".
You need to use a trick since RewriteRule implicitly matches against just the path component of the URL. The trick is looking at the unparsed original request line:
RewriteEngine ON
# literal ? followed by un-encoded space.
RewriteCond %{THE_REQUEST} "\? "
# Ironically the ? here means drop any query string.
RewriteRule ^/index.html /index.html? [R=301]
Question-mark is a Regular Expression special character, which means "the preceding character is optional". Your rule is actually matching index.htm or index.html.
Instead, try putting the question-mark in a "character class". This seems to be working for me:
RewriteRule ^/index.html[?]$ http://localhost/index.html [L,R=301]
($ to signify end-of-string, like ^ signifies start-of-string)
See http://publib.boulder.ibm.com/httpserv/manual60/mod/mod_rewrite.html (for your version of Apache, which is not the latest)
Note from our earlier attempts, escaping the question-mark doesn't seem to work.
Also, I'd push the CDN on why that question-mark is being sent. This doesn't seem a normal pattern.

Rewrite URL condition - Change particular string & remove trailing numbers

What's the best way to rewrite URLs as 301 redirects with the following conditions?
Sample old URLs to rewrite:
/c/garments-apparel/red-yellow-polka-dress-10_450
/c/shoes-and-accessories/black-suede-boots-02_901
Conditions:
Change c to category
Remove trailing number (including connecting dash) from URL (example: -10_450 and -02_901)
New URLs should be:
/category/garments-apparel/red-yellow-polka-dress
/category/shoes-and-accessories/black-suede-boots
Note that changes will be applied to an .htaccess file on a Wordpress environment.
You can have this rule just below RewriteEngine On line:
RewriteEngine On
RewriteRule ^c/([\w-]+/.+)-[\d_]+/?$ /category/$1 [L,NC,R=301]
you can use the regex
[-_]\d+
to replace the trailing numbers with "" (empty string) demo
then use the regex
\/c\/
and replace with /category/ demo

How to rewrite this URL to a redirect page?

I am using Microsoft-IIS/7.5 on a hosted server (Hostek.com)
I have an existing site with 2,820 indexed links in Google. You can see the results by searching Google with this: site:flyingpiston.com Most of the pages use a section, makerid, or bikeid to get the right information. Most of the links look like this:
flyingpiston.com/?BikeID=1068
flyingpiston.com/?MakerID=1441
flyingpiston.com/?Section=Maker&MakerID=1441
flyingpiston.com/?Section=Bike&BikeID=1234
On the new site, I am doing URL rewriting using .htaccess. The new URLs will look like this:
flyingpiston.com/bike/1068/
flyingpiston.com/maker/1123/
Basically, I just want to use my htaccess file to direct any request with a "?" question mark in it directly a coldfusion page called redirect.cfm. On this page, I will use ColdFusion to write a custom 301 redirect. Here's what ColdFusion's redirect looks like:
<cfheader statuscode="301" statustext="Moved Permanently">
<cfheader name="Location" value="http://www.newurl/bike/1233/">
<cfabort>
So, what does my htaccess file need to look like if I want to push everything with a question mark to a particular page? Here's what I have tried, but it's not working.
RewriteEngine on
RewriteRule ^? /redirect.cfm [NS,L]
Update. Using the advice from below, I am using this rule:
RewriteRule \? /redirect/redirect.cfm [NS,L]
To try to push this request
http://flyingpiston2012-com.securec37.ezhostingserver.com/?bikeid=1235
To this page:
http://flyingpiston2012-com.securec37.ezhostingserver.com/redirect/redirect.cfm
There's a couple of reasons what you're trying isn't working.
The first one is that RewriteRule uses a regex, and ? is a regex metacharacter, which therefore needs be escaped with a backslash (\?) to tell it to match the literal question mark character.
However, the second part of the problem is that the regex for RewriteRule is only tested against the filename part of the URL - it specifically excludes the query string.
In order to match against the query string you need to use the RewriteCond directive, placed on the line before the rule (but applied in between the RewriteRule matching and replacing), acting as an additional filter. The useful bit is that you can specify which part of the URL to match against (as well as having the option for using non-regex tests).
Bearing all this in mind, the simplest way to match/rewrite a request with a query string is:
RewriteCond %{QUERY_STRING} .
RewriteRule .* /redirect/redirect.cfm
The %{QUERY_STRING} is what the regex is tested against (everything in CF's CGI scope can be used here, and some other stuff too - see the Server Variables box in the docs).
The single . just says "make sure the matched item has any single character"
At the moment, this rule will preserve the existing query string - if you want to discard it, you can place a ? onto the end of the replacement URL. (If you need to use a query string on the URL and not discard the old version, use the [QSA] flag.)
In the opposite direction, you're losing the filename part of the URL - to preserve this, you probably want to append it onto the replacement as PATH_INFO, using the automatic whole-match capture $0.
These two things together provides:
RewriteCond %{QUERY_STRING} .
RewriteRule .* /redirect/redirect.cfm/$0?
One final thing is that you'll want to guard against infinite loops - the above rule strips the query string so it will always fail the RewriteCond, but better to be safe (especially if you might need to add a query string), which you can do with an extra RewriteCond:
RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !/redirect/redirect\.cfm
RewriteRule .* /redirect/redirect.cfm/$0?
Multiple RewriteCond are combined as ANDs, and the ! negates the match.
You can of course add whatever flags are required to the RewriteRule to have it behave as desired.