RewriteCond REQUEST_URI not match whole path - regex

I am quite puzzled.
My goal is to detect, whether redirect is needed (path changed). This is a minimal example.
RewriteRule ^first$ second
RewriteCond %{REQUEST_URI} !^/$1$
RewriteRule ^(.*)$ /$1 [R=301,L]
And I am requesting example.com/first with intention to get 301 to second.
Problem is, that the RewriteCond always evaluates to true and creates a loop.
On the first go, it is fine. But on the second request, which is now example.com/second, it evaluates to true again, even though %{REQUEST_URI} is /second and $1 is second. I know it is.. I checked by redirecting to URL with both variables appended.
Any idea what am I missing?

Please remember 2 important facts here:
mod_rewrite rules are run in a loop and it stops only when there is no successful rule execution
Value of %{REQUEST_URI} changes after rewrite or redirect.
Looking at your rules your 2nd redirect rule is faulty since you cannot use %1 or $1 in value part of RewriteCond thus making it always return true due to negation.

Related

Why does this rewrite rule result in an infinite loop?

I want to serve files matching a certain pattern from a subdirectory but my rule results in infinite redirect loop. In this example I want to serve google site verification files from a new path:
RewriteRule ^(google.*html)$ /google_site_verification/$1 [L]
According to my error log this results in an internal redirect loop which keeps adding /google_site_verification to the path. I have also tried:
RewriteCond %{REQUEST_URI} ^/google.*html$
RewriteRule ^(.*)$ /google_site_verification/$1 [L]
Which gives the same result. Since my regex explicitly defines beginning and ending of the pattern, why does /google_site_verification/googleabcd1234.html match? The only thing I've tried that works is adding
RewriteCond %{REQUEST_FILENAME} !-f
into the chain, but I don't want to rely on the file not existing for things to work.
You can use:
RewriteRule ^(google[^/]*\.html)$ /google_site_verification/$1 [L]
Your problem is that both urls match:
/google.html
/google_site_verification/google.html

How to handle RewriteRules when the word has similar first characters?

I have been trying to figure it out for a hours now, yet, always one of my redirects does not work. I have path something.com/blog/article-name and something.com/blogujeme to act as a article list. But I cannot get those two redirects to work, since they share the same first characters and regex fails me. So far what I came up with is:
RewriteRule ^(?!blog\?)(blogujeme)$ blog/category-view.php
RewriteRule ^(?!blogujeme\?)(blog) blog/page-view.php
The first rewrite rule actually works, but the second does not and redirects to something.com/blog instead of something.com/blogujeme
What am I doing wrong?
You can use regular expressions and a few Apache mod_rewrite directives to achieve what you want. You'll have to use a condition to check if the request is a file and if it is, then process the file rather than continue with the rules:
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^blog/.*$ blog/page-view.php [L]
RewriteRule ^blogujeme$ blog/category-view.php [L]
Then you'll have:
/blog -> blog/page-view.php
/blogujeme -> blog/category-view.php
/blog/page-view.php -> blog/page-view.php
/blog/category-view.php -> blog/category-view.php
But I cannot get to work those two redirects, since they share same
first characters and regex fails me
You should not have any problem like this with a regular expression, since your 2 examples can be differentiated. Actually, your issue is about mod_rewrite rules syntax/semantic.
Here is what you need to write in your htaccess (which has to be in root folder in this case)
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f # prevent from infinite loop
RewriteRule ^blog/.+$ blog/page-view.php [L]
RewriteRule ^blogujeme$ blog/category-view.php [L]
Shooting from the hip - how about a word boundary anchor?
RewriteRule \bblogujeme$ blog/category-view.php
RewriteRule \bblog\b.* blog/page-view.php
The simplest answer to "what are you doing wrong" is not to use [L] as an indicator to mod_rewrite to STOP and be satisfied when it has hit a match.
Order your "similar" regular expressions in an order where the most specific one goes first. If that hits and has a [L] in the end, then no more rules will run.
If it doesn't hit, the next one will be tried. Thus they can be seen as
"if, else if, etc.."

Regular expression endless loop

I have this server where two domain-names are pointed to. So in my .htaccess-file I want to make a simple rule that says something along the line:
If you come from test.domain.com then go to folder X, if you come from the-other-sub.domain.com then go to folder Y. And this means that I'm moving one of the subdomains, so I could like to make it so it redirects to the right URL (in case that people are following a deep link. For instance if people go to http://test.domain.com/path/to/a/page that they will be redirected to /path/to/a/page in the new folder.
I'm struggling to do so, though. What I don't get is, why is that this code:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^(.*)$ /test/$1 [L,R=301]
Leaves me, so if I go to subdomain.domain.com/abc , then the browser will send me to subdomain.domain.com/test/test/test/test/test/test/test/test/test/test/test/abc
and then complaining about that I have and endless loop. And please no smart links to Apache's documentation for mod_rewrite.c... I've read it, and this is where it has taken me. I know that the * means 'match 0 or more times', but I don't get why that copies the destination-string over and over and over...? /test/ isn't a variable in the regular expression, is it? So why does it repeat it?
If you redirect unconditionally to /test/... then it will keep adding /test/ before redirected URLs also.
To fix use this rule:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain\.domain\.com$ [NC]
RewriteRule ^((?!test/).*)$ /test/$1 [L,R=301,NC]
(?!test/) is negative lookahead condition which means add /test/ only if it already doesn't start with /test/.
The #anubhava's answer is a good solution.
In your current version (and in the #anubhava's one), the [L,R=301] flag causes the redirection to apply so the newly generated url is submitted again. It's why you must take care of not applying this redirection anew.
Nevertheless there is a simpler method, useful if you don't really need to generate a HTTP 301 response status code.
Then from your original version you can simply:
drop the initial "/" in the replacement expression
drop the "R=301" flag
So your version becomes:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^(.*)$ test/$1 [L]
This way, the redirect will happen only once: at the second time, Apache will recognize that the url didn't change, and then stops looping.

RewriteRule: ^ vs ^(.*)$ vs ^.*$ Is there a Difference?

What is the difference in using ^ vs ^(.*)$ vs ^.*$ as wildcards in a RewriteRule?
My goal is to redirect http://carnarianism.com/ (anything) to the landing (default) page of http://carnarian.com/. I have found the following solutions, which all seem to work, so I wonder which is better for performance?
RewriteRule ^ http://carnarian.com/ [R=301,L]
RewriteRule ^.*$ http://carnarian.com/ [R=301,L]
RewriteRule ^(.*)$ http://carnarian.com/ [R=301,L]
All of these seem to work okay. This is my very first post on StackOverflow, most of the time I can find an answer just searching for it.
To be clear: ABOVE the questioned RewriteRule in my .htaccess is a RewriteCond and WWW Handler as follows:
RewriteEngine On
RewriteBase /
# FROM www. --TO-- NO www. See no-www.org
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
RewriteCond %{HTTP_HOST} carnarianism\.com$ [NC]
########## The Above Questioned RewriteRule ??? ##########
RewriteRule ^ http://carnarian.com/ [R=301,L]
Note: I started this search with the following, but I did not want the following because the path was also passed, and I want it to go to the landing page only. Therefore, I know you need the parentheses to be able to use the $1 variable. I do not want the $1 variable.
RewriteRule ^(.*)$ http://carnarian.com/$1 [R=301,L]
^ makes none of the original URL accessible as backreferences. $0 is an empty string.
^.*$ makes the entire original URL accessible as the $0 backreference (so you can do e.g. http://example.com/oldurl.php?url=$0)
^(.*) makes the entire original URL accessible as both the $0 and $1 backreferences; it's usually used when you want to actually use the old URL in the replacement since it's more explicit about the use.
All of them match the same thing, but produce different backreference groups.
The one that is better performance wise is the one you have benchmarked yourself.
But since you are using a .htaccess file rather than having this configuration in the server directly (maybe via a VirtualHost?) which is parsed only once, it really doesn't matter. Parsing .htaccess files at every single request is much more time consuming than performing the regular expression by a factor of thousands.
If you care about performance you should never ever use .htaccess files and even disable their parsing with: AllowOverride None. Not disabling them, and having a request like: http://example.com/sites/css/theme/main.css Apache will still try to load all the following files:
.htaccess
sites/.htaccess
sites/css/.htaccess
sites/css/theme/.htaccess
It will generate system calls even if those file does not exist.
Trying therefore to improve your RewriteRule in an .htaccess file is like sneezing in the ocean in the hope of making it less salty. :)
Now, if you improved your setup to use server configuration and to answer your original question: ^.*$ might be more efficient than ^(.*)$ as less references needs to be created. Chance is high, however, that you can't measure it.

RewriteCond in .htaccess with negated regex condition doesn't work?

I'm trying to prevent, in this case WordPress, from rewriting certain URLs. In this case I'm trying to prevent it from ever handling a request in the uploads directory, and instead leave those to the server's 404 page. So I'm assuming it's as simple as adding the rule:
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
This rule should evaluate to false and make the chain of rules fail for those requests, thus stopping the rewrite. But no... Perhaps I need to match the cover the full string in my expression?
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/.*$
Nope, that's not it either. So after scratching my head I do a check of sanity. Perhaps something is wrong with the actual pattern. So I make a simple test case.
RewriteCond %{REQUEST_URI} ^/xyz/$
In this case, the rewrite happens if and only if the requested URL is /xyz/ and shows the server's 404 page for any other page. This is exactly what I expected. So I'll just stick in a ! to negate that pattern.
RewriteCond %{REQUEST_URI} !^/xyz/$
Now I'm expecting to see the exact opposite of the above condition. The rewrite should not happen for /xyz/ but for every other possible URL. Instead, the rewrite happens for every URL, both /xyz/ and others.
So, either the use of negated regexes in RewriteConds is broken in Apache, or there's something fundamental I don't understand about it. Which one is it?
The server is Apache2.
The file in its entirety:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
RewriteRule . /index.php [L]
</IfModule>
WordPress's default file plus my rule.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/ [OR]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
So, after a lot of irritation, I figured out the problem, sort of. As it turned out, the rule in my original question actually did exactly what it was supposed to. So did a number of other ways of doing the same thing, such as
RewriteRule ^wp-content/uploads/.*$ - [L]
(Mark rule as last if pattern matches) or
RewriteRule ^wp-content/uploads/.*$ - [S=1]
(Skip the next rule if pattern matches) as well as the negated rule in the question, as mentioned. All of those rules worked just fine, and returned control to Apache without rewriting.
The problem happened after those rules were processed. Instead, the problem was that I deleted a the default 404.shtml, 403.shtml etc templates that my host provided. If you don't have any .htaccess rewrites, that works just fine; the server will dish up its own default 404 page and everything works. (At least that's what I thought, but in actual fact it was the double error "Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.")
When you do have a .htaccess, on the other hand, it is executed a second time for the 404 page. If the page is there, it will be used, but now, instead the request for 404.shtml was caught by the catch-all rule and rewritten to index.php. For this reason, all other suggestions I've gotten here, or elsewhere, have all failed because in the end the 404 page has been rewritten to index.php.
So, the solution was simply to restore the error templates. In retrospect it was pretty stupid to delete them, but I have this "start from scratch" mentality. Don't want anything seemingly unnecessary lying around. At least now I understand what was going on, which is what I wanted.
Finally a comment to Cecil: I never wanted to forbid access to anything, just stop the rewrite from taking place. Not that it matters much now, but I just wanted to clarify this.
If /wp-content/uploads/ is really the prefix of the requested URI path, your rule was supposed to work as expected.
But as it obviously doesn’t work, try not to match the path prefix of the full URI path but only the remaining path without the contextual per-directory path prefix, in case of the .htaccess file in the document root directory the URI path without the leading /:
RewriteCond $0 !^wp-content/uploads/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .+ /index.php [L]
If that doesn’t work neither, it would certainly help to get some insight into mod_rewrite’s rewriting process by using its logging feature. So set RewriteLogLevel to a level of at least 4, make your request and take a look at the entries in the log file specified with RewriteLog. There you can see how mod_rewrite handles your request and with RewriteLogLevel greater or equal to 4 you will also see the values of variables like %{REQUEST_URI}.
I have found many examples like this when taking a "WordPress First" approach. For example, adding:
ErrorDocument 404 /error-docs/404.html
to the .htaccess file takes care of the message ("Additionally, a 404 Not Found error...").
Came across this trying to do the same thing in a Drupal site, but might be the same for WP since it all goes through index.php. Negating index.php was the key. This sends everything to the new domain except old-domain.org/my_path_to_ignore:
RewriteCond %{REQUEST_URI} !^/my_path_to_ignore$
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{HTTP_HOST} ^old-domain\.org$ [NC]
RewriteRule ^(.*)$ http%{ENV:protossl}://new-domain.org/$1 [L,R=301]