Regular expression endless loop - regex

I have this server where two domain-names are pointed to. So in my .htaccess-file I want to make a simple rule that says something along the line:
If you come from test.domain.com then go to folder X, if you come from the-other-sub.domain.com then go to folder Y. And this means that I'm moving one of the subdomains, so I could like to make it so it redirects to the right URL (in case that people are following a deep link. For instance if people go to http://test.domain.com/path/to/a/page that they will be redirected to /path/to/a/page in the new folder.
I'm struggling to do so, though. What I don't get is, why is that this code:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^(.*)$ /test/$1 [L,R=301]
Leaves me, so if I go to subdomain.domain.com/abc , then the browser will send me to subdomain.domain.com/test/test/test/test/test/test/test/test/test/test/test/abc
and then complaining about that I have and endless loop. And please no smart links to Apache's documentation for mod_rewrite.c... I've read it, and this is where it has taken me. I know that the * means 'match 0 or more times', but I don't get why that copies the destination-string over and over and over...? /test/ isn't a variable in the regular expression, is it? So why does it repeat it?

If you redirect unconditionally to /test/... then it will keep adding /test/ before redirected URLs also.
To fix use this rule:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain\.domain\.com$ [NC]
RewriteRule ^((?!test/).*)$ /test/$1 [L,R=301,NC]
(?!test/) is negative lookahead condition which means add /test/ only if it already doesn't start with /test/.

The #anubhava's answer is a good solution.
In your current version (and in the #anubhava's one), the [L,R=301] flag causes the redirection to apply so the newly generated url is submitted again. It's why you must take care of not applying this redirection anew.
Nevertheless there is a simpler method, useful if you don't really need to generate a HTTP 301 response status code.
Then from your original version you can simply:
drop the initial "/" in the replacement expression
drop the "R=301" flag
So your version becomes:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^(.*)$ test/$1 [L]
This way, the redirect will happen only once: at the second time, Apache will recognize that the url didn't change, and then stops looping.

Related

Regex: "Mod Rewrite" everything from a "wildcard" to an address, without changing the address?

I really didn't know how to write the title. I changed it several times before I posted. But feel free to change it to the most appropriate question.
I also can't believe I couldn't find an answer already to this pretty basic thing I wanna to. I searched both here and on Google but couldn't find anything that answered this.
So I have this default WordPress .htaccess code:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
But what I would like to add, is the possibility of having all paths beginning with /cv/ to show the page for /cv/. So like a wildcard after, like /cv/*.
I tried with several versions of this:
RewriteRule /cv/.+ /cv/ - [L]
But none worked. Most things I tried redirected me to the "Couldn't find the page" page. But some just redirected back to /cv/. But I want the whatevers'-after-/cv/ should stay there. So if the address is for example /cv/hello, it should still be /cv/hello in the address but the page showing should be /cv/.
Don't think it should be so difficult. What have I missed?
ok, I set up a test now and got the following commands to work for domain.com/cv/hello
to redirect to domain.com/cv but keep the URL
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} \/cv\/(.+)$ [NC]
# make sure to exit here, if there already was a redirect (to prevent endless redirecting)
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^.*$ /cv/%1 [NC,P,R=301,L]
The "magic" is to FollowSymlinks, to use P that tells apache to proxy-pass the redirect, so that the URL remains the same, and to check if there already has been a redirect to the current URL in order to avoid endless redirecting
I solved it temporary (not a nice solution but..) by adding a rewrite rule in my functions file. So that everything from cv/* points to a specific page. In this case page with ID 8472.
/**
* Add Rewrite Rule
*/
function custom_rewrite_basic()
{
add_rewrite_rule('^cv/(.+)/?', 'index.php?page_id=8472', 'top');
remove_action('generate_after_header', 'generate_featured_page_header');
}
add_action('init', 'custom_rewrite_basic');
So, this is just the solution for WordPress. But I don't know. Maybe the other answers on this page would have worked if it wasn't WordPress.

How can I redirect all files in a folder to another directory except a few files using mod_rewrite?

I have a situation where an entire folder's contents are no longer needed and will be redirected to the home page, except 6 or so files. The folder holds over 300 files, so individual redirects:
redirect 301 /folder/file.html http://www.domain.tld/
redirect 301 /folder/file2.html http://www.domain.tld/
redirect 301 /folder/file3.html http://www.domain.tld/
This would take quite a long time. I have some time before needing this done, and would like to know if anyone knows a good way to achieve this by using a little regex with mod_rewrite.
For optimum understanding for all who may use the potential correct answer, lets say the files that we don't want to redirect are:
/folder/stay1.html
/folder/stay2.html
/folder/stay3.html
Thanks in advance for this wonderful community of very knowledgeable people helping those of us who still have a few things to learn!
Edit
Is it possible to achieve this and keep the base url of the folder?
/folder/
/folder/index.html
I tried the following without success:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/folder(/|/index.html|/stay1.html|/stay2.html|/stay3.html|/etc.html)
RewriteRule ^/?folder/ http://www.domain.tld/ [L,R=301]
Edit Correct Answer
A big thanks goes out to Jon Lin for the answer.
The correct method to redirect all files of a /folder/ except a few, while still allowing access to /folder/ is:
RewriteEngine On
# Allow /folder/ to remain accessible
RewriteCond %{REQUEST_URI} !^/folder/$
# Allow specified files to remain accessible
RewriteCond %{REQUEST_URI} !^/folder/(index.html|stay1.html|stay2.html|stay3.html|etc.html)
# Redirect all non-specified files to home page
RewriteRule ^/?folder/(.+)$ http://www.domain.tld/ [L,R=301]
Using mod_rewrite, you can create exception conditions to the redirect, try putting these rules in the htaccess file in your document root:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/folder/$
RewriteCond %{REQUEST_URI} !^/folder/(index.html|stay1.html|stay2.html|stay3.html|etc.html)
RewriteRule ^/?folder/(.+)$ http://www.domain.tld/ [L,R=301]
So anything that's in the list: (stay1.html|stay2.html|stay3.html|etc.html) will fail the condition and the redirect won't happen. Otherwise, anything starting with /folder/ will get redirected to http://www.domain.tld/.
Note that if you have mod_alias redirects intermixed, they may interfere with each other.
You could use RedirectMatch with a negative lookahead, like:
RedirectMatch permanent ^/?folder/(?!(stay1\.html|stay2\.html|stay3\.html)) http://domain.tld
An alternative mod-rewrite solution would be like this:
RewriteRule ^/?folder/(stay1\.html|stay2\.html|stay3\.html)$ - [L]
RewriteRule ^/?folder/.* http://domain.tld
The first rule catches all the exceptions, the L flag ensures no further processing takes place in this pass, and the - instructs the engine not to rewrite, ensuring no further passes are made. Anything not caught by the first rule is redirected by the second rule.

htaccess URL rewrite: What am I messing up in the code?

I have the following rewrite rule:
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} Catalog\/(string|page)\/
RewriteRule ^Catalog\/(string|page)\/([^\/]+)\/[^\.]+\.html$ Catalog/Catalog.php?$1=$2
RewriteRule is on one line but may be showing on multiple lines here.
My questions is mainly what am I doing wrong. I am not getting any errors so Rewrite is working. The address I am typing in to the browser is www.domain.com/Catalog/string/RT/Round_Tomato.html and what I was hoping to get is www.domain.com/Catalog/Catalog.php?string=RT
I am guessing my regex is messed up but have not been able to get it right.
I think that the first wrong thing is putting point without backslash before (.) => www.
^www\.([^/])/Catalog/([a-zA-Z0-9]+)/([a-zA-Z0-9_]+).html$ www.$1/Catalog/Catalog.php?string=$2
here is a clear example:
RewriteRule ^http://www.remotesite.com/(.*)$
/mirror/of/remotesite/$1
http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
check this out:
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteRule ^Catalog/(.+)/(.+)\.html$ /Catalog/Catalog.php?$1=$2
you should restart your apache server -if it's a local server- (start and stop) and you must be able to access http://localhost/Catalog/string/RT.html

RewriteCond in .htaccess with negated regex condition doesn't work?

I'm trying to prevent, in this case WordPress, from rewriting certain URLs. In this case I'm trying to prevent it from ever handling a request in the uploads directory, and instead leave those to the server's 404 page. So I'm assuming it's as simple as adding the rule:
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
This rule should evaluate to false and make the chain of rules fail for those requests, thus stopping the rewrite. But no... Perhaps I need to match the cover the full string in my expression?
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/.*$
Nope, that's not it either. So after scratching my head I do a check of sanity. Perhaps something is wrong with the actual pattern. So I make a simple test case.
RewriteCond %{REQUEST_URI} ^/xyz/$
In this case, the rewrite happens if and only if the requested URL is /xyz/ and shows the server's 404 page for any other page. This is exactly what I expected. So I'll just stick in a ! to negate that pattern.
RewriteCond %{REQUEST_URI} !^/xyz/$
Now I'm expecting to see the exact opposite of the above condition. The rewrite should not happen for /xyz/ but for every other possible URL. Instead, the rewrite happens for every URL, both /xyz/ and others.
So, either the use of negated regexes in RewriteConds is broken in Apache, or there's something fundamental I don't understand about it. Which one is it?
The server is Apache2.
The file in its entirety:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
RewriteRule . /index.php [L]
</IfModule>
WordPress's default file plus my rule.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/ [OR]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
So, after a lot of irritation, I figured out the problem, sort of. As it turned out, the rule in my original question actually did exactly what it was supposed to. So did a number of other ways of doing the same thing, such as
RewriteRule ^wp-content/uploads/.*$ - [L]
(Mark rule as last if pattern matches) or
RewriteRule ^wp-content/uploads/.*$ - [S=1]
(Skip the next rule if pattern matches) as well as the negated rule in the question, as mentioned. All of those rules worked just fine, and returned control to Apache without rewriting.
The problem happened after those rules were processed. Instead, the problem was that I deleted a the default 404.shtml, 403.shtml etc templates that my host provided. If you don't have any .htaccess rewrites, that works just fine; the server will dish up its own default 404 page and everything works. (At least that's what I thought, but in actual fact it was the double error "Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.")
When you do have a .htaccess, on the other hand, it is executed a second time for the 404 page. If the page is there, it will be used, but now, instead the request for 404.shtml was caught by the catch-all rule and rewritten to index.php. For this reason, all other suggestions I've gotten here, or elsewhere, have all failed because in the end the 404 page has been rewritten to index.php.
So, the solution was simply to restore the error templates. In retrospect it was pretty stupid to delete them, but I have this "start from scratch" mentality. Don't want anything seemingly unnecessary lying around. At least now I understand what was going on, which is what I wanted.
Finally a comment to Cecil: I never wanted to forbid access to anything, just stop the rewrite from taking place. Not that it matters much now, but I just wanted to clarify this.
If /wp-content/uploads/ is really the prefix of the requested URI path, your rule was supposed to work as expected.
But as it obviously doesn’t work, try not to match the path prefix of the full URI path but only the remaining path without the contextual per-directory path prefix, in case of the .htaccess file in the document root directory the URI path without the leading /:
RewriteCond $0 !^wp-content/uploads/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .+ /index.php [L]
If that doesn’t work neither, it would certainly help to get some insight into mod_rewrite’s rewriting process by using its logging feature. So set RewriteLogLevel to a level of at least 4, make your request and take a look at the entries in the log file specified with RewriteLog. There you can see how mod_rewrite handles your request and with RewriteLogLevel greater or equal to 4 you will also see the values of variables like %{REQUEST_URI}.
I have found many examples like this when taking a "WordPress First" approach. For example, adding:
ErrorDocument 404 /error-docs/404.html
to the .htaccess file takes care of the message ("Additionally, a 404 Not Found error...").
Came across this trying to do the same thing in a Drupal site, but might be the same for WP since it all goes through index.php. Negating index.php was the key. This sends everything to the new domain except old-domain.org/my_path_to_ignore:
RewriteCond %{REQUEST_URI} !^/my_path_to_ignore$
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{HTTP_HOST} ^old-domain\.org$ [NC]
RewriteRule ^(.*)$ http%{ENV:protossl}://new-domain.org/$1 [L,R=301]

Temporary redirect 302 with .htaccess and mod-rewrite matching expression

I'm trying to match a a bunch of redirects for my website with basically moved to a different folder on the server. I need to make http://www.site.com/index.php?page=anypage go to http://www.site.com/newfolder/index.php?page=anypage. The thing is http://www.site.com/index.php and http://www.site.com/index.php?page=home should remain untouched. How can I accomplish this?
I was trying the following in the .htaccess file, but I am affraid to make a mistake. I really don't know how to test this, either.
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^/index.php?page=(.*)$ http://www.site.com/newfolder/index.php?page=$1 [R=302,NC]
RewriteRule ^/index.php?page=home http://www.site.com/index.php?page=home [R=302,NC,L]
Now I figured that this is temporary, so I should know ho to reverse it! The next week, the links will have to redirect again to the root server. Also, what should I do to re-establish the normal redirection??
If I've followed your scenario correctly, you want something like this:
RewriteEngine On
RewriteCond %{QUERY_STRING} !=""
RewriteCond %{QUERY_STRING} !page=home
RewriteRule ^index.php /newfolder/index.php [R,L]
As far as testing goes, I prefer to try rules out on a local test server. If you have full control over the server (as is the case locally), there are some mod_rewrite directives that help you log what's going on, and that can be helpful in debugging. The module documentation has more information about this.
Edit: When you want to switch back, modify the RewriteRule above like so:
RewriteRule ^newfolder/index\.php /index.php [R,L]