Removing a string of characters in a URL through htaccess - regex

I'm struggling to come up with the correct code to do what I need. I've searched through SO and other sites and found answers close to what I want, but I just can't quite piece it all together right, and .htaccess is a huge weakness of mine.
I'm trying to make it so an entire folder level gets removed from all URLs on a site, otherwise preserving the structure. After that, I need to add ".html" to the end. The addition isn't anything hard, but I'm missing what I need to strip out the folder.
Starting URL: www.domain.com/ANYFOLDER/any-page-name
(Bonus: www.domain.com/ANYFOLDER/ANYDEPTH/any-page-name)
Ending URL: www.domain.com/any-page-name.html
We have a client who is moving from a static site to CMS-driven, has some great ranks/traffic for his URLs, and is petrified he will lose this (we will not take Permanent Redirects as a solution).

You can use this rule for this redirect:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(?:[^/]*/)*((?!.+?\.html$)[^/]*)$ /$1.html [L,R=302,NC]

Related

.htaccess with 2 parameters messing up css, js, images paths

I was working with URLs on my webpage but I can't solve issue for URLs with 2 parameters.
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-zA-Z0-9-z\-]+)/?$ index.php?strona=$1 [L]
RewriteRule ^([a-zA-Z0-9-z\-]+)/([a-zA-Z0-9-z\-]+)/?$ index.php?strona=$1&id=$2 [L]
URLs seem fine except that when my current URL has 2 parameters (for example I'm on http://example.com/subpage/5 whole webpage is broken (stylesheets, navigation etc) because .htaccess changed all links to:
(for example navigation):
http://example.com/subpage_with_2_parameters/home
instead of
http://example.com/home
Pages with one parameter (example: http://example.com/contact) work fine.
Only solution (which is horrible) I have on mind are absolute links.
You're not the only one dealing with this problem of css, js, images paths getting messed up after implementing so-called pretty URLs. I am seeing these problems being reported on SO almost every day.
You can solve this problem in 3 ways:
Best solution is to use absolute paths for images, css and js files i.e. start your path with / or http://
Another option is to use base href tag in HTML head section like this:
<base href="http://www.example.com/">
3rd option is via mod_rewrite
Put these lines above your other RewriteLine in your .htaccess file:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-l
RewriteCond %{DOCUMENT_ROOT}/$1 -f
RewriteRule ^[^/]+/([^.]+\.(?:js|css|jpe?g|png|gif))$ /$1 [L,R=301,NC]
It's not your rewrite rules which are breaking your stylesheets, but your actual HTML. The same thing would happen if you had an actual directory called foo and placed index.php in there.
If you write Home, that link is relative to the current directory (in URL-space) so on a page with a URL like http://example.com/foo/bar/baz it links to http://example.com/foo/bar/home.
What you want instead is for the link to be relative to the root of your domain; for that, you need a leading slash: Home
The only reason this seemed to work before is that all your URLs were in the root directory, so "current directory" and "root of domain" were the same thing.

Redirecting specific parameters to subfolders

I've been searching around a bit, but unfortunately I'm still at a loss when it comes to this problem, and being far from a veteran with .htaccess, I've been unable to work out a solution to my problem.
The platform is Wordpress, but since I'm convinced that this issue can be resolved with .htaccess I don't think that that should make much of a difference.
I need to rewrite searches when they are made to a more friendly URL structure, unfortunately, just changing ?s=Test to /search/Test isn't going to cut it. I need to pull 3 of the parameters out of the search and use them as subfolders, and then append the remaining parameters to the end of the search. Here's an example:
Old url:
http://www.XXXXX.com/?s=Ford&z=59105&ci=Billings&st=MT&r=450&m=15000&pmin=1000&pmax=30000&status=Used&submit=Refine
New url:
http://www.XXXXX.com/search/Used/MT/Billings/?s=Ford&z=59105&r=450&m=15000&pmin=1000&pmax=30000&submit=Refine
As you can see, the parameters "status", "st" and "ci" respectively have been inserted into the url with all of the remaining parameters following behind.
So essentially, I need to redirect the old url to the new url, but have the new url display the page that corresponds to the old url.
I've got the following written so far: (EDIT: *Changed {QUERY_STRING} to [L,QSA] as suggested by Explosion Pills*)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^/?search/([^/]+)/([^/]+)/([^/]+)/?s=([^/]+)?$ /index.php?s=$4&ci=$3&st=$2&status=$1 [L,QSA]
And it kind of works; you can type in the new url and it will display the page, though it seems that the "Used" directory isn't posting data correctly. It's also only half of the puzzle, as it doesn't redirect the old URL to the new one. It simple allows the new URL to exist.
Thank you very much for your help! This one has had me stumped for several days. now.
You may try this in one .htaccess file at root directory:
Options +FollowSymlinks -MultiViews
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} s=([^&]+)&z=([^&]+)&r=([^&]+)&m=([^&]+)&pmin=([^&]+)&pmax=([^&]+)&submit=([^&]+) [NC]
RewriteCond %{REQUEST_URI} !index\.php [NC]
RewriteRule ^search/([^/]+)/([^/]+)/([^/]+) /index.php?s=%1&z=%2&ci=$3&st=$2&r=%3&m=%4&pmin=%5&pmax=%6&status=$1&submit=%7 [L,NC]
Maps silently:
http://www.XXXXX.com/search/Used/MT/Billings/?s=Ford&z=59105&r=450&m=15000&pmin=1000&pmax=30000&submit=Refine
to
http://www.XXXXX.com/index.php?s=Ford&z=59105&ci=Billings&st=MT&r=450&m=15000&pmin=1000&pmax=30000&status=Used&submit=Refine
For permanent redirection, replace [L,NC] with [R=301,L,NC]

How can I have one mod rewrite for a cms and another for static pages?

I currently have a site that has Drupal installed and it has clean urls so the .htaccess file contains the following:
RewriteRule ^ index.php [L]
In addition to this I want to be able to publish static html pages and have them use clean urls as well. I was thinking of differentiating them from the drupal pages by adding a specific keyword e.g. content and maybe having something like below (not sure if this will work) - where I get a url like www.domainname.com/nice-holiday and translate it to
domainname.com/ftp/pages/nice-holiday.html
RewriteRule ^content/(.+)$ domainname.com/ftp/pages/$1.html [L]
The problem is the first rule will try to execute against all requests. I have tried putting the more specific rule before the more general rule but it still doesnt work.
How can you have two mod rewrite rules based on a condition? e.g. presence of a particular word? and more generally has anyone had experience handling a CMS and static pages on the one website - or is that asking for trouble?
This is where RewriteCond comes in handy.
# make sure no rewriting is done for requests without www
RewriteCond %{HTTP_HOST} !^domainname\.com
RewriteCond %{REQUEST_URI} !^/?content/
RewriteRule ^ index.php [L]
# later on...
# don't want this rule to apply for non-www requests either
RewriteCond %{HTTP_HOST} !^domainname\.com
RewriteRule ^/?content/(.+)$ http://domainname.com/ftp/pages/$1.html [L]
I think this is what you're going for? You can eliminate the %{HTTP_HOST} conditions completely if you don't actually care about the www thing. The two rules can still coexist as long as you keep the %{REQUEST_URI} condition on the drupal rewrite, so drupal rewrites explicitly do not apply for URIs beginning with the /content/ prefix.

RewriteCond in .htaccess with negated regex condition doesn't work?

I'm trying to prevent, in this case WordPress, from rewriting certain URLs. In this case I'm trying to prevent it from ever handling a request in the uploads directory, and instead leave those to the server's 404 page. So I'm assuming it's as simple as adding the rule:
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
This rule should evaluate to false and make the chain of rules fail for those requests, thus stopping the rewrite. But no... Perhaps I need to match the cover the full string in my expression?
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/.*$
Nope, that's not it either. So after scratching my head I do a check of sanity. Perhaps something is wrong with the actual pattern. So I make a simple test case.
RewriteCond %{REQUEST_URI} ^/xyz/$
In this case, the rewrite happens if and only if the requested URL is /xyz/ and shows the server's 404 page for any other page. This is exactly what I expected. So I'll just stick in a ! to negate that pattern.
RewriteCond %{REQUEST_URI} !^/xyz/$
Now I'm expecting to see the exact opposite of the above condition. The rewrite should not happen for /xyz/ but for every other possible URL. Instead, the rewrite happens for every URL, both /xyz/ and others.
So, either the use of negated regexes in RewriteConds is broken in Apache, or there's something fundamental I don't understand about it. Which one is it?
The server is Apache2.
The file in its entirety:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
RewriteRule . /index.php [L]
</IfModule>
WordPress's default file plus my rule.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/ [OR]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
So, after a lot of irritation, I figured out the problem, sort of. As it turned out, the rule in my original question actually did exactly what it was supposed to. So did a number of other ways of doing the same thing, such as
RewriteRule ^wp-content/uploads/.*$ - [L]
(Mark rule as last if pattern matches) or
RewriteRule ^wp-content/uploads/.*$ - [S=1]
(Skip the next rule if pattern matches) as well as the negated rule in the question, as mentioned. All of those rules worked just fine, and returned control to Apache without rewriting.
The problem happened after those rules were processed. Instead, the problem was that I deleted a the default 404.shtml, 403.shtml etc templates that my host provided. If you don't have any .htaccess rewrites, that works just fine; the server will dish up its own default 404 page and everything works. (At least that's what I thought, but in actual fact it was the double error "Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.")
When you do have a .htaccess, on the other hand, it is executed a second time for the 404 page. If the page is there, it will be used, but now, instead the request for 404.shtml was caught by the catch-all rule and rewritten to index.php. For this reason, all other suggestions I've gotten here, or elsewhere, have all failed because in the end the 404 page has been rewritten to index.php.
So, the solution was simply to restore the error templates. In retrospect it was pretty stupid to delete them, but I have this "start from scratch" mentality. Don't want anything seemingly unnecessary lying around. At least now I understand what was going on, which is what I wanted.
Finally a comment to Cecil: I never wanted to forbid access to anything, just stop the rewrite from taking place. Not that it matters much now, but I just wanted to clarify this.
If /wp-content/uploads/ is really the prefix of the requested URI path, your rule was supposed to work as expected.
But as it obviously doesn’t work, try not to match the path prefix of the full URI path but only the remaining path without the contextual per-directory path prefix, in case of the .htaccess file in the document root directory the URI path without the leading /:
RewriteCond $0 !^wp-content/uploads/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .+ /index.php [L]
If that doesn’t work neither, it would certainly help to get some insight into mod_rewrite’s rewriting process by using its logging feature. So set RewriteLogLevel to a level of at least 4, make your request and take a look at the entries in the log file specified with RewriteLog. There you can see how mod_rewrite handles your request and with RewriteLogLevel greater or equal to 4 you will also see the values of variables like %{REQUEST_URI}.
I have found many examples like this when taking a "WordPress First" approach. For example, adding:
ErrorDocument 404 /error-docs/404.html
to the .htaccess file takes care of the message ("Additionally, a 404 Not Found error...").
Came across this trying to do the same thing in a Drupal site, but might be the same for WP since it all goes through index.php. Negating index.php was the key. This sends everything to the new domain except old-domain.org/my_path_to_ignore:
RewriteCond %{REQUEST_URI} !^/my_path_to_ignore$
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{HTTP_HOST} ^old-domain\.org$ [NC]
RewriteRule ^(.*)$ http%{ENV:protossl}://new-domain.org/$1 [L,R=301]

.htaccess rewrite rule preventing infinite loop

I have a directory named dollars that contains a file index.php. I would like for the url http://localhost/dollars/foo to translate to dollars/index.php?dollars=foo. This is the .htaccess file that I currently have in the dollars directorty:
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !^index\.php
RewriteRule ^(.+)$ index.php?dollars=$1 [L]
The idea being that any request other than a request to index.php should use the RewriteRule.
However, this does not work.
I've been looking for a while trying to figure out how to create the redirect I want, but I don't even know if I'm on the right track. Regex were never my thing. Thanks for any help!
A often-used solution for rewrites is to just check that the path being requested doesn't point to an actual file/directory, and rewrite it if it doesn't - since the rewritten URL will then point to an actual file, no loop occurs.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
Amber's answer should get things working for you, but I wanted to address what was going wrong in your specific case. You had the right idea, but %{REQUEST_FILENAME} actually ends up being a fully qualified path here, so your regular expression should check for index.php at the end, not the beginning.
Consequently, you should find that this will work more like you expect:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !index\.php$
RewriteRule ^(.+)$ index.php?dollars=$1
Swapping out the RewriteConds for those that Amber mentioned would be less problematic if you added other things to that directory, though, so I'd recommend using that in place of this anyway.