Correct .htaccess file for user-friendly URL - regex

I supposed to have the following URLs to be converted to user-friendly format:
example.com/product/$numbers/$anychars => example.com/product.php?product_id=$numbers&name=$anychars
example.com/image/$numbers/$anychars/$number => example.com/image.php?image_id=$numbers&name=$anychars&no=$number
example.com/item/$numbers/$anychars => example.com/item.php?item_id=$numbers&name=$anychars
example.com/category/$anychars => example.com/category.php?name=$anychars
example.com/category/$anychars/$numbers => example.com/category.php?name=$anychars&page=$numbers
Trailing forward slash should be allowed and ignored by the web-server.
Using some guides from the Internet I did the following:
RewriteEngine On
RewriteRule ^product/([0-9]+)/([^/]+)/?$ product.php?id=$1&name=$2 [QSA,NC,L]
RewriteRule ^image/([0-9]+)/([^/]+)/([0-9]+)/?$ image.php?item_id=$1&item_name=$2&no=$3 [QSA,NC,L]
RewriteRule ^item/([0-9]+)/([^/]+)/?$ item.php?id=$1&name=$2 [QSA,NC,L]
RewriteRule ^category/([^/]+)/?$ category.php?cat=$1&page=0 [QSA,NC,L]
RewriteRule ^category/([^/]+)/([0-9]+)/?$ category.php?cat=$1&page=$2 [QSA,NC,L]
NC flag indicated to make it case insensitive.
L flag indicated to stop searching for match after one match was found (less bugs and faster URL handling)
Preliminary testing showed no errors.
But as regexps and mod_rewrite is not my best hobbies I'd like to ask you to check if I didn't make any errors. And if there's no - it could be a good pattern for a guys like me looking for a easy mod_rewrite solution.

There is one recommendation I would make. Put the rules in order from most specific match to most general match (in this case, switch the two category rules). By following this convention you ensure that a URL that may satisfy more than one rule is caught by the more specific rule.
In your specific case, you won't hit this problem yet, but as you grow your rules it will eventually bite you.
I haven't added QSA flags. I recommend that you create rules that would allow the user to see an entirely friendly URL, rather than a partly-friendly URL - to do this, ensure that you map additional parameters just like you have the ids and categories in your existing rules.
RewriteEngine On
RewriteRule ^product/([0-9]+)/([^/]+)/?$ product.php?id=$1&name=$2 [NC,L]
RewriteRule ^image/([0-9]+)/([^/]+)/([0-9]+)/?$ image.php?item_id=$1&item_name=$2&no=$3 [NC,L]
RewriteRule ^item/([0-9]+)/([^/]+)/?$ item.php?id=$1&name=$2 [NC,L]
RewriteRule ^category/([^/]+)/([0-9]+)/?$ category.php?cat=$1&page=$2 [NC,L]
RewriteRule ^category/([^/]+)/?$ category.php?cat=$1&page=0 [NC,L]

Related

Rewrite rule with pagination .htaccess

I have a URL like this:
http://example.com/category/title which comes from the link http://example.com/cview.php?url=title
I want to create pagination and to be like http://example.com/category/title/page/1 or
http://example.com/category/title/1
this comes from http://example.com/cview.php?url=title&pageno=1.
I have tried this in .htaccess without success
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^category/([^/]*)$/([^/]+)/?$ /cview.php?url=$2&pageno=$1 [L]
Can anyone help please?
RewriteRule ^category/([^/]*)$/([^/]+)/?$ /cview.php?url=$2&pageno=$1 [L]
You have an erroneous $ (end-of-string anchor) in the middle of the RewriteRule pattern. You also appear to have the backreferences $1 and $2 the wrong way round. You are also allowing an optional trailing slash, yet your example URLs do not use this. (An optional trailing slash potentially creates a duplicate content issue.)
If you allow both /category/title/page/1 and /category/title/1 then you are potentially creating a duplicate content issue. Presumably you are only linking to one of these URL formats?
Since the page number is a "number" then it makes sense to just match numbers, rather than anything - this also helps to avoid conflicts with other directives.
It doesn't look like you need the conditions (RewriteCond directives) that check the request does not map to a file or directory, since I wouldn't expect a request of the form /category/title/page/1 to map to a file or directory anyway?
Try the following instead (without the RewriteCond directives):
RewriteRule ^category/([^/]+)(?:/page)?/(\d+)$ /cview.php?url=$1&pageno=$2 [L]
This matches both /category/title/page/<num> and /category/title/<num>. The optional subpattern (?:/page) is non-capturing, so that it doesn't mess up the numbering of the backreferences.
Bear in mind also that the order of the rules in .htaccess is important in order to avoid conflicts.

Apache Rewrite to remove index.php?

I am trying to rewrite my URL's to remove index.php? but I'm struggling a little to get it to work. The closest I can get is the answer here: remove question mark from 301 redirect using htaccess when the user enters the old URL
I need to convert the URLs to pretty URLs on the way out, and rewrite them back to the proper URL on the way in. The structure of the URLs is as follows:
https://sub.domain.com/index.php?/folder1/folder2-etc
Using the code from the referenced answer results in a double forward slash:
https://sub.domain.com//folder1/folder2-etc
The rewrite rules I'm using from the referenced answer are:
RewriteEngine On
RewriteCond %{THE_REQUEST} /index\.php [NC]
RewriteRule ^(.*?)index\.php$ /$1 [L,R=301,NC,NE]
RewriteCond %{THE_REQUEST} \s/+\?([^\s&]+) [NC]
RewriteRule ^ /%1? [R=301,L]
# internal forward from pretty URL to actual one
RewriteRule ^((?!web/)[^/.]+)/?$ /index.php?$1 [L,QSA,NC]
I suspect I know how to solve the first bit, but I'm struggling to understand the second rule for the internal forward.
Additionally, I'm wondering if this is the best way to do this. I'm currently running an Apache backend behind an Nginx reverse proxy. Would I be better doing the rewrite on the Nginx side and the internal forward on Apache?
EDIT:
Complication: I've noticed an additional structure to complicate things. Some URLs appear to have https://sub.domain.com/picture.php?/folder1/folder2-etc
For these, I'd be quite happy to keep 'picture' and just remove the .php? bit.
I'm guessing that for the first bit, Id need to do something like the following:
RewriteCond %{THE_REQUEST} \s/+index\.php\?/([^\s&]+) [NC]
RewriteRule ^ /%1? [R=301,L]
RewriteCond %{THE_REQUEST} \s/+picture\.php\?/([^\s&]+) [NC]
RewriteRule ^(.*)$ /picture/%1 [R=301,L]
But have no idea where to start with the opposite.... ie converting pretty urls back to standard. It would help if the following section could be explained to me?
^((?!web/)[^/.]+)/?$ /index.php?$1 [L,QSA,NC]
RewriteRule ^/*picture/(.*)$ /picture.php?/$1 [L]
RewriteRule ^/*(?!/*index\.php$)(.*)$ /index.php?/$1 [L]
should do the trick. I wasn't able to test it yet though.
I only used the [L] last flag to stop applying rules on match. The QSA query string append flag doesn't seem to make sense as you don't seem to use ?key=value&... syntax anyway. Also dunno if you actually need the NC case-insensitive flag...
Side note:
I hope your php files don't serve paths with .. in them, as that would allow people to read arbitrary files from disk, e.g. /picture/../../../etc/passwd
Apologies, but as it turns out, the main reason I can't get anything to work is due to the use of relative URLs and dynamically generated links within the PHP. Not something I can change unfortunately. The not perfect URLs are something I'm going to have to live with. For reference, the app I'm using is Piwigo

RewriteRule: ^ vs ^(.*)$ vs ^.*$ Is there a Difference?

What is the difference in using ^ vs ^(.*)$ vs ^.*$ as wildcards in a RewriteRule?
My goal is to redirect http://carnarianism.com/ (anything) to the landing (default) page of http://carnarian.com/. I have found the following solutions, which all seem to work, so I wonder which is better for performance?
RewriteRule ^ http://carnarian.com/ [R=301,L]
RewriteRule ^.*$ http://carnarian.com/ [R=301,L]
RewriteRule ^(.*)$ http://carnarian.com/ [R=301,L]
All of these seem to work okay. This is my very first post on StackOverflow, most of the time I can find an answer just searching for it.
To be clear: ABOVE the questioned RewriteRule in my .htaccess is a RewriteCond and WWW Handler as follows:
RewriteEngine On
RewriteBase /
# FROM www. --TO-- NO www. See no-www.org
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
RewriteCond %{HTTP_HOST} carnarianism\.com$ [NC]
########## The Above Questioned RewriteRule ??? ##########
RewriteRule ^ http://carnarian.com/ [R=301,L]
Note: I started this search with the following, but I did not want the following because the path was also passed, and I want it to go to the landing page only. Therefore, I know you need the parentheses to be able to use the $1 variable. I do not want the $1 variable.
RewriteRule ^(.*)$ http://carnarian.com/$1 [R=301,L]
^ makes none of the original URL accessible as backreferences. $0 is an empty string.
^.*$ makes the entire original URL accessible as the $0 backreference (so you can do e.g. http://example.com/oldurl.php?url=$0)
^(.*) makes the entire original URL accessible as both the $0 and $1 backreferences; it's usually used when you want to actually use the old URL in the replacement since it's more explicit about the use.
All of them match the same thing, but produce different backreference groups.
The one that is better performance wise is the one you have benchmarked yourself.
But since you are using a .htaccess file rather than having this configuration in the server directly (maybe via a VirtualHost?) which is parsed only once, it really doesn't matter. Parsing .htaccess files at every single request is much more time consuming than performing the regular expression by a factor of thousands.
If you care about performance you should never ever use .htaccess files and even disable their parsing with: AllowOverride None. Not disabling them, and having a request like: http://example.com/sites/css/theme/main.css Apache will still try to load all the following files:
.htaccess
sites/.htaccess
sites/css/.htaccess
sites/css/theme/.htaccess
It will generate system calls even if those file does not exist.
Trying therefore to improve your RewriteRule in an .htaccess file is like sneezing in the ocean in the hope of making it less salty. :)
Now, if you improved your setup to use server configuration and to answer your original question: ^.*$ might be more efficient than ^(.*)$ as less references needs to be created. Chance is high, however, that you can't measure it.

How can I have one mod rewrite for a cms and another for static pages?

I currently have a site that has Drupal installed and it has clean urls so the .htaccess file contains the following:
RewriteRule ^ index.php [L]
In addition to this I want to be able to publish static html pages and have them use clean urls as well. I was thinking of differentiating them from the drupal pages by adding a specific keyword e.g. content and maybe having something like below (not sure if this will work) - where I get a url like www.domainname.com/nice-holiday and translate it to
domainname.com/ftp/pages/nice-holiday.html
RewriteRule ^content/(.+)$ domainname.com/ftp/pages/$1.html [L]
The problem is the first rule will try to execute against all requests. I have tried putting the more specific rule before the more general rule but it still doesnt work.
How can you have two mod rewrite rules based on a condition? e.g. presence of a particular word? and more generally has anyone had experience handling a CMS and static pages on the one website - or is that asking for trouble?
This is where RewriteCond comes in handy.
# make sure no rewriting is done for requests without www
RewriteCond %{HTTP_HOST} !^domainname\.com
RewriteCond %{REQUEST_URI} !^/?content/
RewriteRule ^ index.php [L]
# later on...
# don't want this rule to apply for non-www requests either
RewriteCond %{HTTP_HOST} !^domainname\.com
RewriteRule ^/?content/(.+)$ http://domainname.com/ftp/pages/$1.html [L]
I think this is what you're going for? You can eliminate the %{HTTP_HOST} conditions completely if you don't actually care about the www thing. The two rules can still coexist as long as you keep the %{REQUEST_URI} condition on the drupal rewrite, so drupal rewrites explicitly do not apply for URIs beginning with the /content/ prefix.

RewriteCond in .htaccess with negated regex condition doesn't work?

I'm trying to prevent, in this case WordPress, from rewriting certain URLs. In this case I'm trying to prevent it from ever handling a request in the uploads directory, and instead leave those to the server's 404 page. So I'm assuming it's as simple as adding the rule:
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
This rule should evaluate to false and make the chain of rules fail for those requests, thus stopping the rewrite. But no... Perhaps I need to match the cover the full string in my expression?
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/.*$
Nope, that's not it either. So after scratching my head I do a check of sanity. Perhaps something is wrong with the actual pattern. So I make a simple test case.
RewriteCond %{REQUEST_URI} ^/xyz/$
In this case, the rewrite happens if and only if the requested URL is /xyz/ and shows the server's 404 page for any other page. This is exactly what I expected. So I'll just stick in a ! to negate that pattern.
RewriteCond %{REQUEST_URI} !^/xyz/$
Now I'm expecting to see the exact opposite of the above condition. The rewrite should not happen for /xyz/ but for every other possible URL. Instead, the rewrite happens for every URL, both /xyz/ and others.
So, either the use of negated regexes in RewriteConds is broken in Apache, or there's something fundamental I don't understand about it. Which one is it?
The server is Apache2.
The file in its entirety:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
RewriteRule . /index.php [L]
</IfModule>
WordPress's default file plus my rule.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/ [OR]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
So, after a lot of irritation, I figured out the problem, sort of. As it turned out, the rule in my original question actually did exactly what it was supposed to. So did a number of other ways of doing the same thing, such as
RewriteRule ^wp-content/uploads/.*$ - [L]
(Mark rule as last if pattern matches) or
RewriteRule ^wp-content/uploads/.*$ - [S=1]
(Skip the next rule if pattern matches) as well as the negated rule in the question, as mentioned. All of those rules worked just fine, and returned control to Apache without rewriting.
The problem happened after those rules were processed. Instead, the problem was that I deleted a the default 404.shtml, 403.shtml etc templates that my host provided. If you don't have any .htaccess rewrites, that works just fine; the server will dish up its own default 404 page and everything works. (At least that's what I thought, but in actual fact it was the double error "Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.")
When you do have a .htaccess, on the other hand, it is executed a second time for the 404 page. If the page is there, it will be used, but now, instead the request for 404.shtml was caught by the catch-all rule and rewritten to index.php. For this reason, all other suggestions I've gotten here, or elsewhere, have all failed because in the end the 404 page has been rewritten to index.php.
So, the solution was simply to restore the error templates. In retrospect it was pretty stupid to delete them, but I have this "start from scratch" mentality. Don't want anything seemingly unnecessary lying around. At least now I understand what was going on, which is what I wanted.
Finally a comment to Cecil: I never wanted to forbid access to anything, just stop the rewrite from taking place. Not that it matters much now, but I just wanted to clarify this.
If /wp-content/uploads/ is really the prefix of the requested URI path, your rule was supposed to work as expected.
But as it obviously doesn’t work, try not to match the path prefix of the full URI path but only the remaining path without the contextual per-directory path prefix, in case of the .htaccess file in the document root directory the URI path without the leading /:
RewriteCond $0 !^wp-content/uploads/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .+ /index.php [L]
If that doesn’t work neither, it would certainly help to get some insight into mod_rewrite’s rewriting process by using its logging feature. So set RewriteLogLevel to a level of at least 4, make your request and take a look at the entries in the log file specified with RewriteLog. There you can see how mod_rewrite handles your request and with RewriteLogLevel greater or equal to 4 you will also see the values of variables like %{REQUEST_URI}.
I have found many examples like this when taking a "WordPress First" approach. For example, adding:
ErrorDocument 404 /error-docs/404.html
to the .htaccess file takes care of the message ("Additionally, a 404 Not Found error...").
Came across this trying to do the same thing in a Drupal site, but might be the same for WP since it all goes through index.php. Negating index.php was the key. This sends everything to the new domain except old-domain.org/my_path_to_ignore:
RewriteCond %{REQUEST_URI} !^/my_path_to_ignore$
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{HTTP_HOST} ^old-domain\.org$ [NC]
RewriteRule ^(.*)$ http%{ENV:protossl}://new-domain.org/$1 [L,R=301]