Why does RewriteRule . work the same as ^(.*)$? - regex

Consider the following:
RewriteRule ^(.*)$ index.php/$1
Based on the fledgling noob knowledge I have of mod_rewrite, this should match the entire URL part between example.com/ and ?query=string, and then prepend it with index.php/, and generally, that's exactly what happens.
http://example.com/some/stuff -> http://example.com/index.php/some/stuff
Now, consider:
RewriteRule . index.php
According to the developers of the current update to the Concrete5 CMS, this does the exact same thing. And indeed, it appears to do just that on my server as well.
My question is, why does the second RewriteRule yeild the same result instead of something like
http://example.com/some/stuff -> http://example.com/index.phpome/stuff
Shouldn't the . match one character and then be replaced with the index.php string? This is on Apache 2.2.

RewriteRule ^(.*)$ index.php/$1 will match and use the captured text to create a new path with whatever was originally requested added to the end where the $1 is.
RewriteRule . index.php matches because it is an unanchored regex. The previous regex uses ^ and $ to anchor the match which means that the pattern must match entirely while this one does not which means that it will match anywhere in the string, so any string with any character will match. Because mod_rewrite treats each test as a running predicate, this rule will be applied as long as it matches.
When the rule is matched the substitution takes place. The substitution is a complete substitution, so if you don't use backreferences like $1 then whatever was in the original pattern is lost. In this case the new path just becomes index.php.
There is therefore a slight difference between the 2, in that the second just goes directly to index.php without adding the originally requested path on to the end. Most likely Concrete5 CMS is using a front controller which dispatches according to information it pulls from the request directly. Since this isn't a redirect rewrite, the original request will be perserved so that is just used instead: shifting some responsibility from Apache and into the hands of the application code, pursuing less dependence on the hosting environment.

The match on the left is not replaced, it is merely searched for. You must use backreferences to replace/retain specific parts.

Related

htaccess regex variable parameter

I'm not used to regex and figure I've lost too many hours trying to resolve this, so thought I'd ask for help. I am trying to prettify the html extension.
My site will use URLs that have variable parameters. For example:
mysite.com/article/this-is-an-entry
mysite.com/article/this-is-an-entirely-different-entry
All will use .html as the extension.
In the htaccess file, I have tried
RewriteRule ^(article\/[a-z].*)$ $1.html [NC,L]
as well as slight variations of this, but cannot get this right. Thanks in advance for any assistance.
Firstly, let's look at the regex you have:
^(article/[a-z].*)$
This matches exactly the string "article/", followed by at least one letter (case insensitive due to the NC flag), followed by zero or more of anything. It's quite broad, but should match the examples you gave.
One way to test that it's matching is to add the R=temp flag to the rule, which tells Apache to redirect the browser to the new URL (I recommend using "=temp" to stop the browser caching the redirect and making later testing harder). You can then observe (e.g. in your browser's F12 debug console) the original request and the redirected one.
RewriteRule ^(article/[a-z].*)$ $1.html [NC,L,R=temp]
However, as CBroe points out, your rule will match again on the target URL, so you need to prevent that. A simple way would be to use the END flag instead of L:
Using the [END] flag terminates not only the current round of rewrite processing (like [L]) but also prevents any subsequent rewrite processing from occurring in per-directory (htaccess) context.
So:
RewriteRule ^(article/[a-z].*)$ $1.html [NC,END]
Alternatively, you can make your pattern stricter, such as changing the . ("anything") to [^.] ("anything other than a dot"):
^(article/[a-z][^.]*)$
To be even more specific, you can add a RewriteCond with an extra pattern to not apply the rule to, such as "anything ending .html".

URL Rewrite rules doesnt handle homepage / default page

I have the following URL Rewrite rules setup on my site
RewriteCond %{REQUEST_URI} !^/(img|js|css|fonts)/
RewriteCond %{REQUEST_URI} !.html$
RewriteRule ^([^/]*)\/?$ $1.html [NC,L]
It works amazingly well, however I cant seem to get it to load the main page of the site unless its explicitly listed
www.domainname.com/ & www.domainname.com doesn't work unless I explicitly write www.domainname.com/index.
Is there something I am missing in my pattern to allow for the default document to be served if no specific page is listed?
This is being handled by Helicon Ape, if that is of any interest to anyone or has any difference in the way it handles its rules?
I think the mistake you are making here is using * quantifier. What it do is, captures zero or more occurrences of mentioned character.
Your regex: ^([^/]*)\/?$ Since you used * it will go lazy and match nothing. Rendering .html after replacement.
You can change regex to be greedy by using +. Your regex will be ^([^/]+)\/?$.
In this demo notice that even blanks are matched because of lazy quantifier *.

mod_rewrite: can a back reference be referenced twice?

I have a rewrite rule (in an Apache htaccess file) which is attempting to use a back reference twice from just one capture ($1):
RewriteRule ^([A-Za-z0-9_-]+)/?$ $1.php?nav=$1
It appears that the query string is being left emtpy, like
example.com/new
is being re-written as
example.com/new.php?nav=
what I want is
example.com/new.php?nav=new
My question: can I reference $1 twice in the expression?
UPDATE:
The Apache documentation on mod_rewrite indicates that you can reference a capture as many times as you like in the substitution part of a rewrite rule. However, after trying for a couple of days I was not able to make it work. I did get my rule to pass in the online regex testers that are out there, but not on my site. In the end I re-designed my menu system so that I could use simpler rewrite rules.
This regex that your're using is wrong:
^(A-Za-z0-9-_)$
Range is allowed in square brackets only and your need to use + accessor to match more than 1 character.
Replace your RewriteRule with this:
RewriteRule ^([a-z0-9_-]+)/?$ $1.php?nav=$1 [L,NC,QSA]

What's wrong with this regular expression in a .htaccess file?

I'm trying to understand why this regular expression isn't working in my .htaccess file. I want it so whenever a user goes to the job_wanted.php?jid=ID, they will be taken to job/ID.
What's wrong with this?
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php?$ job/%1? [R]
I want it so when a user clicks on http://localhost/jobwehave.co.za/jobs/ID they are shown the same results as what below would show http://localhost/jobwehave.co.za/jobs?id=ID.
Sorry for the mix up. I still very confused to how this works.
The primary problem is that you can't match the query string as part of RewriteRule. You need to move that part into a RewriteCond statement.
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ /job/%1?
Editing to reflect your updated question, which is the opposite of what I've shown here. For the reverse, to convert /job/123 into something your PHP script can consume, you'll want:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1
But you're probably going to have trouble putting this in an .htaccess file anywhere except the root, and maybe even there. If it works at the root, you'll likely need to strip the leading / from the RewriteRule I show here.
Second edit to reflect your comment: I think what you want is complicated, but this might work:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1 [L]
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ http://host.name/job/%1? [R]
Your fundamental problem is that you want to "fix" existing links, presumably out of your control. In order to change the URL in the browser address bar, you must redirect the browser. There is no other way to do it.
That's what the second cond+rule does: it matches incoming old URLs and redirects to your pretty URL format. This either needs to go in a VirtualHost configuration block or in the .htaccess file in the same directory as your PHP script.
The first rule does the opposite: it converts the pretty URL back into something that Apache can use, but it does so using an internal sub-request that hopefully will not trigger another round of rewriting. If it does, you have an infinite loop. If it works, this will invoke your PHP script with a query string parameter for the job ID and your page will work as it has all along. Note that because this rule assumes a different, probably non-existent file system path, it must go in a VirtualHost block or in the .htaccess file at your site root, i.e. a different location.
Spreading the configuration around different places sounds like a recipe for future problems to me and I don't recommend it. I think you'll be better off to change the links under your control to the pretty versions and not worry about other links.
The ^ anchors the regex at the beginning of the string.
RewriteRule matches the URI beginning with a / (unless it's in some per-directory configuration area).
Either prefix the / or remove the anchor ^ (depending on what you want to achieve)
You haven't captured the job ID in the regex, so you can't reference it in the rewritten URL. Something like this (not tested, caveat emptor, may cause gastric distress, etc.):
RewriteRule ^job/([0-9]+) job_wanted.php?jid=$1
See Start Rewriting for a tutorial on this.
You need to escape the ? and . marks if you want those to be literals.
^job_wanted\.php\?jid=9\?$
But although that explains why your pattern isn't matching, it doesn't address the issue of your URL rewriting. I'm also not sure why you want the ^ and $ are there, since that will prevent it from matching most URLs (e.g. http://www.yoursite.com/job_wanted.php?jid=9 won't work because it doesn't start with job_wanted.php).
I don't know htaccess well, so I can only address the regex portion of your question. In traditional regex syntax, you'd be looking for something like this:
s/job_wanted\.php\?jid=(\d*)/job\/$1/i
Hope that helps.
Did you try to escape special characters (like ?)?
The ? and . characters have a special meaning in regular expressions. You probably just need to escape them.
Also, you need to capture the jid value and use it in the rule.
Try to change your rules to this:
RewriteEngine On
RewriteRule ^job_wanted\.php\?jid=([0-9]+)$ /job/$1
Something like
ReWriteRule ^job\_wanted\.php\?jid\=([0-9-]+)$ /job/$1
should do the trick.

How do I preserve the existing query string in a mod_rewrite rule

I'm trying to rewrite an url from:
http://domain.com/aa/whatever/whatever.php
to
http://domain.com/whatever/whatever.php?language=aa
However, depending on existing $_GET variables, it either has to be ?language or &language.
To do this, I use 2 regexes with the [L] flag:
RewriteRule ^([a-z]{2})/(.*\.php\?.*) /$2&language=$1 [L]
RewriteRule ^([a-z]{2})/(.*) /$2?language=$1 [L]
The second one works as expected... The first one however is never hit (it falls through to the second regex, which does hit), even though Regex Coach does show me that it should.
edit:
If just read that I need to use two backslashes to escape the question mark. If I do this, it does hit on the first regex but never find the other GET variables.
From the documentation for mod_rewrite the pattern in RewriteRule matches against the part of the URL after the hostname and port, and before the query string so the query string is not included. That is why you don't get the other variables.
To add a new query string parameter language=xx whilst preserving any existing query string you need to use the QSA flag (query string append). With this flag, just one rule based on your second case should be sufficient:
RewriteRule ^([a-z]{2})/(.*) /$2?language=$1 [QSA]
You could setup the URL rewrite to pass the language to the php script via the PATH_INFO element of the $_SERVER superglobal. Just pass the language to the script like so:
foobar.php/en?args
In this case, $_SERVER[PATH_INFO] would equal /en