Mod-rewrites on apache: change all URLs - regex

Right now I'm doing something like this:
RewriteRule ^/?logout(/)?$ logout.php
RewriteRule ^/?config(/)?$ config.php
I would much rather have one rules that would do the same thing for each url, so I don't have to keep adding them every time I add a new file.
Also, I like to match things like '/config/new' to 'config_new.php' if that is possible. I am guessing some regexp would let me accomplish this?

Try:
RewriteRule ^/?(\w+)/?$ $1.php
the $1 is the content of the first captured string in brackets. The brackets around the 2nd slash are not needed.
edit: For the other match, try this:
RewriteRule ^/?(\w+)/(\w+)/?$ $1_$2.php

I would do something like this:
RewriteRule ^/?(logout|config|foo)/?$ $1.php
RewriteRule ^/?(logout|config|foo)/(new|edit|delete)$ $1_$2.php
I prefer to explicitly list the url's I want to match, so that I don't have to worry about static content or adding new things later that don't need to be rewritten to php files.
The above is ok if all sub url's are valid for all root url's (book/new, movie/new, user/new), but not so good if you want to have different sub url's depending on root action (logout/new doesn't make much sense). You can handle that either with a more complex regex, or by routing everything to a single php file which will determine what files to include and display based on the url.

Mod rewrite can't do (potentially) boundless replaces like you want to do in the second part of your question. But check out the External Rewriting Engine at the bottom of the Apache URL Rewriting Guide:
External Rewriting Engine
Description:
A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution by the use of mod_rewrite...
Solution:
Use an external RewriteMap, i.e. a program which acts like a RewriteMap. It is run once on startup of Apache receives the requested URLs on STDIN and has to put the resulting (usually rewritten) URL on STDOUT (same order!).
RewriteEngine on
RewriteMap quux-map prg:/path/to/map.quux.pl
RewriteRule ^/~quux/(.*)$ /~quux/${quux-map:$1}
#!/path/to/perl
# disable buffered I/O which would lead
# to deadloops for the Apache server
$| = 1;
# read URLs one per line from stdin and
# generate substitution URL on stdout
while (<>) {
s|^foo/|bar/|;
print $_;
}
This is a demonstration-only example and just rewrites all URLs /~quux/foo/... to /~quux/bar/.... Actually you can program whatever you like. But notice that while such maps can be used also by an average user, only the system administrator can define it.

Related

Redirect 301 - remove .html extension from URLs

I would like to remove the .html extension from my urls, located into specific directory and redirect 301 them.
Here is how the structure looks like:
mysite.com/category/nameofcategory/pagenumber.html
The thing is that nameofcategory and pagenumber could be any letter or number.
Could you please help me with this?
I wouldn't recommend having your content scattered in many html-files in different folders. This becomes very impractical if you for example want to change the layout of your pages.
Storing the content in a database is a much better solution. If that's not possible perhaps the html files could contain only the formatted text content and a back end script could embed that content to a layout when the page is requested.
This requires that the mod_rewrite module is enabled in the Apache configuration.
In both cases all of the requests would be routed through the back end script and the .htaccess might look something like this:
RewriteEngine on
RewriteRule ^category/([^/.]+)/([^/.]+)/?$ index.php?category=$1&page=$2 [L]
This part of the regex: ([^/.]+) matches and captures a string that doesn't contain the characters / or . and is 1 characters long or longer. The captured strings can be referenced with $1, $2 and so on.
Now the pretty urls like mysite.com/category/foo/bar work. In addition we need to define a rule that redirects the old urls ending in ".html". The rule required might look something like this:
RewriteRule ^category/([^/.]+)/([^/.]+).html$ category/$1/$2 [R=301,L]
One thing to remember while testing and adjusting the redirects is that the redirect may get cached in the browser which may lead to confusing results when testing.
To remove the .html extension on the URL and 301 redirect to the extensionless URL you can try the following in the .htaccess in your "specific directory":
RewriteEngine On
RewriteBase /specific-directory
RewriteRule ^(.*)\.html$ $1 [R=301,L]

How to redirect from specific subdirectory to a subdomain via .htaccess?

I've been trying to redirect this URL (and all its substructures):
http://example.com/archive/
to (and its corresponding substructures):
http://archive.example.com/
For example: http://example.com/archive/signature/logo.png ==> http://archive.example.com/signature/logo.png
I tried to generate an .htaccess rule using a generator and evaluating it by looking at the regex, which I can understand (I think).
The result was the following rule:
RewriteEngine On
RewriteRule http://example.com/archive/(.*) http://archive.example.com/$1 [R=301,L]
The way I see it, the server will proccess any URL that starts with http://example.com/archive/ , will capture the string that comes next and will change the whole initial portion with the subdomain structure and append the captured string.
Unfortunately, this doesn't seem to work neither on my server, nor on online testing tools such as: http://htaccess.madewithlove.be/
Is there anything I'm missing there?
Thank you!
You should be able to try it this way.
RewriteEngine On
RewriteRule ^archive/(.*)$ http://archive.example.com/$1 [R=301,L]
Note that I did not make it dynamic as you didn't specific if you will have more URL's that need to work this way as well or not.

How to define url placeholder $1 for htaccess regex

I wrote the following in htaccess in the process of learning:
RewriteRule ^/test(a-zA-z)\.htm$ /test$1.htm
And test2.htm still gets mapped to test1.htm
I'm assuming the $1 is not being treated as the variable placeholder properly because $ is not escaped. What is the right way of writing this (so that for test purpose, test2.htm gets mapped to itself, test2.thm)
Ultimately, I'm trying to map something like:
domain.com/$1/$2
to
domain.com/?a=$1&b=$2
or
domain.com/$1
to
domain.com/?a=$1
I do not want the URL of the browser to change when the first url is mapped to the second. I know this is possible in C# Global.asax file (using routes.MapRoute), but not sure how to get this happening in php.
Proceed by elimination, from the most complex to the less complex.
handle first 2 params, then QSA directive (important) to keep all GET variables, then L directive to stop all,
then handle first 1 param, then QSA directive (important) to keep all GET variables, then L directive to stop all,
That should work:
RewriteRule ^/([a-zA-z0-9]+)/([a-zA-z0-9]+)$ /?a=$1&b=$2 [QSA,L]
RewriteRule ^/([a-zA-z0-9]+)$ /?a=$1 [QSA,L]
Oh by the way:
Here's the wiki of serverfault.com
The howto's htaccess official guide
The official mod_rewrite guide
And if that's not enough:
Two hints:
If you're not in a hosted environment (= if it's your own server and you can modify the virtual hosts, not only the .htaccess files), try to use the RewriteLog directive: it helps you to track down such problems:
# Trace:
# (!) file gets big quickly, remove in prod environments:
RewriteLog "/web/logs/mywebsite.rewrite.log"
RewriteLogLevel 9
RewriteEngine On
My favorite tool to check for regexp:
http://www.quanetic.com/Regex (don't forget to choose ereg(POSIX) instead of preg(PCRE)!)
When want to write something like a range you should use []. e.g:
RewriteRule ^/test([a-zA-z0-9]+)\.htm$ /index.php?data=$1 [L]
For me this was the simplest article I found which really helped me to figure out what I needed and worked in the end, so I'll share it here and try to use the terms which makes sense to me.
http://www.workingwith.me.uk/articles/scripting/mod_rewrite
RewriteRule ^page/([^/\.]+)/?$ index.php?page=$1 [L]
The right hand side (index.php?page=$1) is the destination which is hidden from the browser.
The left hand side is the mapping rule.
The variable parsed from the left - $1 need not be right at the end of the string and can be anywhere in the middle, for example, /CHECK/?VAR=$1MORETEXT or if there are more variables to parse from the left, it could be "/CHECK/?VAR=$1MORETEXT$2".
The "/?" is optional, if it is desired for the destination URL to not have a "/" at the end, don't include it and just end with the $ like ^page/([^/\.]+)$
The [L] is useful because it stops the htaccess from wasting time reading onwards once a matching Rule is found.

How to remove the query part of the rewritten URL after it has been remotely redirected?

Either I am too tired to see what I am doing wrong or there is something important I am missing here.
Basically I have a simple set of rewrite rules which are used in conjunction with a central dispatcher file (index.php) to handle requests coming for HTML, CSS and JavaScript files separately and they look like this.
RewriteEngine on
RewriteRule (.+)\.html$ index.php?action=view&url=$1.html [L]
RewriteRule (.+)\.css$ index.php?action=resource&type=css&url=$1.css [L]
RewriteRule (.+)\.js$ index.php?action=resource&type=js&url=$1.js [L]
Long story cut short these rules work fine however I've been notified by the SEO agency responsible for the site that there is an error in one of the URLs which needs to be permanently redirected (301) to the correct link. Since its just one URL that requires redirecting I have chosen to use Redirect instead of URL rewriting and added the following rule.
Redirect 301 /page1.html /page2.html
This works well too except for the fact that after the remote redirection is done for page1.html I get the query part (?action=view&url=page2.html) displayed in browsers address bar. I perfectly understand that the HTMl rewriting rule simply added the query string part after it was done with the URL but what would I need to do to get rid of the query part after a remote 301 redirection is performed.
Just to add I tried the URL rewrite method too but it seems that whatever I do the L flag is simply ignored and the HTML rewrite rule is still executed.
RewriteRule ^page1\.html$ page2.html [L,R=301]
That's a rewrite redirect and should cut off the query string. Put it before your other 3 rules, otherwise it will be ignored.
I don't know how much the solution may change with the web-server and the web-server version, but what worked for me was "When you want to erase an existing query string, end the substitution string with just a question mark".
See "Modifying the Query String" at http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule (Apache v2.4)
So,
RewriteRule ^page1\.html$ page2.html? [L,R=3xx]
The R flag is needed for the new URI to be showed and not the original with the query string. But even without the R flag, the query string will not be passed.

What's wrong with this regular expression in a .htaccess file?

I'm trying to understand why this regular expression isn't working in my .htaccess file. I want it so whenever a user goes to the job_wanted.php?jid=ID, they will be taken to job/ID.
What's wrong with this?
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php?$ job/%1? [R]
I want it so when a user clicks on http://localhost/jobwehave.co.za/jobs/ID they are shown the same results as what below would show http://localhost/jobwehave.co.za/jobs?id=ID.
Sorry for the mix up. I still very confused to how this works.
The primary problem is that you can't match the query string as part of RewriteRule. You need to move that part into a RewriteCond statement.
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ /job/%1?
Editing to reflect your updated question, which is the opposite of what I've shown here. For the reverse, to convert /job/123 into something your PHP script can consume, you'll want:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1
But you're probably going to have trouble putting this in an .htaccess file anywhere except the root, and maybe even there. If it works at the root, you'll likely need to strip the leading / from the RewriteRule I show here.
Second edit to reflect your comment: I think what you want is complicated, but this might work:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1 [L]
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ http://host.name/job/%1? [R]
Your fundamental problem is that you want to "fix" existing links, presumably out of your control. In order to change the URL in the browser address bar, you must redirect the browser. There is no other way to do it.
That's what the second cond+rule does: it matches incoming old URLs and redirects to your pretty URL format. This either needs to go in a VirtualHost configuration block or in the .htaccess file in the same directory as your PHP script.
The first rule does the opposite: it converts the pretty URL back into something that Apache can use, but it does so using an internal sub-request that hopefully will not trigger another round of rewriting. If it does, you have an infinite loop. If it works, this will invoke your PHP script with a query string parameter for the job ID and your page will work as it has all along. Note that because this rule assumes a different, probably non-existent file system path, it must go in a VirtualHost block or in the .htaccess file at your site root, i.e. a different location.
Spreading the configuration around different places sounds like a recipe for future problems to me and I don't recommend it. I think you'll be better off to change the links under your control to the pretty versions and not worry about other links.
The ^ anchors the regex at the beginning of the string.
RewriteRule matches the URI beginning with a / (unless it's in some per-directory configuration area).
Either prefix the / or remove the anchor ^ (depending on what you want to achieve)
You haven't captured the job ID in the regex, so you can't reference it in the rewritten URL. Something like this (not tested, caveat emptor, may cause gastric distress, etc.):
RewriteRule ^job/([0-9]+) job_wanted.php?jid=$1
See Start Rewriting for a tutorial on this.
You need to escape the ? and . marks if you want those to be literals.
^job_wanted\.php\?jid=9\?$
But although that explains why your pattern isn't matching, it doesn't address the issue of your URL rewriting. I'm also not sure why you want the ^ and $ are there, since that will prevent it from matching most URLs (e.g. http://www.yoursite.com/job_wanted.php?jid=9 won't work because it doesn't start with job_wanted.php).
I don't know htaccess well, so I can only address the regex portion of your question. In traditional regex syntax, you'd be looking for something like this:
s/job_wanted\.php\?jid=(\d*)/job\/$1/i
Hope that helps.
Did you try to escape special characters (like ?)?
The ? and . characters have a special meaning in regular expressions. You probably just need to escape them.
Also, you need to capture the jid value and use it in the rule.
Try to change your rules to this:
RewriteEngine On
RewriteRule ^job_wanted\.php\?jid=([0-9]+)$ /job/$1
Something like
ReWriteRule ^job\_wanted\.php\?jid\=([0-9-]+)$ /job/$1
should do the trick.