htaccess regex variable parameter - regex

I'm not used to regex and figure I've lost too many hours trying to resolve this, so thought I'd ask for help. I am trying to prettify the html extension.
My site will use URLs that have variable parameters. For example:
mysite.com/article/this-is-an-entry
mysite.com/article/this-is-an-entirely-different-entry
All will use .html as the extension.
In the htaccess file, I have tried
RewriteRule ^(article\/[a-z].*)$ $1.html [NC,L]
as well as slight variations of this, but cannot get this right. Thanks in advance for any assistance.

Firstly, let's look at the regex you have:
^(article/[a-z].*)$
This matches exactly the string "article/", followed by at least one letter (case insensitive due to the NC flag), followed by zero or more of anything. It's quite broad, but should match the examples you gave.
One way to test that it's matching is to add the R=temp flag to the rule, which tells Apache to redirect the browser to the new URL (I recommend using "=temp" to stop the browser caching the redirect and making later testing harder). You can then observe (e.g. in your browser's F12 debug console) the original request and the redirected one.
RewriteRule ^(article/[a-z].*)$ $1.html [NC,L,R=temp]
However, as CBroe points out, your rule will match again on the target URL, so you need to prevent that. A simple way would be to use the END flag instead of L:
Using the [END] flag terminates not only the current round of rewrite processing (like [L]) but also prevents any subsequent rewrite processing from occurring in per-directory (htaccess) context.
So:
RewriteRule ^(article/[a-z].*)$ $1.html [NC,END]
Alternatively, you can make your pattern stricter, such as changing the . ("anything") to [^.] ("anything other than a dot"):
^(article/[a-z][^.]*)$
To be even more specific, you can add a RewriteCond with an extra pattern to not apply the rule to, such as "anything ending .html".

Related

Rewrite in .htaccess (URL with parameter)

At the moment I'm not really good in unterstanding how URL Rewrite in htaccess works (with regEex).
I want to have following Url:
example.org/lyrik/kategorien/allgemein-1/
It should go to following Url with the number at the at as parameter:
example.org/index.php?seite=kategorienDetail&kategorienId=1
I have tried different ways but they all dont work. The last I have tried was:
RewriteRule ^lyrik/kategorien/allgemein-([0-9]+)\$ index.php?seite=kategorienDetail&kategorienId=$1 [L}
This one made problems on the whole website. every page was inaccessible.
Can someone show me a Rewrite Rule that works in my case and explain the RegEx in it?
I think you have some typos in your rule. Try it this way.
RewriteRule ^lyrik/kategorien/allgemein-([0-9]+)/?$ index.php?seite=kategorienDetail&kategorienId=$1 [L]
Note that I used the ? after the / so that it can be optional. You had a backslash. Also at the end of your rule you had a curly brace instead of the bracket.
EDIT
Based on your comment. Then you can use this if part of the URI is not constant.
RewriteRule ^lyrik/kategorien/([^/]+)-([0-9]+)/?$ index.php?seite=kategorienDetail&kategorienId=$2 [L]
Also this [0-9]+ just means it has to be a number from 0-9 to match the rule and the plus sign means 1 or more basically saying it can't be empty. The * basically means any including a blank(0 or more occurrences). Because your link always ends with a number limiting it to 0-9 in the regex ensures that it won't work if any other character is used there in the URL.
You can see more about regex here.
http://www.regular-expressions.info/tutorial.html
Please try this:
RewriteRule ^lyrik/kategorien/allgemein-(.*?)/$ index.php?seite=kategorienDetail&kategorienId=$1 [QSA,L]
I am writing from the top of my head, but it should work. Let me know if it doesn't, but it should :)
Edit: Please add the QSA flag also to pass any other query parameters you might have.
Alex

How to define url placeholder $1 for htaccess regex

I wrote the following in htaccess in the process of learning:
RewriteRule ^/test(a-zA-z)\.htm$ /test$1.htm
And test2.htm still gets mapped to test1.htm
I'm assuming the $1 is not being treated as the variable placeholder properly because $ is not escaped. What is the right way of writing this (so that for test purpose, test2.htm gets mapped to itself, test2.thm)
Ultimately, I'm trying to map something like:
domain.com/$1/$2
to
domain.com/?a=$1&b=$2
or
domain.com/$1
to
domain.com/?a=$1
I do not want the URL of the browser to change when the first url is mapped to the second. I know this is possible in C# Global.asax file (using routes.MapRoute), but not sure how to get this happening in php.
Proceed by elimination, from the most complex to the less complex.
handle first 2 params, then QSA directive (important) to keep all GET variables, then L directive to stop all,
then handle first 1 param, then QSA directive (important) to keep all GET variables, then L directive to stop all,
That should work:
RewriteRule ^/([a-zA-z0-9]+)/([a-zA-z0-9]+)$ /?a=$1&b=$2 [QSA,L]
RewriteRule ^/([a-zA-z0-9]+)$ /?a=$1 [QSA,L]
Oh by the way:
Here's the wiki of serverfault.com
The howto's htaccess official guide
The official mod_rewrite guide
And if that's not enough:
Two hints:
If you're not in a hosted environment (= if it's your own server and you can modify the virtual hosts, not only the .htaccess files), try to use the RewriteLog directive: it helps you to track down such problems:
# Trace:
# (!) file gets big quickly, remove in prod environments:
RewriteLog "/web/logs/mywebsite.rewrite.log"
RewriteLogLevel 9
RewriteEngine On
My favorite tool to check for regexp:
http://www.quanetic.com/Regex (don't forget to choose ereg(POSIX) instead of preg(PCRE)!)
When want to write something like a range you should use []. e.g:
RewriteRule ^/test([a-zA-z0-9]+)\.htm$ /index.php?data=$1 [L]
For me this was the simplest article I found which really helped me to figure out what I needed and worked in the end, so I'll share it here and try to use the terms which makes sense to me.
http://www.workingwith.me.uk/articles/scripting/mod_rewrite
RewriteRule ^page/([^/\.]+)/?$ index.php?page=$1 [L]
The right hand side (index.php?page=$1) is the destination which is hidden from the browser.
The left hand side is the mapping rule.
The variable parsed from the left - $1 need not be right at the end of the string and can be anywhere in the middle, for example, /CHECK/?VAR=$1MORETEXT or if there are more variables to parse from the left, it could be "/CHECK/?VAR=$1MORETEXT$2".
The "/?" is optional, if it is desired for the destination URL to not have a "/" at the end, don't include it and just end with the $ like ^page/([^/\.]+)$
The [L] is useful because it stops the htaccess from wasting time reading onwards once a matching Rule is found.

Problem with htaccess GET form variables in a rewritten url

Essentially my problem is thus; I have a MVC system that redirects all requests to index.php on my site. I have a rewrite rule in my htaccess file to handle those requests like so:
RewriteRule ^([a-zAZ\_\-]+)\/([a-zA-Z\_\-]+)\/([^\/?]*) /?module=$1&class=$2&event=$3
Which translates urls into these type of urls
http://example.com/users/login/
http://example.com/users/info/me
My problem is that I also want GET variables to be applied and used in the URL like so
http://example.com/users/login/?var1=val1&var2=val2
http://example.com/users/info/me?var1=val2...
I've written two different regexes that work perfectly well in a my workbench (expresso) and I've tested them out in PHP however they refuse to work in htaccess. They're not particular complex, I have tried:
^([a-zAZ_\-]+)\/([a-zA-Z_\-]+)\/([^\/\?]*)[\?]*(.*) /?module=$1&class=$2&event=$3&$4
and
^([a-zAZ_\-]+)\/([a-zA-Z_\-]+)\/([^\/\?]*)(?(?=\?)\?(.+)) /?module=$1&class=$2&event=$3&$4
Neither of these work and I'm racking my brains as to why. Essentially it just doesn't recognise the fourth group and returns nothing I thought it might have been due to it being next to an ampersand but I did &var=$4 as a test and it still fell over.
Any help with this would be greatly appreciated as this is driving me insane.
Thanks in advance,
Rupert S.
After all, this is what you need:
RewriteRule ^([a-z_-]+)/([a-z_-]+)/([^/?]*) /?module=$1&class=$2&event=$3 [QSA,NC,L]
[QSA] will append the additional GET parameters to the rewritten query string.
[NC] since it is case insensitive, no need for A-Z matches

What's wrong with this regular expression in a .htaccess file?

I'm trying to understand why this regular expression isn't working in my .htaccess file. I want it so whenever a user goes to the job_wanted.php?jid=ID, they will be taken to job/ID.
What's wrong with this?
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php?$ job/%1? [R]
I want it so when a user clicks on http://localhost/jobwehave.co.za/jobs/ID they are shown the same results as what below would show http://localhost/jobwehave.co.za/jobs?id=ID.
Sorry for the mix up. I still very confused to how this works.
The primary problem is that you can't match the query string as part of RewriteRule. You need to move that part into a RewriteCond statement.
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ /job/%1?
Editing to reflect your updated question, which is the opposite of what I've shown here. For the reverse, to convert /job/123 into something your PHP script can consume, you'll want:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1
But you're probably going to have trouble putting this in an .htaccess file anywhere except the root, and maybe even there. If it works at the root, you'll likely need to strip the leading / from the RewriteRule I show here.
Second edit to reflect your comment: I think what you want is complicated, but this might work:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1 [L]
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ http://host.name/job/%1? [R]
Your fundamental problem is that you want to "fix" existing links, presumably out of your control. In order to change the URL in the browser address bar, you must redirect the browser. There is no other way to do it.
That's what the second cond+rule does: it matches incoming old URLs and redirects to your pretty URL format. This either needs to go in a VirtualHost configuration block or in the .htaccess file in the same directory as your PHP script.
The first rule does the opposite: it converts the pretty URL back into something that Apache can use, but it does so using an internal sub-request that hopefully will not trigger another round of rewriting. If it does, you have an infinite loop. If it works, this will invoke your PHP script with a query string parameter for the job ID and your page will work as it has all along. Note that because this rule assumes a different, probably non-existent file system path, it must go in a VirtualHost block or in the .htaccess file at your site root, i.e. a different location.
Spreading the configuration around different places sounds like a recipe for future problems to me and I don't recommend it. I think you'll be better off to change the links under your control to the pretty versions and not worry about other links.
The ^ anchors the regex at the beginning of the string.
RewriteRule matches the URI beginning with a / (unless it's in some per-directory configuration area).
Either prefix the / or remove the anchor ^ (depending on what you want to achieve)
You haven't captured the job ID in the regex, so you can't reference it in the rewritten URL. Something like this (not tested, caveat emptor, may cause gastric distress, etc.):
RewriteRule ^job/([0-9]+) job_wanted.php?jid=$1
See Start Rewriting for a tutorial on this.
You need to escape the ? and . marks if you want those to be literals.
^job_wanted\.php\?jid=9\?$
But although that explains why your pattern isn't matching, it doesn't address the issue of your URL rewriting. I'm also not sure why you want the ^ and $ are there, since that will prevent it from matching most URLs (e.g. http://www.yoursite.com/job_wanted.php?jid=9 won't work because it doesn't start with job_wanted.php).
I don't know htaccess well, so I can only address the regex portion of your question. In traditional regex syntax, you'd be looking for something like this:
s/job_wanted\.php\?jid=(\d*)/job\/$1/i
Hope that helps.
Did you try to escape special characters (like ?)?
The ? and . characters have a special meaning in regular expressions. You probably just need to escape them.
Also, you need to capture the jid value and use it in the rule.
Try to change your rules to this:
RewriteEngine On
RewriteRule ^job_wanted\.php\?jid=([0-9]+)$ /job/$1
Something like
ReWriteRule ^job\_wanted\.php\?jid\=([0-9-]+)$ /job/$1
should do the trick.

mod_rewrite: replace underscores with dashes

I'm revealing my embarrassing ignorance of REGEX-fu here, but: I currently have a website where a load of the articles' URLs are written as "article_name", whilst the newer ones are written as "article-name".
I want to move all of them to using dashes, so is there a regular expression I could use to rewrite the older URLs to their newer equivalents?
Thanking you in advance!
First you must achieve consistency in the existing URLs. Basically, you have to normalize all existing names to always use dashes. Ok, you've done that.
We're starting with the following assumption:
The URL is roughly of the form:
http://example.com/articles/what-ever/really-doesnt_matter/faulty_article_name
where only URLs under /articles should be rewritten, and only the /faulty_article_name part needs to be sanitized.
Greatly updated, with something that actually works
For Apache:
RewriteEngine On
RewriteRule ^(/?articles/.*/[^/]*?)_([^/]*?_[^/]*)$ $1-$2 [N]
RewriteRule ^(/?articles/.*/[^/]*?)_([^/_]*)$ $1-$2 [R=301]
That's generally inspired by GApple's answer.
The first /? ensures that this code will run on both vhost confs and .htaccess files. The latter does not expect a leading slash.
I then add the articles/ part to ensure that the rules only apply for URLs within /articles.
Then, while we have at least two underscores in the URL, we keep looping through the rules. When we end up with only one remaining underscore, the second rule kicks in, replaces it with a dash, and does a permanent redirect.
Phew.
Try this:
RewriteRule ^([^_]*)_([^_]*_.*) $1-$2 [N]
RewriteRule ^([^_]*)_([^_]*)$ /$1-$2 [L,R=301]
The first rule replaces one underscore at a time until there are one or less left. The last rule will then replace the last underscrore and do an external redirect.
A potential different approach to think about:
I'm assuming that your "old format" and your "new format" will be in different directories for this idea, if they aren't you might want to consider making the new format have a different directory name.
For instance:
http://site.com/articles/2008/12/31/new_years_celebration
http://site.com/article/2008/12/31/new-years-celebration
In which case you could use mod_rewrite to detect anything in the "old directory" and redirect it to a "redirector.php".
Although on second thought, your mod_rewrite could look for something like this:
RedirectRule /articles/(.*_.*) /redirector.php?article=$1
Matching anything with a _ and sending it through the redirector.
Inside of redirector.php you can get the $_SERVER['REQUEST_URI'] and use tools like preg_replace and even database queries to find the correct url to redirect them to - as well as study the number of hits to old urls.
How will mod rewrite know what the actual url is supposed to be? You can rewrite all articles to use the underscore or the dash, but there is no way for mod_rewrite to tell if new location exists.
For example,
/I_Like_Bees is stored as /path/i_like_bees
/I-like-flowers is stored as /path/i-like-flowers
You want i-like-bees to rewrite to i_like_bees.
If you rewrite underscores to dashes, i_like_bees wouldn't be found
if you rewrite dashes to underscores i-like-flowers wouldn't be found
If you stored all your articles consistently you could easily make a rewrite rule work. Instead you probably have to write a script to check the directories existence and do a 301 redirect to the correct place.
Here's a method: http://yoast.com/apache-rewrite-dash-underscore/
Basically it separates the url into tokens on either side of the underscore, and rewrites the tokens again with the underscore replaced. The problem is it only replaces a single underscore at a time; it will redirect to a closer but not quite correct url, which will again redirect to a even closer, but possibly still not correct url...
It suggests fixing the multiple redirects by having several rewrite conditions & rules with successively more underscores and tokens, but this would require as many conditions and rules as you have underscores in your longest title.
Make sure to add any qualifiers if you can however, as the rule may replace paths you don't want changed (eg., image files) as is.