Looking for some URL rewriting help - regex

My question is a simple one, but I can't seem to find the answer. I'm currently working on some URL rewriting for a website, but I have encountered a problem. Currently the most basic rule I have goes something like this:
RewriteRule ^([a-zA-Z]+)/(([a-zA-Z]+)/?$ index.php?mod=$1&com$2
This works in most cases, and I have some special cases for where this doesn't apply, however one of the pages needs to pass a lot of information through the URL, and I want to automatically rewrite this. Some examples:
website.com/asdf/jkl/id/5/page/2 should become website.com/index.php?mod=asdf&com=jkl&id=5&page=2
website.com/qwer/yuio/search/keyword/sort/alpha should become website.com/index.php?mod=qwer&com=yuio&search=keyword&sort=alpha
Is this possible? I could really use some help here... Thanks! :)

Depending on what language/framework you're using, it may be simpler to put the rewriting/dispatch in the script rather than attempt to do everything in mod_rewrite.
For example, if you were using PHP, given the URL:
http://www.example.com/asdf.php/jkl/id/5/page/2
a script at asf.php could read the PATH_INFO variable, split it on slashes, and write the values into the expected place in $_REQUEST.
Then all you need is one simple rewrite rule to elide the ‘.php’.

You could use a recursive rule:
RewriteRule ^([a-zA-Z]+)/([a-zA-Z]+)/?$ index.php?mod=$1&com$2 [L,QSA]
RewriteRule ^([^/]+/[^/]+)/([^/]+)/([^/]+)/?(.*) $1/$4?$2=$3 [QSA]
But it sure would be easier to parse the request path with PHP.

Related

How to remove everything after a word in apache url with rewrite

I need some help with regex and an Apache rewrite, I still have yet to comprehend regex patterns, I see a lot of previous questions on doing something similar in php (which I know a lot about), but I can't seem to convert it to a rewrite rule.
I used to use Smugmug for my photography website, then I just made my own, Smugmug added some random letters at the end of each url to make it more "private", like this "Blog/Jenny-Easter-Pictures/i-nbRcmgv"
I kept the same folder structure, except the random characters, and because of that, Google, and possibly my old old clients are getting a not found error since its looking for a "i-nbRcmgv" folder
I want apache to remove everything after (and including) the "/i-" in the url, needs to include the slash, in case I ever have a folder name that does have an i- in it of course.
Sounds like it might be easy, but like I said, regex baffles me, and I don't use it enough to learn it all AND remember it for the next time ha.
I already have something like this in my conf, so I have a general idea on what to do, but I copied it from someone else's answered question on this site.
RewriteRule ^/downloads/(.*)/$ /download.php?DIR=$1 [QSA]
You need something like
RewriteEngine on
RewriteRule ^(.*)/i-.* $1 [QSD]
Remove [QSD] if you want to preserve the query part after the "?" in the url (eg Blog/Jenny-Easter-Pictures/i-nbRcmg?a=b should become Blog/Jenny-Easter-Pictures?a=b)

Speed implications of using Redirect vs RewriteRule

I'm curious to know if there is any difference in speed between RewriteRules and Redirect within a .htaccess rule on Apache.
To my mind, RewriteRules can often be complex regex expressions which I assume have overhead (even if it's incredibly small) compared to Redirect that would be simple string matching(?)
So, if I had:
RewriteRule ^mytestpage\.html$ http://www.google.com [R=301, QSA]
vs
Redirect 301 /mytestpage\.html http://www.google.com/
I'm probably never going to notice a difference, but what if I had 1000 unique redirects? or 10,000? Would it be adventagous to use one over the other?
The speed implications of using either is negligible and you won't notice a difference. That being said, you should use the right tool for the job.
Doing a simple redirect, you should use a Redirect instead of using Mod_Rewrite. That example is something a Redirect should take care of. When you need to start doing more complex things you can think about using Mod_Rewrite.
Even with 1000 or 10,000 redirects you're not really going to notice a big difference. However it will use more RAM. Probably a few MB's if that.
So to answer your question, it really wont have a real impact but use the right tool for the job.
This should help.
When not to use mod_rewrite
As PanamaJack mentioned the link, Apache Docs say that:
mod_alias provides the Redirect and RedirectMatch directives, which provide a
means to redirect one URL to another. This kind of simple redirection of one
URL, or a class of URLs, to somewhere else, should be accomplished using these
directives rather than RewriteRule
So , I understand it like this: for better speed use Redirect/RedirectMatch (mod_alias) rather than RewriteRule (mod_rewrite).

Target and rewrite one url ignoring any others, with the issue being that the others partially match it

Ive exhausted my patient first, my energy then, and finally my sanity, trying to make a rewrite redirect for a homepage of the kind:
http://www.mydomain.com/index.php
to
http://www.mydomain.com/
Ive indeed done that successfully with something like:
RewriteRule ^/index\.php$ /$ [R=301,L]
However the issue is that the rest of urls are done like this
http://www.mydomain.com/index.php?page=this-is-a-page-title
Which makes the previous rewrite to break them turning them into:
http://www.mydomain.com/?page=this-is-a-page-title
Ive have not managed to properly write a rewritecond to exclude all urls containing the string
index.php?page=
Or preferably something to directly target the homepage only, this lead me to look for something to be put after the url like
http://www.mydomain.com/index.php(here)
that would tell mod_rewrite to not go for any url longer than that, but i could not understand how to do it either with a rewriterule only or with a rewritecond of the request_uri ! kind. And in fact found a post here where ppl stated that regex is to match something, and is not efficient at all to try match nothing. Ive tried near a hundred, remaking them with my best knowledge, and got most not working, and some directly breaking url, and even 500 errors.
You decide whatever you feel is the more efficient solution.
Ive learnt more than the almost nothing i knew about regex and mod_rewrite, but i think im past beyond the point were i should ask for some help. Ive tried for two days, read a lot, here and there, apache docs, and i have the feeling ive past almost touching the working line, probably with some mistake in punctuation. Im also interested in understanding whatever is shown here to fix what im sure is a misconception on my part either about regex or about how to properly lay in a correct order a rewrite or rewritecond like this.
You could use RewriteCond to condition the following rewrite rule. Something like this:
RewriteCond %{QUERY_STRING} =""
RewriteRule ^/index\.php$ / [R=301,L]
The condition here is that the query string is empty, which seems to be what you want.

Problem with htaccess GET form variables in a rewritten url

Essentially my problem is thus; I have a MVC system that redirects all requests to index.php on my site. I have a rewrite rule in my htaccess file to handle those requests like so:
RewriteRule ^([a-zAZ\_\-]+)\/([a-zA-Z\_\-]+)\/([^\/?]*) /?module=$1&class=$2&event=$3
Which translates urls into these type of urls
http://example.com/users/login/
http://example.com/users/info/me
My problem is that I also want GET variables to be applied and used in the URL like so
http://example.com/users/login/?var1=val1&var2=val2
http://example.com/users/info/me?var1=val2...
I've written two different regexes that work perfectly well in a my workbench (expresso) and I've tested them out in PHP however they refuse to work in htaccess. They're not particular complex, I have tried:
^([a-zAZ_\-]+)\/([a-zA-Z_\-]+)\/([^\/\?]*)[\?]*(.*) /?module=$1&class=$2&event=$3&$4
and
^([a-zAZ_\-]+)\/([a-zA-Z_\-]+)\/([^\/\?]*)(?(?=\?)\?(.+)) /?module=$1&class=$2&event=$3&$4
Neither of these work and I'm racking my brains as to why. Essentially it just doesn't recognise the fourth group and returns nothing I thought it might have been due to it being next to an ampersand but I did &var=$4 as a test and it still fell over.
Any help with this would be greatly appreciated as this is driving me insane.
Thanks in advance,
Rupert S.
After all, this is what you need:
RewriteRule ^([a-z_-]+)/([a-z_-]+)/([^/?]*) /?module=$1&class=$2&event=$3 [QSA,NC,L]
[QSA] will append the additional GET parameters to the rewritten query string.
[NC] since it is case insensitive, no need for A-Z matches

What's wrong with this regular expression in a .htaccess file?

I'm trying to understand why this regular expression isn't working in my .htaccess file. I want it so whenever a user goes to the job_wanted.php?jid=ID, they will be taken to job/ID.
What's wrong with this?
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php?$ job/%1? [R]
I want it so when a user clicks on http://localhost/jobwehave.co.za/jobs/ID they are shown the same results as what below would show http://localhost/jobwehave.co.za/jobs?id=ID.
Sorry for the mix up. I still very confused to how this works.
The primary problem is that you can't match the query string as part of RewriteRule. You need to move that part into a RewriteCond statement.
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ /job/%1?
Editing to reflect your updated question, which is the opposite of what I've shown here. For the reverse, to convert /job/123 into something your PHP script can consume, you'll want:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1
But you're probably going to have trouble putting this in an .htaccess file anywhere except the root, and maybe even there. If it works at the root, you'll likely need to strip the leading / from the RewriteRule I show here.
Second edit to reflect your comment: I think what you want is complicated, but this might work:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1 [L]
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ http://host.name/job/%1? [R]
Your fundamental problem is that you want to "fix" existing links, presumably out of your control. In order to change the URL in the browser address bar, you must redirect the browser. There is no other way to do it.
That's what the second cond+rule does: it matches incoming old URLs and redirects to your pretty URL format. This either needs to go in a VirtualHost configuration block or in the .htaccess file in the same directory as your PHP script.
The first rule does the opposite: it converts the pretty URL back into something that Apache can use, but it does so using an internal sub-request that hopefully will not trigger another round of rewriting. If it does, you have an infinite loop. If it works, this will invoke your PHP script with a query string parameter for the job ID and your page will work as it has all along. Note that because this rule assumes a different, probably non-existent file system path, it must go in a VirtualHost block or in the .htaccess file at your site root, i.e. a different location.
Spreading the configuration around different places sounds like a recipe for future problems to me and I don't recommend it. I think you'll be better off to change the links under your control to the pretty versions and not worry about other links.
The ^ anchors the regex at the beginning of the string.
RewriteRule matches the URI beginning with a / (unless it's in some per-directory configuration area).
Either prefix the / or remove the anchor ^ (depending on what you want to achieve)
You haven't captured the job ID in the regex, so you can't reference it in the rewritten URL. Something like this (not tested, caveat emptor, may cause gastric distress, etc.):
RewriteRule ^job/([0-9]+) job_wanted.php?jid=$1
See Start Rewriting for a tutorial on this.
You need to escape the ? and . marks if you want those to be literals.
^job_wanted\.php\?jid=9\?$
But although that explains why your pattern isn't matching, it doesn't address the issue of your URL rewriting. I'm also not sure why you want the ^ and $ are there, since that will prevent it from matching most URLs (e.g. http://www.yoursite.com/job_wanted.php?jid=9 won't work because it doesn't start with job_wanted.php).
I don't know htaccess well, so I can only address the regex portion of your question. In traditional regex syntax, you'd be looking for something like this:
s/job_wanted\.php\?jid=(\d*)/job\/$1/i
Hope that helps.
Did you try to escape special characters (like ?)?
The ? and . characters have a special meaning in regular expressions. You probably just need to escape them.
Also, you need to capture the jid value and use it in the rule.
Try to change your rules to this:
RewriteEngine On
RewriteRule ^job_wanted\.php\?jid=([0-9]+)$ /job/$1
Something like
ReWriteRule ^job\_wanted\.php\?jid\=([0-9-]+)$ /job/$1
should do the trick.