mod_rewrite where the search begins? - regex

Recently, I'm experimenting with PHP's mod_rewrite engine. A bunch of tutorials I've read gave me a pretty good picture how to use its most basic and useful possibilities. But there is still that question I didn't find the answer for. I guess it should be the very first question to be explained but no tutorial gave me the answer yet.
I'm wondering which very part of URL is being considered when trying to match the regex.
Let's say I have a directory my_project on my server and a .htaccess file inside that directory. The browser should see the directory like this:
http://my_website.com/my_project
If I add a rule in .htaccess then which part of the above URL will be considered when trying to match the regex of this rule? I'm pretty good in understanding regular expressions themselves but I can't figure out which chunk of URL does mod_rewrite pick to do the regex.
If my question isn't clear enough let me also put it this way: which exact place of the above URL is matched by the following regex in .htaccess?
^
Yet another question, if I go to
http://my_website.com/my_project/subfolder
will the considered part of the URL will be different or it will always depend on the place where .htaccess is placed?

I figured it out.
To explain the problem and how I got to the answer I'll try to explain it step by step.
Let's assume the following:
.htaccess is placed in a folder my_project in the root path of www.my_website.com. .htaccess consists the following rule:
RewriteRule ^.*$ index.php?matched=$0
To avoid endless loop let's "fire" the rule only if we provide a test parameter in query string, so the complete .htaccess should look like this:
RewriteEngine On
RewriteCond %{QUERY_STRING} test=1
RewriteRule ^.*$ index.php?matched=$0
Now, if everything goes as I thought we should end up in the index.php script placed in my_project folder. To see the whole match let's add the following line to the script:
var_dump($_GET["matched"]);
In the browser we go to http://my_website.com/my_project?test=1 and we expect the output to be:
string(32) "http://my_website.com/my_project"
But it is not! It is instead
string(0) ""
We're almost there. Now let's go to http://my_website.com/my_project/subfolder/?test=1. The output is
string(10) "subfolder/"
That proves one thing - when mod_rewrite starts to compare the URL with regular expressions it doesn't see the PROTOCOL part of the URL as well as the HTTP_HOST part. As my further research reveal, it also ommits every folder above the .htaccess location as well as the query string and hash part of the URL. For the mod_rewrite the URL begins where the .htaccess location begins.
I hope this self-answered question will be helpful for someone in the future.
Enjoy!

Let me give you a practical example
Suppose Your website is www.example.com and it's located in a folder/Directory named 'ex'
You'll place the .htaccess file in your ex folder to make it work for your website www.example.com
Now let's say you want to make this url clean www.example.com/ex/index.php?page=welcome
open your.htaccess file that you have placed in your ex folder and add this following code to it
RewriteEngine On
RewriteRule ^([A-Za-z0-9-+_%*?]+)/?$ index.php?page=$1 [L]
It'll chagle the URL from www.example.com/ex/index.php?page=welcome to www.example.com/ex/welcome
Now let's say you moved your website to a subfolder ex/subfolder OR www.example.com/ex to www.example.com/ex/subfolder
Simply move the .htaccess file with all of your site to that subfolder no need to change the code it'll work the same
([A-Za-z0-9-+_%*?]+) <-- this part with in the brackets is used as regular expression
means you are looking for any Character from A to z and from a to z and any number from 0 to 9 and symbol - , +,_,%,*,? and the + sign after the closing square bracket means more than one .
In short You are asking to for which is ([in here]+) and it's more than one ,if however you remove the + symbol after the bracket it'll return only the first character

Related

htaccess: redirect first folder to get param

I am very new to regular expressions and the .htaccess file of the apache webserver.
I want to rewrite a url so that the first subfolder gets converted to a GET-parameter and all the following subfolders should be left as they are...
A few examples:
http://www.example.com/thisisthevariable
should be rewritten to
index.php?p=thisisthevariable
and
http://www.example.com/thisisthevariable/test.jpg
should be rewritten to
test.jpg?p=thisisthevariable
and
http://www.example.com/thisisthevariable/subfolder/and/another/sub/folder/123.gif
should be rewritten to
subfolder/and/another/sub/folder/123.gif?p=thisisthevariable
It should work with a unlimited count of subfolders. So the whole path should be used except the first directory - this should be used as get-parameter of the destination file.
Hope, anyone understands my task and can help me :-)
Thanks!!
RewriteRule ^([^/]+)$ index.php?p=$1
RewriteRule ^([^/]+)/(.+) $2?p=$1
[^/]+ matches anything that's not a forward slash, so the first part before the first /. The () capture it as a match, which is then available as $1 (and $2). For more details, see http://www.regular-expressions.info/refquick.html.

^ character not working on mod rewrite in htaccess

I'm having this very annoying problem with my rewrite rules in the .htaccess file.
The context
So what I want is to have these two types of URLs rewrite to different targets:
URL 1 -- http://example.com/rem/call/answer/{Hex String}/{Hex String}/
URL 2 -- http://example.com/answer/{Hex String}/{Hex String}/
This is an extract of my .htaccess file:
RewriteEngine On
RewriteRule rem/call/answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET1
RewriteRule answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET2
The problem
Now the problem is that URL 2 rewrites well (using rule #2) and goes to TARGET 2, but URL 1 rewrites with both rules instead of just rule #1.
I tried several solutions, including the obvious use of the character ^ for "start of string". At that point, my rewrite rules were:
RewriteEngine On
RewriteRule rem/call/answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET1
RewriteRule ^answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET2
However, another problem happened. This time it's URL 1 that rewrites well, with only rule #1 and goes to TARGET 1. But now URL 2 doesn't rewrite at all any more. I'm guessing it's because the second rewrite rule never matches any url and thus never applies.
The only solution I found so far is to remove the ^ and use the [L] flag at the end of rule #1 like so:
RewriteEngine On
RewriteRule rem/call/answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET1 [L]
RewriteRule answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET2
This way, it uses rule #1, matches, but never gets to rule #2. Both urls get rewritten properly with these rules, but it is not a good solution since I might not want to stop the rewriting of URL 1 after the first rule applies (what if I have a third rule I would want to apply to it as well...)
My questions to you
Now that I've stated the problem, my questions here are:
Is the [L] flag the only way to go ? (which I highly doubt, and certainly hope not)
Would ^ be a candidate solution ? (I think so)
If so, how to make it work and why is it not working at all in my case ?
What I suspect
I suspect that it has something to do with the fact that the URL is actually http://example.com/answer/{Hex String}/{Hex String}/ and not just answer/{Hex String}/{Hex String}/, which means that answer/.. isn't really at the beginning of the string and thus prefixing it with ^ doesn't work.
But then it brings me to another question:
How to tell apache to strip the url of the scheme+domain part (i.e. http://example.com/) and match rules with the remainder of the url only (e.g. answer/{Hex String}/{Hex String}/) ?
EDIT
I should also add that I've tried the basic alice-bob example. I have a file named bob.html in my root and the following rule in my .htaccess file:
RewriteRule alice.html$ /bob.html
This works just fine and displays the bob.html page when alice.html is queried. However, if I change the rule to:
RewriteRule ^alice.html$ /bob.html
I then get a 404 error when querying the alice.html page...
As for #anubhava's comment, my full .htaccess file is composed as follows:
RewriteEngine On
[A bunch of RewriteRule that have nothing to do with the topic at hand
(don't contain any "answer" string in them and all work perfectly)]
RewriteRule rem/call/answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET1 [L]
RewriteRule answer/([a-f0-9]+)/([a-f0-9]+)/?$ /TARGET2
ErrorDocument 404 /404.html
Header set Access-Control-Allow-Origin "*"
SetEnv file_uploads On
Ok so, thanks to #anubhava's comments, I solved the problem easily by moving the .htaccess file down one level to the www directory.
I was still quite curious about why this solved my problem, so I went on investigating how apache's rewriting works. I'm not sure I've got all the details right, but here's what I found out.
Location location location
Of course, it goes without saying that the location of files is important, and especially configuration files like .htaccess. But it goes even beyond simple file path, and here is the reason why:
First, you need to keep in mind that the .htaccess file will affect the directory it's located in as well as all its subdirectories. So it would seem logical that a global .htaccess file should be placed at the root directory of your website, since it will affect all subdirectories (i.e. the whole website).
The second thing to keep in mind is that the public_html directory (which in my case was called www, simply a symbolic link to public_html) is the root folder of your website's content. You might have access to its parent directories, but whatever you put outside of your public_html directory is not part of your website's content per se, any resource you put there won't be part of your website's hierarchy (i.e not accessible via http://example.com/path/to/resource).
The regex option ^ matches the start of a string, here in the context of URL rewriting, it's the start of the considered URL. And that's not all, it seems that Apaches resolves matches relatively to the location of your .htaccess file. Which means that the ^ not only references the start of the string you wrote as part of the rule but actually references it relatively to the actual path of the .htaccess file which acts as a "local root directory" for all the rewrite rules in that specific .htaccess file.
Example
Let's say you have a subdirectory (e.g http://example.com/sub/directory/) and inside it you have two files:
http://example.com/sub/directory/.htaccess
http://example.com/sub/directory/bob.html
inside this .htaccess file, you have a rewrite rule as follows:
RewriteRule ^tom.html$ /sub/directory/bob.html
This rule will not match http://example.com/tom.html as you could expect the ^ to act, but instead will match http://example.com/sub/directory/tom.html since this is where the .htaccess file is located.
Conclusion
Generally speaking, let's say you have a rewrite rule such as:
RewriteRule ^PATH$ /TARGET_PATH
This means that the rule will not match the URL against ^PATH$, but instead it actually matches it against ^[Location of the .htaccess file]/PATH$
In other words, the location of the .htaccess file acts as a sort of base URL for all rewrite rules in it (much similar to the base tag in html).
This is why my rewrite rule with ^ didn't work, since my .htaccess file was located above the public_html directory, and that parent directory was acting as the base URL for my rules. Thus the rule would never match any URL since it would compare it with a path never accessed (because above the website's content root).
I hope this was clear enough to help anyone who might encounter the same problem I had.
Cheers

.htaccess change / in match to &

I want to have te following URLs on my page:
www.domain.com/<module>/<function>/<query>=<string>/<query>=<string>/<query>=<string>
I know how to match the part with the module and function to valid urls like this:
www.domain.com/index.php?module=<module>&function=<function>
But I have no idea how I can append all those query=string-parameters to the query string.
I currently use RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9_]+)$ index.php?module=$1&function=$2 [NC]as my rule and would like to add those (optional and repeatable) query-string parts.
I hope someone knows more about htaccess and regexp than me xD
These rules need to be placed in .htaccess file in website root folder.
RewriteRule ^(.+)/([a-z0-9_]+)=([^/]+)/?$ $1/?$2=$3 [NC,N,DPI,QSA]
RewriteRule ^([a-z0-9_]+)/([a-z0-9_]+)/?$ /index.php?module=$1&function=$2 [NC,QSA,L]
They will rewrite URL (internally) from this form
http://www.example.com/main/job/p1=value/p2=something+else/PP=yes
into this form
http://www.example.com/index.php?module=main&function=job&p1=value&p2=something+else&PP=yes
These rules need to be placed somewhere on the top of .htaccess -- first rule uses [N] flag which tells Apache to start rewriting from start again (in order to rewrite all <query>=<string> fragments). If you have a lot of rules before this one, Apache will have to "probe" each rule after each iteration, which may put unnecessary load on web server.

Apache Regex doesn't return first result

I have the following URL:
http://somedomain.com/aa/search/search.php
I want it to return 2 selections, that of "aa" and that of "search/search.php".
With the help of Regex Coach, I have made the following regular expression which targets these two just fine:
/([a-z]{2})/(.*)
However, when I use them in my htaccess file, the rewrite just doesn't happen:
Options +FollowSymlinks
RewriteEngine on
RewriteRule /([a-z]{2})/(.*) /$2?var=$1
Can someone point this regex newbie in the right direction?
edit:
By "doesn't happen", I mean that following the URL: somedomain.com/aa/search.php gives me
"/aa/search.php not found"
rather than
"/search.php?var=aa not found".
That depends on where you define the rule. Your syntax is correct for server (global) config files. If you use .htaccess files, the server will strip the path upto the place where the file is located from the URL. Try
([a-z]{2})/(.*)
(i.e. without the first slash)

Mod-rewrites on apache: change all URLs

Right now I'm doing something like this:
RewriteRule ^/?logout(/)?$ logout.php
RewriteRule ^/?config(/)?$ config.php
I would much rather have one rules that would do the same thing for each url, so I don't have to keep adding them every time I add a new file.
Also, I like to match things like '/config/new' to 'config_new.php' if that is possible. I am guessing some regexp would let me accomplish this?
Try:
RewriteRule ^/?(\w+)/?$ $1.php
the $1 is the content of the first captured string in brackets. The brackets around the 2nd slash are not needed.
edit: For the other match, try this:
RewriteRule ^/?(\w+)/(\w+)/?$ $1_$2.php
I would do something like this:
RewriteRule ^/?(logout|config|foo)/?$ $1.php
RewriteRule ^/?(logout|config|foo)/(new|edit|delete)$ $1_$2.php
I prefer to explicitly list the url's I want to match, so that I don't have to worry about static content or adding new things later that don't need to be rewritten to php files.
The above is ok if all sub url's are valid for all root url's (book/new, movie/new, user/new), but not so good if you want to have different sub url's depending on root action (logout/new doesn't make much sense). You can handle that either with a more complex regex, or by routing everything to a single php file which will determine what files to include and display based on the url.
Mod rewrite can't do (potentially) boundless replaces like you want to do in the second part of your question. But check out the External Rewriting Engine at the bottom of the Apache URL Rewriting Guide:
External Rewriting Engine
Description:
A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution by the use of mod_rewrite...
Solution:
Use an external RewriteMap, i.e. a program which acts like a RewriteMap. It is run once on startup of Apache receives the requested URLs on STDIN and has to put the resulting (usually rewritten) URL on STDOUT (same order!).
RewriteEngine on
RewriteMap quux-map prg:/path/to/map.quux.pl
RewriteRule ^/~quux/(.*)$ /~quux/${quux-map:$1}
#!/path/to/perl
# disable buffered I/O which would lead
# to deadloops for the Apache server
$| = 1;
# read URLs one per line from stdin and
# generate substitution URL on stdout
while (<>) {
s|^foo/|bar/|;
print $_;
}
This is a demonstration-only example and just rewrites all URLs /~quux/foo/... to /~quux/bar/.... Actually you can program whatever you like. But notice that while such maps can be used also by an average user, only the system administrator can define it.