Cannot match literal dot with htaccess regex - regex

The problem: I cannot figure out how to match a literal dot in my expression so I could rewrite query strings containing dots. First I tried something like this:
RewriteRule ^([\.\w]+)$ index.php?url=$1 [L]
I have a php script:
echo "url is: ".$_GET['url'];
which should, in theory, output anything that I write in my query. But for any query containing only letters and dots, my script always outputs:
url is: index.php
I've tried these expressions as well:
^(.+)$
^([.\w]+)$
And the result is the same.
So the question is: are my expressions wrong or does this have something to do with server's config?

It looks like there is another request which is processed before the rule is applied, if I use a rule which matches less than index.php (e.g. .. for matching xy), the result is as expected: xy. With more relaxing rules like .* or .+ it fails. x.* works fine however.
You can add another condition to ignore requests like index.php:
RewriteCond %{REQUEST_FILENAME} !index\.php$
RewriteRule ^(.+)$ index.php?url=$1 [L]
This was tested/ debugged with:
<?php
printf("url is: %s <br>\n", htmlspecialchars(filter_input(INPUT_GET, 'url')));
echo "<pre>",htmlentities(print_R($_SERVER, 1));

Related

How to remove the specific character in between using Apache rewrite rule

I have a URL like http://example.com/abc+def+cde+ndk
Unfortunately the number of capturing groups in the URI (abc, def,cde..) are not in a fixed number.
I tried writing a rule like the below but it is matching and replacing only three groups(two character groups and one + in between).
RewriteCond %{REQUEST_URI} ^/(.*?)(\+{1,})(.*)$ [NC]
RewriteRule . http://example.com/%1%3 [R=301,L]
Example given below:
Source: example.com/abc+def+cde+x+y(n number of strings separated by +)
Destination Must be: example.com/abccdexy...till n
If you can add a directive to the main config, the best solution is to use a RewriteMap that processes the URL rewriting through an external script, which you write. You can find details on that here.
Basically you do something like:
RewriteMap convertUrl "prg:/www/bin/convertUrl.pl"
RewriteRule \+ ${convertUrl:%{REQUEST_URI}} [R=301,L]
(only the RewriteMap needs to go in your main config, the RewriteRule can go in your .htaccess)
Where /www/bin/convertUrl.pl is a script you write to process the substitution, as described on the above link. It should take the URL on STDIN (without any buffering), strip out the plus signs, and return it on STDOUT.
Something like this should work:
#!/usr/bin/perl
$| = 1; # Turn off I/O buffering
while (<STDIN>) {
s/\+//g; # Replace dashes with underscores
print $_;
}
Here is a pure .htaccess solution.
# Remove a plus sign on each iteration of the rule
RewriteRule ^help/col/([^+/]+)\+([^/]+)$ help/col/$1$2 [E=REMOVED_PLUS_SIGNS:1]
# For URLs that were processed, redirect once all the plus signs are removed
RewriteCond %{ENV:REMOVED_PLUS_SIGNS} =1
RewriteRule ^help/col/([^+/]+)$ /help/col/$1 [R=301,L]

.htaccess: param url to clean url

I am trying to turn an ugly url with parameters into a nice url. At the moment I have:
http://myasite.com/index.php?reg=uk&area=london&id=16
Which I would like to have like so:
http://myasite.com/uk/london/16
I have tried using this .htaccess:
RewriteEngine On
RewriteRule ^/?$/?$ index.php?reg=$1&area=london&id=$2 [L,QSA]
Which I got from an online generator however when I run the page with /uk/16 in the url it just crashes.
What am I doing wrong?
In reply to Chris's reply below. All of these are optional.
Structure of url is like so:
myasite.com
myasite.com/uk (if set, This will always be text and always 2 chars long)
myasite.com/uk/london (if set, This will always be text, this will be any char length )
myasite.com/uk/london/16 (if set, This will always be integer and any char length)
Your regex is incorrect. Your ^/?$/?$ says the request can have 2 /s only, each is optional. You also aren't using any capture groups so $1 and $2 have no context. Here's a regex that would work for your provided example:
^/(uk)/(\d+)$
If uk can be any 2 lowercase letters you could use:
^/([a-z]{2})/(\d+)$
You can use regex101 to see how your regexs will function.
https://regex101.com/r/VyJE9d/1 (your rule)
https://regex101.com/r/VyJE9d/2
The right side of the page gives explanations.
As a rewrite rule:
RewriteEngine On
RewriteRule ^/([a-z]{2})/(\d+)$ index.php?reg=$1&id=$2 [L,QSA]
All you need to use is this in your .htaccess file:
RewriteEngine On
RewriteRule ^([^/]*)/([^/]*)/([^/]*)$ /index.php?reg=$1&area=$2&id=$3 [L]
This will leave you with your desired URL of: http://myasite.com/uk/london/16. Just make sure you clear your cache before testing this.
RewriteEngine On
RewriteRule ^([a-zA-Z0-9_-]+)/([0-9]+)/?$ index.php?reg=$1&id=$2 [L,QSA]
We are rewriting $1/$2/ and $1/$2 to index.php?reg=$1&id=$2

ExpressionEngine RewriteRule RegEx Throws 500 Error

When using categories in ExpressionEngine, a Category URL Indicator trigger word can be set to load a category by its {category_url_title}.
I would like to remove the category "trigger word" from the URL.
Here is what I have so far, with the trigger word set to "category":
RewriteRule /products/(.+)$ /products/category/$1 [QSA,L]
I'm not an expert at writing regular expressions, but I do a little. I'm 99% sure my RegEx is fine, however when trying to use it as a RewriteRule in my .htaccess file, I'm getting a 500 error.
I'm sure it's something stupid, but for some reason I'm not seeing my mistake. What am I doing wrong?
Update: Adding a ^ to the beginning of the RewriteRule fixed the 500 error.
RewriteRule ^/products/(.+)$ /products/category/$1 [QSA,L]
This is not safe. Take:
/products/a
The regex group matches a.
It will be rewritten to:
/products/category/a
which the regex matches again (this time, the group matches category/a). Guess what will happen.
You want /products/ from the beginning of input if it is not followed by category/, which means you want a negative lookahead. Also, the QSA flag is of no use, you don't have a query string to rewrite (QSA stands for Query String Append):
RewriteRule ^/products/(?!category/)(.+) /products/category/$1 [L]
Another way to use it (and which I personally prefer) is to use a RewriteCond prior to the rule:
RewriteCond %{REQUEST_URI} ^/products/(?!category/)
RewriteRule ^/products/(.*) /products/category/$1 [L]
This Apache RewriteRule should do the job for you*:
RewriteCond %{REQUEST_URI} ^/products/(?!category/)
RewriteRule ^/products/(.*) /products/category/$1 [L]
With this in place, you'll need to hard code your category links manually:
{categories backspace="2"}
{category_name},
{/categories}
Which would output the new Category URLs you desire:
http://example.com/products/toys
Otherwise, if using the recommended path variable when building your category links:
{categories backspace="2"}
{category_name},
{/categories}
Would create links with the Category URL Indicator in the URI:
http://example.com/products/C1
http://example.com/products/category/toys
Which — while perfectly valid — would create canonicalization issues on your site since the different URLs would appear as duplicate content to search engines.
*Credit to fge for brilliant mod_write rule.

mod_rewrite regexp

I'm working on some rewrite rules, and for some reason a regexp I'm not expecting to pass (and does pass not on any of my regexp testers) is passing in mod_rewrite.
The URL in question is:
http://url.com/api/projects.json?division=aa
And the rewrite rule is:
RewriteEngine On
RewriteBase /
RewriteRule ^api\/([^.?#/%\s]+)\.([^#?\s]+)$ api.php?type=$1&format=$2 [NC,L]
Because the second capture is immediately followed by $ I'd expect that URL to fail because of the query string, but it seems to accept just fine and pass the two parameters to GET.
Any thoughts?
Note: Query String
The Pattern will not be matched
against the query string. Instead, you
must use a RewriteCond with the
%{QUERY_STRING} variable.
Snip from the bottom of the docs

Regex pattern help (I almost have it, just need a bit of expertise to finish it)

I need to match two cases
js/example_directory/example_name.js
and
js/example_directory/example_name.js?12345
(where 12345 is a digit string of unknown length and the directory can be limitless in depth or not exist at all)
I need to capture in both cases everything between js/ and .js
and if ? exists capture the digit string after ?
This is what I have so far
^js/(.*).js\??(\d+)?
This works except it also captures
js/example_directory/example_name.js12345
I want the regex to ignore that. Any suggestions?
Thank you all!
Test your patterns here
Answer:
Using Gumbo's information my final rewrite rule is as follows.
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{QUERY_STRING} ^\d*$
RewriteRule ^js/(.*)\.js$ js.php?f=$1.js&v=%0 [L]
</IfModule>
Include the whole querystring pattern, including the ? in one conditional match.
^js/(.*).js(\?\d+)?
mod_rewrite’s RewriteRule directive does only test the URI path and not the query. So using a rule like the following does already match both URIs:
RewriteRule ^js/(.*)\.js$ …
If you now want to test the query too, you need to use an additional RewriteCond:
RewriteCond %{QUERY_STRING} ^\d*$
RewriteRule ^js/(.*)\.js$ …
The match of the last successful RewriteCond can be referred to with %n, so in case of the whole match %0 or in this case even just %{QUERY_STRING} directly.
As far as regular expressions go - you can use the (?:) (non capture grouping) to make the \?(\d+) as a chunck, optional like so:
^js/(.*).js(?:\?(\d+))?
You really don't >need< to use the ?: (non capture) portion, but if you don't, back references will be changed - 1 will point at the filename, 2 will point at ?1234 and 3 will be 1234