htaccess - how to apply RegEx patterns on the output of another - encapsulation - regex

I'm performing several regular expressions on a string inside a variable in order to clean it up for further use in the htaccess rules, but it seems rather cumbersome to do such simple thing in several lines:
RewriteCond %{THE_REQUEST} (?<=\s)(.*?)(?=\s)
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
RewriteCond %{ENV:HREFPATH} (^.*)?\?
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
RewriteCond %{ENV:HREFPATH} /(.*)
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
RewriteCond %{ENV:HREFPATH} (.*)/$
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
How can I reduce this to 2 lines?
Basically I'm looking for a way to encapsulate each as aggregation steps (filter) based on the output of the previous expression, but my humble efforts have failed after trying and web-searching for hours.
The code above does what I need it to do, it's just really ugly (not elegant).
In PHP, or basically any decent(ish) language it could be as simple as:
$HREFPATH = trim(explode(explode(" ",$THE_REQUEST)[1],"?")[0],"/");
-but this is NOT a PHP-related question; merely a simple way to explain what I mean, and what I'm trying to achieve.
I know there may be many RegEx patterns that could (theoretically) work here, but it should be compatible with Apache's RegEx engine.
Any input will be rewarded in kind; thanks in advance.

What you are doing in multiple rules can be done in a single like this:
RewriteCond %{THE_REQUEST} \s/+([^?]*?)/*[\s?]
RewriteRule ^ - [E=HREFPATH:%1]
RegEx Details:
\s: Match a whitespace
/+: Match 1+ /s
([^?]*?): Lazily match 0 or more of any characters that are not ?. Capture this value in %1
/*: Match 0 or more trailing /s
[\s?]: Must be followed by a ? or a whitespace

Related

Single RewriteRule to create a redirection with two conditions

I found this rule which works fine with just one condition which is: foobar is in the string.
I need to change this to include a new condition to have two conditions (instead of one):
foobar is in the string. This is already working.
meetball is NOT in the string.
RewriteRule ^(.*)foobar(.*)$ http://www.example.com/index.php [L,R=301]
Please try following, written as per your shown samples. Also you need to create groups (.*) since you are not using them while redirection. You could add NC flag of apache to enable ignorecase to the URI values.
RewriteEngine ON
RewriteRule ^(?!.*metaball).*foobar.*$ http://www.example.com/index.php [NC,L,R=301]
OR without negative lookahead try with usual condition check. Please make sure either you put above Rulesets OR following rulesets one at a time only.
RewriteEngine ON
RewriteRule %{REQUEST_URI} !metaball [NC]
RewriteRule foobar http://www.example.com/index.php [NC,L,R=301]
You can use negative lookahead pattern:
RewriteRule ^(?!.*meetball).*foobar http://www.example.com/index.php [L,R=301]
(?!.*meetball) will fail the pattern match if meatball is found anywhere in URI. Also there is no need to use grouping hence (...) is removed in my answer.

Apache mod_rewrite (RewriteCond) to filter out the "in-between" word?

(I'm not very used to RewriteCond things. So that i just googled and used for existing ones.)
Now I have a piece of .htaccess codes which rewrites into a different URL upon the input:
RewriteCond %{HTTP_HOST} ^(.*).example.com$
RewriteRule ^(.*)$ http://www.example.com/%1/$1 [P,L,NS]
By using that, it will rewrite by taking the "sub domain" input, and then putting as a folder, as in output. Lets say:
Input: http://support.example.com
Output: http://www.example.com/support
Input: http://member.example.com
Output: http://www.example.com/member
So that is working.
Now what i need is a bit more complicated out of this existing one.
How to filter the "in between" words by the RewriteCond?
Lets say i have 2 levels of sub-domains down. And then i ONLY want to select/filter the upper one. Which means:
Input: http://dev.support.example.com
Output: http://dev.example.com/support
(Inputs can also be: dev.member, dev.pricing, etc alot)
How to filter that support word, out of dev.support string?
Current ^(.*).example.com$ is only for the far left item.
Below one is NOT working:
RewriteCond %{HTTP_HOST} dev.^(.*).example.com$
RewriteRule ^(.*)$ http://dev.example.com/%1/$1 [P,L,NS]
Please suggest.
In your regex ^ needs to be placed at start of your pattern and use [^.]+ is better pattern than .*
You can use:
RewriteCond %{HTTP_HOST} ^dev\.([^.]+)\.example\.com$
RewriteRule ^(.*)$ http://dev.example.com/%1/$1 [P,L,NS]

ExpressionEngine RewriteRule RegEx Throws 500 Error

When using categories in ExpressionEngine, a Category URL Indicator trigger word can be set to load a category by its {category_url_title}.
I would like to remove the category "trigger word" from the URL.
Here is what I have so far, with the trigger word set to "category":
RewriteRule /products/(.+)$ /products/category/$1 [QSA,L]
I'm not an expert at writing regular expressions, but I do a little. I'm 99% sure my RegEx is fine, however when trying to use it as a RewriteRule in my .htaccess file, I'm getting a 500 error.
I'm sure it's something stupid, but for some reason I'm not seeing my mistake. What am I doing wrong?
Update: Adding a ^ to the beginning of the RewriteRule fixed the 500 error.
RewriteRule ^/products/(.+)$ /products/category/$1 [QSA,L]
This is not safe. Take:
/products/a
The regex group matches a.
It will be rewritten to:
/products/category/a
which the regex matches again (this time, the group matches category/a). Guess what will happen.
You want /products/ from the beginning of input if it is not followed by category/, which means you want a negative lookahead. Also, the QSA flag is of no use, you don't have a query string to rewrite (QSA stands for Query String Append):
RewriteRule ^/products/(?!category/)(.+) /products/category/$1 [L]
Another way to use it (and which I personally prefer) is to use a RewriteCond prior to the rule:
RewriteCond %{REQUEST_URI} ^/products/(?!category/)
RewriteRule ^/products/(.*) /products/category/$1 [L]
This Apache RewriteRule should do the job for you*:
RewriteCond %{REQUEST_URI} ^/products/(?!category/)
RewriteRule ^/products/(.*) /products/category/$1 [L]
With this in place, you'll need to hard code your category links manually:
{categories backspace="2"}
{category_name},
{/categories}
Which would output the new Category URLs you desire:
http://example.com/products/toys
Otherwise, if using the recommended path variable when building your category links:
{categories backspace="2"}
{category_name},
{/categories}
Would create links with the Category URL Indicator in the URI:
http://example.com/products/C1
http://example.com/products/category/toys
Which — while perfectly valid — would create canonicalization issues on your site since the different URLs would appear as duplicate content to search engines.
*Credit to fge for brilliant mod_write rule.

Regex pattern help (I almost have it, just need a bit of expertise to finish it)

I need to match two cases
js/example_directory/example_name.js
and
js/example_directory/example_name.js?12345
(where 12345 is a digit string of unknown length and the directory can be limitless in depth or not exist at all)
I need to capture in both cases everything between js/ and .js
and if ? exists capture the digit string after ?
This is what I have so far
^js/(.*).js\??(\d+)?
This works except it also captures
js/example_directory/example_name.js12345
I want the regex to ignore that. Any suggestions?
Thank you all!
Test your patterns here
Answer:
Using Gumbo's information my final rewrite rule is as follows.
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{QUERY_STRING} ^\d*$
RewriteRule ^js/(.*)\.js$ js.php?f=$1.js&v=%0 [L]
</IfModule>
Include the whole querystring pattern, including the ? in one conditional match.
^js/(.*).js(\?\d+)?
mod_rewrite’s RewriteRule directive does only test the URI path and not the query. So using a rule like the following does already match both URIs:
RewriteRule ^js/(.*)\.js$ …
If you now want to test the query too, you need to use an additional RewriteCond:
RewriteCond %{QUERY_STRING} ^\d*$
RewriteRule ^js/(.*)\.js$ …
The match of the last successful RewriteCond can be referred to with %n, so in case of the whole match %0 or in this case even just %{QUERY_STRING} directly.
As far as regular expressions go - you can use the (?:) (non capture grouping) to make the \?(\d+) as a chunck, optional like so:
^js/(.*).js(?:\?(\d+))?
You really don't >need< to use the ?: (non capture) portion, but if you don't, back references will be changed - 1 will point at the filename, 2 will point at ?1234 and 3 will be 1234

Can mod_rewrite preserve a double slash?

Im just learning mod_rewrite and regex stuff, and what I'm trying to do is pass variables of any name, with any number of variables and values, into a script and have them forwarded to a different script.
here is what I have so far:
RewriteEngine on
RewriteRule ^script\$(.*[\])? anotherscript?ip=%{REMOTE_ADDR}&$1 [L]
That all seems to work except that one of the parameters I'm passing is a URL and the // after http:// always gets stripped down to one slash.
for example, I do
script$url=http://www.stackoverflow.com
then it redirects to:
anotherscript?ip=127.0.0.1&url=http:/www.stackoverflow.com
and the second script chokes on the single-slash.
I realize that preserving a double-slash is the exact opposite of what people usually do with mod_rewrite. Is there a way I can preserve the double-slash?
EDIT: Solution found with Gumbo's help.
RewriteCond %{THE_REQUEST} ^GET\ (.*)/script\$([^\s]+)
RewriteRule ^script\$(.*) anotherscript?ip=%{REMOTE_ADDR}&%2 [L]
I had to add that (.*) in front of /script on the RewriteCond, once I did that it got rid of the 404 errors and then it was just a matter of passing the matches through.
Try this rule:
RewriteCond %{THE_REQUEST} ^GET\ /script\$([^\s]+)
RewriteRule ^script\$.+ anotherscript?ip=%{REMOTE_ADDR}&%1 [L]
See Diggbar modrewrite- How do they pass URLs through modrewrite? for the explanation.
I Think there may be something wrong with the first part of your RewriteRule regex
^script\$(.*[\])?
The backslash ( \ ) is used to escape a special character into a litteral one, thus you are actually trying to match a closing bracket ( ] ), is that intended ?
try this
RewriteRule ^script\$(.*)? anotherscript?ip=%{REMOTE_ADDR}&$1 [L]