rewrite rule issue in htaccess - regex

I have the following rule which is working
RewriteRule ^(.+?)/(step)/([0-9]+)/(id)/([0-9]+)/(start)/([0-9]+)/(end)/
([0-9]+)/?$ index.php?url=$1&$2=$3&$4=$5&$6=$7&$8=$9 [NC,L,QSA]
Now I wanted to add another param at the end of the string which is (ansid) so I did in the following way but for some reason it is not picking up the ansid.
RewriteRule ^(.+?)/(step)/([0-9]+)/(id)/([0-9]+)/(start)/([0-9]+)/(end)
/([0-9]+)/(ansid)/([0-9]+)/?$ index.php?url=$1&$2=$3&$4=$5&$6=$7&$8=$9&$10=$11
[NC,L,QSA]

$10 and $11 won't work because as per Apache mod_rewrite manual:
RewriteRule backreferences:
These are backreferences of the form $N (0 <= N <= 9). $1 to $9 provide access to the grouped parts (in parentheses) of the pattern, from the RewriteRule which is subject to the current set of RewriteCond conditions. $0 provides access to the whole string matched by that pattern.
You need to refactor your rule to use backreference upto $9
Your rule can be possibly rewritten as:
RewriteRule ^(.+?)/(step)/([0-9]+)/(id)/([0-9]+)/(start)/([0-9]+)/end/([0-9]+)/ansid/([0-9]+)/?$ index.php?url=$1&$2=$3&$4=$5&$6=$7&end=$8&$ansid=$9 [NC,L,QSA]

Related

htaccess - how to apply RegEx patterns on the output of another - encapsulation

I'm performing several regular expressions on a string inside a variable in order to clean it up for further use in the htaccess rules, but it seems rather cumbersome to do such simple thing in several lines:
RewriteCond %{THE_REQUEST} (?<=\s)(.*?)(?=\s)
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
RewriteCond %{ENV:HREFPATH} (^.*)?\?
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
RewriteCond %{ENV:HREFPATH} /(.*)
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
RewriteCond %{ENV:HREFPATH} (.*)/$
RewriteRule ^(.*)$ - [E=HREFPATH:%1]
How can I reduce this to 2 lines?
Basically I'm looking for a way to encapsulate each as aggregation steps (filter) based on the output of the previous expression, but my humble efforts have failed after trying and web-searching for hours.
The code above does what I need it to do, it's just really ugly (not elegant).
In PHP, or basically any decent(ish) language it could be as simple as:
$HREFPATH = trim(explode(explode(" ",$THE_REQUEST)[1],"?")[0],"/");
-but this is NOT a PHP-related question; merely a simple way to explain what I mean, and what I'm trying to achieve.
I know there may be many RegEx patterns that could (theoretically) work here, but it should be compatible with Apache's RegEx engine.
Any input will be rewarded in kind; thanks in advance.
What you are doing in multiple rules can be done in a single like this:
RewriteCond %{THE_REQUEST} \s/+([^?]*?)/*[\s?]
RewriteRule ^ - [E=HREFPATH:%1]
RegEx Details:
\s: Match a whitespace
/+: Match 1+ /s
([^?]*?): Lazily match 0 or more of any characters that are not ?. Capture this value in %1
/*: Match 0 or more trailing /s
[\s?]: Must be followed by a ? or a whitespace

How do I find the 20th regex match group?

I am doing a rewriterule inside of my .htacess folder in one of my htdocs folders.
The rewriterule looks something like this:
RewriteRule ^index/(blah)/(blah2)/(blah3)..../(blah20)
^^^The above code looks like bad practice--don't worry about that.
Anyways, I heard before that ${20} was the correct way to access the 20th match group in regex, but even though in regex101 my 20th match group is matching blah20, whenever I print out the 20th capture group, I just get ${20}.
Why is this? Am I correctly accessing two digit match groups?
Edit--real rewriterule:
RewriteRule ^a/([\d]*)/(b/([\d]{2}:[\d]{2}:[\d]{2})/?)?(c/(\w*)/?)?(d/([\w]
{6})/?)?(e/([\w]{6})/?)?(f/([\w]{6})/?)?(g/([\w]{6})/?)?(h/([\w]{6})/?)?
(i/([\w]{6})/?)?(j/([\w]{6})/?)?(k/([\w]{6})/?)?(l/([\w]{6})/?)?(m/([\w]
{6})/?)? /index.php?a=$1&b=$3&c=$5&d=$7&e=$9&f=${11}&g=${13}&h=${15}&i=${17}&
j=${19}&k=${21}&l=${23}&m=${25} [L]
You cannot use back-reference number greater than 9 as per official mod_rewrite documentation.
From Manual:
RewriteRule back-references: These are back-references of the form $N (0 <= N <= 9). $1 to $9 provide access to the grouped parts (in parentheses) of the pattern, from the RewriteRule which is subject to the current set of RewriteCond conditions. $0 provides access to the whole string matched by that pattern.
If you are dealing with so many back-references then it is better to pass full URI after index/ to index.php and use explode inside the php code:
RewriteRule ^index/(.+)$ index.php?q=$1 [L,QSA,NC]
For example like this:
RewriteRule \^index(?:\/\w+){5}\/(blah6)
Will match 6th folder in the url.

String replace dash to plus sign htaccess

I'm doing a rewrite rewriting my old url structure to my new one, the thing is my old one uses dashes to separate words and my new one uses +'s.
This is my rewrite rule
RewriteRule ^search/files/(.*)/(.*).html?$ http://www.domain.com/search.html?q=$2 [R=301,L]
how could i do a string replace on $2 to replace -'s with +'s?
Thanks
Insert this rule before your existing rule:
RewriteEngine On
# replace - with + in $2 and $2
RewriteRule ^(search/files/[^/]+)/([^-]*)-+(.+?\.html?)$ /$1/$2+$3 [NC,L,R]
# your present rule
RewriteRule ^search/files/([^/]+)/([^.]+).html?$ http://www.domain.com/search.html?q=$2 [R=301,L,QSA,NC]

.htaccess rewrite

I don't know if this is the right area, but here goes:
I have a RewriteRule
RewriteRule ^(eScience/)?(\w+)/RENDER/(\d+)/(\d+)/P(\d+)\.html$ /RENDER/escience/kids/1016/2063/test.html [L,NC]
that works fine because I've hardcoded the IDs in. Now when I do something like
RewriteRule ^(eScience/)?(\w+)/RENDER/(\d+)/(\d+)/P(\d+)\.html$ /RENDER/escience/kids/$2/2063/test.html [L,NC]
The rewrite doesn't work, I get page not found. The really odd part is that $4 works, so if I do something like
RewriteRule ^(eScience/)?(\w+)/RENDER/(\d+)/(\d+)/P(\d+)\.html$ /RENDER/escience/kids/1016/$4/test.html [L,NC]
it works, but anything 3 and under doesn't work. Any ideas? The URL that I am using is
http://www.escience.ca/kids/RENDER/1016/2063/P2063.html
As you can see, $3 and $4 are the exact same IDs, so that's why my third example works.
Look at your regex groups:
RewriteRule ^(eScience/)?(\w+)/RENDER/(\d+)/(\d+)/P(\d+)\.html$ /RENDER/escience/kids/$2/2063/test.html [L,NC]
$1 $2 $3 $4 $5
It should be obvious why it doesn't work - $2 is not the number you expected. Maybe you should use named groups for complex regular expressions if you loose track of the numbering. You can exclude regex groups from being grouped by using the ?: operator, by the way (for example "(?:ungrouped)(dollar1)(dollar2)").

Regex pattern help (I almost have it, just need a bit of expertise to finish it)

I need to match two cases
js/example_directory/example_name.js
and
js/example_directory/example_name.js?12345
(where 12345 is a digit string of unknown length and the directory can be limitless in depth or not exist at all)
I need to capture in both cases everything between js/ and .js
and if ? exists capture the digit string after ?
This is what I have so far
^js/(.*).js\??(\d+)?
This works except it also captures
js/example_directory/example_name.js12345
I want the regex to ignore that. Any suggestions?
Thank you all!
Test your patterns here
Answer:
Using Gumbo's information my final rewrite rule is as follows.
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{QUERY_STRING} ^\d*$
RewriteRule ^js/(.*)\.js$ js.php?f=$1.js&v=%0 [L]
</IfModule>
Include the whole querystring pattern, including the ? in one conditional match.
^js/(.*).js(\?\d+)?
mod_rewrite’s RewriteRule directive does only test the URI path and not the query. So using a rule like the following does already match both URIs:
RewriteRule ^js/(.*)\.js$ …
If you now want to test the query too, you need to use an additional RewriteCond:
RewriteCond %{QUERY_STRING} ^\d*$
RewriteRule ^js/(.*)\.js$ …
The match of the last successful RewriteCond can be referred to with %n, so in case of the whole match %0 or in this case even just %{QUERY_STRING} directly.
As far as regular expressions go - you can use the (?:) (non capture grouping) to make the \?(\d+) as a chunck, optional like so:
^js/(.*).js(?:\?(\d+))?
You really don't >need< to use the ?: (non capture) portion, but if you don't, back references will be changed - 1 will point at the filename, 2 will point at ?1234 and 3 will be 1234