mod_rewrite replace '_' with '-' - regex

I'm almost there with a mod_rewrite rule, but I've caved in :)
I need to rewrite
country/[countryname].php
to
country/[countryname]/
however, [countryname] may have an underscore like this: 'south_africa.php' and if it does I want to replace it with a hypen: 'south-africa/'
I also want to match if the country has numbers following it: 'france03.php' to 'france/'
Heres my rule, its almost there but its still adding a hyphen even if there is no second part after the underscore.
RewriteRule ^country/(.*)_(.*?)[0-9]*\.php$ country/$1-$2 [R=301,L]
so currently 'country/south_.php' becomes 'country/south-/'
Can someone please help me find the missing piece of the puzzle? Thanks.

Try this:
RewriteRule ^country/([^_]*)_([^_]*?)\d*\.php$ country/$1-$2 [R=301,L]
This rule will match urls with a single underscore - you'll need a different rule for more underscores or none.
If you want to make sure $2 contains only letter and isn't empty, change ([^_]*?) it to ([a-zA-Z]+).

Alternatively you could do it over several passes:
# If request is for something in "country/"
RewriteCond %{REQUEST_URI} ^country/.+\.php$
# Replace underscore and digits with (single) hyphen
RewriteRule [_0-9]+ \-
# Remove extension (and possible trailing hyphen)
RewriteRule ^(.*)-?\.php$ $1
# Final rewrite
RewriteRule ^country/(.*)$ country/$1 [R=301,L]
Untested ... and not necessarily "pretty" :)

Related

File .htaccess not working with URL that has a dash and 2nd RewriteRule not applied

I have a problem with .htaccess file, as I understand RewriteRule help to rewrite the URL. But when I try the following 2 cases it doesn't work.
#1 The first RewriteRule works but the second doesn't work
RewriteRule ^([a-zA-Z0-9_-]+)$ index.php?idcat=$1 [L] #working
RewriteRule ^([a-zA-Z0-9_-]+)$ index.php?idl=$1 [L] #not working
#2 The RewriteRule doesn't work with dash but works with slash and underscore.
RewriteRule ^([a-zA-Z0-9_-]+)-([a-zA-Z0-9_-]+)$ index.php?idl=$1&iddis=$2 [L] #not working
RewriteRule ^([a-zA-Z0-9_-]+)_([a-zA-Z0-9_-]+)$ index.php?idl=$1&iddis=$2 [L] #working
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)$ index.php?idl=$1&iddis=$2 [L] #working
So how to fix these problems? Does anyone have any suggestions for me?
#1 The first Rewriterule works but the second doesn't work
RewriteRule ^([a-zA-Z0-9_-]+)$ index.php?idcat=$1 [L] #working
RewriteRule ^([a-zA-Z0-9_-]+)$ index.php?idl=$1 [L] #not working
Because you are using the same pattern in both rules, the first rule always "wins" and the second rule is never triggered. This is essentially processed as follows (pseudo-code):
if (the URL matches the pattern "^([a-zA-Z0-9_-]+)$") {
rewrite the request to "index.php?idcat=<url>"
}
elseif (the URL matches the pattern "^([a-zA-Z0-9_-]+)$") {
rewrite the request to "index.php?idl=<url>"
}
As you can see, the second code block is never processed since the expressions are the same.
To put it another way, how would you determine whether a request of the form /foo should be rewritten to index.php?idcat=foo or to index.php?idl=foo? You can't rewrite the request to both.
In this particular case you could perhaps rewrite everything to index.php?id=<url> and let your script decide whether it should be idcat or idl. Otherwise, there needs to be something different about the two URLs (and consequently the patterns you are using to match the URLs) that allows you to determine how the URL should be rewritten.
#2 The Rewriterule doesn't work with dash but works with slash and underscore.
RewriteRule ^([a-zA-Z0-9_-]+)-([a-zA-Z0-9_-]+)$ index.php?idl=$1&iddis=$2 [L] #not working
RewriteRule ^([a-zA-Z0-9_-]+)_([a-zA-Z0-9_-]+)$ index.php?idl=$1&iddis=$2 [L] #working
Both these rules have the same problem, depending on the URLs being requested. This is because the patterns/regex you are using are "ambiguous". Each of the two subpatterns (either side of the delimiter), that are used to match the idl and iddis values, contain the same character as the expected delimiter, - or _. However, in the 3rd rule (not shown), you are using a / as the delimiter, which does not occur in the surrounding subpatterns, so there is no ambiguity,
For example, how should (or you would expect) a URL of the form /foo-bar-baz to be matched by the first rule? Since the first subpattern uses the greedy quantifier +, it will capture foo-bar and baz and rewrite the request to index.php?idl=foo-bar&iddis=baz.
To avoid this "ambiguity" you need to make sure the delimiter between the subpatterns (ie. between the values for idl and iddis) is different to the characters used in the subpatterns (or at least one of the two subpatterns).
This can often be resolved by making the regex as specific as possible. ie. Match only the valid characters in idl and iddis.
To begin resolving this issue, you need to first identify the precise URLs you are trying to match, before implementing the rules to match them.

mod_rewrite: match string within URL, which regex to chose?

I would like to use mod_rewrite to capture a string within brackets in my URL and do a redirect.
My URL:
something?var_a=A&var_b=(B)&var_c=C
my .httaccess file with the regex:
RewriteEngine on
RewriteRule ^/?.+var_b=\((.*)\)$ somedir/$1 [R]
I just would like to capture what's in between the round brackets, so my redirect should look something like this: somedir/B
I test my regex at http://htaccess.madewithlove.be/ but I get no match.
I don't know what I am missing here, even if I try much simpler regexes, e.g. .+var_b(.*)$ I get no match. Only if my regex was looking for a pattern at the beginning, I get a match, so for example the regex something(.*)$ works.
What am I missing here?
RewriteEngine On
RewriteCond %{QUERY_STRING} (^|&)var_b=\((.*?)\)(&|$) [NC]
RewriteRule ^.*$ somedir/%2? [R]
The reason is that RewriteRule does not receive the ?x=y part of the query. The %2 variable refers to the pattern from the last RewriteCond, while $2 would refer to the pattern from this RewriteRule. The ? at the end prevents the query part ?x=y from being automatically appended at the end of the result.
The (^|&) and (&|$) in the pattern guarantee that var_b=(B) is the complete parameter and not a part of it. Without these, the pattern would also match ?xyzvar_b=(B) or ?var_b=(B)xyz. With these, it will only match ?var_b=(B) or ?a=b&var_b=(B)&x=z etc.

Unescapable Periods in Apache 2.0 RewriteRule Regex

So, I have the following RegEx..
RewriteRule ^([-a-z0-9]*[A-Z\.]+.*)$ file.php?string=$1 [QSA]
The URL I want file.php to trigger for must either have capital letters or a period in it, then send the URL to the PHP script.
However, the problem I have is that this script is triggering on any URL, because of the not-truly-escaped Period.
I've tried escaping the period with a backslash, or two backslashes, or three... but none stop the generic interpretation.
What am I doing wrong?
Edit: As an example,
RewriteRule ^([-a-z0-9]*[A-Z\\.]+[-a-z0-9\/]*)$ file.php?string=$1 [QSA]
Doesn't work, but
RewriteRule ^([-a-z0-9]*\\.+[-a-z0-9\/]*)$ file.php?string=$1 [QSA]
does escape it.
Edit 2: Examples of URLs I want to redirect:
/some-page-goes-here.html
/heres-Robs/Old/Page/
And ones I don't:
/testing/one/two/
/an/actual-file.gif
EDIT 3: Old regex was:
RewriteRule ^([-a-z0-9]*[A-Z\.]+[-a-z0-9\/]*)$ file.php?string=$1 [QSA]
But while writing the post, I updated the question's regex to what you see above.
Try:
RewriteCond %{REQUEST_URI} [A-Z] [OR]
RewriteCond %{REQUEST_URI} \.html$
RewriteRule (.*) file.php?string=$1 [QSA]
When using mod_rewrite and you have several URLs to match, it is always better to use RewriteCond to filter and then apply your RewriteRule.
I don't think your problem can be what you think it is: periods in a character class are supposed to mean literal periods, not "any character". If this really is the problem, somehow, then you could change [A-Z\.]+ to ([A-Z]|\.)+; but I doubt it. Some things to try:
what happens if you comment out this line? does that successfully disable this redirect? if not, then obviously the problem isn't with this line. :-)
what happens if you make this a real HTTP redirect, by changing QSA to QSA,R? Does the destination URL look like what you expect? Maybe there are some unexpected periods or uppercase letters? (Warning: this will very likely trigger an infinite redirect loop if you try it in a browser; it'll probably be easier to try submitting the request via port-80 Telnet and seeing the actual HTTP response.)
Also, your rule doesn't quite match how you describe it. For example, your rule wouldn't match a URL like a.b.c, because you only uppercase letters and/or dots to occur in a single "clump"; if they're separated by lowercase letters, no match will occur. Is that just because you didn't want to overcomplicate the description?

Quick mod-rewrite/regex issue

I am trying to setup my .htaccess file to do some nifty redirects for me.
Right now I have URLs like:
mysite.com/?video=1
I would like to have URLs like:
mysite.com/1/
Right now I have pieced together the following regex:
RewriteRule ^(.*)/?$ /index.php?v=$1 [L]
This works great if the URL is in the format
mysite.com/2
, but NOT if the format is
mysite.com/2/
, NOTE the trailing slash.
So what I really need help with is my regex! :)
Try to make the quantifier non greedy
^(.*?)/?$
otherwise the trailing slash is matched by the ., because its greedy and the explicit slash is optional.
Stema's idea should work. Or you could just make the regex more specific by e.g. only accepting numbers.
RewriteRule ^([0-9]+)/?$ /index.php?v=$1 [L]
or alpha numeric
RewriteRule ^([0-9a-zA-Z]+)/?$ /index.php?v=$1 [L]

mod_rewrite problems: negation

I'm trying to understand mod_rewrite better and have one particular problem I think I need to get my head round first.
I am rewriting http://www.somesite.tld/a/b/c to index.php?path=a/b/c using the following
RewriteRule ^(?!index.php)(.*)$ index.php?path=$1 [NC,L]
An equivalent rewrite would, in this case, be
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?path=$1 [NC,L]
This does not work without the RewriteCond -- path=index.php would be the result without specifically ignoring files or saying 'not index.php'. Why is this?
Also, what is the ?! and ?: syntax that I sometimes see used? I do not understand the use of the ? when it is not prefixed by anything.
And why, in the first RewriteRule above, do the second pair of brackets return a match for $1?
Cheers
(?= ...) and (?! ...) is special syntax in Perl regular expressions and in PCRE, which is the regex library that Apache uses. They are, respectively, positive and negative lookahead assertions: they match an empty string if the text after it matches or does not match the content in the brackets.
They are non-capturing, so they don't define any $n (it would be pointless, since they match an empty string). (?: ...) is also non-capturing, it is used to group subexpressions.
Your first rule should work in .htaccess (but not in a virtual host configuration file), though it would be more correct to write it as
RewriteRule ^(?!index\.php$)(.*)$ index.php?path=$1 [L]
Perhaps another rule is interacting with it. You can check what exactly is being matched and rewritten with RewriteLog and RewriteLogLevel.
"!" means negation. Like a = 1 (a is equal one) a != 1 (a is not equal one);
"f" means file. So if you use together with "!", like "!-f" would be something "file does not exist". the links below may help you better:
http://www.askapache.com/htaccess/htaccess.html
http://net.tutsplus.com/tutorials/other/using-htaccess-files-for-pretty-urls/
http://corz.org/serv/tricks/htaccess2.php