Apache mod_rewrite doesn't match URL containing percent symbol - regex

If the rule:
RewriteRule myurl /newurl
Matches:
http://localhost:8080/test/xxxmyurlyyy
Why doesn't it match this?
http://localhost:8080/test/xxxmyurly%EDyy
EDIT:
I've just discovered utf-8 encoded works just fine. If I rather use %C3%AD than %ED it's ok. I still need to enable unicode.

It seems to match just fine. Given this:
RewriteEngine On
RewriteLog /tmp/rewrite.log
RewriteLogLevel 5
RewriteRule myurl /stackexchange/foo.html
If I fetch this:
curl http://localhost/test/xxmyurly%EDyy
I see this in /tmp/rewrite.log:
(3) applying pattern 'myurl' to uri '/test/xxxmyurly?yy'
(2) rewrite '/test/xxxmyurly?yy' -> '/serverfault/foo.html'
And I get exactly what I expect (a document with the content "this is a test").
I suspect your problem is other than you think it is. Enable RewriteLog and
see what shows up, and then post your results here. Also, seeing more of
(ideally, all of) your configuration would help, too.

Related

htaccess regex variable parameter

I'm not used to regex and figure I've lost too many hours trying to resolve this, so thought I'd ask for help. I am trying to prettify the html extension.
My site will use URLs that have variable parameters. For example:
mysite.com/article/this-is-an-entry
mysite.com/article/this-is-an-entirely-different-entry
All will use .html as the extension.
In the htaccess file, I have tried
RewriteRule ^(article\/[a-z].*)$ $1.html [NC,L]
as well as slight variations of this, but cannot get this right. Thanks in advance for any assistance.
Firstly, let's look at the regex you have:
^(article/[a-z].*)$
This matches exactly the string "article/", followed by at least one letter (case insensitive due to the NC flag), followed by zero or more of anything. It's quite broad, but should match the examples you gave.
One way to test that it's matching is to add the R=temp flag to the rule, which tells Apache to redirect the browser to the new URL (I recommend using "=temp" to stop the browser caching the redirect and making later testing harder). You can then observe (e.g. in your browser's F12 debug console) the original request and the redirected one.
RewriteRule ^(article/[a-z].*)$ $1.html [NC,L,R=temp]
However, as CBroe points out, your rule will match again on the target URL, so you need to prevent that. A simple way would be to use the END flag instead of L:
Using the [END] flag terminates not only the current round of rewrite processing (like [L]) but also prevents any subsequent rewrite processing from occurring in per-directory (htaccess) context.
So:
RewriteRule ^(article/[a-z].*)$ $1.html [NC,END]
Alternatively, you can make your pattern stricter, such as changing the . ("anything") to [^.] ("anything other than a dot"):
^(article/[a-z][^.]*)$
To be even more specific, you can add a RewriteCond with an extra pattern to not apply the rule to, such as "anything ending .html".

How to redirect from specific subdirectory to a subdomain via .htaccess?

I've been trying to redirect this URL (and all its substructures):
http://example.com/archive/
to (and its corresponding substructures):
http://archive.example.com/
For example: http://example.com/archive/signature/logo.png ==> http://archive.example.com/signature/logo.png
I tried to generate an .htaccess rule using a generator and evaluating it by looking at the regex, which I can understand (I think).
The result was the following rule:
RewriteEngine On
RewriteRule http://example.com/archive/(.*) http://archive.example.com/$1 [R=301,L]
The way I see it, the server will proccess any URL that starts with http://example.com/archive/ , will capture the string that comes next and will change the whole initial portion with the subdomain structure and append the captured string.
Unfortunately, this doesn't seem to work neither on my server, nor on online testing tools such as: http://htaccess.madewithlove.be/
Is there anything I'm missing there?
Thank you!
You should be able to try it this way.
RewriteEngine On
RewriteRule ^archive/(.*)$ http://archive.example.com/$1 [R=301,L]
Note that I did not make it dynamic as you didn't specific if you will have more URL's that need to work this way as well or not.

.htaccess rewrite rule remove everything after RK=0/RS=

I have a website that is getting a lot of requests for pages that don't exist.
All the requests are based on an existing page, but have RK=0/RS= plus a random string of characters at the end.
For example, the request is:
www.domain.com/folder/article/RK=0/RS=M9j32OWsFAC_u8I6a0xOMjYKU_Q-
but the page www.domain.com/folder/article does exist.
I would like to use htaccess to say:
if RK=0/RS= exists, remove it and everything after
but haven't been able to get it working.
All the htaccess rules talking about removing query strings, but I'm guessing because this doesn't have a ? it's not a query.
Could someone help me understand how to do this?
Someone found where this mess is coming from.
http://xenforo.com/community/threads/server-logs-with-rk-0-rs-2-i-now-know-what-these-are.73853/
It looks like actually NOT malicious, it's something broken with Yahoo rewrites that create URLs that point to pages that don't exist.
The demo described on xenforo does replicate it, and the pattern of the URLS that Yahoo is producing:
http://r.search.yahoo.com/_ylt=A0SO810GVXBTMyYAHoxLBQx./RV=2/RE=1399899526/RO=10/RU=http%3a%2f%2fkidshealth.org%2fkid%2fhtbw%2f/RK=0/RS=y2aW.Onf1Hs6RISRJ9Hye6gXvow-
Sure does look like the RV=, RE=, RU=, RK=, RS= values are of the same family. It's just that somewhere the arg concatenation is screwing up on their side.
You can use this rule in root .htaccess file:
RewriteEngine On
RewriteRule ^(folder/article/)RK=0/RS= /$1 [L,NC,R=301]

How to rewrite a jump link with htaccess

I currently have rewrites in an htaccess file of mine and need to account for a jumplink.
The issue I beleive I am having is the '#' keeps getting recognized as a comment.
I've seen questions on here suggesting the use of the [NE] or [R] flags, but either I am not using them correctly or they do not do what I need.
My current working rewrite is:
RewriteRule ^news/([^/]*)/([^/]*)/*$ display_news.php?yid=$1&mid=$2 [L]
My idea was to append another segment to the end of the url with something like this:
RewriteRule ^news/([^/]*)/([^/]*)/1/*$ display_news.php?yid=$1&mid=$2#jumplink [L]
With my use of the [NE] and [R] flags I replaced ? with 3F and $ with 24 for hexcodes given by http://www.asciitable.com/. Do I have to enclose these codes with special brackets or something? How would Apache know I don't literally mean 3F or 24.
The current behavior when I try to place these hexcodes in my file I get the internal server error.
If there is a more elegant method to account for jumplinks in an htaccess file I am all ears.
EDIT:
As suggested here are example URLs of what I am expecting.
http://website.com/news/2013/11 would map to display_news.php?yid=2013&mid=11
and
http://website.com/news/2013/11/1 would map to display_news.php?yid=2013&mid=11#jumplink
But I would want the address to remain in the format http://website.com/news/2013/11/1 and just map to the page.
This should work:
RewriteRule ^news/([^/]*)/([^/]*)/1/?$ /display_news.php?yid=$1&mid=$2#jumplink [L,NC,QSA,NE,R=302]
I suggest you to provide example of URIs that you want to match and what what is your target URI.
The #jumplink part of the URI that you've rewritten to is completely meaningless to the server. The URL fragment (the #jumplink part) is used by browsers and javascript running on browsers. It's not even passed to php.
You can try adding an R flag to externally redirect the browser but I'm guessing that's not what you want.

What's wrong with this regular expression in a .htaccess file?

I'm trying to understand why this regular expression isn't working in my .htaccess file. I want it so whenever a user goes to the job_wanted.php?jid=ID, they will be taken to job/ID.
What's wrong with this?
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php?$ job/%1? [R]
I want it so when a user clicks on http://localhost/jobwehave.co.za/jobs/ID they are shown the same results as what below would show http://localhost/jobwehave.co.za/jobs?id=ID.
Sorry for the mix up. I still very confused to how this works.
The primary problem is that you can't match the query string as part of RewriteRule. You need to move that part into a RewriteCond statement.
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ /job/%1?
Editing to reflect your updated question, which is the opposite of what I've shown here. For the reverse, to convert /job/123 into something your PHP script can consume, you'll want:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1
But you're probably going to have trouble putting this in an .htaccess file anywhere except the root, and maybe even there. If it works at the root, you'll likely need to strip the leading / from the RewriteRule I show here.
Second edit to reflect your comment: I think what you want is complicated, but this might work:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1 [L]
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ http://host.name/job/%1? [R]
Your fundamental problem is that you want to "fix" existing links, presumably out of your control. In order to change the URL in the browser address bar, you must redirect the browser. There is no other way to do it.
That's what the second cond+rule does: it matches incoming old URLs and redirects to your pretty URL format. This either needs to go in a VirtualHost configuration block or in the .htaccess file in the same directory as your PHP script.
The first rule does the opposite: it converts the pretty URL back into something that Apache can use, but it does so using an internal sub-request that hopefully will not trigger another round of rewriting. If it does, you have an infinite loop. If it works, this will invoke your PHP script with a query string parameter for the job ID and your page will work as it has all along. Note that because this rule assumes a different, probably non-existent file system path, it must go in a VirtualHost block or in the .htaccess file at your site root, i.e. a different location.
Spreading the configuration around different places sounds like a recipe for future problems to me and I don't recommend it. I think you'll be better off to change the links under your control to the pretty versions and not worry about other links.
The ^ anchors the regex at the beginning of the string.
RewriteRule matches the URI beginning with a / (unless it's in some per-directory configuration area).
Either prefix the / or remove the anchor ^ (depending on what you want to achieve)
You haven't captured the job ID in the regex, so you can't reference it in the rewritten URL. Something like this (not tested, caveat emptor, may cause gastric distress, etc.):
RewriteRule ^job/([0-9]+) job_wanted.php?jid=$1
See Start Rewriting for a tutorial on this.
You need to escape the ? and . marks if you want those to be literals.
^job_wanted\.php\?jid=9\?$
But although that explains why your pattern isn't matching, it doesn't address the issue of your URL rewriting. I'm also not sure why you want the ^ and $ are there, since that will prevent it from matching most URLs (e.g. http://www.yoursite.com/job_wanted.php?jid=9 won't work because it doesn't start with job_wanted.php).
I don't know htaccess well, so I can only address the regex portion of your question. In traditional regex syntax, you'd be looking for something like this:
s/job_wanted\.php\?jid=(\d*)/job\/$1/i
Hope that helps.
Did you try to escape special characters (like ?)?
The ? and . characters have a special meaning in regular expressions. You probably just need to escape them.
Also, you need to capture the jid value and use it in the rule.
Try to change your rules to this:
RewriteEngine On
RewriteRule ^job_wanted\.php\?jid=([0-9]+)$ /job/$1
Something like
ReWriteRule ^job\_wanted\.php\?jid\=([0-9-]+)$ /job/$1
should do the trick.