.htaccess rewrite rule remove everything after RK=0/RS= - regex

I have a website that is getting a lot of requests for pages that don't exist.
All the requests are based on an existing page, but have RK=0/RS= plus a random string of characters at the end.
For example, the request is:
www.domain.com/folder/article/RK=0/RS=M9j32OWsFAC_u8I6a0xOMjYKU_Q-
but the page www.domain.com/folder/article does exist.
I would like to use htaccess to say:
if RK=0/RS= exists, remove it and everything after
but haven't been able to get it working.
All the htaccess rules talking about removing query strings, but I'm guessing because this doesn't have a ? it's not a query.
Could someone help me understand how to do this?

Someone found where this mess is coming from.
http://xenforo.com/community/threads/server-logs-with-rk-0-rs-2-i-now-know-what-these-are.73853/
It looks like actually NOT malicious, it's something broken with Yahoo rewrites that create URLs that point to pages that don't exist.
The demo described on xenforo does replicate it, and the pattern of the URLS that Yahoo is producing:
http://r.search.yahoo.com/_ylt=A0SO810GVXBTMyYAHoxLBQx./RV=2/RE=1399899526/RO=10/RU=http%3a%2f%2fkidshealth.org%2fkid%2fhtbw%2f/RK=0/RS=y2aW.Onf1Hs6RISRJ9Hye6gXvow-
Sure does look like the RV=, RE=, RU=, RK=, RS= values are of the same family. It's just that somewhere the arg concatenation is screwing up on their side.

You can use this rule in root .htaccess file:
RewriteEngine On
RewriteRule ^(folder/article/)RK=0/RS= /$1 [L,NC,R=301]

Related

How to redirect from specific subdirectory to a subdomain via .htaccess?

I've been trying to redirect this URL (and all its substructures):
http://example.com/archive/
to (and its corresponding substructures):
http://archive.example.com/
For example: http://example.com/archive/signature/logo.png ==> http://archive.example.com/signature/logo.png
I tried to generate an .htaccess rule using a generator and evaluating it by looking at the regex, which I can understand (I think).
The result was the following rule:
RewriteEngine On
RewriteRule http://example.com/archive/(.*) http://archive.example.com/$1 [R=301,L]
The way I see it, the server will proccess any URL that starts with http://example.com/archive/ , will capture the string that comes next and will change the whole initial portion with the subdomain structure and append the captured string.
Unfortunately, this doesn't seem to work neither on my server, nor on online testing tools such as: http://htaccess.madewithlove.be/
Is there anything I'm missing there?
Thank you!
You should be able to try it this way.
RewriteEngine On
RewriteRule ^archive/(.*)$ http://archive.example.com/$1 [R=301,L]
Note that I did not make it dynamic as you didn't specific if you will have more URL's that need to work this way as well or not.

How to rewrite a jump link with htaccess

I currently have rewrites in an htaccess file of mine and need to account for a jumplink.
The issue I beleive I am having is the '#' keeps getting recognized as a comment.
I've seen questions on here suggesting the use of the [NE] or [R] flags, but either I am not using them correctly or they do not do what I need.
My current working rewrite is:
RewriteRule ^news/([^/]*)/([^/]*)/*$ display_news.php?yid=$1&mid=$2 [L]
My idea was to append another segment to the end of the url with something like this:
RewriteRule ^news/([^/]*)/([^/]*)/1/*$ display_news.php?yid=$1&mid=$2#jumplink [L]
With my use of the [NE] and [R] flags I replaced ? with 3F and $ with 24 for hexcodes given by http://www.asciitable.com/. Do I have to enclose these codes with special brackets or something? How would Apache know I don't literally mean 3F or 24.
The current behavior when I try to place these hexcodes in my file I get the internal server error.
If there is a more elegant method to account for jumplinks in an htaccess file I am all ears.
EDIT:
As suggested here are example URLs of what I am expecting.
http://website.com/news/2013/11 would map to display_news.php?yid=2013&mid=11
and
http://website.com/news/2013/11/1 would map to display_news.php?yid=2013&mid=11#jumplink
But I would want the address to remain in the format http://website.com/news/2013/11/1 and just map to the page.
This should work:
RewriteRule ^news/([^/]*)/([^/]*)/1/?$ /display_news.php?yid=$1&mid=$2#jumplink [L,NC,QSA,NE,R=302]
I suggest you to provide example of URIs that you want to match and what what is your target URI.
The #jumplink part of the URI that you've rewritten to is completely meaningless to the server. The URL fragment (the #jumplink part) is used by browsers and javascript running on browsers. It's not even passed to php.
You can try adding an R flag to externally redirect the browser but I'm guessing that's not what you want.

htaccess 301 redirects, not sure about correct regex

Can anyone help me, I am trying to sort out some redirects using htaccess but I am facing a tricky issue.
The part I cannot figure out is this: I have a system which creates 2 urls for some pages of the following pattern:
products/category (what we are using mostly and what we want, regardless of category depth)
products/subcategory/category (better for SEO maybe but not what we want)
How can I write a 301 redirect to strip out the /subcategory part of the url and redirect to the shorter version where we have in the region of 80 subcategories so I don't want to attempt this one by one. I am trying to get to grips with Regex but I am totally green with it, I am assuming matching the subcategory isn't too hard as it always follows /product and always ends with / but how do I redirect it to the shortened version.
I can't think of more information to add but if I haven't given enough detail let me know, I would love some help with this.
Thanks
Try these lines in your htaccess file :
RewriteEngine on
RewriteRule ^products/[^/]+/(.+)$ /products/$1 [L,QSA,R=301]

How to remove the query part of the rewritten URL after it has been remotely redirected?

Either I am too tired to see what I am doing wrong or there is something important I am missing here.
Basically I have a simple set of rewrite rules which are used in conjunction with a central dispatcher file (index.php) to handle requests coming for HTML, CSS and JavaScript files separately and they look like this.
RewriteEngine on
RewriteRule (.+)\.html$ index.php?action=view&url=$1.html [L]
RewriteRule (.+)\.css$ index.php?action=resource&type=css&url=$1.css [L]
RewriteRule (.+)\.js$ index.php?action=resource&type=js&url=$1.js [L]
Long story cut short these rules work fine however I've been notified by the SEO agency responsible for the site that there is an error in one of the URLs which needs to be permanently redirected (301) to the correct link. Since its just one URL that requires redirecting I have chosen to use Redirect instead of URL rewriting and added the following rule.
Redirect 301 /page1.html /page2.html
This works well too except for the fact that after the remote redirection is done for page1.html I get the query part (?action=view&url=page2.html) displayed in browsers address bar. I perfectly understand that the HTMl rewriting rule simply added the query string part after it was done with the URL but what would I need to do to get rid of the query part after a remote 301 redirection is performed.
Just to add I tried the URL rewrite method too but it seems that whatever I do the L flag is simply ignored and the HTML rewrite rule is still executed.
RewriteRule ^page1\.html$ page2.html [L,R=301]
That's a rewrite redirect and should cut off the query string. Put it before your other 3 rules, otherwise it will be ignored.
I don't know how much the solution may change with the web-server and the web-server version, but what worked for me was "When you want to erase an existing query string, end the substitution string with just a question mark".
See "Modifying the Query String" at http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule (Apache v2.4)
So,
RewriteRule ^page1\.html$ page2.html? [L,R=3xx]
The R flag is needed for the new URI to be showed and not the original with the query string. But even without the R flag, the query string will not be passed.

Problem with htaccess GET form variables in a rewritten url

Essentially my problem is thus; I have a MVC system that redirects all requests to index.php on my site. I have a rewrite rule in my htaccess file to handle those requests like so:
RewriteRule ^([a-zAZ\_\-]+)\/([a-zA-Z\_\-]+)\/([^\/?]*) /?module=$1&class=$2&event=$3
Which translates urls into these type of urls
http://example.com/users/login/
http://example.com/users/info/me
My problem is that I also want GET variables to be applied and used in the URL like so
http://example.com/users/login/?var1=val1&var2=val2
http://example.com/users/info/me?var1=val2...
I've written two different regexes that work perfectly well in a my workbench (expresso) and I've tested them out in PHP however they refuse to work in htaccess. They're not particular complex, I have tried:
^([a-zAZ_\-]+)\/([a-zA-Z_\-]+)\/([^\/\?]*)[\?]*(.*) /?module=$1&class=$2&event=$3&$4
and
^([a-zAZ_\-]+)\/([a-zA-Z_\-]+)\/([^\/\?]*)(?(?=\?)\?(.+)) /?module=$1&class=$2&event=$3&$4
Neither of these work and I'm racking my brains as to why. Essentially it just doesn't recognise the fourth group and returns nothing I thought it might have been due to it being next to an ampersand but I did &var=$4 as a test and it still fell over.
Any help with this would be greatly appreciated as this is driving me insane.
Thanks in advance,
Rupert S.
After all, this is what you need:
RewriteRule ^([a-z_-]+)/([a-z_-]+)/([^/?]*) /?module=$1&class=$2&event=$3 [QSA,NC,L]
[QSA] will append the additional GET parameters to the rewritten query string.
[NC] since it is case insensitive, no need for A-Z matches