I updated a website to a new system, and now we have a ton of redirects to handle.
Many of them fall into the same general pattern - old links ending in .html or .php, with certain keywords (product names) in the URL.
Instead of writing an explicit redirect for each case (already up to 1500+, and still growing), I was thinking there's a way to handle them with an AND/OR statement.
For example:
If the OLD URL contains "SKU12345" AND ".html", it should be redirected to /products/SKUGROUP1/SKU12345.
This way, whether the old URL looks like
"/products/oldsubcategory/something/cool-widget-SKU12345.html"
OR
"/something/really-old-version-of-SKU-12345.html"
it should redirect to the same new page.
In other words, I want to catch any links that contain a specific product model/SKU/keyword, AND the extension .html or .php, and redirect them to the new URL (which doesn't have an ending).
I can't just say "if the old URL contains this SKU/keyword", because the new URLs also contain the SKU/keywords, and it would cause a redirect loop. It has to specifically contain .html/.php.
Is this possible to do? If so, can anyone show me the proper syntax?
Thanks!
You can use this redirect rule in root .htaccess to match both cases:
RewriteEngine On
RewriteRule (^|[/-])AA99SKU1-?(\d*)\.(?:html?|php)$ /products/SKUGROUP1/AA99SKU1$2 [L,NC,R=301]
This will match both cases:
"/products/oldsubcategory/something/cool-widget-SKU12345.html"
OR
"/something/really-old-version-of-SKU-12345.html"
Related
I would like to remove the .html extension from my urls, located into specific directory and redirect 301 them.
Here is how the structure looks like:
mysite.com/category/nameofcategory/pagenumber.html
The thing is that nameofcategory and pagenumber could be any letter or number.
Could you please help me with this?
I wouldn't recommend having your content scattered in many html-files in different folders. This becomes very impractical if you for example want to change the layout of your pages.
Storing the content in a database is a much better solution. If that's not possible perhaps the html files could contain only the formatted text content and a back end script could embed that content to a layout when the page is requested.
This requires that the mod_rewrite module is enabled in the Apache configuration.
In both cases all of the requests would be routed through the back end script and the .htaccess might look something like this:
RewriteEngine on
RewriteRule ^category/([^/.]+)/([^/.]+)/?$ index.php?category=$1&page=$2 [L]
This part of the regex: ([^/.]+) matches and captures a string that doesn't contain the characters / or . and is 1 characters long or longer. The captured strings can be referenced with $1, $2 and so on.
Now the pretty urls like mysite.com/category/foo/bar work. In addition we need to define a rule that redirects the old urls ending in ".html". The rule required might look something like this:
RewriteRule ^category/([^/.]+)/([^/.]+).html$ category/$1/$2 [R=301,L]
One thing to remember while testing and adjusting the redirects is that the redirect may get cached in the browser which may lead to confusing results when testing.
To remove the .html extension on the URL and 301 redirect to the extensionless URL you can try the following in the .htaccess in your "specific directory":
RewriteEngine On
RewriteBase /specific-directory
RewriteRule ^(.*)\.html$ $1 [R=301,L]
I have a website that is getting a lot of requests for pages that don't exist.
All the requests are based on an existing page, but have RK=0/RS= plus a random string of characters at the end.
For example, the request is:
www.domain.com/folder/article/RK=0/RS=M9j32OWsFAC_u8I6a0xOMjYKU_Q-
but the page www.domain.com/folder/article does exist.
I would like to use htaccess to say:
if RK=0/RS= exists, remove it and everything after
but haven't been able to get it working.
All the htaccess rules talking about removing query strings, but I'm guessing because this doesn't have a ? it's not a query.
Could someone help me understand how to do this?
Someone found where this mess is coming from.
http://xenforo.com/community/threads/server-logs-with-rk-0-rs-2-i-now-know-what-these-are.73853/
It looks like actually NOT malicious, it's something broken with Yahoo rewrites that create URLs that point to pages that don't exist.
The demo described on xenforo does replicate it, and the pattern of the URLS that Yahoo is producing:
http://r.search.yahoo.com/_ylt=A0SO810GVXBTMyYAHoxLBQx./RV=2/RE=1399899526/RO=10/RU=http%3a%2f%2fkidshealth.org%2fkid%2fhtbw%2f/RK=0/RS=y2aW.Onf1Hs6RISRJ9Hye6gXvow-
Sure does look like the RV=, RE=, RU=, RK=, RS= values are of the same family. It's just that somewhere the arg concatenation is screwing up on their side.
You can use this rule in root .htaccess file:
RewriteEngine On
RewriteRule ^(folder/article/)RK=0/RS= /$1 [L,NC,R=301]
I have a long list of URLs from an old site structure that I need to redirect using RedirectMatch to new URLs on the same domain. The trick is that the old (source) URLs contain a bunch of messy variables. That's not a problem, right? I just put in place some sweet little Regexp statements to handle those variables. That's what I did, and it matches the variables correctly. Sweet!
The problem comes with the second part of the RedirectMatch statement - the destination. RedirectMatch is correctly resolving the old URLs to the new URLs, except that the old variables are appended to the new URL. I want to keep the redirects, but have the destination URL not contain the variables. Here is my code:
RedirectMatch 301 ^/Shop/Category1/Category2/(.*)$ http://www.website.com/garage.html
Actual Redirect URL:
http://www.website.com/garage.html?launch_pg=itemZoomView&launch_sel=1009152&launch_pg_sp=true&title=Pig+Waste+Can
Can anybody point me to what I am doing wrong here? I just want to get rid of those crummy old variables and start fresh.
If by "variables" you mean query string (the launch_pg=itemZoomView&launch_sel=1009152&launch_pg_sp=true&title=Pig+Waste+Can part of your destination URL example), then use this redirect rule:
RedirectMatch 301 ^/Shop/Category1/Category2/(.*)$ http://www.website.com/garage.html?
The only difference -- the ? at the end of new URL. This stops old query string from being copied over as we telling Apache that new URL will have this empty query string.
If you can use mod_rewrite, here is the rule:
# Activate Rewrite Engine
Options +FollowSymLinks
RewriteEngine On
# the rewrite rule
RewriteRule ^Shop/Category1/Category2/(.*)$ http://www.website.com/garage.html? [R=301,L]
This definitely will redirect with empty query string.
I want to have te following URLs on my page:
www.domain.com/<module>/<function>/<query>=<string>/<query>=<string>/<query>=<string>
I know how to match the part with the module and function to valid urls like this:
www.domain.com/index.php?module=<module>&function=<function>
But I have no idea how I can append all those query=string-parameters to the query string.
I currently use RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9_]+)$ index.php?module=$1&function=$2 [NC]as my rule and would like to add those (optional and repeatable) query-string parts.
I hope someone knows more about htaccess and regexp than me xD
These rules need to be placed in .htaccess file in website root folder.
RewriteRule ^(.+)/([a-z0-9_]+)=([^/]+)/?$ $1/?$2=$3 [NC,N,DPI,QSA]
RewriteRule ^([a-z0-9_]+)/([a-z0-9_]+)/?$ /index.php?module=$1&function=$2 [NC,QSA,L]
They will rewrite URL (internally) from this form
http://www.example.com/main/job/p1=value/p2=something+else/PP=yes
into this form
http://www.example.com/index.php?module=main&function=job&p1=value&p2=something+else&PP=yes
These rules need to be placed somewhere on the top of .htaccess -- first rule uses [N] flag which tells Apache to start rewriting from start again (in order to rewrite all <query>=<string> fragments). If you have a lot of rules before this one, Apache will have to "probe" each rule after each iteration, which may put unnecessary load on web server.
Either I am too tired to see what I am doing wrong or there is something important I am missing here.
Basically I have a simple set of rewrite rules which are used in conjunction with a central dispatcher file (index.php) to handle requests coming for HTML, CSS and JavaScript files separately and they look like this.
RewriteEngine on
RewriteRule (.+)\.html$ index.php?action=view&url=$1.html [L]
RewriteRule (.+)\.css$ index.php?action=resource&type=css&url=$1.css [L]
RewriteRule (.+)\.js$ index.php?action=resource&type=js&url=$1.js [L]
Long story cut short these rules work fine however I've been notified by the SEO agency responsible for the site that there is an error in one of the URLs which needs to be permanently redirected (301) to the correct link. Since its just one URL that requires redirecting I have chosen to use Redirect instead of URL rewriting and added the following rule.
Redirect 301 /page1.html /page2.html
This works well too except for the fact that after the remote redirection is done for page1.html I get the query part (?action=view&url=page2.html) displayed in browsers address bar. I perfectly understand that the HTMl rewriting rule simply added the query string part after it was done with the URL but what would I need to do to get rid of the query part after a remote 301 redirection is performed.
Just to add I tried the URL rewrite method too but it seems that whatever I do the L flag is simply ignored and the HTML rewrite rule is still executed.
RewriteRule ^page1\.html$ page2.html [L,R=301]
That's a rewrite redirect and should cut off the query string. Put it before your other 3 rules, otherwise it will be ignored.
I don't know how much the solution may change with the web-server and the web-server version, but what worked for me was "When you want to erase an existing query string, end the substitution string with just a question mark".
See "Modifying the Query String" at http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule (Apache v2.4)
So,
RewriteRule ^page1\.html$ page2.html? [L,R=3xx]
The R flag is needed for the new URI to be showed and not the original with the query string. But even without the R flag, the query string will not be passed.