htaccess redirect url with trailing %20 - regex

We use pretty urls on our site. I had an external technician add back links some years ago. He did a great job, but in one case, he consistently added a link with a trailing space character.
https://www.example.com/item/item/%20
This has been indexed as %20 and I can see on my back link reports that there are 87 sites that point to the URL with %20 at the end.
If I can redirect this, then my page /item/item/ would gain 87 back links.
We use rewrite rules, and I have tried every solution here on stack overflow, but none has worked. Some non working solutions are:
RewriteEngine on
RewriteRule ^(.*[^\ ])\ +$ /$1
RedirectRule (.*)\s$ $1 [R=301]
RewriteRule ^(.*/|)[\s%20]+(.+)$ $1$2
I have tried a redirect 301 but these don't work either.
redirect 301 /item/item/%20 /item/item/
redirect 301 /item/item/+ /item/item/
Some things that helps - this is not a site wide pattern. It is just one particular URL that got propagated out into the world incorrectly. And it is not a space anywhere in the string - it is always at the end.
Thanks.
It would also work fine for me to convert the trailing %20 to a known character like a - because I could redirect it /item/item/- to item/item/

You can use this rule as your topmost rule just below RewriteEngine On line:
RewriteEngine On
RewriteRule ^(.*)(?:\s|\x20)+$ /$1 [L,NE,R=301]

Related

HTACCESS : Redirect (301) thousands of Url's containing directories to simple url's

I need to convert with HTACCESS method tons of URL's allready produced (and already indexed...) by Wordpress for articles containing folders/subfolders to simples URL's without any folder/subfolder name.
Example:
FROM https://www.website.com/Animals/Cats/mycat.html TO https://www.website.com/mycat.html
FROM https://www.website.com/Animals/Dogs/mydog.html TO https://www.website.com/mydog.html
FROM https://www.website.com/Countries/France/bordeaux.html TO https://www.website.com/bordeaux.html
etc...
I already changed permalinks options in Wordpress config. So, now URL's produced are in the good format (Ex: https://www.website.com/bordeaux.html) without any folder name.
My problem is to redirect all OLD Url's to this new format to prevent 404 and preserve the rank.
If tryed to add in my .htacess this line :
RewriteRule ^/(.*)\.html$ /$1 [R=301,L,NC]
I egally tryed RedirectMatch 301 (.*)\.html$ method and it's the same. I'm going crazy with this.
What am i doing wrong and could you help me?
Thanks
RewriteRule ^/(.*)\.html$ /$1 [R=301,L,NC]
The URL-path matched by the RewriteRule pattern never starts with a slash. But you can use this to only match the last path-segment when there are more "folders". And the target URL also needs to end in .html (as per your examples).
So, this can be written in a single directive:
RewriteRule /([^/]+\.html)$ /$1 [R=301,L]
This handles any level of nested "folders". But does not match /foo.html (the target URL) in the document root (due to the slash prefix on the regex), so no redirect-loop.
(No need for any preceding conditions.)
Here the $1 backrefence includes the .html suffix.
Just match the last part of the url and pass it to the redirect:
RewriteRule /([^/]+)\.html$ /$1.html [R=301,L,NC]
It will match any number of directories like:
https://www.website.com/dir1/page1.html
https://www.website.com/dir1/dir2/page2.html
https://www.website.com/dir1/dir2/dir3/page3.html
https://www.website.com/dir1/dir2/dir3/dir3/page4.html

How can I redirect users from a url to another using regex and htaccess?

I want to redirect users from www.example.com/ANYTHING to www.example.com.
Is .htaccess the better way to do it? How can I do it?
If you really just have to remove everything after www.mydomain.com, then you just have to delete everything from the first / to the end of the URL:
Search for this regex
%^([^/]*).*$
and substitute with \1, as I did here. Note that I have used the % sign as delimiter instead of /, so I don't need to escape the / in the regex. (I could have used any other available symbol other than /.)
You'll need to use a mod_rewrite RewriteRule (as opposed to a mod-alias RedirectMatch) in order to avoid conflicts with mod_dir and the DirectoryIndex*1.
For example, in .htaccess (Apache 2.2 and 2.4):
RewriteEngine On
RewriteRule . / [R,L]
The single dot matches something (ie. not the document root) and we redirect to the document root.
However, if the document root is an HTML webpage that links to resources like JavaScript, CSS and images then you need to make exceptions for these resources, otherwise these too will be redirected to the root!
For example:
RewriteEngine On
RewriteCond %{REQUEST_URI} !\.(js|css|jpg|png|gif)$ [NC]
RewriteRule . / [R,L]
*1 A mod_alias RedirectMatch directive such as RedirectMatch /. / ends up matching the rewritten request (by mod_dir) to the DirectoryIndex (eg. index.php) resulting in a redirect loop.

RewriteRule to remove superfluous single "?" in URL

I am using IBM HTTP server configuration file to rewrite a URL redirected from CDN.
For some reason the URL comes with a superfluous single question mark even when there are no any query string. For example:
/index.html?
I'm in the process of making the 301 redirect for this. I want to remove the single "?" from the url but keep it if there is any query string.
Here's what I tried but it doesn't work:
RewriteRule ^/index.html? http://localhost/index.html [L,R=301]
update:
I tried this rule with correct regular expression but it never be triggered either.
RewriteRule ^/index.html\?$ http://localhost/index.html [L,R=301]
I tried to write another rule to rewrite "index.html" to "test.html" and I input "index.html?" in browser, it redirected me to "test.html?" but not "index.html".
You need to use a trick since RewriteRule implicitly matches against just the path component of the URL. The trick is looking at the unparsed original request line:
RewriteEngine ON
# literal ? followed by un-encoded space.
RewriteCond %{THE_REQUEST} "\? "
# Ironically the ? here means drop any query string.
RewriteRule ^/index.html /index.html? [R=301]
Question-mark is a Regular Expression special character, which means "the preceding character is optional". Your rule is actually matching index.htm or index.html.
Instead, try putting the question-mark in a "character class". This seems to be working for me:
RewriteRule ^/index.html[?]$ http://localhost/index.html [L,R=301]
($ to signify end-of-string, like ^ signifies start-of-string)
See http://publib.boulder.ibm.com/httpserv/manual60/mod/mod_rewrite.html (for your version of Apache, which is not the latest)
Note from our earlier attempts, escaping the question-mark doesn't seem to work.
Also, I'd push the CDN on why that question-mark is being sent. This doesn't seem a normal pattern.

Need help at redirect regex 301

i've moved an old site to wordpress and now i need to redirect the old links to new one
The old url
mysite.com/2010-11-02-11-05-12/category-old/subcategory/123-article
Here are my redirects:
to category page (working)
RedirectPermanent /2010-11-02-11-05-12/category-old/ /archive/category-new/
to article page (not working)
RedirectPermanent /2010-11-02-11-05-12/(.+?)/^[0-9]+-(.+?)/?$ /$1 [L,R=301]
redirect url (404)
mysite.com/archive/tag/subcategory/123-article
the url should look like this (the number before are removed 123-):
mysite.com/article
could anyone help?
Change it to:
RewriteEngine on
RedirectRule 2010-11-02-11-05-12/(.+?)/[0-9]+-(.+?)/?$ /$1 [L,R=301]
Which seems to be what you were trying to use. I removed the opening forward slash as they are not included in a match from .htaccess. I removed the caret (^) from before the numbers as that would stop it matching. Let me know any problems.

.htaccess - replacing last hyphen (out of many) with a forward slash [regex?]

I have xxxx's of URLs which are in the following format:
http://www.example.com/sub/worda-wordb-wordc-123456789
However I have external links to my site with the URLs in the following format:
http://www.example.com/sub/worda-wordb-wordc/123456789
I'd like to redirect all URLs from
http://www.example.com/sub/worda-wordb-wordc/123456789
to
http://www.example.com/sub/worda-wordb-wordc-123456789
Please try the following:
RewriteEngine On
# Redirect URI with last slash, replacing with hyphen
RewriteRule ^sub/([\w-]+)/(\d+)/?$ /sub/$1-$2 [R=302,L]
Here, we are checking for letters, digits, underscores, and hyphens with ([\w-]+), digits with (\d+) and an optional slash on the end with /?, just to be sure, and then redirecting it accordingly.
Be sure to make this one of your first rules, and then change 302 to 301 to make the redirect cached by browsers and search engines.
You can use this .htaccess file:
RewriteBase /
RewriteRule ^sub/(.*)/([0-9]+)$ /sub/$1-$2
Now if you go to http://www.example.com/sub/worda-wordb-wordc/123456789 the url will be rewritted to http://www.example.com/sub/worda-wordb-wordc-123456789.
If this is not what you were looking for please add more details to your question.
You can use this rule in your site root .htaccess:
RedirectMatch 301 ^(sub)/(.*)-(\d+)/?$ /$1/$2/$3