How to set up rewrite rule for a list of keywords in the URL? - regex

What I wish to do
I have a number of URLs I need to redirect, along with a 301 permanent redirect header being sent to browser. I've determined doing this at the htaccess level is most efficient (as opposed to doing it with a function in the Wordpress site this relates to).
The URLs to redirect are:
https://www.mydomain.com.au/search-result/?location=victoria
https://www.mydomain.com.au/search-result/?location=new-south-wales
https://www.mydomain.com.au/search-result/?location=queensland
https://www.mydomain.com.au/search-result/?location=south-australia
https://www.mydomain.com.au/search-result/?location=tasmania
https://www.mydomain.com.au/search-result/?location=northern-territory
Where to redirect to
I want to redirect them to the home page: https://mydomain.com.au/ (I might later choose to redirect them all elsewhere, but I can do that part).
NOTE: The query string should be dropped from the redirect.
I am not sure whether it's best to test for all six of those location= variables, or to simply test for the one location= variable that is not to redirect.
The one location= variable that is not to redirect is ?location=western-australia. E.g.,
https://www.mydomain.com.au/search-result/?location=western-australia
Additional considerations
Note that there are other .../search-result/ URLs that have different variables in the query strings, such as ?weather=... or ?water=.... For example, https://www.mydomain.com.au/search-result/?location=victoria&weather=part-shade&water=&pasture=
As seen in that example, it's also possible multiple variables will be in the query string, such as ?location=tasmania&weather=&water=moderate&pasture=.
So I need to test for the presence of the above listed location= irrespective of whether or not it has other variables after it. The location= variable is always the first in the overall query string.
I am thinking it may be as simple as testing for the presence of /search-result/ AND that followed by victoria | tasmania | northern-territory | etc. in the URL. I can't be 100% sure those words (victoria, etc.) won't show up in any other URLs, hence my reason for only redirecting if those words follow either location= or /search-result/. I suspect location= would be a suitable condition.
I've played around with modifying many rewrite rule examples I've found online, and couldn't get anything to work. I'd either get a 501 error (site crash), or nothing would happen at all.
Thank you.

Not sure if you've tried these, but they worked well for me:
To allow any location values, except western-australia:
# The request path is /search-result/ or maybe /search-result
RewriteCond %{REQUEST_URI} ^/search-result/?$
# ..and the query string 'location' is not empty
RewriteCond %{QUERY_STRING} (^|&)location=.+($|&)
# ..and the value is not 'western-australia'.
RewriteCond %{QUERY_STRING} !(^|&)location=western-australia($|&)
# Redirect to the home page.
RewriteRule . / [R=301,NC,L]
To allow only certain location values:
RewriteCond %{REQUEST_URI} ^/search-result/?$
// Allow only certain location values - (<value>|<value>|...).
RewriteCond %{QUERY_STRING} (^|&)location=(victoria|new-south-wales)($|&)
RewriteRule . / [R=301,NC,L]
And note that, in WordPress, you need to put the above before the WordPress rules:
# This is a sample .htaccess file used on a WordPress site.
# PLACE YOUR CUSTOM RULES HERE.
# BEGIN WordPress
<IfModule mod_rewrite.c>
# ...
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
I.e. Place your rules above the # BEGIN WordPress line, to avoid getting 404 errors.
And btw, I'm no htaccess expert, but hopefully this answer helps you. :)

Related

HTACCESS : Redirect (301) thousands of Url's containing directories to simple url's

I need to convert with HTACCESS method tons of URL's allready produced (and already indexed...) by Wordpress for articles containing folders/subfolders to simples URL's without any folder/subfolder name.
Example:
FROM https://www.website.com/Animals/Cats/mycat.html TO https://www.website.com/mycat.html
FROM https://www.website.com/Animals/Dogs/mydog.html TO https://www.website.com/mydog.html
FROM https://www.website.com/Countries/France/bordeaux.html TO https://www.website.com/bordeaux.html
etc...
I already changed permalinks options in Wordpress config. So, now URL's produced are in the good format (Ex: https://www.website.com/bordeaux.html) without any folder name.
My problem is to redirect all OLD Url's to this new format to prevent 404 and preserve the rank.
If tryed to add in my .htacess this line :
RewriteRule ^/(.*)\.html$ /$1 [R=301,L,NC]
I egally tryed RedirectMatch 301 (.*)\.html$ method and it's the same. I'm going crazy with this.
What am i doing wrong and could you help me?
Thanks
RewriteRule ^/(.*)\.html$ /$1 [R=301,L,NC]
The URL-path matched by the RewriteRule pattern never starts with a slash. But you can use this to only match the last path-segment when there are more "folders". And the target URL also needs to end in .html (as per your examples).
So, this can be written in a single directive:
RewriteRule /([^/]+\.html)$ /$1 [R=301,L]
This handles any level of nested "folders". But does not match /foo.html (the target URL) in the document root (due to the slash prefix on the regex), so no redirect-loop.
(No need for any preceding conditions.)
Here the $1 backrefence includes the .html suffix.
Just match the last part of the url and pass it to the redirect:
RewriteRule /([^/]+)\.html$ /$1.html [R=301,L,NC]
It will match any number of directories like:
https://www.website.com/dir1/page1.html
https://www.website.com/dir1/dir2/page2.html
https://www.website.com/dir1/dir2/dir3/page3.html
https://www.website.com/dir1/dir2/dir3/dir3/page4.html

How do I mask a subdirectory URL without a trailing slash using htaccess and not have the internal query string appended?

Arg! So, I have a URL like this:
mysite.com/foo-bar
or
mysite.com/foo-bar/
When the user requests this address, I don't want it to change, but internally (invisibly) I want this to actually be the address:
main.php?page=foo-bar
For years, I have used this htaccess line:
RewriteRule ^([a-z0-9\-]+)\/{0,1}$ main.php?page=$1 [QSA,NC,L]
And it worked fine. But now, when I (1) try it with a sub-directory:
RewriteRule ^sub\/([a-z0-9\-]+)\/{0,1}$ main.php?page=$1 [QSA,NC,L]
And (2) don't add a trailing slash, it suddenly appends the "internal" query string.
So, this:
mysite.com/sub/foo-bar/
Works fine. But this:
mysite.com/sub/foo-bar
Still works internally, but the URL redirects to this:
mysite.com/sub/foo-bar/?page=foo-bar
I tried removing [QSA], but that doesn't make a difference. Besides, I need it, because I want any extra vars the user passes to be added to mine. For example:
mysite.com/sub/foo-bar?uservar=42
Should not change to the viewer, but internally should be:
main.php?page=foo-bar&uservar=42
I have been at this for hours reading Apache docs and StackOverflow posts with similar problems, but none of the solutions work. If I don't put a trailing slash in the subdirectory requests, the slash gets added and internal vars get shown to the viewer. Please help!
That is happening due to mod_dir adding a trailing slash to your URI which is for a real directory. To fix you can use:
DirectorySlash Off
RewriteEngine On
# add a trailing slash to directories
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule ^(.*?[^/])$ %{REQUEST_URI}/ [L,R=302]
RewriteRule ^sub/([a-z0-9-]+)/?$ main.php?page=$1 [QSA,NC,L]

Yet another mod_rewrite query -- Cannot rewrite to pretty url and keep query strings

I'm literally at the end of my tether with this. I've researched many other questions and answers on stackoverflow but still can't find the solution i need. I'm starting to think what i want to do is not possible.
So... the problem is this. I want to turn the following:
e.g.
www.mydomain.com/visa-information/country.php?country={COUNTRY-NAME}&passport={PASSPORT-NAME}
To the following pretty url:
www.mydomain.com/visa-information/{COUNTRY-NAME}-visa-for-{PASSPORT-NAME}-citizens/
I have a partially successful rule in my htacces file as so:
RewriteRule ^/visa-information/([A-Za-z-]+)-visa-for-([A-Za-z-]+)-citizens/?$ visa-information/country.php?country=$1&passport=$2 [NC]
which works fine and does what i want if i enter the url into the browser address bar, but the real problem i'm having is getting it to re-direct to the pretty url via a form i have on pretty much every page of the site.
I've tried various re-direct rules like the one below:
RewriteCond %{QUERY_STRING} country=([A-Za-z-]+)&passport=([A-Za-z-]+) [NC]
RewriteRule visa-information/country.php visa-information/%1-visa-for-%2-citizens/? [R,NC,L]
But no luck. I've also tried adding the QSA flag to the above re-direct rule, but it just ends up with an endless loop.
I have tried using a location php re-direct header at the top of the country.php page to re-direct after form submission like so:
if(isset($_GET['country']) && isset($_GET['passport'])) {
header("Location: " . $dir . "/visa-information/" . $currentCountry . "-visa-for-" . $currentPassport . "-citizens/");
exit();
}
I was expecting the above to work like entering the pretty url directly into the browser works, but it doesn't, just gives me a 404 error.
Any help is greatly appreciated.
Thanks
Jordash
EDIT
My local directory structure is as follows:
/webserver/mydomain.com/visa-information/etc...
On the live server it will be:
mydomain.com/visa-information/etc..
As i am using an Apache Alias on my local machine i have set RewriteBase as:
RewriteBase /webserver/mydomain.com/
I currently have the following set of RewriteRules adapted from what anubhava gave me:
RewriteCond %{REQUEST_URI} visa-information/country.php [NC]
RewriteCond %{QUERY_STRING} country=([A-Za-z-]+)&passport=([A-Za-z-]+) [NC]
RewriteRule visa-information/ visa-information/%1-visa-for-%2-citizens/? [R=301,L,QSA]
# internal redirect from pretty URL to old URL
RewriteRule ^visa-information/([A-Za-z-]+)-visa-for-([A-Za-z-]+)-citizens/?$ visa-information/country.php?country=$1&passport=$2 [NC,L]
This currently gives me an endless re-direct loop, both when entering the pretty url in the browser bar, and when using my form, however if i disable the top 3 rules then i find i can enter the pretty url into the address bar and the rewrite works, but not from the form submission of course.
I really don't know what i'm doing wrong. Why is there an endless loop?
Not sure how first rule is working for you since RewriteRule doesn't match leading slash in .htaccess. Also to redirect old url to pretty URL you need to use THE_REQUEST variable that represents original request received by Apache from your browser. Replace your code with this:
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
# external redirect from old URL to pretty URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+visa-information/country\.php\?country=([^\s&]+)&passport=([^\s&]+)\s [NC]
RewriteRule ^ /visa-information/%1-visa-for-%2-citizens/? [R=301,L]
# internal redirect from pretty URL to old URL
RewriteRule ^visa-information/([a-z-]+)-visa-for-([a-z-]+)-citizens/?$ /visa-information/country.php?country=$1&passport=$2 [NC,L,QSA]
Looks like the issue was actually with my php form and not the mod_Rewrite rules. The form "action" was pointing to the same form page (as in $SERVER['PHP_SELF']) which works fine without the rewrite rules, but causes an endless loop when they are activated.
I simply made a search_action.php page and then redirect the form there using a php header to the pretty url:
header("Location: " . $dir . "/visa-information/" . $currentCountry . "-visa-for-" . $currentPassport . "-citizens/");
exit();
The mod_rewrite rule I had originally works fine and now the user can get to the desired page with the pretty url from the form or by tyoing directly into the browser address bar.
I'm not sure it's actually possible to action a form to the same page whilst rewriting the query string, without using an intermediary action page from the form, or Javascript. I'm sure many of the more experienced programmers will have known this already, but not me unfortunately.

.htaccess Subdirectory Rewrite Exception

I'm currently consolidating posts on a site we recently acquired that had multiple WordPress installs to manage content, one in the public_html folder and another in a subdirectory, like so:
1. http://domain.com/
2. http://domain.com/another-install/
We're moving all of the content from /another-install/ into the main setup, and using a 301 redirect to remove /another-install/ from all old links like so:
RedirectMatch 301 ^/another-install/(.*) http://domain.com/$1
Resulting in all articles redirecting like so:
http://domain.com/another-install/article-name/
TO
http://domain.com/article-name/
The problem is, we want to keep /another-install/ viewable as a page. With the current redirect, http://domain.com/another-install/ goes to http://domain.com/. Is there any way to add an exception, or rewrite the current rule so that it keeps /another-install/ viewable?
Change your regex from (.*) (which matches 0 or more of any character) to (.+) (which matches 1 or more of any character). That means there would have to be something following /another-install/ in order for there to be a redirect.
You need a RewriteRule to specify exclusions. Add this to your .htaccess file
RewriteCond %{REQUEST_URI} !^/old-install/(index\.wml)?$ [NC]
RewriteRule ^old-install/(.+)$ http://domain.com/$1 [R=301,NC,L]

.htaccess mod-rewrite regex apache confusion results in 10k 404's per day

I have reviewed the many questions posted here related to .htaccess, apache, mod-rewrite and regex, but I'm just not getting it. I tried a few different things but either I am over complicating things or making beginner mistakes. Regardless, I've been at it a few days now and have completely scrambled things somewhere as the 10000 404's per day are showing.
My site
I have a WordPress site which contains over 23,000 posts broken down into just over 1200 categories. The site features streaming video files, industry news, show reviews, movies, phpbb forums, etc. and is structured like this:
site / base categories ( 0 and a-z) / sub categories (series name) /
posts (episode name .html )for all streaming media episodes
site / movies / post title.html for all streaming movies
site / news / posttitle.html
site / reviews / posttitle.html
site / page.html for assorted pages
site / forums
Permalink structure is /%category%/%postname%.html
I have am using the Yoast Wordpress SEO plugin and have the option to append a trailing slash enabled for directories and categories.
here is the current .htaccess
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
My examples
From our old site structure we have many inbound links using "/episode title/". This is wrong. We need these incoming links to redirect to /watch-anime/letter, number or symbol only 1 character long/series title/episode title.html
/one-piece-episode-528​/
should be
/watch-anime/o/one-piece/​one-piece-episode-528​.html
A mistake I made caused this problem... "/watch-anime/letter/series title/episode title/" to "/watch-anime/letter/series title/episode title.html". So, we need to remove trailing slash from single posts and add .html
/watch-anime​/w​/welcome-to-the-nhk​/welcome-to-the-nhk-episode-14​/
should be
/watch-anime​/w​/welcome-to-the-nhk​/welcome-to-the-nhk-episode-14​.html
The same mistake caused this problem when combined with the old site structure issue... "/episode title.html" needs to be "/watch-anime/letter/series title/episode title.html"
/one-piece-episode-528​.html
needs to be
/watch-anime/o/one-piece/​one-piece-episode-528​.html
As you can see, I've made a mess of things between migrating the sites post structure and my attempts to fix it. I am now asking for any help you can provide in getting a proper .htaccess file that will take care of these 301 redirects.
Thanks for any assistance you can provide!
I don't know if RewriteMap work with .htaccess files, but anyway here's my solution for virtual host, which should work flawlessly.
Create a RewriteMap file. See here for more information. This is a very simple text file with: first, the wrong URL without the '/', then one space (at least) and then the right url, like this:
one-piece-episode-528​ /watch-anime/o/one-piece/​one-piece-episode-528​.html
dexter-season-6-episode-1 /watch-interesting-stuff/d/dexter/dexter-season-6-episode-1.html
breaking-bad-full-season-3 /watch-interesting-stuff/b/breaking-bad/​breaking-bad-full-season-3.html
and so on.
convert this simple text file into hash map. For example:
httxt2dbm -i mapanime.txt -o mapanime.map
Now declare it in your vhost:
RewriteMap mapanime \
dbm:/pathtofile/mapanime.map
So all in all your vhost should look like:
<VirtualHost *>
RewriteEngine On
RewriteMap mapanime \
dbm:/pathtofile/mapanime.map
# don't touch the URL, but try to search if it exists in mapanime
RewriteRule /([^/]*)/$ - [QSA,NC,E=VARANIME:${mapanime:$1|notfound}]
# if VARANIME not empty *and*
# VARANIME different from "notfound":
RewriteCond %{ENV:VARANIME} ^(notfound|)$
# then redirect it to the right URL:
# QSA = query string append
# R = redirect, 301 = definitive redirect
# L = last = don't go further
RewriteRule . %{ENV:VARANIME} [QSA,R=301,L]
</VirtualHost>
Hope this helps.
I don't see a simpler solution, but I'm pretty sure this one will work.
If it doesn't work: read my usual "two hints", and add the rewrite log in your question.
Two hints:
Please try to use the RewriteLog directive: it helps you to track down such problems:
# Trace:
# (!) file gets big quickly, remove in prod environments:
RewriteLog "/web/logs/mywebsite.rewrite.log"
RewriteLogLevel 9
RewriteEngine On
My favorite tool to check for regexp:
http://www.quanetic.com/Regex (don't forget to choose ereg(POSIX) instead of preg(PCRE)!)