Regexp / htaccess rewrite fun - regex

I have a long list of URLs from an old site structure that I need to redirect using RedirectMatch to new URLs on the same domain. The trick is that the old (source) URLs contain a bunch of messy variables. That's not a problem, right? I just put in place some sweet little Regexp statements to handle those variables. That's what I did, and it matches the variables correctly. Sweet!
The problem comes with the second part of the RedirectMatch statement - the destination. RedirectMatch is correctly resolving the old URLs to the new URLs, except that the old variables are appended to the new URL. I want to keep the redirects, but have the destination URL not contain the variables. Here is my code:
RedirectMatch 301 ^/Shop/Category1/Category2/(.*)$ http://www.website.com/garage.html
Actual Redirect URL:
http://www.website.com/garage.html?launch_pg=itemZoomView&launch_sel=1009152&launch_pg_sp=true&title=Pig+Waste+Can
Can anybody point me to what I am doing wrong here? I just want to get rid of those crummy old variables and start fresh.

If by "variables" you mean query string (the launch_pg=itemZoomView&launch_sel=1009152&launch_pg_sp=true&title=Pig+Waste+Can part of your destination URL example), then use this redirect rule:
RedirectMatch 301 ^/Shop/Category1/Category2/(.*)$ http://www.website.com/garage.html?
The only difference -- the ? at the end of new URL. This stops old query string from being copied over as we telling Apache that new URL will have this empty query string.
If you can use mod_rewrite, here is the rule:
# Activate Rewrite Engine
Options +FollowSymLinks
RewriteEngine On
# the rewrite rule
RewriteRule ^Shop/Category1/Category2/(.*)$ http://www.website.com/garage.html? [R=301,L]
This definitely will redirect with empty query string.

Related

Redirect "ugly" URL to new URL using htaccess

I have already rewritten my old "ugly" URL:
http://example.com/ppd-brands/generic/?gen_id=Mjky
to
http://example.com/ppd-brands/generic/gen_id/Mjky
using the code below
RewriteRule ^ppd-brands/generic/gen_id/([^/]*)$ /ppd-brands/generic/?gen_id=$1 [L]
and it's working.
Now my problem is how can I redirect the old "ugly" URL to the new URL when the user visits the old "ugly" URL?
RewriteRule ^ppd-brands/generic/gen_id/([^/]*)$ /ppd-brands/generic/?gen_id=$1 [L]
Just a precursor... whilst your old "ugly" URL was of the form /ppd-brands/generic/?gen_id=Mjky, you should ideally be rewriting to the actual file that handles the request, eg. index.php, instead of allowing mod_dir to issue an additional internal subrequest to the directory index - which is what I assume is happening here.
For example:
RewriteRule ^ppd-brands/generic/gen_id/([^/]*)$ /ppd-brands/generic/index.php?gen_id=$1 [L]
Now, your main question... to externally redirect from the old "ugly" URL to the new URL. In this case, you need to be careful of a redirect loop, since if we simply redirect then the above rewrite will rewrite it back again in an endless loop. You can't use a mod_alias Redirect (as the other answer suggests) for this reason. (And a mod_alias Redirect can't match the query string either - another reason.)
Aside: Since we changed the above rewrite to include index.php in the rewritten URL, which would appear to differ from the old "ugly" URL, we could perhaps get away with a simple redirect if you are on Apache 2.4 (but Apache 2.2 would result in a conflict because mod_dir would issue an internal subrequest for index.php before we can process the URL with mod_rewrite).
We need to only redirect initial requests, not requests that we have already rewritten. We can do this by checking against the REDIRECT_STATUS environment variable, which is empty on the initial request and set to "200" (as in 200 OK HTTP status) after the first successful rewrite. (Another way is to check against THE_REQUEST, instead of the dynamic/rewritable URL-path.)
For example, try the following before your existing rewrite:
# Redirect "old" to "new"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^gen_id=([^/&]*)
RewriteRule ^(ppd-brands/generic)/(?:index\.php)?$ /$1/gen_id/%1 [QSD,R=302,L]
Note that in order to match the query string we need a condition (RewriteCond directive) that checks against the QUERY_STRING server variable. The URL-path matched by the RewriteRule pattern notably excludes the query string.
The index.php in the request URL is optional, so it matches /ppd-brands/generic/?gen_id=Mjky or /ppd-brands/generic/index.php?gen_id=Mjky (if that is the actual URL).
The $1 backreference is simply to save typing/duplication. This will always contain ppd-brands/generic when the directive matches. We could have done the same with "gen_id", but that could make the susbstitution string look a bit too cryptic.
The %1 backreference (note the % prefix) is a backreference to the captured group in the last matched CondPattern (as opposed to $1 which refers to the RewriteRule pattern), ie. the value of the gen_id URL parameter.
The QSD flag (Apache 2.4+) strips the query string from the redirected URL. Otherwise gen_id=XYZ would be passed through to the target URL. If you are still on Apache 2.2 then you would need to append a ? to the end of the substitution string instead (essentially an empty query string). eg. /$1/gen_id/%1?
The "magic" is really the first condition that checks the REDIRECT_STATUS env var. As mentioned above, this ensures that we only process initial requests and not the rewritten request, thus avoiding redirect loop.
Note that this is currently a 302 (temporary) redirect. Only change to a 301 (permanent) once you have tested this works OK. 301s are cached persistently by the browser so can make testing problematic.
And just to clarify... a redirect like this should only be implemented once you have already changed all the URLs in your application. This redirect is to simply redirect search engines, backlinks and anyone who should manually type the URL (unlikely).
Redirect 301 /oldurl.htm /newurl.htm
change old and new URL according to your need. Hope it helps you

htaccess Regex 301 Redirect

I'm having a hard time getting a permanent redirect to work. I would like this to happen, using regular expressions.
OLD URL: https://example.com/olddir/other_name_here/123456/garbage.jpg
NEW URL: https://example.com/newdir/other-name-here-123456/
Note the change from underscores to dashes and the fact that I'm throwing the extra bits away after the numeric string. I've tried this but it isn't working (page doesn't exist and still getting a 404):
RewriteRule ^/olddir/other_name_here/([0-9]{6})/.+ /newdir/other-name-here-$1/ [R=301,L]
I have a few hundred names in the "other_name_here" directory location, so if I could dynamically change underscores to hyphens that would be good but not necessary. olddir and newdir are actual names and can be hardcoded. What am I doing wrong? Thanks!
Try the following actions:
If not present in your .htaccess add the following 2 lines before your RewriteRule:
Options +FollowSymLinks
RewriteEngine On
Change your regex in the following way by adding a $:
RewriteRule ^/olddir/other_name_here/([0-9]{6})/.+$ /newdir/other-name-here-$1/ [R=301,L]
Try to access to your new URL pages directly and confirm that they are accessible.
Finally, I would recommend this website for details about the .htaccess detailed configuration: https://mediatemple.net/community/products/dv/204643270/using-htaccess-rewrite-rules

Redirect 301 - remove .html extension from URLs

I would like to remove the .html extension from my urls, located into specific directory and redirect 301 them.
Here is how the structure looks like:
mysite.com/category/nameofcategory/pagenumber.html
The thing is that nameofcategory and pagenumber could be any letter or number.
Could you please help me with this?
I wouldn't recommend having your content scattered in many html-files in different folders. This becomes very impractical if you for example want to change the layout of your pages.
Storing the content in a database is a much better solution. If that's not possible perhaps the html files could contain only the formatted text content and a back end script could embed that content to a layout when the page is requested.
This requires that the mod_rewrite module is enabled in the Apache configuration.
In both cases all of the requests would be routed through the back end script and the .htaccess might look something like this:
RewriteEngine on
RewriteRule ^category/([^/.]+)/([^/.]+)/?$ index.php?category=$1&page=$2 [L]
This part of the regex: ([^/.]+) matches and captures a string that doesn't contain the characters / or . and is 1 characters long or longer. The captured strings can be referenced with $1, $2 and so on.
Now the pretty urls like mysite.com/category/foo/bar work. In addition we need to define a rule that redirects the old urls ending in ".html". The rule required might look something like this:
RewriteRule ^category/([^/.]+)/([^/.]+).html$ category/$1/$2 [R=301,L]
One thing to remember while testing and adjusting the redirects is that the redirect may get cached in the browser which may lead to confusing results when testing.
To remove the .html extension on the URL and 301 redirect to the extensionless URL you can try the following in the .htaccess in your "specific directory":
RewriteEngine On
RewriteBase /specific-directory
RewriteRule ^(.*)\.html$ $1 [R=301,L]

Htaccess Not Working - Redirect 301

Due to some bad URLs, we generated some links that don't work and I want to redirect them with a 301 redirect to clear up some webmaster tools issues with Google.
So, we have this URL like this:
http://www.site.com/subdomain/z//-products
*Note that subdomain is variable, the rest of the url is static.
As a side note, this URL makes no sense, that's why I want to redirect it. It should be something like this:
http://www.site.com/bedroom/z/12345/bedroom-furniture-products
Anyway, we had these bad URLs being dynamically generated. We've fixed them, but google picked them up and keeps trying to crawl them. I want to create an htaccess rule to 301 redirect them and the issue should wash out eventually.
Here's what I tried with htaccess to no avail:
^(.*)/n//-products/?$ $1 [R=301,B]
I've also tried all kinds of permutations of this and it's not working. I suspected it was an entity escaping issue, but my research led me to add the [B], but that didn't seem to work either. It's like the redirect rule is working, but it's just redirecting to the original page.
What am i missing here?
I believe anubhava is correct, in that there is inconsistency between the sample URL you describe /subdomain/z//-products and the RewriteRule you attempted to apply. Not sure if this is a typo or not. It may even be the case your copy/paste operation actually added the "/n" literally.
Anyhoo, let us presume that you want to make the rule work with /subdomain/z//-products:
RewriteRule ^/([^/]+)/z//\-products/?$ http://www.site.com/$1 [R=301]
See the example 1 slides of this PDF to get the quick first portion. It is much faster than using (.*).
We literally match the z character and the surrounding slashes. We escape the - character, then we do the rest of the URL and optionally match the trailing slash. We use, if memory serves correctly, an "external" style redirect so that the robots re-open a separate HTTP connection, appending the matched backreference, and hand off the status code.
Let me know if that works.
Enable mod_rewrite and .htaccess through httpd.conf and then put this code in your .htaccess under DOCUMENT_ROOT directory:
Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
RewriteRule ^subdomain/z/-products/?$ /$1 [L,R=302,NC]
Once you verify it is working fine, replace R=302 to R=301. Avoid using R=301 (Permanent Redirect) while testing your mod_rewrite rules.

How to remove the query part of the rewritten URL after it has been remotely redirected?

Either I am too tired to see what I am doing wrong or there is something important I am missing here.
Basically I have a simple set of rewrite rules which are used in conjunction with a central dispatcher file (index.php) to handle requests coming for HTML, CSS and JavaScript files separately and they look like this.
RewriteEngine on
RewriteRule (.+)\.html$ index.php?action=view&url=$1.html [L]
RewriteRule (.+)\.css$ index.php?action=resource&type=css&url=$1.css [L]
RewriteRule (.+)\.js$ index.php?action=resource&type=js&url=$1.js [L]
Long story cut short these rules work fine however I've been notified by the SEO agency responsible for the site that there is an error in one of the URLs which needs to be permanently redirected (301) to the correct link. Since its just one URL that requires redirecting I have chosen to use Redirect instead of URL rewriting and added the following rule.
Redirect 301 /page1.html /page2.html
This works well too except for the fact that after the remote redirection is done for page1.html I get the query part (?action=view&url=page2.html) displayed in browsers address bar. I perfectly understand that the HTMl rewriting rule simply added the query string part after it was done with the URL but what would I need to do to get rid of the query part after a remote 301 redirection is performed.
Just to add I tried the URL rewrite method too but it seems that whatever I do the L flag is simply ignored and the HTML rewrite rule is still executed.
RewriteRule ^page1\.html$ page2.html [L,R=301]
That's a rewrite redirect and should cut off the query string. Put it before your other 3 rules, otherwise it will be ignored.
I don't know how much the solution may change with the web-server and the web-server version, but what worked for me was "When you want to erase an existing query string, end the substitution string with just a question mark".
See "Modifying the Query String" at http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule (Apache v2.4)
So,
RewriteRule ^page1\.html$ page2.html? [L,R=3xx]
The R flag is needed for the new URI to be showed and not the original with the query string. But even without the R flag, the query string will not be passed.