Regular Expression to capture URLs with ascii encoded characters - regex

Having migrated a Wordpress site to a new build, I need to capture a lot of old URLs and redirect them to the same content on the new site. The problem is that the old site has a lot of URLs with ascii-encoded chars and Wordpress has stripped them out on the current site. For example:
/blog/uncategorized/germany%E2%80%99s-ageing-population-working-longer-working-better.html
would redirect to:
/blog/germanys-ageing-population-working-longer-working-better/
Can anyone provide a regular expression that would remove the ascii-encoded characters?

For matching the encoded characters, you would use the following regex pattern:
%[A-Z0-9]{2}
How you perform the replacement will depend on the language/tool you are using.

You have to match against the request here, because with redirect and rewrite rules, the URI is decoded before the patterns get applied. That means you'd be matching against stuff like รข instead of the encoded strings. So you'll want something like:
RewriteEngine On
RewriteCond %{THE_REQUEST} \ /blog/([^\?\ ]*)\%[A-Z0-9]{2}([^\?\ ]*)
RewriteRule ^ /blog/%1%2 [L,R=301,NE]

Related

How to match string if it doesn't contain only numbers after slash?

I am redirecting certain urls with path to get variables like the following:
localhost2/post/myTitle => localhost2/post.php?title=myTitle
localhost2/post/123 => localhost2/post.php?id=123
So In my htaccess file, I use
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^post/(\d+) post.php?id=$1
RewriteRule ^post/(.*) post.php?title=$1
</IfModule>
This works no problem. But I want to learn how to write negative of ^post/(\d+), that is ^post/(NEGATE-ONLY-NUMBERS). In other words I want a regex that matches the whole input sting if there is not only numbers after post/. So post/abc, post/a23, post/ab3, post/12c and post/a2c should all pass but not post/123. I refered to this post, which suggest using:
(?!^\d+$)^.+$
I can't use ^post/(?!^\d+$)^.+$, because there can be only one ^ and one $. I don't know what regex anchor specifies first position in a substring. My best guess is
post\/(?!\d++).*
I think (?!\d++), with the ++ would eat all characters followig and check if all are digits. But this fails at post/1ab.
Another guess is:
post\/(?![\d,\/]+$).*
The works the best but it allows: post/3455/X.
Secondly, eventually I need to convert localhost2/post/myTitle/123 => localhost2/post.php?title=myTitle&repeat=123 as well. I ave come up with the following:
^post/(?!\d+($|/))(.+?($|/))(\d+$)?
Note: +? to use lazy quantifier, otherwise multiple slashes will be matched by .
and
^post/(?!\d+($|/))([^/\n\r]+($|/))(\d+$)?
Here I use [^/\n\r] instead of .+?
Patterns inside zero-width assertions like (?!\d++) are non-consuming, they do not "eat" chars, they only check the context while keeping the regex index at the same location as before matching the zero-width assertion pattern.
You can use any of the following:
^post/(?!\d+(?:/|$)).*
^post/(?!\d+(?=/|$)).*
^post/(?!\d+(?![^/])).*
See the regex demo. Details:
^post/ - start of input, post/ literal string
(?!\d+(?=/|$)) - a negative lookahead that fails the match if, immediately to the right of the current location, there are one or more digits followed with / or end of string
.* - the rest of the input.
Do not over complicate things when you can keep things simple by keeping 3 separate rewrite rules and since your query parameters are named differently you will need 3 separate rewrite rules anyway.
Consider:
Options -MultiViews
RewriteEngine On
RewriteRule ^post/(\d+) post.php?id=$1 [L,QSA,NC]
RewriteRule ^post/([^/]+)/(\d+) post.php?title=$1&repeat=$2 [L,QSA,NC]
RewriteRule ^post/([^/]*) post.php?title=$1 [L,QSA,NC]
Take note of Options -MultiViews. If this is not enabled in Apache config you must have it here otherwise it will keep all $_GET parameters empty in your php file.
Option MultiViews (see http://httpd.apache.org/docs/2.4/content-negotiation.html) is used by Apache's content negotiation module that runs before mod_rewrite and makes Apache server match extensions of files. So if /file is the URL then Apache will serve /file.html.

301 redirect with regular expressions: ASP to HTML

I'd like to redirect a certain pattern of URLs (from an old site) to a simpler, newer html version we are using.
The old URLs look like this:
http://example.com/Collections.asp?Collection=My Collection
...and ideally they would be redirected to:
http://example.com/my-collection.html
So we have that last word which is variable, and can be one word, more than one word with spaces, more than one word with "-" in between them (hyphenated chain of words), or a simple number, like "3".
I'm pretty new with regular expressions and would need some help to manage it.
I tried with:
RedirectMatch 301 /Collections.asp?Collection=(.*) /$1.html
...with no luck, and some other variations.
How can I grab that variable word and make it suitable for my new URL structure?
Thanks for your help,
You can not match against query strings in RedirectMatch directive, Use mod_rewrite :
RewriteEngine On
RewriteCond %{THE_REQUEST} /Collections\.asp\?Collection=([^\s]+) [NC]
RewriteRule ^ /%1.html [NC,L,R]

mod_rewrite rule using date regex

I'm trying to write a rule that when user types in this url:
domain.com/09/13/2013/thisIsMyPageTitle
That url stays in browser window, but content from this url is displayed:
domain.com/contentlibrary/thisIsMyPageTitle
This is my rule that I currently get an error with:
RewriteEngine On
RewriteRule ^((0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d[/])$(.*) /contentlibrary/$1 [L]
I'm trying to match the date with regular expression, and use the (.*) from the initial url in the second one that holds the content and actually exists.
If you're not going to do anything with date then why bother being precise with date semantics. You can simplify your regex:
RewriteRule ^[0-9]+/[0-9]+/[0-9]+/([^/]+)/?$ /contentlibrary/$1 [L]
The error that you're getting is probably because you have unescaped spaces in your regex. Specifically these:
[- /.]
The spaces get interpreted by mod_rewrite as the delimiter between parameters. Additionally, you have this:
$(.*)
at the end of your pattern. The $ matches the end of the string, so you want those swapped:
(.*)$
So:
^((0[1-9]|1[012])[-\ /.](0[1-9]|[12][0-9]|3[01])[-\ /.](19|20)\d\d[/])(.*)$
shold be the pattern that you want.

How to rewrite this URL to a redirect page?

I am using Microsoft-IIS/7.5 on a hosted server (Hostek.com)
I have an existing site with 2,820 indexed links in Google. You can see the results by searching Google with this: site:flyingpiston.com Most of the pages use a section, makerid, or bikeid to get the right information. Most of the links look like this:
flyingpiston.com/?BikeID=1068
flyingpiston.com/?MakerID=1441
flyingpiston.com/?Section=Maker&MakerID=1441
flyingpiston.com/?Section=Bike&BikeID=1234
On the new site, I am doing URL rewriting using .htaccess. The new URLs will look like this:
flyingpiston.com/bike/1068/
flyingpiston.com/maker/1123/
Basically, I just want to use my htaccess file to direct any request with a "?" question mark in it directly a coldfusion page called redirect.cfm. On this page, I will use ColdFusion to write a custom 301 redirect. Here's what ColdFusion's redirect looks like:
<cfheader statuscode="301" statustext="Moved Permanently">
<cfheader name="Location" value="http://www.newurl/bike/1233/">
<cfabort>
So, what does my htaccess file need to look like if I want to push everything with a question mark to a particular page? Here's what I have tried, but it's not working.
RewriteEngine on
RewriteRule ^? /redirect.cfm [NS,L]
Update. Using the advice from below, I am using this rule:
RewriteRule \? /redirect/redirect.cfm [NS,L]
To try to push this request
http://flyingpiston2012-com.securec37.ezhostingserver.com/?bikeid=1235
To this page:
http://flyingpiston2012-com.securec37.ezhostingserver.com/redirect/redirect.cfm
There's a couple of reasons what you're trying isn't working.
The first one is that RewriteRule uses a regex, and ? is a regex metacharacter, which therefore needs be escaped with a backslash (\?) to tell it to match the literal question mark character.
However, the second part of the problem is that the regex for RewriteRule is only tested against the filename part of the URL - it specifically excludes the query string.
In order to match against the query string you need to use the RewriteCond directive, placed on the line before the rule (but applied in between the RewriteRule matching and replacing), acting as an additional filter. The useful bit is that you can specify which part of the URL to match against (as well as having the option for using non-regex tests).
Bearing all this in mind, the simplest way to match/rewrite a request with a query string is:
RewriteCond %{QUERY_STRING} .
RewriteRule .* /redirect/redirect.cfm
The %{QUERY_STRING} is what the regex is tested against (everything in CF's CGI scope can be used here, and some other stuff too - see the Server Variables box in the docs).
The single . just says "make sure the matched item has any single character"
At the moment, this rule will preserve the existing query string - if you want to discard it, you can place a ? onto the end of the replacement URL. (If you need to use a query string on the URL and not discard the old version, use the [QSA] flag.)
In the opposite direction, you're losing the filename part of the URL - to preserve this, you probably want to append it onto the replacement as PATH_INFO, using the automatic whole-match capture $0.
These two things together provides:
RewriteCond %{QUERY_STRING} .
RewriteRule .* /redirect/redirect.cfm/$0?
One final thing is that you'll want to guard against infinite loops - the above rule strips the query string so it will always fail the RewriteCond, but better to be safe (especially if you might need to add a query string), which you can do with an extra RewriteCond:
RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !/redirect/redirect\.cfm
RewriteRule .* /redirect/redirect.cfm/$0?
Multiple RewriteCond are combined as ANDs, and the ! negates the match.
You can of course add whatever flags are required to the RewriteRule to have it behave as desired.

How to check for dot(.) in mod_rewrite

I want to redirect URL
domain/Family_He..
to
domain/Family_Health_insurance
using RewriteRule. I have tried with
RewriteRule /Family_He(.*)$ /Family_Health_insurance
and it is working. But I have some more page with urls like
domain/Family_Health_info
domain/Family_Health_quote
domain/Family_Health_child etc
When I tried as
RewriteRule /Family_He\.\.$ /Family_Health_insurance
then this won't works for me. Please help me out.
Are you aware that there are actually two spaces at the end of your subject string that would prevent \.$ (a literal dot at the end of the string) from matching?
To have it redirect (302) you need to add the R flag. To match the dot you need to escape it using \. like
RewriteRule ^/Family_He\.\.$ /Family_Health_insurance [R=302,L]
Note that the leading slash doesn't work in htaccess, only in httpd.conf