I'm trying to assemble a regular expression to rewrite a URL containing uppercase characters to the same URL but in all lowercase.
Example:
example.com/foO-BAR-bAz rewrite to example.com/foo-bar-baz
example.com/FOO-BAR-BAZ rewrite to example.com/foo-bar-baz
example.com/foo-bar-baz does not match
I tried ^\/(?=.*[A-Z]) to match a string with at least one uppercase character but it doesn't match the full string. I also know that I need to use a "capturing group" but I'm not sure how.
I would be implementing this redirect rule in an .htaccess file of an Apache server
If you are on Apache 2.4 then in .htaccess you can use mod_rewrite with an Apache expression and make use of the tolower() function. (This doesn't require the use of a RewriteMap in the server config, as mentioned in comments.) For example:
RewriteEngine On
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] %1 [L]
The RewriteRule pattern simply checks there is at least one uppercase letter in the requested URL-path. The RewriteCond directive then calls the tolower() function on the URL-path (REQUEST_URI server variable) which is then effectively captured using the regex. The %1 backreference in the substitution string then holds the result of the tolower() function call, ie. the lowercased URL-path, which is internally rewritten to.
To "correct" the URL and issue an external redirect, then just add the R flag to the RewriteRule directive. For example:
:
RewriteRule [A-Z] %1 [R=301,L]
UPDATE: To eliminate a double redirect when redirecting HTTP to HTTPS (and/or non-www vs www) then include the full canonical URL as part of this rule and implement the canonical (scheme + hostname) redirects second.
For example:
# 1 - Upper to lowercase conversion (and HTTPS and WWW)
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] https://www.example.com%1 [R=301,L]
# 2 - HTTP to HTTPS
:
Note that the %1 backreference already includes the slash prefix at the start of the URL-path, so this is omitted in the substitution string.
HOWEVER, it is not necessarily incorrect to have a double redirect in this situation. ie. Redirect HTTP to HTTPS (same hostname and URL-path) first then canonicalise other elements of the requested URL (hostname, upper/lowercase URL-path etc.) second. These should be edge cases to begin with, so the real-world impact is minimal.
Note that if you are implementing HSTS then it is a requirement that you first redirect from HTTP to HTTPS on the same hostname, before canonicalising the hostname (ie. www vs non-www). In this case you should use the HTTP_HOST server variable (ie. %{HTTP_HOST}) as the hostname in the above redirect. A double redirect cannot be avoided in this scenario.
Related
My website will be like example.com/0xETHEREUMADDRESS.
So i want to redirect all those request starting with 0x to INDEX.HTML and index.html has already the code to the rest of work.
I need .htaccess code to redirect all starting with 0x to index.html.
Here is my tried .htaccess rules file.
RewriteEngine on
RewriteCond %{REQUEST_URI} !^/index.html$
RewriteRule (0x*)$ /index.html [L,R=302]
RewriteCond %{REQUEST_URI} !^/index.html$
RewriteRule (0x*)$ /index.html [L,R=302]
The regex (0x*)$ matches URLs that end-with 0, 0x, 0xx, 0xxx etc. It does not match URLs that start-with 0x, so this will not match the desired URL. The * character is a regex quantifier that repeats the preceding token 0 or more times. There is also no need for the capturing group (surrounding parentheses).
If the rule only matches URLs that start with Ox then the condition that checks the URL is not /index.html is therefore redundant.
The following will do what you are asking:
RewriteRule ^0x /index.html [R=302,L]
The ^ is the start-of-string anchor, so the requested URL must start with 0x. Note that the URL-path matched by the RewriteRule pattern does not start with a slash.
However, I'd question whether you really want to "redirect" the user? (As in an external HTTP redirect - which is what this is.) Redirecting will lose the original URL and expose /index.html to your users.
Internal "rewrite" instead
If, however, you wish to internally rewrite the request instead so that index.html can analyse the requested URL (as you say, "index.html has already the code to the rest of work") and keep /0xETHEREUMADDRESS in the browser's address bar then remove the R=302 flag and the slash prefix on the substitution string. For example:
RewriteRule ^0x index.html [L]
Reference:
https://httpd.apache.org/docs/current/rewrite/intro.html#regex
I want to redirect certain URLs starting with an expression. For ex
I want to redirect:
www.example.com/%2FE (www.example.com/%2FExxxxxxxx) to my blog page in my .htaccess file.
I can redirect www.example.com/2FExxxxx but I am not able to target the %.
The xxxx... I have used in the URL is to represent any expression after %2FE.
This is my code:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule %2FE /blog [R=301,L]
<IfModule>
Can anyone here help me?
By default Apache rejects (with a server generated 404) any URL that contains an encoded slash (%2F) in the URL-path part of the URL. This occurs before the request is processed by .htaccess. (This is considered a security feature.)
To specifically permit encoded slashes, there is the AllowEncodedSlashes directive (default value is Off). But this can only be set in a server or virtualhost context. It cannot be set in .htaccess. To permit encoded slashes, AllowEncodedSlashes can be set to either On or NoDecode (preferable).
For example:
# In a server / virtualhost context (not .htaccess)
AllowEncodedSlashes NoDecode
Then, once the above has been implemented in the server config and the webserver restarted, you can proceed to match the slash using mod_rewrite in .htaccess...
RewriteRule %2FE /blog [R=301,L]
Ordinarily, the RewriteRule pattern matches against the %-decoded URL-path. However, if the NoDecode option has been set then the encoded slash (%2F) is not decoded. So the above "should" work (except that the pattern is not anchored, so potentially matches too much).
But note that multiple (decoded) slashes are reduced in the URL-path that is matched by the RewriteRule pattern. So matching multiple-contiguous slashes here is not possible.
I would instead match against the THE_REQUEST server variable, which is as per the original request and always remains %-encoded (if that is how the request has been made). And multiple slashes are preserved. Note that THE_REQUEST contains the first line of the HTTP request headers, not just the URL-path.
For example:
RewriteEngine On
RewriteCond %{THE_REQUEST} \s/%2FE [NC]
RewriteRule . /blog [R=301,L]
You should not use the <IfModule> wrapper here.
Id like to have the following URL(s) redirect to the same URL just without the ?
For example:
https://www.example.com/this-is-static?numbersletterssymbols
goes to
https://www.example.com/this-is-static
"numbersletterssymbols" can be anything
Id like this to be a 301 , using htaccess ( apache )
I came across the following, however, the variable seems to be in parentheses
RewriteCond %{QUERY_STRING} ^product=(.*)$
RewriteRule ^test.php$ %1/? [R=301,L]
Any insight is appreciated
To remove the query string (any query string) from any URL you could do the following using mod_rewrite, near the top of your .htaccess file:
RewriteEngine On
RewriteCond %{QUERY_STRING} .
RewriteRule ^ %{REQUEST_URI} [QSD,R=301,L]
The condition (RewriteCond directive) simply asserts that there is a query string consisting of at least 1 character (determined by the regex . - a single dot).
The QSD (Query String Discard) flag removes the original query string from the redirected response. The QSD flag requires Apache 2.4 (which you are most probably using). The method used on earlier versions of Apache, as in your example, is to append a ? to the susbstitution string (essentially an empty query string).
Note that you should test first with a 302 (temporary) redirect to avoid potential caching issues.
however, the variable seems to be in parentheses
The parentheses in the regex simply creates a "capturing group" which can be referenced later with a backreference. eg. In your example, the value of the product URL parameter is referenced in the RewriteRule substitution string using the %1 backreference in order to redirect to the value of the URL parameter. This is very different to what you are trying to do and is arguably a security issue. eg. It would redirect a request for /test.php?product=https://malicious.com to https://malicious.com/, allowing a potential hacker to relay traffic via your site.
UPDATE: is it possible to make this work only for when the URL begins with "this-is-static" (for example)
Yes, the RewriteRule pattern (1st argument) matches the URL-path, less the slash prefix. For example:
RewriteCond %{QUERY_STRING} .
RewriteRule ^this-is-static %{REQUEST_URI} [QSD,R=301,L]
Matches all URLs that start with /this-is-static.
This is not really a problem but a question.
I have this file that shows the product information when you go to URL/product.php?id=1 How can I make it show the same when going to URL/product/1. id is a variable that changes.
Sorry, I have no clue how htaccess works, and how to rewrite..
You need to internally rewrite the request from /product/1 to /product.php?id=1. On Apache, you need to do this with mod_rewrite. In .htaccess this would take the form of:
# We must enable the rewrite engine before using mod_rewrite
RewriteEngine On
# Internally rewrite a request from "/product/1"
RewriteRule ^product/1$ /product.php?id=1 [L]
Note that this literally rewrites from /product/1 to /product.php?id=1 (as stated in the question) nothing else. And is internal to the server - the URL in the browser's address bar does not change.
The arguments to Apache directives are space separated:
^product/1$ - The first argument (pattern) to the RewriteRule directive is a regular expression (regex) that matches against the URL-path (only) of the request. Note that in .htaccess this URL-path does not start with a slash, so the URL-path that is matched is product/1 not /product/1, even though you are requesting example.com/product/1.
/product.php?id=1 - The second argument (substitution) is the string that is substituted for the requested URL. ie. the target URL. This is an "ordinary" string, not a regex.
[L] - The third argument (flags) are additional options that can change how the RewriteRule directive behaves. The argument must be surrounded in square brackets and contains a comma separated list of flags. The L (or last) flag signifies this is the last directive in this round of processing. If this is the last directive in the file then the L flag is not required. If you omit the L flag then processing continues and the request could be further rewritten (if you have more directives). Another common flag is the R (or redirect) flag. This changes the internal rewrite into an external redirect (which sends a Location HTTP response header back to the client and results in the browser being externally redirected to the new URL - the URL in the browser's address bar changes).
Additional Note: In this instance, since you are requesting "product" and a file with that basename exists (in fact, that is the file you are rewriting to) you also need to make sure that MultiViews is disabled (it is by default). If MultiViews is enabled (some shared hosts enable this for some reason) then mod_negotiation will trigger an internal subrequest for product.php before your mod_rewrite directive gets to rewrite the request and this will be missing the id URL parameter. (Numerous rewriting issues on SO are caused by conflicts with MultiViews.) To disable MultiViews, you can include this at the top of your .htaccess file:
Options -MultiViews
More generic
To make this more generic and rewrite /product/<number> to /product.php?id=<number>, where <number> is 1 or more digits, you can modify the regex (first argument) and create a backreference that you use in the substitution string (second argument). For example:
# Internally rewrite a request from "/product/<number>"
RewriteRule ^product/(\d+)$ /product.php?id=$1 [L]
This would rewrite a URL of the form /product/123 to /product.php?id=123. Where 123 is 1 or more digits, denoted by the regex subpattern \d+. (\d is a shorthand character class and is the same as the marginally more verbose [0-9]. + is a quantifier that indicates 1 or more of the preceding pattern - in this case digits.) By surrounding this in parentheses, we create a capturing group, which we can refer to in the substitution. That's what the $1 backreference is. $1 is essentially a variable that contains whatever value the regex captured.
In summary:
# Disable MultiViews
Options -MultiViews
# We must enable the rewrite engine before using mod_rewrite
RewriteEngine On
# Internally rewrite a request to "/product/<number>"
RewriteRule ^product/(\d+)$ /product.php?id=$1 [L]
Reference:
https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html#rewriterule
I need to have a RegEx that will match a URI like this based on the subdomain "blog"--
http://blog.foo.com/2010/06/25/city-tax-sale/
and redirect like this (getting rid of the subdomain and numbers/date)--
http://foo.com/city-tax-sale/
where the last bit "city-tax-sale" would be a wildcard. So basically any incoming URI that starts with 'blog.foo.com' would be redirected to 'foo.com' + 'whatever is at the end of the above URI after the three sub paths with numbers.
I hope that makes sense. Just trying to create one redirect instead of writing every single one.
This will explicitly match your date format, rather than any series of digits and slashes:
RewriteCond %{HTTP_HOST} ^blog\.foo\.com$ [NC]
RewriteRule ^/\d{4}/\d{2}/\d{2}/(.*)$ http://foo.com/$1 [L,R=301]
The regex part can be broken does to:
^ # start of non-domain url
/\d{4} # slash followed by 4 digits
/\d{2} # slash followed by 2 digits
/\d{2} # slash followed by 2 digits
/ # closing slash
(.*) # rest of the url, captured to group 1
$ # end of url
With the $1 in the replacement being group 1.
In the options part:
L is for "Last" - tells it to not bother looking at other rules.
R=301 is for Redirect with 301 header, which means permanent redirect (just R would send a temporary 302 header)
The RewriteCond bit performs a case-insensitive (NC option) check on the HTTP_HOST header (supplied by user/client) and if it starts blog.foo.com it performs the rewrite, otherwise it doesn't.
RewriteCond %{HTTP_HOST} ^blog.foo.com [NC]
RewriteRule ^(\d+/)+(.*)/?$ http://foo.com/$2 [L,R=301]
You can try this:
/http:\/\/blog\..*\.[a-zA-Z]{2,5}\/[0-9]{4}\/[0-9]{2}\/[0-9]{2}\/(.*)\//