.htaccess: append subdomain as query to existing queries - regex

I'm trying to append the subdomain as a query to eventual existing queries using htaccess.
http://test.domain.com should be http://test.domain.com?x=test
http://test.domain.com?id=1 should be http://test.domain.com?id=1&x=test
This is what I have done, but it doesn't work and I can figure out why:
RewriteCond %{HTTP_HOST} ^([a-z0-9_-]+)\.domain\.com$ [NC]
// exclude www.domain.com
RewriteCond %1 !^(www)$ [NC]
RewriteRule ^[^\?]*(?:\?(.*))?$ index.php?$1&x=%1 [L]
My understanding was
[^\?]* all characters except ?, match 0 or more times
(?: start of a non capturing group
\? match ? literally
(.*) all characters after ? as a group
)? end of the non capturing group, match 0 or 1 times
But it does not work. Where is my mistake?
UPDATE 1:
I could make it work by using the following rule
RewriteRule (.*) index.php?$1&x=%1 [QSA,L]
http://test1.domain.com?y=test1 brings me [x=>test1,y=>test2]
but
http://test1.domain.com?y=test1&x=test3 brings me [x=>test3,y=>test2]
So it overrides my x value. Is there a way to block that?
UPDATE 2
This is the code I'm using now:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(?!www)([\w-]+)\. [NC]
RewriteCond %1::%{QUERY_STRING} !^(.+?)::x=\1(?:&|$) [NC]
RewriteRule ^ index.php?%{QUERY_STRING}&x=%1 [L]

Try this rule:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(?!www)([\w-]+)\. [NC]
RewriteCond %1::%{QUERY_STRING} !^(.+?)::x=\1(?:&|$) [NC]
RewriteRule ^ index.php?%{QUERY_STRING}&x=%1 [L]
Make sure this is the only rule you have in .htaccess while testing.
Explanation of:
RewriteCond %{HTTP_HOST} ^(?!www)([\w-]+)\. [NC]
RewriteCond %1::%{QUERY_STRING} !^(.+?)::x=\1(?:&|$) [NC]
We are capturing starting part of hostname from this group: ([\w-]+) which is denoted by %1. Note that we cannot use %1 in RHS of a condition.
We are then appending %1 and %{QUERY_STRING} together in %1::%{QUERY_STRING}. Here we could use any other arbitrary delimiter like ## as well.
In RHS we have ^(.+?)::x=\1(?:&|$) which means %1 followed by delimiter :: followd by literal x= and then \1 which is back-reference for %1 (goup before ::). ! before ^ is there to negate the condition. In simple words this condition means execute this rule only if we already don't have x=subdomain in QUERY_STRING.

Looks like you are trying to match the query string content with your RewriteRule’s pattern – that is not possible, it searches only the path component of the requested URL.
But, no worries – there’s an easy solution that helps combine the original query string, and what the pattern matched: The QSA flag.
So this should do the trick (combined with your existing RewriteConds):
RewriteRule (.*) index.php?$1 [QSA,L]

Related

RewriteRule to handle one domain two folders to two domains no folder

I am attempting to create rewrite rules to handle some specific website redirections:
I would like domain1.ca/folder1/xyz to go to domain2.ca/xyz and domain1.ca/folder2/xyz to go to domain3.ca/xyz
Right now my attempts are as following:
RewriteCond %{HTTP_HOST} ^domain1.ca$ [OR]
RewriteCond %{HTTP_HOST} ^www.domain1.ca$
RewriteRule ^(\/folder1\/)(.*)$ "https://domain2.ca/$1" [R=301,L]
RewriteCond %{HTTP_HOST} ^domain1.ca$ [OR]
RewriteCond %{HTTP_HOST} ^www.domain1.ca$
RewriteRule ^(\/folder2\/)(.*)$ "https://domain3.ca/$1" [R=301,L]
Any help would be greatly appreciated :) Thx.
A couple of problems with your existsing rules:
In .htaccess the URL-path matched by the RewriteRule pattern does not start with a slash. So, the URL-path starts folder1/xyz, not /folder1/xyz.
You are unnecessarily capturing "folder1" in the first parenthesised subpattern and using this in the substitution string (ie. $1). You should be using $2, or don't capture the first path segment.
The directives could also be tidied up a bit (eg. no need to backslash-escape slashes in the regex and the conditions can be combined).
Try the following instead:
RewriteCond %{HTTP_HOST} ^(www\.)?domain1\.ca [NC]
RewriteRule ^folder1/(.*) https://domain2.ca/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^(www\.)?domain1\.ca [NC]
RewriteRule ^folder2/(.*) https://domain3.ca/$1 [R=301,L]
Additional notes:
The end-of-string anchor ($) following (.*)$ in the RewriteRule pattern is not required since regex is greedy by default.
You only need to surround the argument in double quotes if it contains spaces.
I removed the end-of-string anchor ($) from the end of the CondPattern to also match fully qualified domain names that end in a dot.
I added the NC flag to the condition. It's technically possible that some bots can send a mixed/uppercase Host header.
Test first with 302 (temporary) redirects to avoid potential caching issues.

Rewrite condition with wildcard in query

I have a rewrite condition that rewrites /myPage.php?myQueryVar=foo-aRandomString to /myNewPage/foo-aRandomString. I only want this to apply in instances where there is a hyphen in the query value therefore I have some conditions in place as seen below:
RewriteCond %{QUERY_STRING} (^|&)myQueryVar=foo-(.*)($|&)
RewriteCond %{THE_REQUEST} ^GET\ /myPage\.php\?myQueryVar=([^\s&]+) [NC]
RewriteRule ^myPage\.php$ /myNewPage/%1? [R=301,L]
I'd like to add another rule exception allowing /myPage.php?myQueryVar=bar-aRandomString. Currently I've had to simply cloine the above code and use it again but changing foo to bar as sen below. Is there a cleaner way of doing this without having to have multiple line of near identical code? Thank you.
RewriteCond %{QUERY_STRING} (^|&)myQueryVar=bar-(.*)($|&)
RewriteCond %{THE_REQUEST} ^GET\ /myPage\.php\?myQueryVar=([^\s&]+) [NC]
RewriteRule ^myPage\.php$ /myNewPage/%1? [R=301,L]
Try to use this:
RewriteCond %{QUERY_STRING} (^|&)myQueryVar=(foo|bar)-(.*)($|&)
RewriteCond %{THE_REQUEST} ^GET\ /myPage\.php\?myQueryVar=([^\s&]+) [NC]
RewriteRule ^myPage\.php$ /myNewPage/%1? [R=301,L]
(The first RewriteCond was changed)
Basically, foo was changed to this (foo|bar) which means:
( # begin of group
foo # a literal 'foo'
| # or
bar # a literal 'bar'
) # end of group
Bear in mind we had to enclose the two options within a group, since if not, the regex would have meant instead:
(^|&)myQueryVar=foo|bar-(.*)($|&) ==> (^|&)myQueryVar=foo OR bar-(.*)($|&)
Also, if you don't want to capture what is inside the parenthesis (that's why they are called 'capturing groups') You may use 'non-capturing groups' instead (?:). Using non-capturing groups is a good practice if you don't actually need to capture the inner data.
Also, you don't need the group holding the .* on the first RewriteCond, since you really use the capturing group of the second RewriteCond
So the rules could be changed like this:
RewriteCond %{QUERY_STRING} (?:^|&)myQueryVar=(?:foo|bar)-.*(?:$|&)
RewriteCond %{THE_REQUEST} ^GET\ /myPage\.php\?myQueryVar=([^\s&]+) [NC]
RewriteRule ^myPage\.php$ /myNewPage/%1? [R=301,L]
But even so, you could just one one single RewriteCond by joining the two regexes:
RewriteCond %{THE_REQUEST} ^GET\ /myPage\.php\?myQueryVar=((?:foo|bar)-[^\s&]+) [NC]
RewriteRule ^myPage\.php$ /myNewPage/%1? [R=301,L]
(note that the (?:foo|bar)is a non-capturing group, since you don't need to capture just that part, you need to capture the variable as a whole

htaccess RewriteRule regex

I have hundreds of these old links I need to redirect.
Here is one example:
/index.php?option=com_content&view=article&id=433:seventh-character-code-categories-and-icd-10-cm&Itemid=101&showall=1
to
/seventh-character-code-categories-and-icd-10-cm
Essentially I need to remove the /index.php?option=com_content&view=article&id=433: part.
I tried this but I am getting confused with the [0-9] and : parts, so the following does not work:
RewriteRule ^/index.php?option=com_content&view=article&id=[0-9]:(.*)$ /$1 [L,R=301]
Say you want to capture from after : to right before & in the query string you mentioned, then try this expression:
^[^\:]*\:([^\&]*)\&.*$
As #starkeen mentioned in comments, you got to check against the query string. This can be done using RewriteCond %{QUERY_STRING}
So if index.php is in the root folder:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^\/index\.php$
RewriteCond %{QUERY_STRING} ^[^\:]*\:([^\&]*)\&.*$
RewriteRule ^(.*)$ http://example.com/%1 [R=301,L]
Here's another example. This one is for a sub folder:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^\/pages\/index\.php$
RewriteCond %{QUERY_STRING} ^[^\:]*\:([^\&]*)\&.*$
RewriteRule ^(.*)$ /pages/%1? [R=301,L]
Also, notice the ? at the end of the url /pages/%1?, this prevents from re-attaching the query string.
Another thing, captured groups will be set to variables %{number} since set in the RewriteCond.
BTW, depending on your server's configuration, you may need to add the NE flag, like [NE,L,R=301] Plus test whether it is necessary to double escape the literal characters.
what is about direct approach. Skip all till semicolon, mach string till & and replace all with first much
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{QUERY_STRING} [^:]+:([\w-]+[^&]).*
RewriteRule .*$ \/%1? [R=301,L]
</IfModule>

Regex to match a domain with only one subdomain

i'm trying to create a regex for my .htaccess that matches Domains that have only one subdomain
Example
test1.subdomain.ourdomain.de no match
subdomain.ourdomain.de match => redirect to default.subdomain.ourdomain.de
What I've got till now is this ugly thing:
^([a-z0-9_\-]+)\.([a-z0-9_\-]+)\.ourdomain\.de$
It matches test1.subdomain.ourdomain.de but not without test1. How to negate this correctly? My attempts with negative lookaheads did not work :-(
Try here: https://regex101.com/
This example htaccess will redirect from subdomain.ourdomain.de to default.subdomain.ourdomain.de, and there will be no match on http://first1.subdomain.ourdomain.de:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^subdomain.ourdomain.de$ [NC]
RewriteRule ^(.*)$ http://default.subdomain.ourdomain.de/ [R=301,NC,L]
To match any 1-subdomain URL, you need to use correct capture group with a corresponding RewriteCond:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^([^.]+\.[^.]+\.[^.]+)$ [NC]
RewriteRule ^(.*)$ http://default.%1/ [R=301,NC,L]
Input:
http://tes1.clothes.germany.de
Output: <NOTHING> as RewriteCond %{HTTP_HOST} ^([^.]+\.[^.]+\.[^.]+)$ [NC] condition was not met
Input 2:
http://clothes.germany.de
Output URL:
http://default.clothes.germany.de/
You can try this one:
^[^\.]+\.[^\.]+\.[^\.]+$
Here ^[^\.]+ will match the pattern upto a . at the beginning of the line, then we have matched a literal ., then match upto a ., then again . and then at the last we match any pattern that does not have . at the end of the line.

htaccess URL Rewrite with and without query string

I'm trying to rewrite some urls but my newbness is getting in the way.
URLs input into the address bar
example.com/prefix-suffix.html
example.com/prefix-suffix.html?v=huzzah
Desired output (would like them displayed like)
example.com/prefix/suffix/
example.com/prefix/suffix/huzzah
I've looked at a few other examples like this one, but I just don't understand what they are doing exactly or how rather, which makes it challenging to modify.
If you have the time to explain what each line is doing I would greatly appreciate it.
Thanks.
You can use this code in your DOCUMENT_ROOT/.htaccess file:
RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} \s/+([^-]+)-([^.]+)\.html\s [NC]
RewriteRule ^ /%1/%2/ [R=302,L,NE]
RewriteCond %{THE_REQUEST} \s/+([^-]+)-([^.]+)\.html\?v=([^\s&]+) [NC]
RewriteRule ^ /%1/%2/%3? [R=302,L,NE]
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
RewriteRule ^([^/]+)/([^/]+)/?$ /$1-$2.html [L]
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/?$ /$1-$2.html?v=$3 [L,QSA]
this is a good reference. to summarize the relevant parts:
in the SO question you are referring to, the OP wanted index.php?page=mobile to become index/page/mobile
and the accepted answer to that question was:
RewriteEngine on
RewriteBase /cashearn/
RewriteCond %{THE_REQUEST} /index\.php\?page=([^\s&]+) [NC]
RewriteRule ^ index/page/%1? [R=302,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^index/page/([^/]+)/?$ index.php?page=$1 [L,QSA,NC]
in order to understand how that is working, let's take a close look at this section:
RewriteCond %{THE_REQUEST} /index\.php\?page=([^\s&]+) [NC]
RewriteRule ^ index/page/%1? [R=302,L]
RewriteCond is a condition that must be matched in order for the subsequent RewriteRule to be used.
the pattern-matching is done using regex. the basics are easy to master so i recommend that you look it up. in any case, anything of the form %N (e.g. %1) is a backreference to the "grouped" part of the pattern of the last RewriteCond in the current set of conditions. A "grouped" part is denoted by parentheses. In other words, the %1 in the RewriteRule refers to the ([^\s&]+) in the RewriteCond.
looking at the regex:
square brackets in regex denote a character class so [^\s&] is thus a character class.
the caret ^ when it is inside a character class denotes negation. \s is an escape code for the whitespace character. so, all in all, [^\s&] means "any character except whitespace and &". the character class is appended with + which means "one or more". so, the regex pattern will match one of characters that are included in the character class. for a url, this essentially means any combination of letters, digits and %.
the other characters in the RewriteRule and RewriteCond, other than the "server-variable" %{THE_REQUEST}, are either regex special characters or literals. (by literals i mean that ab in a regex will match the string cab.)
^, the caret, when it isn't inside a character class, is a special character that denotes the beginning of a line. note that ? and . are also literals here, despite the fact that they are included in the list of regex "special characters". that is because they are escaped with a \.
the only thing left to explain is the flags [NC] and [R=302,L]
NC means case-insensitive. L means last, i.e. if there is a match, then subsequent rules will not be processed. 'R' means "redirect" and 302 is the redirect's HTTP status code.
the difference between an internal and external redirect is relevant here. with an internal redirect, the server will silently grab the resources from the filepath specified at the newly formed URL, while the user still sees the original URL in the browser. the R flag indicates an external redirect and this causes the user to initiate a new HTTP transaction with the newly formed URL. Omit R for an internal redirect.