IIS Url Rewite: Add Trailing Slash, Preserve Anchors and Query Strings - regex

I've searched several SO posts and haven't found what I'm looking for. It might exists but might be fairly old enough to not show up for me. I found a post (Nginx rewrite: add trailing slash, preserve anchors and query strings) so close to what I need, but it's regex solution does not work for URL Rewrite for IIS, unless I'm doing it wrong.
Problem
I'm trying to add a forward slash / to the end of my url paths while also preserving any existing for query strings ? and anchors #.
Desired Solution
Basically, here's the desired results to each problem:
Entry: https://my.site.com/about
Result: https://my.site.com/about/
Entry: https://my.site.com/about?query=string
Result: https://my.site.com/about/?query=string
Entry: https://my.site.com/about#TestAnchor
Result: https://my.site.com/about/#TestAnchor
Entry: https://my.site.com/about?query=string#TestAnchor
Result: https://my.site.com/about/?query=string#TestAnchor
Current Tests
Our current regex ignores query strings and anchors, but I would like to take them into consideration now.
<rule name="AddTrailingSlash" stopProcessing="true">
<match url="^([^.?]+[^.?/])$" />
<action type="Redirect" url="{R:1}/" redirectType="Permanent" />
</rule>
I've also tested another regex but it only works if the url contains both a query string AND an anchor.
<rule name="AddTrailingSlash" stopProcessing="true">
<match url="^(.*)(\?.*?)(\#.*?)$" />
<action type="Redirect" url="{R:1}/{R:2}{R:3}" redirectType="Permanent" />
</rule>
NOTE: I just tested this last one (^(.*)(\?.*?)(\#.*?)$) and it actually doesn't work. If the url already contains a / before the ? the test passes which it should not, so I have more work to do here.
Question
Is there a single regex that I can use to solve this or do I need to use multiple rules?

TL;DR
IIS Rewrite (ALL) URIs with Trailing Slash & preserve Fragment and Query Strings
<rule name="AddTrailingSlash" stopProcessing="true">
<match url="^([^/]+:\/\/[^/#?]+|[^?#]+?)\/?((?:[^/?#]+\.[^/?#]+)?(?:[?#].*)?$)" />
<action type="Redirect" url="{R:1}/{R:2}" redirectType="Permanent" />
</rule>
IIS use ECMAScript so you can Try it here : https://regexr.com/6ele7
Update
IIS Rewrite (Considered) URIs with Trailing Slash & preserve Fragment and Query Strings
<rule name="AddTrailingSlash" stopProcessing="true">
<match url="^([^/]+:\/\/[^/#?]+|[^?#]+\/[^/.?#]+)([?#].*)?$" />
<action type="Redirect" url="{R:1}/{R:2}" redirectType="Permanent" />
</rule>
Try it here : https://regexr.com/6fk3g
http://127.0.0.1 --> http://127.0.0.1/
https://localhost --> https://localhost/
https://localhost? --> https://localhost/?
https://localhost/ --> https://localhost/
https://my.site.com --> https://my.site.com/
https://my.site.com:443? --> https://my.site.com:443/?
https://my.site.com/ --> https://my.site.com/
https://my.site.com/about.php --> https://my.site.com/about.php
https://my.site.com/about.php? --> https://my.site.com/about.php?
https://my.site.com/about --> https://my.site.com/about/
https://my.site.com/about? --> https://my.site.com/about/?
https://my.site.com/about/ --> https://my.site.com/about/
https://my.site.com/about/? --> https://my.site.com/about/?
https://my.site.com/about?query --> https://my.site.com/about/?query
https://my.site.com/about/?query --> https://my.site.com/about/?query
https://my.site.com/about.php?query --> https://my.site.com/about.php?query
https://my.site.com/about#hash --> https://my.site.com/about/#hash
https://my.site.com/about/#hash --> https://my.site.com/about/#hash
https://my.site.com/about.php#hash --> https://my.site.com/about.php#hash
https://my.site.com/about?query#hash --> https://my.site.com/about/?query#hash
https://my.site.com/about/?query#hash --> https://my.site.com/about/?query#hash
https://my.site.com/folder.name/about?query --> https://my.site.com/folder.name/about/?query
https://my.site.com/about?query#hash:http://test.com?q --> https://my.site.com/about/?query#hash:http://test.com?q
Explaination (All)
Level 1 - Lets just think about your examples:
^([^?#]+?)\/?([?#].*)?$
Group #1: ^ In first, [^?#] Any character except ?/#, Go much but lazy +? (Stop on first possible, by looking to next)
Ignore: \/? Then if a / exist or not
Group #2: [?#] = ?/# And .* Any much character next to that till $ End, (...)? If exist
It work well. But it will deal not right with:
https://my.site.com/about.php?query --> https://my.site.com/about.php/?query !!!
So let's add an exception...
Level 2 - How if we take possible file name Name.name.name.ext as Group #2?
^([^?#]+?)\/?((?:[^/?#]+\.[^/?#]+)?(?:[?#].*)?)$
(?:...) Non-Capturing group
([^/?#]+\.[^/?#]+)? Look for any possible file name or (?:[?#].*)? Any possible query or anchor strings
Now everything is OK, except this:
https://my.site.com? --> https://my.site.com? !!!
So we need another exception in Group #1
Level 3 - Take just domain URI as an alternative
^([^/]+:\/\/[^/#?]+|[^?#]+?)\/?((?:[^/?#]+\.[^/?#]+)?(?:[?#].*)?$)
(...|...) Alternative
[^/]+:\/\/[^/#?]+ First check if (not lazy) any pattern like ...://... till not / # ? exist?
Now it work great!
+ Explaination (Considered)
Level 4 - How if we just add a Not-Accepting . & / character set in first group to just match considered URIs and ignore others?
^([^/]+:\/\/[^/#?]+|[^?#]+\/[^/.?#]+)([?#].*)?$
\/[^/.?#]+ Check if after last / the set of characters be not /.?#
Now it is even smaller and faster!
Analyzing other method
As #károly-szabó answered well here, instead of looking for Not-Accepted character sets, we can look for matched pattern.
So if we want to use the method but in simpler way (2 Groups) (+ Some minor optimization), the regex will be:
^(https?:\/\/[\w.:-]+\/?(?:[\w.-]+\/)*[\w-]+(?!\/))([?#].*)?$
But URI path Accepted characters are more.
So a wider version of that Regex can be:
^(https?:\/\/[\w.:-]+\/?(?:[\w!#-)+-.;=#~]+\/)*[\w!#-);=#~+,-]+(?!\/))([?#].*)?$
Try it here: https://regexr.com/6elea
Note: Still "multibyte Unicode as domain name is allowed" but i ignored that in this method.
P.S.
Actually i don't think that we should rewrite it on IIS, because of these reasons:
Anchors char # can be part of a folder name (by %23)
A file name can have no extension
IIS/Browsers usually will (/should) handle Anchors/Queries
Ref:
IIS: UrlRewrite middleware query strings are preserved
Google/Anchor tags are stripped from URLs
RFC URI References
URI Wiki
How does IIS URL Rewrite handle # anchor tags
I Mean:
https://my.site.com/ --> (=Call root)
https://my.site.com/about --> (=Call root > Folder/File name about)
https://my.site.com/about/ --> (=Call root > Folder name about)
https://my.site.com/about?query --> (=Call root > Folder/File name about + Query)
https://my.site.com/about/?query --> (=Call root > Folder name about + Query)
https://my.site.com/about.php?query --> (=Call root > File name about.php + Query)
[When browser strip it:]
https://my.site.com/about#hash --> (=Call root > Folder/File name about + Anchor)
https://my.site.com/about/#hash --> (=Call root > Folder name about + Anchor)
https://my.site.com/about.php#hash --> (=Call root > File name about.php + Anchor)
[If not?]
https://my.site.com/folder#name/?query#hash
https://my.site.com/folder.name/about.php?query=one/two

You can try with this regex https://regex101.com/r/6TSqaP/2. This is matching every provided example and solves the problem if the url already has an ending '/'.
^((?:https?:\/\/[\w\.\-]*)(?:[\w\-]+\/)*(?:[\w\-]+)(?!\/))(\?.*?)?(\#.*?)?$
I used your second example as base for my regex, with the following logic.
The parts of the url: scheme://authority/path?query#fragment
first capture group matches the scheme://authority/path part of the url
second capture group optional and matching the ?query
third capture group also optional and for the #fragment
regex explanation
^( # should start with this
(?:https?:\/\/[\w\.\-]*) # match the http or https protocol and the domain
(?:[\w\-]+\/)* # match the path except the last element of it (optional)
(?:[\w\-]+)(?!\/) # match the last path element, but only if it's not closed with '/'
) # {R:1}
(\?.*?)? # {R:2} query (optional)
(\#.*?)? # {R:3} fragment (optional)
$ # string should end
Nginx
<rule name="AddTrailingSlash" stopProcessing="true">
<match url="^((?:https?:\/\/[\w\.\-]*)(?:[\w\-]+\/)*(?:[\w\-]+)(?!\/))(\?.*?)?(\#.*?)?$" />
<action type="Redirect" url="{R:1}/{R:2}{R:3}" redirectType="Permanent" />
</rule>
Edit: Updated regex to handle dashes (-) and multiple path elements

Related

301 Redirect Regex Pattern

I'm trying to make a IIS redirect rule to redirect from this url pattern, but it beats me:
https://www.mycompanyPLC.com/en/lorem/ipsum/whatever
to
https://www.mycompanyLTD.com/lorem/ipsum/whatever
Basically I need to replace PLC with LTD and if there is the "/en/" group in url, this has to be removed.
You can achieve your both the requirements using the single regex provided /en/ is preceded by .com. Something like:
(.*?)PLC\.com(?:\/\ben\b)?(.*)
Explanation of the above regex:
(.*?) - Represents 1st capturing group capturing everything before PLC lazily.
PLC\.com - Matches PLC.com literally.
(?:\/\ben\b)? - Represents a non-capturing group matching \en literally zero or one time. \b represents a word boundary.
(.*) - Represents the second capturing group matching everything after \en greedily.
$1LTD.com$2 - For the replacement(or redirection in this case) part you can get away with this string where $1 represents the first captured group and $2 represents the second captured group. In your case; you can use {R:1}LTD.com{R:2}.
You can find the demo of the above regex in here.
Please refer to below URL rule.
<system.webServer>
<rewrite>
<rules>
<rule name="ReverseProxyInboundRule1" stopProcessing="true">
<match url="en(.*)" />
<action type="Redirect" url="https://www.mycompanyLTD.com{R:1}" />
</rule>
</rules>
</rewrite>
</system.webServer>
There is no need to match a /en URL fragment forcibly. We redirect the request as long as we found that we have a /en URL segment. so does the http/https URL segment.
Feel free to let me know if there is anything I can help with.
After several hours of lecturing Regex I've created this rule and seems to be working (I've tested several scenarios):
^(http|https)://?(www.)mycompanyPLC.com/en?(.*)
and the Redirect URL from IIS is:
https://www.mycompanyLTD.com/{R:3}
Later edit:
The rule in IIS is like this:
<rule name="Replace PLC with LTD and remove /en/" enabled="true" stopProcessing="true">
<match url="(.*?)PLC\.com(?:\/\ben\b)?(.*)" />
<conditions logicalGrouping="MatchAny" trackAllCaptures="false" />
<action type="Redirect" url="{R:1}ltd.com{R:2}" />
</rule>
Test urls were this format:
http://webdev.myCompanyplc.com/en/our-experience/retail
{R:1} = http://webdev.myCompany
{R:2} = /our-experience/retail
Regex expression was ok, but redirect still didnt work

Use Regex to lookahead and redirect in case of match IIS

I have the following IIS rule which is supposed to redirect if the URI does not contain the word Api:
<rule name="React Routes" stopProcessing="true">
<match url=".*" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{REQUEST_URI}" pattern="^((?!Api).)*$" negate="false" />
</conditions>
<action type="Rewrite" url="/" />
</rule>
This was working fine until I added a token as a query parameter for a route. Now when it tries to match that URI it will go out of memory.
How would I have to write the pattern so it looks only in the first 30 characters? The /Api/ route will never appear later. This way I will make sure that the regular expression matching does not run out of memory when a token is present.
To make sure Api does not occur within the first 30 chars you may use
pattern="^(?!.{0,27}Api).*"
Details
^ - start of string
(?!.{0,27}Api) - a negative looakahead that matches a location that is not immediately followed with any 0 to 27 chars (other than linebreak chars) and Api after them
.* - any 0+ chars (other than linebreak chars).

Regex URL rewriting for custom login page

I'm implementing a custom login page for a multitenant portal where each client gets a different login page styled according to their stored settings.
To achieve this I am using IIS 7.5 with the URL Rewriting module.
My idea is to capture requests for "http://portal.com/client1/" and rewrite them into "http://portal.com/login.aspx?client=client1".
What I'm struggling with is the regex expression to match the URL and extract the "client1" bit out.
EXAMPLES:
"http://portal.com/pepsi" = "http://portal.com/login.aspx?client=pepsi"
"http://portal.com/fedex" = "http://portal.com/login.aspx?client=fedex"
"http://portal.com/northwind" = "http://portal.com/login.aspx?client=northwind"
"http://portal.com/microsoft/" = "http://portal.com/login.aspx?client=microsoft"
So the match should be found if the requested URL contains a single word after the first "/" and work whether there is a trailing "/" or not.
"http://portal.com/clients/home.aspx" would be ignored by the rule.
"http://portal.com/clients/catalog" would be ignored by the rule.
"http://portal.com/products.aspx" would be ignored by the rule.
Assuming:
That the parameter name is always client
and that you don't care about what is after /client1/ then you can use this simple pattern to capture that portion of the URL and then repeat it as a parameter
here:
<rewrite>
<rules>
<rule name="client1 rewrite">
<match url="^([^/.]*)[/]*$" />
<action type="rewrite" url="login.aspx?client={R:1}"/>
</rule>
</rules>
</rewrite>
Fiddle
This works because in all of the ignore list there is a slash in the "middle", but the slash is optional at ending of the "filter" list. So {R:1} will contain everything up to the first slash or end of url if there is no slash.

IIS: Removing Trailing Dots from URL

I want to remove trailing dots such as ... from all urls on IIS. I tried to use the following rule:
<rule name="RemoveTrailingDots" stopProcessing="true">
<match url="^([^.]*)\.+$" />
<action type="Redirect" redirectType="Permanent" url="{R:1}" />
</rule>
This works as expected on my local PC but not on my website.
For example I expected /fruits/apple... to be redirected to /fruits/apple but no redirect happens.
Thanks
Your problem is you're requiring that everything before the trailing dots be completely devoid of dots. You might try something like
<match url="^(.*[^.])\.+$" />
This will match any number of characters, followed by a single non-dot character (as part of the capture), followed by dots (which aren't captured).
That said, I don't understand why you even want this. URLs with trailing dots are not even remotely a common thing to have.

How to redirect subfolder to query in Mod Rewrite for IIS 7.0?

I'm using Mod Rewrite for IIS 7.0 from iis.net and want to redirect requests:
http://example.com/users/foo to http://example.com/User.aspx?name=foo
http://example.com/users/1 to http://example.com/User.aspx?id=1
I have created 2 rules:
<rule name="ID">
<match url="/users/([0-9])" />
<action type="Rewrite" url="/User.aspx?id={R:1}" />
</rule>
<rule name="Name">
<match url="/users/([a-z])" ignoreCase="true" />
<action type="Rewrite" url="/User.aspx?name={R:1}" />
</rule>
It passes a test into iis mmc test dialog, but doesn't in debug (URL like http://localhost:9080/example.com/users/1 or …/users/foo) and doesn't on real IIS!
What have I done wrong?
The obvious problem is that your current regexes only match one character in the user name or one number. You'll need to add a plus quantifier inside the parentheses in order to match multiple letters or numbers. See this page for more info about regex quantifiers. Note that you won't be matching plain URLs like "/users/" (no ID or name). Make sure this is what you intended.
The other problem you're running into is that IIS evaluates rewrite rules starting from the first character after the initial slash. So your rule to match /users/([0-9]) won't match anything because when the regex evaluation happens, the URL looks like users/foo not /users/foo. The solution is to use ^ (which is the regex character that means "start of string") at the start of the pattern instead of a slash. Like this:
<rule name="ID">
<match url="^users/([0-9]+)" />
<action type="Rewrite" url="/User.aspx?id={R:1}" />
</rule>
<rule name="Name">
<match url="^users/([a-z]+)" ignoreCase="true" />
<action type="Rewrite" url="/Users.aspx?name={R:1}" />
</rule>
Note that you're choosing Users.aspx for one of these URLs and User.aspx (no plural) for the other. Make sure this is what you intended.
BTW, the way I figured these things out was by using IIS Failed Request Tracing to troubleshoot rewrite rules. This made diagnosing this really easy. I was able to make a test request and look through the trace to find where each rewrite rule is being evaluated (it's in a section of the trace called "PATTERN_MATCH". For the particular PATTERN_MATCH for one of your rules, I saw this:
-PATTERN_MATCH
Pattern /users/([0-9]+?)
InputURL users/1
Negate false
Matched false
Note the lack of the beginning slash.
You should use <match url="/users/([0-9]+)" /> and <match url="/users/([a-z]+)" ignoreCase="true" />, respectively, to match the complete id/user and not just their first letter/digit. But I don't know why your regex would have failed on a single digit, so there must be another issue, too.
As for your second question, I'm not sure I understand completely. How can you tell the difference between a folder name and a user name? Will a folder always have a trailing slash?