.htaccess Negated Lookahead Regex malformed - regex

The following is a segment from my .htaccess file.
I want the following behaviour from Apache (currently the site is at localhost, but that shouldn't matter, right?):
If the requested resource is anything else other than
{site_url}/core
or
{site_url}/login
like
{site_url}/pseudo/path/name
the resource served must be
{site_url}/site/pseudo/path/name
Otherwise the URL served must be {site_url}/core or {site_url}/login, i.e. whatever was requested.
The .htaccess file is:
<IfModule mod_alias.c>
AliasMatch ^/(?!core|login)(/?.*)$ /site/$2
Header add X-Enabled mod_alias
</IfModule>
But this doesn't seem to be working and returns an error. I am not very familiar with Regular Expressions and am trying to learn these. So what I have inferred from this expressions is:
If the expression after '/' , i.e. URI after site_url does not match core or login (?!core|login)) , and is followed by anything, inclusive of a sub-folder (/?.*)$ Optional slash, and anything following it, set the alias to /site/(anything that was matched in second parentheses).
The module is working, which I've checked using only the Header add part, the problem is the regex.
Please help.

Leave off the ^ at the beginning. The regex you have would match /pseudo/path/name but not {site_url}/pseudo/path/name, because you're telling it that the text must begin with a /.
Also, be careful, because your regex is excluding things like {site_url}/corel. That's probably not going to be a problem unless you have other directories beginning with core, but if you really want to make it match anything other than {site_url}/core or {site_url}/login, use this regex:
/(?!core$|login$)(/?.*)$

Related

How to only show id value on url path with htaccess?

What I have right now is
https://www.example.com/link.php?link=48k4E8jrdh
What I want to accomplish is to get this URL instead =
https://www.example.com/48k4E8jrdh
I looked on the internet but with no success :(
Could someone help me and explain how this works?
This is what I have right now (Am I in the right direction?)
RewriteEngine On
RewriteRule ^([^/]*)$ /link.php?link=$1
RewriteRule ^([^/]*)$ /link.php?link=$1
This is close, except that it will also match /link.php (the URL being rewritten to) so will result in an endless rewrite-loop (500 Internal Server Error response back to the browser).
You could avoid this loop by simply making the regex more restrictive. Instead of matching anything except a slash (ie. [^/]), you could match anything except a slash and a dot, so it won't match the dot in link.php, and any other static resources for that matter.
For example:
RewriteRule ^([^/.]*)$ link.php?link=$1 [L]
You should include the L flag if this is intended to be the last rule. Strictly speaking you don't need it if it is already the last rule, but otherwise if you add more directives you'll need to remember to add it!
If the id in the URL should only consist of lowercase letters and digits, as in your example, then consider just matching what is needed (eg. [a-z0-9]). Generally, the regex should be as restrictive as required. Also, how many characters are you expecting? Currently you allow from nothing to "unlimited" length.
Just in case it's not clear, you do still need to change the actual URLs you are linking to in your application to be of the canonical form. ie. https://www.example.com/48k4E8jrdh.
UPDATE:
It works but now the site always sees that page regardless if it is link.php or not? So what happens now is this: example.com/idu33d3dh#dj3d3j And if I just do this: example.com then it keeps coming back to link.php
This is because the regex ^([^/.]*)$ matches 0 or more characters (denoted by the * quantifier). You probably want to match at least one (or some minimum) of character(s)? For example, to match between 1 and 32 characters change the quantifier from * to {1,32}. ie. ^([^/.]{1,32})$.
Incidentally, the fragment identifier (fragid) (ie. everything after the #) is not passed to the server so this does not affect the regex used (server-side). The fragid is only used by client-side code (JavaScript and HTML) so is not strictly part of the link value.

.htaccess RedirectMatch conditional regex fails

Good evening dear fellow coders,
I am trying to handle urls without file extensions that are more readable to average internet users using a .htaccess redirect, like http://example.com/file to http://example.com/file.php (with or without query)
Unfortunately I am not able to use mod_rewrite, but although redirect does work, it seems not to be able to handle my request properly.
To handle any given URL I tried using
RedirectMatch ^/(?(?=.*\.php(?i).*)|(\w+)(.*)) /$1.php$2
And
RedirectMatch ^/.*\.php(?i).*|(\w+)(.*) /$1.php$2
As well as using $2 and $3, assuming the behaviour might extract the first pattern contrary to every knowledge.
It should extract characters and numbers for $1 and everything else for $2 (starting with ? for queries etc.) unless it contains the file extension .php.
Validating the regex with https://regex101.com/r/zF2bV9/2 everything should work fine, but implementing one of these lines to the .htaccess the filename will replace any given file with ".php" (as in http://example.com/.php) and obviously produce an error of a non-existing file.
What am I missing about the code or the redirect functionality?
You can try something like:
RedirectMatch ^/([^.]+)$ /$1.php
This matches the URL providing there is not already a dot (ie. .php) in the URL. And so prevents a redirect loop.
As mentioned in comments, you don't need to do anything specific with the query string, providing you want it passed through to the destination unaltered. The query string is not present in the URL-path that the RedirectMatch directive matches against anyway. So, any manipulation of the query string would require mod_rewrite.

Excluding directory from regex redirect

I wish to redirect all URLs with underscores to their dashed equivalent.
E.g. /nederland/amsterdam/car_rental becomes /nederland/amsterdam/car-rental. For this I'm using the technique described here: How to replace underscore to dash with Nginx. So my location block is matched to:
location ~ (_)
But I only want to do this on URLs not in the /admin namespace. To accomplish this I tried combining the regex with a negative lookup: Regular expression to match a line that doesn't contain a word?. The location now matches with:
(?=^(?!\/admin))(?=([^_]*))
Rubular reports the string /nederland/amsterdam/car_rental to match the regex, while /admin/stats_dashboard is not matched, just as I want it. However when I apply this rule to the nginx config, the site ends up in redirect loops. Is there anything I've overlooked?
UPDATE: I don't actually want to rewrite anything in the /admin namespace. The underscore-to-dash rewrite should only take place on all URLs not in the /admin namespace.
The Nginx location matching order is such that locations defined using regular expressions are checked in the order of their appearance in the configuration file and the search of regular expressions terminates on the first match.
With this knowledge, in your shoes, I will simply define one location using a regular expression for "admin" above that for the underscores you got from the Stack Overflow Answer you linked to.
location ~ (\badmin\b) {
# Config to process urls containing "admin"
}
location ~ (_) {
# Config to process urls containing "_"
}
Any request with admin in it will be processed by the first location block no matter whether it has an underscore or not because the matching location block appears before that for the underscores.
** PS **
As another answer posted by cnst a couple of days after mine shows, the link to the documentation on the location matching order I posted also indicates that you may also use the ^~modifier to match the /admin folder and skip the location block for the underscores.
I personally tend not to use this modifier and prefer to band regex based locations together with annotated comments but it is certainly an option.
However, you will need to be careful, depending on your setup, as requests starting with "/admin", but longer, may be matching with the modifier and lead to unexpected results.
As said, I prefer my regex based approach safe in the knowledge that no one will start to arbitrarily change the order of things in the config file without a clear understanding.
^(?!\/admin\b).*
You just need this simple regex with lookahead.See demo.
https://regex101.com/r/uF4oY4/16
Your regex will fail /nederland/amsterdam/car_rental too as it has _.So only the string /nederland/amsterdam/car will be considered.
or
you can use
rewrite ^(?!\/admin\b)([^_]*)_(.*)$ $1-$2;
You've not explicitly mentioned one way or the other, but it does appear that you likely only have a single /admin namespace, which forms the prefix of $uri and would match a ^/admin.*$ regex; let me provide two non-conflicting configuration options based on such an assumption.
As others suggested, you might want to use a separate location for /admin.
However, unlike the other answer, I would advise you to define it by a prefix string, and use the ^~ modifier to not check the regular expressions after a successful match.
location ^~ /admin {
}
Alternatively, or even additionally for an extra peace of mind and a fool-proof approach, instead of using what appears to be a non-POSIX regular expression from the linked answer (if my reading of re_format(7) on OpenBSD is to be believed), consider something that's much simpler, guaranteed to be understood by most people who'd claim they know what REs are, and work everywhere, not to mention likely be more efficient, considering that you already know that it's the ^/admin.* path that you want to exclude:
location ~ ^/[^a][^d][^m][^i][^n].*_.* {
}
To accomplish your goal, you could use either one of these two solutions, or even both to be more rigid and fool-proof.

More efficient RewriteRule for messy URL

I have developed a new web site to replace an existing one for a client. Their previous site had some pretty nasty looking URLs to their products. For example, an old URL:
http://mydomain.com/p/-3-0-Some-Ugly-Product-Info-With-1-3pt-/a-arbitrary-folder/-18pt/-1-8pt-/ABC1234
I want to catch all requests to the new site that use these old URLs. The information I need out of the old URL is the ABC1234 which is the product ID. To clarify, the old URL begins with /p/ followed by four levels of folders, then the product ID.
So for example, the above URL needs to get rewritten to this:
http://mydomain.com/shop/?sku=ABC1234
I'm using Apache 2.2 on Linux. Can anyone point me on the correct pattern to match? I know this is wrong, but here is where I am currently at:
RewriteRule ^p/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)?$ shop/?sku=$5 [R=301,NC,L]
I'm pretty sure that the pattern used to match each of the 4 folders is redundant, but I'm just not that sharp with regex. I've tried some online regex evaluators with no success.
Thank you.
--EDIT #1--
Actually, my RewriteRule above does work, but is there a way to shorten it up?
--EDIT #2--
Thanks to ddr, I've been able to get this expression down to this:
RewriteRule ^p/([\w-]+/){4}([\w-]+)$ shop/?_sku=$2 [R=301,NC,L]
--EDIT #3--
Mostly for the benefit of ddr, but I welcome anyone to assist who can. I'm dealing with over 10,000 URLs that need to be rewritten to work with a new site. The information I've provided so far still stands, but now that I am testing that all of the old URLs are being rewritten properly I am running into a few anomolies that don't work with the RewriteRule example provided by ddr.
The old URLs are consistent in that the product ID I need is at the very end of the URL as documented above. The first folder is always /p/. The problem I am running into at this point is that some of the URLs have a URL encoded double quote ("). Additionally, some of the URLs contain a /-/ as one of the four folders mentioned. So here are some examples of the variations in the old URLs:
/p/-letters-numbers-hyphens-88/another-folder/-and-another-/another-18/ABC1234
/p/-letters-numbers-hyphens-88/2%22/-/-/ABCD1234
/p/letters-numbers-hyphens-1234/34-88/-22/-/ABCD1234/
Though the old URLs are nasty, I think it is safe to say that the following are always true:
Each begins with /p/
Each ends with the product ID that I need to isolate.
There are always four levels of folders between /p/ and the product ID.
Some folders in between have hyphens, some don't.
Some folders in between are a hyphen ONLY.
Some folders in between contain a % character where they are URL encoded.
Some requests include a / at the very end and some do not.
The following rule was provided by ddr and worked great until I ran into the URLs that contain a % percent sign or a folder with only a hyphen:
RewriteRule ^p/(?:[\w-]+/){4}([\w-]+)$ shop/?_sku=$1 [R=301,NC,L]
Given the rule above, how do I edit it to allow for a folder that is hyphen only (/-/) or for a percent sign?
You can use character classes to reduce some of the length. The parentheses (capture groups) are also unnecessary, except the last one, as #jpmc26 says.
I'm not especially familiar with Apache rules, but try this:
RewriteRule ^p/(?:[\w-]+/){4}([\w-]+)$ shop/?sku=$1 [R=301,NC,L]
It should work if extended regular expressions are supported.
\w is equivalent to [A-Za-z0-9_] and you don't need to not capture underscores, so that's one replacement.
The {4} matches exactly four repetitions of the previous group. This is not always supported so Apache may not like it.
The ?: is optional but indicates that these parens should not be treated as a capture. Makes it slightly more efficient.
I'm not sure what the part in [] at the end is for but I left it. I can't see why you'd need a ? before the $, so I took it out.
Edit: the most compact way, if Apache likes it, would probably be
RewriteRule ^p(/[\w-]+){5}$ shop/?sku=$5 [R=301,NC,L]
EDIT: response to edit 3 of the question.
I'm surprised it doesn't work with only -. The [\w-]+ should match even where there is just a single -. Are you sure there isn't something else going on in these URLs?
You might also try replacing - in the regex with \-.
As for the %, just change [\w-] to [\w%-]. Make sure you leave the - at the end! Otherwise the regex engine will try to interpret it as part of a char sequence.
EDIT 2: Or try this:
RewriteRule ^p/(?:.*?/){4}(.*?)/?$ shop/?sku=$1 [R=301,NC,L]

Regular Expression for redirect

How do I redirect all the following URLs to "/" using single regular expression?
members/kaleem/
members/kaleem/activity/just-me/
members/kaleem/activity/
members/kaleem/activity/favorites/
members/kaleem/activity/groups/
members/kaleem/friends/
I am using it wordpress redirect plugin.
I'm not sure how Wordpress' redirect plugin works, but this regular expression will match all of above, as well as any other pages after members/kaleem.
members/kaleem[[\w\-\/]*
Grab word characters, dashes, and slashes that appear after members/kaleem. If there are certain pages after members/kaleem that shouldn't be matched, it get's more complicated. I was assuming that the examples you showed were part of a pattern.
If you want to only match kaleem/activity and kaleem/friends, plus any pages that are children of them, you can use this:
members/kaleem/((activity|friends)[\w\/\-]*)?
It seems members/ is the common identifier. Correct? If so, you just have to match that: ^members/. Otherwise it becomes a bit more complicated: ^members/kaleem/(?:friends|activity/(?:(?:just-me|favorites|groups)/)?). See: http://regex101.com/r/jJ4rM8