Capture variable with zero or one slashes in Django URL - regex

I need to match a Django URL pattern that can contain zero or one slashes. So far I've got a pattern that matches any number of characters, including slashes.
url(r'^(?P<myvar>[\w/-]+)/', index.home)
I tried adding a question mark after the slash in the variable pattern, but that doesn't seem to work.
url(r'^(?P<myvar>[\w/?-]+)/', index.home)
To be absolutely clear, I want to capture strings such as foo and foo/bar but not foo/bar/bing.
EDIT:
So my root URL is example.com. I need three different routes: one for the main page, one for variable foo that can either be just an alphanumerical string or an alphanumerical string containing one slash, and one for two variables foo and bar where bar must be a simple alphanumerical string.
example.com # should lead to main page
example.com/hey # should lead to second route where foo="hey"
example.com/hey/ho # should lead to second route where foo="hey/ho"
example.com/hey/ho/hi # should lead to third route where foo="hey/ho" and bar="hi"
So I need to capture foo and bar as variables, where foo can either contain a slash or not contain a slash. But as far as this question is concerned, I'm only interested in capturing foo.

You could try the below regex to capture specific parts in an URL.
^[^/]*\/(?P<foo>(?:[^/\n]*(?:/[^/\n]*)?))(?:/(?P<bar>.*))?
DEMO

Related

Matching redirect on url end, ignoring the substring

Im currently trying to redirect from and old website to the new one.
The domain has changed and the subpath has changed, but the end is always the same, so I am trying to create a regex that will ignore the subpath, and only match with the ending, no matter what the combination might be.
Example:
http://shop.kmsport.dk/team-sport/bolde/fodbolde
https://kmsport.dk/collections/fodbolde
http://shop.kmsport.dk/fodbolde/fodbold-udstryr/anforerbind-325
These 3 urls all contain the word "fodbolde" but I only wanna match up the first two, since they both end on "/fodbolde", and ignoring the subpath in the process.
So far I've been able to match up the ends with this:
\/([a-zA-Z]*)*+$
How do I create something to account for the different subpaths?
P.s Its a massive sporting good store, so would be nice not having to creating a unique redirect for every possible combination -.-
If you are only interested in the last part just go with
url.rsplit('/', 1)[-1]
You current regex is not taking /fodbolde into account. If that has to be at the end you could use $ to assert the end of the string like /fodbolde$
One possibility could be to match the start of the string ^https?:// and optionally match shop. (?:shop\.)? followed by kmsport\.sk/
Then use a repeating pattern matching not a forward slash followed by a forward slash zero or more times (?:[^/]+/)* and at the end of the string match fodbolde fodbolde$
^https?://(?:shop\.)?kmsport\.dk/(?:[^/]+/)*fodbolde$

Regex for URL to sites

I have two URLs with the patterns:
1.http://localhost:9001/f/
2.http://localhost:9001/flight/
I have a site filter which redirects to the respective sites if the regex matches. I tried the following regex patterns for the 2 URLs above:
http?://localhost[^/]/f[^flight]/.*
http?://localhost[^/]/flight/.*
Both URLS are getting redirected to the first site, as both URLs are matched by the first regex.
I have tried http?://localhost[^/]/[f]/.* also for the 1st url. I am Unable to get what am i missing . I feel that this regex should not accept any thing other than "f", but it is allowing "flight" as well.
Please help me by pointing the mistake i have done.
Keep things simple:
.*/f(/[^/]*)?$
vs
.*/flight(/[^/]*)?$
Adding ? before $ makes the trailing slash with optional path term optional.
The first one will be caught with following regex;
/^http:[\/]{2}localhost:9001\/f[^light]$/
The other one will be disallowed and can be found with following regex
/^http:[\/]{2}localhost:9001\/flight\/$/
You regex has several issues: 1) p? means optional p (htt:// will match), 2) [^/] will only match : in your URLs since it will only capture 1 character (and you have a port number), 3) [^light] is a negated character class that means any character that is not l, i, g, h, or t.
So, if you want to only capture localhost URLs, you'd better use this regex for the 1st site:
http://localhost[^/]*/f/.*
And this one for the second
http://localhost[^/]*/flight/.*
Please also bear in mind that depending on where you use the regexps, your actual input may or may not include either the protocol.
These should work for you:
http[s]{0,1}:\/\/localhost:[0-9]{4}\/f\/
http[s]{0,1}:\/\/localhost:[0-9]{4}\/flight\/
You can see it working here

Apply Laravel filter using regex as route name

I'm trying to apply a route filter to all the routes except homepage. I do it like this:
Route::whenRegex('/^\/[\S]+/', 'myFilter');
So, basically, I'm saying: match all the routes starting with /, followed by any non-whitespace character(s). However, the filter doesn't work.
The filter itself:
Route::filter('myFilter', function() {
if (Session::has('userRegState')) {
return Redirect::action('DefaultController#home');
}
});
I checked it - the userRegState session variable is set, but no redirect is done. Is the regex used in the filter wrong?
OK, it seems there might be some kind of forward slash stripping done in the routing logic of Laravel, which makes my pattern not match what it should. To fix it, I omit the forward slash.
Instead of this:
Route::whenRegex('/^\/[\S]+/', 'myFilter');
I do that:
Route::whenRegex('/^\S{2}/', 'myFilter');
The final regex matches anything that starts with at least 2 non-whitespace characters. In my case it's forward slash followed by any non-whitespace character, like /a, /page2, etc.

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+

This regex matches and shouldn't. Why is it?

This regex:
^((https?|ftp)\:(\/\/)|(file\:\/{2,3}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)?)+?)(\.)([a-z]{2}
|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum))([a-zA-Z0-9\?\=\&\%\/]*)?$
Formatted for readability:
^( # Begin regex / begin address clause
(https?|ftp)\:(\/\/)|(file\:\/{2,3}))? # protocol
( # container for two address formats, more to come later
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) # match IP addresses
)|( # delimiter for address formats
((([a-zA-Z0-9]+)(\.)?)+?) # match domains and any number of subdomains
(\.) #dot for .com
([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum) #TLD clause
) # end address clause
([a-zA-Z0-9\?\=\&\%\/]*)? # querystring support, will pretty this up later
$
is matching:
www.google
and shouldn't be. This is one of my "fail" test cases. I have declared the TLD portion of the URL to be mandatory when matching on alpha instead of on IP, and "google" doesn't fit into the "[a-z]{2}" clause.
Keep in mind I will fix the following issues seperately - this question is about why it matches www.google and shouldn't.
Querystring needs to support proper formats only, currently accepts any combination of querystring characters
Several protocols not supported, though the scope of my requirements may not include them
uncommon TLDs with 3 characters not included
Probably matches http://www.google..com - will check for consecutive dots
Doesn't support decimal IP address formats
What's wrong with my regex?
edit: See also a previous problem with an earlier version of this regex on a different test case:
How can I make this regex match correctly?
edit2: Fixed - The corrected regex (as asked) is:
^((https?|ftp)\:(\/\/)|(file\:\/{2,3}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)?)+?)(\.)([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum))([\/][\/a-zA-Z0-9\.]*)*?([\/]?[\?][a-zA-Z0-9\=\&\%\/]*)?$
"google" might not fit in [a-z]{2}, but it does fit in [a-z]{2}([a-zA-Z0-9\?\=\&\%\/]*)? - you forgot to require a / after the TLD if the URL extends beyond the domain. So it's interpreting it with "www.go" as the domain and then "ogle" following it, with no slash in between. You can fix it by adding a [?/] to the front of that last group to require one of those two symbols between the TLD and any further portion of the URL.
Your TLD clause matches "go" in google and the querystring support part matches "ogle" afterwards. Try changing the querystring part to this:
([?/][a-zA-Z0-9\?\=\&\%\/]*)?
google" doesn't fit into the "[a-z]{2}" clause.
But "go" does and then "ogle" matches "([a-zA-Z0-9\?\=\&\%/]*)?"