Regular Expression for url paths - regex

I want a regular expression to match on paths containing '/food/' but not '/food/api/':
http://example.com/food/api/pasta?sauce=true
Right now I'm using this:
/^((?!\/food\/api\/).)*$/
The problem with this is it matches ANY path that doesn't contain '/food/api/'
Behavior I want to achieve:
REGEX MATCHES
example.com/food/
example.com/food/meals
REGEX IGNORES
example.com/food/api/pasta?sauce=true
example.com/food/api/pasta
example.com/food/api/
example.com/meal
example.com/

Using a pattern like this ((?!\/food\/api\/).)* (a tempered greedy token solution) will match the whole line if it does not contain the sub string /food/api
As the quantifier is a * it will also match an empty line.
Instead, you can use an alternation to match until the first occurrence of a / followed by food or meal followed and a forward slash. After this slash, check that it is not followed by /api
^[^/]+/(?:food|meal)/(?!api/).*$
Regex demo
If the string can not contains spaces, you can exclude them using the negated character class [^/\s]+ and match \S* instead of .*
^[^/\s]+/(?:food|meal)/(?!api/)\S*$
Regex demo

Related

Regular expression to exactly match the last path segment of an URL without parameters, except if the path ends with a trailing slash

The goal of my regular expression adventure is to create a matcher for a mechanism that could add a trailing slash to URLs, even in the presence of parameters denoted by # or ? at the end of the URL.
For any of the following URLs, I'm looking for a match for segment as follows:
https://example.com/what-not/segment matches segment
https://example.com/what-not/segment?a=b matches segment
https://example.com/what-not/segment#a matches segment
In case there is a match for segment, I'm going to replace it with segment/.
For any of the following URLs, there should be no match:
https://example.com/what-not/segment/ no match
https://example.com/what-not/segment/?a=b no match
https://example.com/what-not/segment/#a no match
because here, there is already a trailing slash.
I've tried:
This primitive regex and their variants: .*\/([^?#\/]+). However, with this approach, I could not make it not match when there is already a trailing slash.
I experimented with negative lookaheads as follows: ([^\/\#\?]+)(?!(.*[\#\?].*))$. In this case, I could not get rid of any ? or # parts properly.
Thank you for your kind help!
Lookahead and lookbehind conditionals are so powerful!
(?<=\/)[\w]+(?(?=[\?\#])|$)
P.s: I just added [\w]+ that means [a-zA-Z0-9_]+.
Of course URLs can contain many other character like - or ~ but for the examples provided it works nicely.
If you want to match urls, you might use
\b(https?://\S+/)[^\s?#/]+(?![^\s?#])
Explanation
\b A word boundary to prevent a partial word match
( Capture group 1
https?://\S+/ Match the protocol, 1+ non whitespace chars and then the last occurrence of /
) Close group 1
[^\s?#/]+ Match 1+ chars other than a whitespace char ? # /
(?![^\s?#]) Negative lookahead, assert that directly to the right is not a non whitespace char other than ? or #
See a regex demo.
In the replacement use group 1 followed by segment/
For a match only instead of a capture group:
(?<=\bhttps?://\S+/)[^\s?#/]+(?![^\s?#])
See another regex demo.

How to match a word based on slash in regular expression

I am trying to match a word with regex. for example, I want to match only first 2 folders in below string
/folder1/folder2/filder3/folder4/folder5
I wrote a below regex to match first two folders but it matches everything till /folder5 but I wanted to match only till /folder2
/(\w.+){2}
I guess .+ matches everything. Any idea how to handle this?
You can use
^/[^/]+/[^/]+
^(?:/[^/]+){2}
Or, if you need to escape slashes:
^\/[^\/]+\/[^\/]+
^(?:\/[^\/]+){2}
See the regex demo. [^/] is a negated character class that matches any char other than a / char.

Regex to match string that does not contain slash

I am trying to set up a route using vue-router in a web app using regex to match the pattern. The pattern I am looking to match is any string that contains alphanumeric characters (and underscore) without slashes. Here are some examples (the first slash is just to show the string after the domain e.g. example.com/):
/codestack
/demo45
/i_am_long
Strings that should not match would be:
/data/files.xml
/share/home.html
/demo45/photos
The only regex I came up with so far is:
path: '/:Username([a-zA-Z0-9]+)'
That is not quite right because it matches all the characters except for the slash. Whereas I want to only match on the first set of alphanumeric characters (including underscore) before the first forward slash is encountered.
If a route contains a forward slash e.g. /data/files.xml then that should be a different regex route match. Therefore I also need a regex pattern to match the examples above containing slashes. Theoretically, they could contain any number of slashes e.g. /demo45/photos/holiday/2015/bahamas.
For the first part, you can match 1 or more word characters which will also match an underscore.
The anchors ^ and $ assert the start and end of the string.
^\w+$
For the second one, you can start the match with word characters followed by /
In case of more forward slashes you can optionally repeat the first pattern in a group.
The last part after the pattern can be 1 or more word characters with a optional part matching a dot and word characters.
^\w+/(?:\w+/)*\w+(?:\.\w+)?$
Regex demo
If you want to match any char except / you can use [^/]
^(?:[^/\s]+/)+[^/\s]+$
Regex demo

Removing url params using regex

I have problem with removing params from url starting on ":". I have example path like:
/foo/:some_id/bar/:id
I would like to archive following result:
/foo/bar
I tried to create some regex. I figured out this:
\/:.*?\/
But this one removes :some_id but still It leaves :id part. Can you tell me how I can modify my regex to remove all params?
Your regex requires a / to be present at the end. You cannot just remove the / from the regex since .*? won't match anything then. Use a negated character class:
\/:[^\/]+
See the regex demo
Pattern details:
\/: - matches a literal /:
[^\/]+ - matches 1+ characters other than / as [^...] defines a negated character class matching all characters but those defined in the class.

How to get last word inside slashes using Regular Expressions

How to get last word inside slashes using Regular Expressions in a URL?
Example : http://aaa/bbb/ccc/ddd.aspx returns ccc.
Something like this
/([^/]+)/[^/]*$
Should match the last section of your URL and store ccc in a group.
Using lookbehind and lookahead, this should work
(?<=/)[^/]+(?=/[^/]*$)
(?<=/) the match must be preceeded by /
[^/]+ this will capture everything except a / - as many as possible.
(?=/[^/]*$) the match must be followed by a /, any number of non-slashes, and the end of string.