regular expression doesn't match, why? - regex

I want to match all paths that:
don't start with "/foo-bar/"
or not ends with any extension (.jpg, .gif, etc)
examples:
/foo-bar/aaaa/fff will not match
/foo-bar/aaaa/fff.jpg will not match
/aaa/bbb will match
/aaaa/bbbb.jpg will not match
/bbb.a will not match
this is my regex:
^\/(?!foo-bar\/).*(?!\.).*$
but is not working, why?
thanks!

It is more easy to try to match what you don't want. Example with PHP:
if (!preg_match('~^/foo-bar/|\.[^/]+$~', $url))
echo 'Valid!';
Your pattern doesn't work because of this part .*(?!\.).*$. The first .* is greedy and will take all the characters of the string until the end, after, to make the end of the pattern to succeed, the regex engine will backtrack one character (the last of the string). (?!\.).*$ will always match this last character if it is not a dot.
If you absolutely need an affirmative pattern, you can use this:
if (preg_match('~^/(?!foo-bar/)(?:[^/]*/)*+[^./]*$~', $url))
echo 'Valid!';

You can try this one, which is a bit simpler and close to what you have tried:
^(?!\/foo-bar)([^\.]+)$
Live Demo

Related

How can I get the second part of a hyphenated word using regex?

For example, I have the word: sh0rt-t3rm.
How can I get the t3rm part using perl regex?
I could get sh0rt by using [(a-zA-Z0-9)+]\[-\], but \[-\][(a-zA-Z0-9)+] doesn't work to get t3rm.
The syntax used for the regex is not correct to get either sh0rt or t3rm
You flipped the square brackets and the parenthesis, and the hyphen does not have to be between square brackets.
To get sh0rt in sh0rt-t3rm you you might use for example one of:
Regex
Demo
Explanation
\b([a-zA-Z0-9]+)-
Demo 1
\b is a word boundary to prevent a partial word match, the value is in capture group 1.
\b[a-zA-Z0-9]+(?=-)
Demo 2
Match the allowed chars in the character class, and assert a - to the right using a positive lookahead (?=-)
To get t3rm in sh0rt-t3rm you might use for example one of:
Regex
Demo
Explanation
-([a-zA-Z0-9]+)\b
Demo 3
The other way around with a leading - and get the value from capture group 1.
-\K[a-zA-Z0-9]+\b
Demo 4
Match - and use \K to keep out what is matched so far. Then match 1 or more times the allowed chars in the character class.
If your whole target string is literally just sh0rt-t3rm then you want all that comes after the -.
So the barest and minimal version, cut precisely for this description, is
my ($capture) = $string =~ /-(.+)/;
We need parenthesis on the left-hand-side so to make regex run in a list context because that's when it returns the matches (otherwise it returns true/false, normally 1 or '').
But what if the preceding text may have - itself? Then make sure to match all up to that last -
my ($capture) = $string =~ /.*-(.+)/;
Here the "greedy" nature of the * quantifier makes the previous . match all it possibly can so that the whole pattern still matches; thus it goes up until the very last -.
There are of course many other variations on how the data may look like, other than just being one hyphenated-word. In particular, if it's a part of a text, you may want to include word-boundaries
my ($capture) = $string =~ /\b.*?-(.+?)\b/;
Here we also need to adjust our "wild-card"-like pattern .+ by limiting it using ? so that it is not greedy. This matches the first such hyphenated word in the $string. But if indeed only "word" characters fly then we can just use \w (instead of . and word-boundary anchors)
my ($capture) = $string =~ /\w*?-(\w+)/;
Note that \w matches [a-zA-Z0-9_] only, which excludes some characters that may appear in normal text (English, not to mention all other writing systems).
But this is clearly getting pickier and cookier and would need careful close inspection and testing, and more complete knowledge of what the data may look like.
Perl offers its own tutorial, perlretut, and the main full reference is perlre
-([a-zA-Z0-9]+) will match a - followed by a word, with just the word being captured.
Demo

Remove last occurrence of pattern and everything after that

I want to remove last occurrence of a pattern "\[uU]" and everything after it from a string.
Example:
input: ab00cd\u00FF\U00FF0000
output: ab00cd\u00FF
I am doing this currently with something like lastIndexOf and substring and I wonder if there is a Regex way to do it. I figure it might involve lookarounds?
Match \U that isn't followed by \U.
You haven't said what language you're using, so the generic solution is:
Search: \[uU](?!.*\[uU]).*
Replace: <blank>
The negative look-ahead (?!.*\[uU]) asserts that \U (or \u) do not appear anywhere after the leading match.

Regex for selecting words ending in 'ing' unless

I want to select words ending in with a regular expression, but I want exclude words that end in thing. For example:
everything
running
catching
nothing
Of these words, running and catching should be selected, everything and nothing should be excluded.
I've tried the following:
.+ing$
But that selects everything. I'm thinking look aheads/look arounds could be the solution, but I haven't been able to get one that works.
Solutions that work in Python or R would be helpful.
In python you can use negative lookbehind assertion as this:
^.*(?<!th)ing$
RegEx Demo
(?<!th) is negative lookbehind expression that will fail the match if th comes before ing at the end of string.
Note that if you are matching words that are not on separate lines then instead of anchors use word boundaries as:
\w+(?<!th)ing\b
Something like \b\w+(?<!th)ing\b maybe.
You might also use a negative lookahead (?! to assert that what is on the right is not 0+ times a word character followed by thing and a word boundary:
\b(?!\w*thing\b)\w*ing\b
Regex demo | Python demo

Full match only if the capturing group encountered once

The pattern:
(test):(thestring)
What I want is full match only if there is just one test: before
test:thestring
But in this case there wouldn't be full match:
test:test:thestring
I've tried qualificator, but it didn't work.
Need help
Try this pattern: ^(?!.*((?(?<=^)|(?<=:))test(?=(:|$))).*(?1)).+$.
The main part is ((?(?<=^)|(?<=:))test(?=(:|$))), which matches test if it's preceeded by colon : or is at the beginning of a line and it's followed by colon : or end of the line.
(?(?<=^)|(?<=:)) this is workaround to (?<=(:|^)), but lookbehinds must have fixed length.
Then we have backreference to first capturing group (?1), to see if there are any other test.
This whole pattern is placed in negative lookahead (?!...), to match everything if it doesn't match pattern explained above (test matched more than one time).
Demo
for this very specific case:
(?<!.)(test:thestring)
Regex101
All it does is search for the string test:thestring and ensures that there are no characters before it. (Use Michał Turczyn's regex for an all purpose search!)
^((?!test:).)*(test:thestring)
See in action
If you want a full match and there should be only one time test: before test:string you might assert the start of the string ^, use a negative lookahead (?:(?!test:).) to match any character if what is on the right side is not test:
Then match test:thestring followed by a negative lookahead (?:(?!test:thestring).)* that matches any character if what is on the right side is not test:thestring and assert the end of the string $
^(?:(?!test:).)*test:thestring(?:(?!test:thestring).)*$
Regex demo

regex positive lookahead with if/else condition

I am trying to write an regular expression that would check if a pattern exists and, if it does, matches everything following it, and if (and only if) it does not, matches everything after another pattern.
example lines:
http://example.com/contact
www.example.com/contact
http://www.example.com/contact
expected output in all 3 cases: example
Here is the regular expression I expected would do the job:
(?(?<=www\.).+|(?<=http:\/\/).+)(?=\.com)
which I assumed would:
check if "www." is to be found
if yes, would match everything following it
if not, match everything following "http://"
restrict match to everything before the occurrence of ".com "
For the first two lines, the expression worked well, but in the third line www.example is matched instead of just example. Does this mean that for some reason the else command is executed although the if condition is met?
How can I change the above expression so that it only does the http// lookahead if the www. part was not found?
Converting my comment to answer.
You may use this regex:
^(?:https?://(?:www\.)?|www\.)\K\S+?(?=\.com(?:/|$))
RegEx Demo
RegEx Description:
^: Start
(?:https?://(?:www\.)?|www\.): Match http://www. or http:// or (https)
\K: Reset matched information
\S+?: Match 1+ non-space characters (lazy)
(?=\.com(?:/|$)): Using lookahead assert that we have .com or end of line ahead