Match string between two characters - regex

I'm looking for a regex command which gives me from a string the word which is between the last "/" and "&".
String:
://name.prod.something-blabla.com/erp/apps/appname/appname.text#/com/text/prod/appname/uil/partner/PartnerBearbeiten&unternehmenId1=Z0004dw
Desired output: PartnerBearbeiten
I tried: ([^\/]+\&) but it includes the & (PartnerBearbeiten&)
Image: Regex code in a xml

You can use a positive lookahead:
([^\/]+?)(?=&)
See demo.
Note that I made character class lazy (using +?), in order to work with multi-parameters URL.

You could use regex lookahead to exclude any matches from the result. So, to exclude & in that particular pattern, the regex would be ([^/]+(?=\&)) (check on this RegExr website)

Try this:
([^\/])*(?=&) <br>
Tested with Javascript.

Using a lookahead as shown in other answers is certainly possible here but I’d argue that the canonical solution to the problem of matching a substring between two delimiting characters is to exclude the last character from the match character class:
/([^/&]+)&
This has the advantage of making the overall expression simpler, and working efficiently without using non-greedy matching: using a lookahead assertion without non-greedy matching would force the regular expression to backtrack, which can be inefficient in this case (using non-greedy +? instead of + would also solve this, though).
Lookahead assertions are best reserved for cases that cannot be expressed differently. In this particular case, they are simply redundant.

Related

Regex formation and Issue in Negation

I need to create two regex
One, for catching these type of strings:
/xyz-courses/test/test
/abc-courses/test-abc/test-xyz
/abc-courses/test-abc/test-xyz?itsok=yes
But I don't want to match these strings where fixed word is prepended with -courses:
/fixed-courses/test/test
/fixed-courses/test-abc/test-xyz
/fixed-courses/test-abc/test-xyz?itsok=yes
I have created the following REGEX, which is working perfectly fine, but not sure about case how to exclude the prepended word fixed
/([^/]+)-courses/([^/]+)/([^/]+)$
Second, I need to create REGEX to negate all regex created in previous step.
I tried:
[^/([^/]+)-courses/([^/]+)/([^/]+)]$
But this is showing invalid on all REGEX checkers.
You may use this regex to disallow fixed- before courses:
^/((?!fixed-)[^/-]+)-courses/([^/]+)/([^/]+)$
RegEx Demo
(?!fixed-) is a negative lookahead that will fail the match if fixed- appears right after / and before courses/.
For second part use this to negate first regex:
^/(?!((?!fixed-)[^/-]+)-courses/([^/]+)/([^/]+)$).+
RegEx Demo 2

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.
You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.
The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)
In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB

Regex to match certain word but not a particular combination

I have 15 titles as follows:
fruits-and-flowers-themeA
fruits-and-flowers-themeB
fruits-and-flowers-just-test-themeA
themeAfruitsandflowers
nice-fruits-and-flowers-themeA
botanical-names-themeA
I want a regex to help me get only those titles with "themeA" in them, but it should not include "nice" and not include "just-test" or "just-tests".
I tried
^(?!.*just-test|*just-tests|nice).*?(?:themeA).*,
but I still get fruits-and-flowers-just-test-themeA in the output.
How to fix this?
Thanks
You can use this regex with negative lookahead:
^(?!.*?(?:just-tests?|nice)).*?themeA.*$
Working Demo
Option 1
You can use a single regex with lookaheads (see online demo):
^(?!.*nice?)(?!.*just-tests?).*themeA.*
The ^ asserts that the match starts at the beginning of the string (so we don't match a subset of the string
The (?!.*nice?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by nice
The (?!.*just-tests?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by just-test and an optional s
As a further tweak, you can compress the lookaheads into one using an | alternation as in anubhava's answer.
Option 2 without lookaheads (Perl, PHP/PCRE)
^(?:.*(?:nice|just-tests?).*)(*SKIP)(?!)|.*themeA.*
This one doesn't use lookaheads but just skips the unwanted titles. See demo.
Use two different regular expressions for clarity and simplicity.
Match your string against one regex that matches themeA:
/themeA/
and then check that the string does NOT match the one you don't want:
/nice|just-tests?/
Doing it in two different regexes makes it far easier to understand and maintain.

Regular Expression: match only non-repeated occurrence of a character

I need to find and replace all occurrences of apostrophe character in a string, but only if this apostrophe is not followed by another apostrophe.
That is
abc'def
is a match but
abc''def
is NOT a match.
I've already composed a working pattern - (^|[^'])'($|[^']) but I believe it may be shorter and simpler.
Thanks,
Valery
depends on your environment - if your environment supports lookahead and lookbehind, you can do this: (?<!')'(?!')
Ref: http://www.regular-expressions.info/lookaround.html
I think your pattern is short and precise. You could be using negative lookahead/lookbehind, but they would make it a lot more complex. Maintainability is important.
You'll have to be careful for an uneven number of apostrophes:
abc'''def
where you probably do want to replace the 3rd one and leave the 1st and 2nd in there.
You can do that like this (assuming you already matched string literals and only want to replace the uneven numbered trailing apostrophe):
Search for the pattern:
(('')*)'
and replace it with
$1
which is group 1: the even numbered apostrophes (or no apostrophes at all).
I'm not sure what actual problem you're solving, but in case you're parsing/reading a CSV file, or a string that has the likes of CSV input, I highly recommend using a decent CSV parser. Almost all languages have them in some form or another.
see here nagative lookahed q(?!u)
(?=pattern) is a positive look-ahead assertion
(?!pattern) is a negative look-ahead assertion
(?<=pattern) is a positive look-behind assertion
(?<!pattern) is a negative look-behind assertion
http://www.regular-expressions.info/lookaround.html
working DEMO

How to negate the whole regex?

I have a regex, for example (ma|(t){1}). It matches ma and t and doesn't match bla.
I want to negate the regex, thus it must match bla and not ma and t, by adding something to this regex. I know I can write bla, the actual regex is however more complex.
Use negative lookaround: (?!pattern)
Positive lookarounds can be used to assert that a pattern matches. Negative lookarounds is the opposite: it's used to assert that a pattern DOES NOT match. Some flavor supports assertions; some puts limitations on lookbehind, etc.
Links to regular-expressions.info
Lookahead and Lookbehind Zero-Width Assertions
Flavor comparison
See also
How do I convert CamelCase into human-readable names in Java?
Regex for all strings not containing a string?
A regex to match a substring that isn’t followed by a certain other substring.
More examples
These are attempts to come up with regex solutions to toy problems as exercises; they should be educational if you're trying to learn the various ways you can use lookarounds (nesting them, using them to capture, etc):
codingBat plusOut using regex
codingBat repeatEnd using regex
codingbat wordEnds using regex
Assuming you only want to disallow strings that match the regex completely (i.e., mmbla is okay, but mm isn't), this is what you want:
^(?!(?:m{2}|t)$).*$
(?!(?:m{2}|t)$) is a negative lookahead; it says "starting from the current position, the next few characters are not mm or t, followed by the end of the string." The start anchor (^) at the beginning ensures that the lookahead is applied at the beginning of the string. If that succeeds, the .* goes ahead and consumes the string.
FYI, if you're using Java's matches() method, you don't really need the the ^ and the final $, but they don't do any harm. The $ inside the lookahead is required, though.
\b(?=\w)(?!(ma|(t){1}))\b(\w*)
this is for the given regex.
the \b is to find word boundary.
the positive look ahead (?=\w) is here to avoid spaces.
the negative look ahead over the original regex is to prevent matches of it.
and finally the (\w*) is to catch all the words that are left.
the group that will hold the words is group 3.
the simple (?!pattern) will not work as any sub-string will match
the simple ^(?!(?:m{2}|t)$).*$ will not work as it's granularity is full lines
This regexp math your condition:
^.*(?<!ma|t)$
Look at how it works:
https://regex101.com/r/Ryg2FX/1
Apply this if you use laravel.
Laravel has a not_regex where field under validation must not match the given regular expression; uses the PHP preg_match function internally.
'email' => 'not_regex:/^.+$/i'