I have the following regular expression:
[^0-9+-]|(?<=.)[+-]
This regex matches either a non-digit and not + and - or +/- preceded by something. However, positive lookbehind isn't supported in JavaScript regex. How can I make it work?
The (?<=.) lookbehind just makes sure the subsequent pattern is not located at the start of the string. In JS, it is easy to do with (?!^) lookahead:
[^0-9+-]|(?!^)[+-]
^^^^^
See the regex demo (cf. the original regex demo).
Related
I have the following string
abc123+InterestingValue+def456
I want to get the InterestingValue only, I am using this regex
\+.*\+
but the output it still includes the + characters
Is there a way to search for a string between the + characters, then search again for anything that is not a + character?
Use lookarounds.
(?<=\+)[^+]*(?=\+)
DEMO
You can use a positive lookahead and a positive lookbehind (more info about these here). Basically, a positive lookbehind tells the engine "this match has to come before the next match", and a positive lookahead tells the engine "this has to come after the previous match". Neither of them actually match the pattern they're looking for though.
A positive lookbehind is a group beginning with ?<= and a positive lookahead is a group beginning with ?=. Adding these to your existing expression would look like this:
(?<=\+).*(?=\+)
regex101
If it should be the first match, you can use a capture group with an anchor:
^[^+]*\+([^+]+)\+
^ Start of string
[^+]* Optionally match any char except + using a negated character class
\+ Match literally
([^+]+) Capture group 1, match 1+ chars other than +
\+ Match literally
Regex demo
I am having a hard time coming up with a regex to match a specific case:
This can be matched:
any-dashed-strings
this-can-be-matched-even-though-its-big
This cannot be matched:
strings starting with elem- or asdf- or a single -
elem-this-cannot-be-matched
asdf-this-cannot-be-matched
-
So far what I came up with is:
/\b(?!elem-|asdf-)([\w\-]+)\b/
But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found
I am currently working with the Oniguruma engine (Ruby 1.9+/PHP multi-byte string module).
If possible, please elaborate on the solution. Thanks a lot!
If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.
(?<!\S)(?!(?:elem|asdf)?-)[\w-]+\b
Explanation
(?<!\S) Assert a whitespace boundary to the left
(?! Negative lookahead, assert the directly to the right is not
(?:elem|asdf)?- Optionally match elem or asdf followed by -
) Close the lookahead
[\w-]+ Match 1+ word chars or -
\b A word boundary
See a regex demo.
Or a version with a capture group and without a lookbehind:
(?:\s|^)(?!(?:elem|asdf)?-)([\w-]+)\b
See another regex demo.
I am trying to extract the days using regex groups in C# from the following string,
"RRULE:FREQ=MONTHLY;UNTIL=20211126T143000Z;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYSETPOS=-1"
I am new to regular expressions and looked at various websites to try write an expression the expression i have got so far is the following
(?:BYDAY=)([A-Z,]*);
Which matches
MO,TU,WE,TH,FR;
as a whole, which i can then use ',' in a split to achieve what I want, I wanted to know if there is a way of doing this purely in Regex.
If a quantifier in the lookbehind is supported, you might use:
(?<=BYDAY=[A-Z,]*)[A-Z]+
Explanation
(?<= Positive lookbehind, assert what is on the left is
BYDAY=[A-Z,]* match BYDAY= followed by 0 or more times A-Z or ,
) Close lookbehind
[A-Z]+ Match 1+ chars A-Z
.Net regex demo | C# demo by WiktorStribiżew
Alternatively you can make use of the \G anchor to get iterative matches and capture the value in group 1
(?:\G(?!^)|BYDAY=)([A-Z]+),?
Regex demo
I have this regex:
\[tag\](.*?)\[\/tag\]
It match any character between two tags. The problem that is matching also empty contents or just white spaces inside the tags, for example:
[tag][/tag]
[tag] [/tag]
How can I avoid it? Make it to match at least 1 character and not only white spaces. Thanks!
Use
\[tag\](?!\s*\[\/tag\])(.*?)\[\/tag\]
^^^^^^^^^^^^^^^^
See the regex demo and the Regulex graph:
The (?!\s*\[\/tag\]) is a negative lookahead that fails the match if, immediately to the right of the current location, there is 0+ whitespaces, [/tag].
You might change your expression to something similar to this:
\[tag\]([\s\S]+)\[\/tag\]
and you might add a quantifier to it, and bound it with number of chars, similar to this expression:
\[tag\]([\s\S]{3,})\[\/tag\]
Or you could do the same with your original expression as this expression:
Try this regex:
\[(tag)\](?!\s*\[\/\1\])(.*?)\[\/\1\]
This regex matches tag only if it has at least one non-whitespace char.
If this is a PCRE (or php) or NP++ or Perl, use this
(?s)(?:\[tag\]\s*\[/tag\](*SKIP)(?!)|\[tag\]\s*(.+?)\s*\[/tag\])
https://regex101.com/r/aCsOoQ/1
If not, you're stuck with using Stribnetz regex, which works because of
an odd condition of your requirements.
Readable
(?s)
(?:
\[tag\]
\s*
\[/tag\]
(*SKIP)
(?!)
|
\[tag\]
\s*
( .+? ) # (1)
\s*
\[/tag\]
)
I have the following regex:
[a-zA-Z0-9. ]*(?!cs)
and the string
Hotfix H5.12.1.00.cs02_ADV_LCR
I want to match only untill
Hotfix H5.12.1.00
But the regex matches untill "cs02"
Shouldn't the negative lookahead have done the job?
You may consider using a tempered greedy token:
(?:(?!\.cs)[a-zA-Z0-9. ])*
See the regex demo.
This will work regardless of whether .cs is present in the string or not because the tempered greedy token matches any 0+ characters from the [a-zA-Z0-9. ] character class that is not .cs.
You need to use positive lookahead instead of negative lookahead.
[a-zA-Z0-9. ]*(?=\.cs)
or
[a-zA-Z0-9. ]+(?=\.cs)
Note that your regex [a-zA-Z0-9. ]*(?!cs) is greedy and matches all the characters until it reaches a boundary which isn't followed by cs. See here.
At first pattern [a-zA-Z0-9. ]+ matches Hotfix H5.12.1.00.cs02 greedily because this pattern greedily matches alphabets , dots and spaces. Once it see the underscore char, it stops matching where the two conditions is satisfied,
_ won't get matched by [a-zA-Z0-9. ]+
_ is not cs
It works same for the further two matches also.