I need to match a character to split a big string, let's say -, but not if it's between two digits
In a-b it should match -
In a-4 it should match -
In 3-a it should match -
In 3-4 it should not match
I've tried negative lookahead and lookbehind, but I've only been able to come up with this (?<=\D)-(?=\D)|(?<=\d)-(?=\D)|(?<=\D)-(?=\d)
Is there a simpler way to specify this pattern?
Edit: using regex conditionals I think I can use (?(?<=\D)-|-(?=\D))
The following will work for this scenario. Be sure that your Regex flavor of choice has conditionals, otherwise this will not work:
-(?(?=\d)(?<=\D-))
- // match a dash
(? // If
(?=\d) // the next character is a digit
(?<= // then start a lookbehind (assert preceding characters are)
\D- // a non-digit then the dash we matched
) // end lookbehind
) // end conditional
With nothing as the substitution, as the dash is the only character captured.
Another option is to use an alternation to match a - when on the left is not a digit or match a - when on the right is not a digit:
(?<!\d)-|-(?!\d)
(?<!\d)- Negative lookbehind, assert what is on the left is not a digit and match -
| or
-(?!\d) Match - and assert what is on the right is not a digit using a negative lookahead
Regex demo
Related
I have the following string
abc123+InterestingValue+def456
I want to get the InterestingValue only, I am using this regex
\+.*\+
but the output it still includes the + characters
Is there a way to search for a string between the + characters, then search again for anything that is not a + character?
Use lookarounds.
(?<=\+)[^+]*(?=\+)
DEMO
You can use a positive lookahead and a positive lookbehind (more info about these here). Basically, a positive lookbehind tells the engine "this match has to come before the next match", and a positive lookahead tells the engine "this has to come after the previous match". Neither of them actually match the pattern they're looking for though.
A positive lookbehind is a group beginning with ?<= and a positive lookahead is a group beginning with ?=. Adding these to your existing expression would look like this:
(?<=\+).*(?=\+)
regex101
If it should be the first match, you can use a capture group with an anchor:
^[^+]*\+([^+]+)\+
^ Start of string
[^+]* Optionally match any char except + using a negated character class
\+ Match literally
([^+]+) Capture group 1, match 1+ chars other than +
\+ Match literally
Regex demo
Appreciating regex but still beginning.
I tried many workarounds but can't figure how to solve my problem.
String A : 4 x 120glgt
String B : 120glgt
I'd like the proper regex to return 120 as the number after "x".
But sometimes there won't be "x". So, be it [A] or [B] looking for one unique approach.
I tried :
to start the search from the END
Start right after the "x"
I clearly have some syntax issues and didn't quite get the logic of (?=)
(?=[^x])(?=[0-9]+)
So looking forward to learn with your help
As you tagged pcre, you could optionally match the leading digits followed by x and use \K to clear the match buffer to only match the digits after it.
^(?:\d+\h*x\h*)?\K\d+
The pattern matches:
^ Start of string
(?:\d+\h*x\h*)? Optionally match 1+ digits followed by x between optional spaces
\K Forget what is matched so far
\d+ Match 1+ digits
See a regex demo.
If you want to use a lookahead variant, you might use
\d+(?=[^\r\n\dx]*$)
This pattern matches:
\d+ Match 1+ digits
(?= Positive lookahead, assert what is to the right is
[^\r\n\dx]*$ Match optional repetitions of any char except a digit, x or a newline
) Close the lookahead
See another regex demo.
Need to match words that doesn't start with tss- or equal to tss. Tried multiple combinations but no positive results.
^(((?!(tss)).*)|(?!tss-).+)
To apply two negative lookahead checks against the input string, you need to simply chain them after the ^ anchor:
^(?!tss$)(?!tss-).*
The logical relationship is AND in this case:
^ - start of string
(?!tss$) - the string must not be equal to tss
AND
(?!tss-) - the string must not start with tss-
.* - match the rest.
If the words can also occur in a sentence, you might also use lookarounds to assert not tss followed by either a whitespace bounadry or - using a negative lookahead.
(?<!\S)(?!tss(?:(?!\S)|-))\S+
(?<!\S) Assert a whitespace boundary to the left
(?! Negative lookahead, assert to the right is not
tss Match literally
(?:(?!\S)|-) Match either a whitespace boundary or -
) Close Lookahead
\S+ Match 1+ non whitespace chars
Regex demo
I have an expression that is matching something, but am trying to get this not to match if it's followed by the suffix: one or more spaces, three dashes, one or more spaces, one or more digits, a slash, and finally one or more digits. Here is the expression:
(?<=(^|\s+))[A-Z]+[ ]+([0-9]+(\.[0-9]{1,3})?)/([0-9]+(\.[0-9]{1,3})?)(?!(\s+\-\-\-\s+[0-9]+/[0-9]+))
And here is the text:
January 10.5/13.5 --- 22/26 ---
It's matching January 10.5/13, but I don't want it to match anything.
As lookarounds are supported, you can change the positive lookbehind at the start to a negative lookbehind asserting a whitespace boundary to the left (?<!\S)
You can use .* to it to scan the whole line, instead of starting with 1+ more whitespace chars \s+
The negative lookahead (?!.*\s-{3}\s+[0-9]+/[0-9] asserts that what is on the right is not the suffix.
You can omit the quantifier + after the last character class, as it does not matter if there are 1 or more digits following...as long as it is not a digit.
Note that in the current pattern, the decimal part is an optional capturing group 2. If you want that whole value in group 1, you can make it an optional group.
(?<!\S)[A-Z]+[ ]+([0-9]+(\.[0-9]{1,3})?)/([0-9]+(\.[0-9]{1,3})?)(?!.*\s-{3}\s+[0-9]+/[0-9])
Regex demo
I have this regular expression
^[0-9]+[a-zA-Z0-9]*
But I need one that always stars with a number then it can be another number or a letter, but it can not be number letter number. The letter will always be the last. Like this example
102A OK
1A OK
2 OK
110 OK
10A1 WRONG
BV WRONG
The letter cannot be between two numbers.
You could use a negative lookahead.
^(?!\d+[a-z]\d)\d.*
with the case-indifferent flag set.
Demo
A match of this regular expression signifies that the string does not contain a 3-character substring consisting of a digit, a letter, a digit, in that order. If the entire string is to be matched when the match is successful, add .* to the end of the regex.
The regex engine performs the following operations.
^ match beginning of line
(?! begin negative lookahead
\d+[a-z]\d match digits-letter-digit
) end negative lookahead
\d match a digit
Note that \d at the end must follow the negative lookahead. If the regex were ^\d(?!.\d+[a-z]\d) and the string were 1A1 the negative lookahead would fail to find digit-letter-digit in A1 and the overall match would succeed (incorrectly).
Because the negative lookahead is pinned to the beginning of the line and consumes no characters, if it fails (match succeeds) the search for \d at the end of the regex begins at the beginning of the line.
You could match a char 1-9 followed by optional digits 0-9 and optional chars a-zA-Z.
If you use [a-zA-Z0-9] the character class will match any of the listed in any order.
If you separate the chars and the digits, the letter can not come before the digits and, as the * quantifier matches 0 or more times, you can also match a single digit.
^[1-9][0-9]*[a-zA-Z]*$
Regex demo