Regex match Or and Not - regex

I want to write a regular expression that captures the solutions to a quadratic equation, that is:
to match
x=1 or x=2 and x=2 or x=1
but not match
x=1 or x=1 and x=2 or x=2
I tried
x=[12] or x=[21]
but clearly [12]=[21] since that order doesn't matter.
I tried capturing the first value and using that:
x=([12]) or x=[\1]
which gives me the negation of what I want.
My thinking is that I need to match [12] and not \1. Can this be done? And if so, how?

You may use a capturing group on the first character class and restrict the second one with a negative lookahead containing a backreference to that Group 1 value:
x=([12]) or x=(?!\1)[21]
Probably, a word boundary will be helpful, too (in case you want to make sure you only avoid matching 1 in 1 and not 1 in 100):
x=([12]) or x=(?!\1\b)[21]\b
To match any digit(s), replace [21] with [0-9] or \d.
Details
x= - a literal substring
([12]) - Group 1: either 1 or 2 (\d+ or [0-9]+ to match any 1+ digits)
or - or enclosed with spaces
x= - a literal substring
(?!\1\b)[21]\b - a 2 or 1 (replace with [0-9]+ or \d+ to match any 1+ digits) that are not equal to the value captured in Group 1 (due to (?!\1\b) negative lookahead).
Note that in case your numbers can be glued to words, you will have to replace \b word boundaries with (?!\d) negative lookahead (no digit right after the current location).
See the regex demo here.

Try this regex:
x=(\d+) or x=(?!\1)\d+
Click for Demo

Related

Match with optional positive lookahead

I've got 2 strings in the format:
Some_thing_here_1234 Match Me 1 & 1234 Match Me 1_1
In both cases I want the resultant match to be 1234 Match Me 1
So far I've got (?<=^|_)\d{4}\s.+ which works but in the case of string 2 also captures the _1 at the end. I thought I could use a lookahead at the end with an optional such as (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) but it always seems to revert to the second option and so the _1 gets through.
Any help would be great
You can use
(?<=^|_)\d{4}\s[^_]+
See the regex demo.
Details:
(?<=^|_) - a positive lookbehind that matches a location that is immediately preceded with either start of string or a _ char (equal to (?<![^_]))
\d{4} - four digits
\s - a whitespace
[^_]+ - one or more chars other than _.
Your second pattern (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) is greedy and at the end of the string the second alternative |$ will match so you will keep matching the whole line.
Note that you can omit {1}
If you want to use an optional part in the lookahad, you can make the match non greedy and optionally match :_\d in the lookahead followed by the end of the string.
(?<=^|_)\d{4}\s.+?(?=(?:_\d)?$)
See a regex demo.

Regex to match X repeated exactly n times in a row

I am attempting to use regex to match a pattern where we have some X (any character) which occurs exactly n many times in succession. I know a little about regex, but don't know of anything like this.
My previous attempts left me using (.) as a capture group for my X, but I wasn't able to find a way to make sure this happened exactly n times (no more, and no less)
(Edit) For more context, I am trying to separate strings (containing only the letters 'r', 'p', and 's') into either "human" or "machine" generated and I want to assume that any string which has "XrrrrX" (where X is either s or p) or "YssssY" (where Y is either r or p) or "ZppppZ" (where Z is either s or r).
Some sample examples are
psrsrprrsssrrrpsprprsppspsssrsrssrpprppsrpssrp
psrpsprpsrpprpsprpsprpsrpprppsrpsprsprsprppsrp
psrrrrsprsrpsrrsprrrrrprpssssrsprrpspspppprpsr
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
where I want to match only strings that have at most 5 of any character in a row and also at least one occurrence of xxxxx (where x is any character repeated 5 times in a row)
You need to use a back-reference to your capture group. Here is an example regex:
(.)\1{2}
Regex explained:
(.) is a capture group that captures literally anything a single time
\1 is a back-reference to the group you just captured (that single character)
{2} is a quantifier, which matches the previous token (the \1) exactly twice.
Note that, to capture a single character n times, you have to specify {n - 1} as the quantifier because the first match was already captured by (.).
I believe the following should do what you'r after:
^(?!([psr]*?([psr])\2{4})\2)(?1)(?2)*$
See an online demo
^ - Start-line anchor;
(?! - Open a negative lookahead;
([psr]*?([psr])\2{4})\2) - A nested 1st capture group to match 0+ (Lazy) characters of [psr] upto another nested 2nd capture group of any of those characters. Right after is a backreference to the 2nd capture group which we match 4 times. After that we match the content of the 2nd capture group once more to avoid it occur 6+ times before we close the negative lookahead;
(?1) - Match the same subpattern as the 1st group;
(?2)* - Match the same subpattern as the 2nd group 0+ (Greedy) times upto;
$ - End-line anchor.
I suppose this would be short for something like:
^(?![psr]*?([psr])\1{5})[psr]*?([psr])\2{4}[psr]*$
I have assumed you wish to match any of the following strings:
'psssssp'
'sppppps'
'rsssssr'
'srrrrrs'
'prrrrrp'
'rpppppr'
(but I have a sneaking suspicion that you actually want to do something else, in which case please tell me in a comment).
That could be done by matching a simple alternation:
(?:psssssp|sppppps|rsssssr|srrrrrs|prrrrrp|rpppppr)
Alternatively, you could use a regular expression that employs back-references:
([psr])(?!\1)([psr])\2{4}\1
​Demo
The second regular expression has the following components.
([psr]) # match 'p', 's' or 'r' and save to capture group 1
(?!\1) # the next character cannot be the content of capture group 1
([psr]) # match 'p', 's' or 'r' and save to capture group 2
\2{4} # match the content of capture group 2 four times
\1 # match the content of capture group 1
(?!\1) is a negative lookahead.

Regex to get each item in an xpath-like string, with each subscript as a group

I'd like to take an xpath-like string such as:
a.b.c[2].d[123].e1[4].f88[5]
And have each path-part as a match, with each subscript ("array index") as a group, like this:
match 1: a
match 2: b
match 3: c, group 1: 123
match 4: e1, group 1: 4,
match 5: f88, group 1: 5
I tried with the following (which doesn't work):
[^.]+(?:\[)*([0-9]+)*(?:\])*
As I understand this Regex, it means:
First, match all characters except for a dot
Then, check (but don't capture) for a left square bracket - it may be present 0 to unlimited times.
Then, check for any number, with length 1 to unlimited - and capture as a group.
Then, do 2 again for a right square brack.
But it doesn't work.
How can I make it work?
[^.]+(?:\[)*([0-9]+)*(?:\])*
"But it doesn't work" because + is greedy and consumes all the characters before the dot. Furthermore, each subscript is integrally optional, rather than part by part.
Applying those criteria, this expression does work:
([^.\[]+)(?:\[(\d+)\])?
Regex101 Test
The pattern that you tried:
The pattern that you tried matches too much, as the negated character class [^.]+ matches 1 or more times any char except a dot, and can also match square brackets.
Note that this notation (?:\[)* is the same as \[* and matches 0 or more times an opening square bracket
If the \G anchor is supported, and you want to match the example string only from the start of the string, you might use 2 capture groups for the data that you want, and match the dots and square brackets in between.
\G([^\][.\s]+)(?:\[(\d+)\])?\.?
The pattern matches:
\G Assert the position at the end of the previous match, or at the start of the string
([^\][.\s]+) Capture group 1, match 1+ char other than ] [ . or a whitespace char (as there do not seem to be any spaces in the example string)
(?:\[(\d+)\])? Optionally match capture group 2 between matching square brackets
\.? Match an optional dot to continue the consecutive matching for the \G anchor
Regex demo
If there can not be a dot at the end of the string, and there must be at least 1 dot present, you can assert the whole format first from the start of the string:
(?:^(?=[^.]+(?:\.[^.]+)+$)|\G(?!^))\.?([^\][.]+)(?:\[(\d+)\])?
Regex demo

Regex: Opposite of group match

this expression
(^\+\d{2})_\1
would match
+32_+32
How can I make it match
+32_+44
If you want the opposite, you might use a negative lookahead (?!\1) asserting not the value of group 1 and then match a + and 2 digits
^(\+\d{2})_(?!\1)\+\d{2}
Regex demo
If you want to match an underscore followed by 2 digits, you don't need the first capturing group and you can match if afterwards.
^\+\d{2}_\+\d{2}
Regex demo

find matches for positive and negative number in arithmetic expression

Given the following arithmetic expression:
-1-5+(-3+2)
There is need to find matches for positive and negative numbers. For that expression expected result is: -1 5 -3 2
I tried to use regex -?\d+(.\d+)? but it returns: -1 -5 -3 2 where -5 is not correct.
Is that possible to build regex pattern to get positive and negative numbers for that case and other similar cases ?
You can use
(?<!\d)[-]?\d*\.?\d+
See the regex demo
Pattern details:
(?<!\d) - a negative lookbehind that fails the match if a digit appears before the currently tested position
[-]? - an optional (1 or 0) minus sign
\d* - 0+ digits
\.? - 1 or 0 dots (a literal dot as it is escaped)
\d+ - 1+ digits
Note that \d*\.?\d+ allows .456 values, if you do not need that, just use \d+(?:\.\d+)?.
If the lookbehind is not supported, use a capturing group with alternation to check if the - is not at the start of the string or before another digit:
(?:^|\D)([-]?\d*\.?\d+)
See another demo (the necessary value is in Group 1).
/((?:^|\+|-|\()(-?\d+))/
You don't say what language/script you're using, but here's a PHP example:
$string = '-1-5+(-3+2)';
preg_match_all('/(?:^|\+|-)(-?\d+)/', $string, $matches);
print_r($matches[1]);
Outputs:
Array
(
[0] => -1
[1] => 5
[2] => -3
[3] => 2
)
PHP Sandbox
EXPLANATION: There's two grouped patterns:
/
(?:^|\+|-|\()
(-?\d+)
/
The first group is a non-capturing group ?:, and an alternation, that is, match any character in the list. There's 4 characters in that list:
^ line start
\+ literal plus
- minus
\( literal open paren
The first group is there to make the optional - in the second group stand out. Every digit (positive or negative) will be preceded by one of the characters listed in the first group, now match all that follows including the - if present.