Regex to validate subtract equations like "abc-b=ac" - regex

I've stumbled upon a regex question.
How to validate a subtract equation like this?
A string subtract another string equals to whatever remains(all the terms are just plain strings, not sets. So ab and ba are different strings).
Pass
abc-b=ac
abcde-cd=abe
ab-a=b
abcde-a=bcde
abcde-cde=ab
Fail
abc-a=c
abcde-bd=ace
abc-cd=ab
abcde-a=cde
abc-abc=
abc-=abc
Here's what I tried and you may play around with it
https://regex101.com/r/lTWUCY/1/

Disclaimer: I see that some of the comments were deleted. So let me start by saying that, though short (in terms of code-golf), the following answer is not the most efficient in terms of steps involved. Though, looking at the nature of the question and its "puzzle" aspect, it will probably do fine. For a more efficient answer, I'd like to redirect you to this answer.
Here is my attempt:
^(.*)(.+)(.*)-\2=(?=.)\1\3$
See the online demo
^ - Start line anchor.
(.*) - A 1st capture group with 0+ non-newline characters right upto;
(.+) - A 2nd capture group with 1+ non-newline characters right upto;
(.*) - A 3rd capture group with 0+ non-newline characters right upto;
-\2= - An hyphen followed by a backreference to our 2nd capture group and a literal "=".
(?=.) - A positive lookahead to assert position is followed by at least a single character other than newline.
\1\3 - A backreference to what was captured in both the 1st and 3rd capture group.
$ - End line anchor.
EDIT:
I guess a bit more restrictive could be:
^([a-z]*)([a-z]+)((?1))-\2=(?=.)\1\3$

You may use this more efficient regex with a lookahead at the start with a capture group that matches text on the right hand side of - i.e. substring between - and = and captures it in group #1. Then in the main body of regex we just check presence of capture group #1 and capture text before and after \1 in 2 separate groups.
^(?=[^-]+-([^=]+)=.)([^-]*?)\1([^-]*)-[^=]+=\2\3$
RegEx Demo
RegEx Demo:
^: Start
(?=[^-]+-([^=]+)=.): Lookahead to make sure we have expression structure of pqr-pq=r and also more importantly capture substring between - and = in capture group #1. . after = is there for a reason to disallow any empty string after =.
([^-]*?): Match 0 or more non-- characters in capture group #2
\1: Back-reference to group #1 to make sure we match same value as in capture group #1
([^-]*): Match 0 or more non-- characters in capture group #3
-: Match a -
[^=]+: Match 0 or more non-= characters
=: Match a =
\2\3: Back-reference to group #2 and #3 which is difference of substraction
$: End

Related

Regex - add a zero after second period

I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!
You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2
You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.
Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));
Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.

Validate string # followed by digits but # increases after every occurance

I have a string looks like this
#123##1234###2356####69
It starts with # and followed by any digits, every time the # appears, the number of # increases, first time 1, second time 2, etc.
It's similar to this regex, but since I don't know how long this pattern goes, so it's not very useful.
^#\d+##\d+###\d+$
I'm using PCRE regex engine, it allows recursion (?R) and conditions (?(1)...) etc.
Is there a regex to validate this pattern?
Valid
#123
#12##235
#1234##12###368
#1234##12###368####22235#####723356
Invalid
##123
#123###456
#123##456##789
I tried ^(?(1)(?|(#\1)|(#))\d+)+$ but it doesn't seem to work at all
You can do this using PCRE conditional sub-pattern matching:
^(?:((?(1)\1)#)\d+)++$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
(: Start capture group #1
(?(1)\1): if/then/else directive that means match back-reference \1 only if 1st capture group is available otherwise match null
#: Match an additional #
): End capture group #1
\d+: Match 1+ digits
)++: End non-capture group. Match 1+ of this non-capture group.
$: End
One option could be optionally matching a backreference to group 1 inside group 1 using a possessive quantifier \1?+# adding # on every iteration.
^(?:(\1?+#)\d+)++$
^ Start of string
(?: Non capture group
(\1?+#)\d+ Capture group 1, match an optional possessive backreference to what is already captured in group 1 and add matching a # followed by 1+ digits
)++ Close the non capture group and repeat 1+ times possessively
$ End of string
Regex demo
I think you can use forward-referencing here:
^(?:((?:\1(?!^)|^)#)\d+)+$
See the regex demo.
Details:
^ - start of string
(?:((?:\1(?!^)|^)#)\d+)+ - one or more occurrences of
((?:\1(?!^)|^)#) - Group 1 (the \1 value): start of string or an occurrence of the Group 1 value if it is not at the string start position
\d+ - one or more digits
$ - end of string.
NOTE: This technique does not work in regex flavors that do not support forward referencing, like ECMAScript based flavors (e.g. JavaScript, VBA, C++ std::regex)
Despite there are already working answers, and inspired by Wiktor's answer, I came up this idea:
(?:(^#|#\1)\d+)+$
Which is also quite short and effective(also works for non pcre environment).
See the test cases

Regex - match until group that may or may not occur

I have following text:
:3:Start!##$%^&*():31:Start!##$%^&*():31:End!##$%^&*():3:End
and with following regex:
(:3:Start)(.*)(:31:Start.*:31:End)?(.*)(:3:End)
Why group3 is not found even though it exists. Even if I set group2 as not greedy:
(:3:Start)(.*?)(:31:Start.*:31:End)?(.*)(:3:End)
How Can I capture group with optional subgroup if it occurs in the middle of the text
You may achieve what you need if you enclose the (.*?) and (:31:Start.*:31:End) groups into an optional non-capturing group (quantified with a greedy ? quantifier) and making the optional group obligatory:
(:3:Start)(?:(.*?)(:31:Start.*:31:End))?(.*)(:3:End)
|____________________________|
See the regex demo. It will work like this:
(:3:Start) - will capture into Group 1 the :3:Start` string
(?:(.*?)(:31:Start.*:31:End))? - will attempt to match once a sequence of patterns:
(.*?) - Group 2: any 0 or more chars other than line break chars as few as possible
(:31:Start.*:31:End) - Group 3: :31:Start.*:31:End string
(.*) - Group 4: any 0 or more chars other than line break chars as many as possible
(:3:End) - captures into Group 5 :3:End string
Why doesn't your pattern work?
See your pattern demo, the !##$%^&*():31:Start!##$%^&*():31:End!##$%^&*() substring is captured into Group 4, matched with (.*) pattern. It happens because (.*?)(:31:Start.*:31:End)? first skips the .*? pattern (it is lazy, non-greedy, the engine does not even attempt to match it when it sees such a pattern the first time, it goes on matching with obligatory patterns and only comes back when the subsequent patterns do not match), and (:31:Start.*:31:End)? matches an empty string right after :3:Start substring. The rest finds a match, thus, no optional text is matched into your expected group.

Regular expression to exclude group with 0 and more occurence issue

I need to extract 1234567 from below URLs
http://www.test.in/some--wonders-1234567---2
http://www.test.in/some--wonders-1234567
I tried with .*\-([0-9]+)(?:-{2,}2)?.
but for the first URL it returned 2, but this is in non-capturing group.
Please give me a solution. I am digging it for so long. not getting any idea.
Try this one:
.*?\-([0-9]+)(?:-{2,}2|$)
It sets lazy mode for first .* pattern, you can also remove it at all with same effect:
\-([0-9]+)(?:-{2,}2|$)
If your regex engine supports negative look behinds (some do not), you can do it this way:
(?<!\d+-+)\d+
It gives you any non-empty digit string, which is not preceded by (minuses followed by digits).
Big advantage is that you don't have to use groups here - regex itself returns what you want.
You could match a - followed by one or more digits which you could capture in a group ([0-9]+). This group will contain the value you want to extract.
Then an optional part (?:-{2,}[0-9]+)? that would match ---2 followed by asserting the end of the line $.
-(\d+)(?:-{2,}\d+)?$
Explanation
- Match literally
(\d+) Capture one or more digits in a group
(?: Non capturing group
-{2,} Match 2 or more times -
\d+ Match one or more digits
)? close non capturing group and make it optional
$ Assert position at the end of the line

Repeated capturing group PCRE

Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.
Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.