Returning values from multiple options in Regex

Returning values from multiple options in Regex - regex

I am searching for specific terms in a block of text only when they are surrounded by other 'qualifying' terms. I have made a regex where each of those qualifying terms is an option:
(?<QUALIFY_A>(A\s(?<TERM>(Hi|Bye)))|(?<QUALIFY_B>B\s(?<TERM1>(Hi|Bye)))|(?<QUALIFY_C>C\s(?<TERM2>(Hi|Bye))))
http://regex101.com/r/xR0uA9
So far this behaves as I expect as it finds the first match of the term when preceded by the qualifying expression. However Ideally I'd like to get each one returned, in other words not have the regex quit once matched. I realize if I just had one option I could do /1/2 to get multiple matches to the one option but in this case trying to get multiple matches to different options.

try putting g in the modifier box you can see on the right hand side of the regex

Related

Regex match sequence more than once

How come for something that simple I can't find an answer after looking one hour in the internet?
I have this sentence:
HeLLo woRLd HOw are YoU
I want to capture all groups that consist of two following capital letters
[A-Z]{2}
The regex above works but capture only LL (the first two capital letters) while I want LL in one group and in the other groups also RL HO

Most regular expression engines expose some way to make your expression global. This means that your expression will applied multiple times. This global flag is usually denoted with the /g marker at the end of your expression. This is your regular expression without the /g flag, while this is what happens when you apply said flag.
Different languages expose such functionality differently, in C# for instance, this is done through the Regex.Matches syntax. In Java, you use while(matcher.find()), which keeps providing sub strings which match the pattern provided.
EDIT: I am not a Python person, but judging from the example available here, you could do something like so:
it = re.finditer(r"[A-Z]{2}", "HeLLo woRLd HOw are YoU")
for match in it:
print "'{g}' was found between the indices {s}".format(g=match.group(), s=match.span())

You can not have multiple groups in this case, but you can have multiple matches. Add the global flag to your regex and use a method to match the regex.
For javscript, it would be /[A-Z]{2}/g.
The method most probably returns an Array of matches, and you can use index to access them.

Regex global search without using the global flag

I'm using software that only allows a single line regular expression for filtering and it doesn't allow the global modifier to capture all patterns in the string. Currently, my expression is only returning the first instance.
Is there another way to capture all instances of the pattern in the string?
Expression: (captures hi-res jpg urls)
\{\"hiRes\"\:\"([A-Za-z0-9%\/_:.-]+)\"\,\"thumb
String:
'colorImages': { 'initial': [{"hiRes":"http://sub.website.com/images/I/81OJ6qwKxyL._SL1500_.jpg","thumb":"http://sub.website.com/images/I/41NQRigTUdL._SS40_.jpg","large":"http://sub.website.com/images/I/41NQRigTUdL.jpg","main":{"http://sub.website.com/images/I/81OJ6qwKxyL._SY355_.jpg":[272,355],"http://sub.website.com/images/I/81OJ6qwKxyL._SY450_.jpg":[345,450],"http://sub.website.com/images/I/81OJ6qwKxyL._SY550_.jpg":[422,550],"http://sub.website.com/images/I/81OJ6qwKxyL._SY606_.jpg":[465,606],"http://sub.website.com/images/I/81OJ6qwKxyL._SY679_.jpg":[521,679]},"variant":"MAIN"},{"hiRes":"http://sub.website.com/images/I/71oHZNvsLbL._SL1500_.jpg","thumb":"http://sub.website.com/images/I/31lHNGD-ZDL._SS40_.jpg","large":"http://sub.website.com/images/I/31lHNGD-ZDL.jpg","main":{"http://sub.website.com/images/I/71oHZNvsLbL._SY355_.jpg":[197,355],"http://sub.website.com/images/I/71oHZNvsLbL._SY450_.jpg":[249,450],"http://sub.website.com/images/I/71oHZNvsLbL._SY550_.jpg":[305,550],"http://sub.website.com/images/I/71oHZNvsLbL._SY606_.jpg":[336,606],"http://sub.website.com/images/I/71oHZNvsLbL._SY679_.jpg":[376,679]},"variant":"PT01"},{"hiRes":"http://sub.website.com/images/I/91VCJAcIPEL._SL1500_.jpg","thumb":"http://sub.website.com/images/I/51G1gCkOFzL._SS40_.jpg","large":"http://sub.website.com/images/I/51G1gCkOFzL.jpg","main":{"http://sub.website.com/images/I/91VCJAcIPEL._SX355_.jpg":[355,341],"http://sub.website.com/images/I/91VCJAcIPEL._SX450_.jpg":[450,433],"http://sub.website.com/images/I/91VCJAcIPEL._SX425_.jpg":[425,409],"http://sub.website.com/images/I/91VCJAcIPEL._SX466_.jpg":[466,448],"http://sub.website.com/images/I/91VCJAcIPEL._SX522_.jpg":[522,502]},"variant":"PT02"},{"hiRes":"http://sub.website.com/images/I/912B68GN4aL._SL1500_.jpg","thumb":"http://sub.website.com/images/I/51elravQx6L._SS40_.jpg","large":"http://sub.websi

An interesting question. In my understanding, the global flag cannot be "emulated" with other Regex syntax features.
One could try to emulate the global flag by a Regex repetition. You could expand your Regex so that it would match all appearances of "hiRes":... in a repetition loop. But then, you would see that although several URLs would be matched because of the loop, only the last appearance would be captured.
Switching on the global flag does more than just "continue looking". It switches on collecting more than one capture in an array. Having just a Regex loop does not do the same.
I'd like to show two examples what this means. To test the examples, use e.g. https://regex101.com/.
Here is a simple example, first with the global flag:
Given text: a i b i c i
Regex: /(i)/g
Result: array of three strings, [0]="i" Pos.2, [1]="i" Pos.6, [2]="i Pos.10"
Now without the global flag. To match more, we must add a repetition to the Regex that embraces several "i", and a condition that ignores text between two "i". Like this:
Given text: a i b i c i
Regex: /(?:(i)[^i]*)+/
Result: array of one string, [0]="i" Pos.10
This seems puzzling first, but it is correct. The Regex matches from position 2 until 10. And from that match, it captures the last "i" at position 10. So the repetition in the Regex causes not several captures but a longer matching. This is very different from what the global flag does.
To be precise, this behavior is called "greedy". It tries to match as much as possible. With the "U" flag or with certain quantifiers, you can make the Regex "ungreedy". In that case in the example above, your "ungreedily" captured "i" will be that of position 2.
As a more complex example, just enhance your initial Regex. It must ignore text from the URL until the next "hiRes", and a repetition be put around. Here it is:
/\{(?:"hiRes":"([A-Za-z0-9%\/_:.-]+)"(?:[^"]|"(?!hiRes))*)+/
The second part means: match as many as possible that is not a quota, or that is a quota not followed by hiRes. Like this, this syntax will dig until the begin of the next "hiRes". And then the repetition comes in and it starts over with "hiRes".
Try it out. It will capture only the last URL in your text.
Finally, this tutorial is very comprehensive: http://www.regular-expressions.info/

Is there any upper limit for number of groups used or the length of the regex in Notepad++?

I am new to using regex. I am trying to use the regex find and replace option in Notepad++.
I have used the following regex:
((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))(/)((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))
For the following text:
2/2
+2/+2
-2/-2
2+/2+
2-/2-
But I am able to get matches only for the first three. The last two, it only gives partial matches, excluding the last "+" and the "-". I am wondering if there is any upper limit for the number of groups (which i doubt is unlikely) that can be used or any upper limit for the maximum length of the regex. I am not sure why my regex is failing. Or if there is anything wrong with my regex, please correct it.

This is not an issue with Notepad++'s regex engine. The problem is that when you have alternations like (?:)|(\+)|(-), the regex engine will attempt to match the different options in the order they are specified. Since you specified an empty group first, it will attempt to match an empty string first, only matching the + or - if it needs to backtrack. This essentially makes the alternation lazy—it will never match any character unless it has to.
vks's answer works perfectly well, but just in case you actually needed those capturing groups separated out, you can do the same thing just by rewriting your alternations like this:
((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))(/)((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))
or even more simply, like this:
((\+)|(-)|)(\d)((\+)|(-)|)(/)((\+)|(-)|)(\d)((\+)|(-)|)

([-+]?)(\d)([-+]?)(/)([-+]?)(\d)([-+]?)
You can use this simple regex to match all cases.See here.
https://www.regex101.com/r/fG5pZ8/19

Confusion regarding the *? regular expression operator

So I want to search a string, using the below regular expression:
border-.*\.5pt
to find all border-top, border-bottom, etc CSS properties in a file with a border thickness of .5pt. It generally works great, but it's too greedy.
For example all of the below comes back as a single match:
border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt
I want those two CSS properties to be two separate matches.
So I tried to modify my regular expression to:
border-.*?\.5pt
Using ? to make it non-greedy. However, after that modification, nothing matches.
Can anyone explain why I see this behavior? What am I missing?
(If it's worth knowing, I'm using Microsoft Expression Web's 'find with regular expressions' when doing this search.)

There is no one "regular expression" language. While there are broad commonalities, details differ from implementation to implementation. Many regexes use - to be the non-greedy "0 or more", others use *?. Apparently Microsoft Expression Web uses #.
In short, regexes can differ, so you'll often need to RTM for the one you're using to find its range of capabilities and detailed syntax (i.e. support for alteration/backtracking/etc., grouping character, set shorthand, etc.)

.*? is the badest, so to say "antipattern" for Regular Expressions. It is commonly used as a "Match-something-until-the-string-i-want" Pattern - but it isn't.
Especially when combining multiple .*? within ONE pattern, it may lead to very wrong and unexpected results.
For your Case - as stated in the comments - It works. (Maybe you did something wrong?)
However, it is ALWAYS a good idea to be more specific, when generating a regex pattern.
ALWAYS KEEP IN MIND that .*? can be ANYTHING. Also Stuff you really don't want to match!
In your example, i would use something like this: border-(?:[^:]+):\s*(?:[^\s]+)\s+(?:\#[a-fA-F0-9]{6})\s+(?:\d*(?:\.\d+)?)pt;?
It is more specific, but matches the given Requirements, ignores all whitespaces that dont make sence, and even matches border widths, regardles if they are written as .2, 3 or 4.1. If you remove the ?: from the single match Groups you can also match every single attribute, if required. : Position, Border type, Color and thickness.
The pattern border-([^:]+):\s*([^\s]+)\s+(\#[a-fA-F0-9]{6})\s+(\d*(?:\.\d+)?)pt;? with your string border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt will match:
First Match:
1.top
2.solid
3.#1F497D
4..5
Second Match:
1.bottom
2.solid
3.#1F497D
4..5

Match two whole words in Regex

I am struggling to find a solution for matching two successive whole words using Regular Expression. I have a text box where the user can type in their search criteria, enclosed by quotations for exact matches. The quotes and space (if any) are then replaced by RegEx expressions. Here is an example:
User enters: "Apple Orange"
Converted to:
\bApple\W+(?:\w+\W+){1,6}?Orange\b
Then, my RegEx match would be based on this converted criteria. The instructions are from www.regular-expressions.info/near.html
Maybe I am going about this entirely the wrong way? I am using visual studio. Any help is appreciated.

if you want an exact match when a user uses quotes, then you should just remove the quotes and do a straight string comparison (equality, not contains)
update:
Based on comments below, you would just do the same thing as with a single word match:
Single word:
\bApple\b
Double word
\bApple Orange\b
The idea is that the user enters in the search term and you match for exactly that, so you wouldn't be doing pattern matching for the term itself, just the boundaries of it (the \b wrapped around it). There's no reason to touch the search term itself (all that stuff in-between Apple and Orange that you were trying to do) because even the space inbetween the two is part of their search...unless you were wanting to make it a bit flexible..for example, if the user were to enter in "Apple[lots of space here]Orange" to just count that as a single space, then you could do
\bApple\s+Orange\b
..but then you're kind of deviating from the whole "exact match" theme...
Sidenote: You said in your comment that for "CrabApple OrangeCrush" you did not want "Apple Orange" to match. Which is why you use the \b word boundaries. But IMO if it were me, I would allow for that to match. Or at least, offer some kind of option to search for it in that manner.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Returning values from multiple options in Regex - regex

try putting g in the modifier box you can see on the right hand side of the regex

Related

Regex match sequence more than once

Regex global search without using the global flag

Is there any upper limit for number of groups used or the length of the regex in Notepad++?

Confusion regarding the *? regular expression operator

Match two whole words in Regex

Categories

Resources