I am learning regex and am working on finding the regex format to satisfy below conditions:
check the contents in between "<NoteText>" and "</NoteText>"
If there is one or more "<" symbol not followed by "!", return all the identified "<" symbols.
example:
<NoteText><![CDATA[dvsdhjkndlv <<<RED>>> <72901> </NoteText>
this should return the 3 "<" before RED and the 1 "<" before 72901
initially i tried with the below regex pattern of negative lookahead.
<(?!!)
But it returns the "<" before the "NoteText" phrase as well.
I am not sure how to limit the area of filtering in between "<NoteText>" and "</NoteText>".
trying the below way did not work as well.
(?:<NoteText>.*)(<(?!!)).*(?:<\/NoteText>)
PCRE, not pretty, but working:
(?:\G(?!\A)|<NoteText>)(?:(?!<\/?NoteText>).)*?\K<(?!!)(?=(?:(?!<\/?NoteText>).)*?<\/NoteText>)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
<NoteText> '<NoteText>'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
! '!'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
This is a working method in Java 8. Remember that this works only if you don't have nested <NoteText> tags.
String myString = "<NoteText><![CDATA[dvsdhjkndlv <<<RED>>> <72901> </NoteText>";
Matcher outerMatcher = Pattern.compile("(?<=<NoteText>).*?(?=</NoteText>)").matcher(myString);
while (outerMatcher.find()) {
String content = outerMatcher.group(); // this is the content of the current NodeText tag
Matcher innerMatcher = Pattern.compile("<(?!!)").matcher(content);
int count = 0;
while (innerMatcher.find()) count++;
System.out.println(count); // this will print 4
}
The code above is thought for working also with strings of multiple occurrences of <NoteText> tags.
If you know you have only one <NoteText> tag, just replace the while with an if.
Related
I am trying to create a regex to match any tags not including [first].
# Trying to match:
# [second]
# [first.second]
# [first.third]
[first]
# something = else
[second]
test = yes
[first.second]
[first.third]
I was trying ^\[((?!first).*)\]$
https://regex101.com/r/1fz1CW/1
And this seems to match [second] in the example above but I can't figure out why it doesn't match [first.second] or [first.third] I was thinking I may need word boundaries, but I can't seem to get them to work.
Use
^\[((?!first\])[^\]\[]*)\]$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
first 'first'
--------------------------------------------------------------------------------
\] ']'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^\]\[]* any character except: '\]', '\[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\] ']'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
I have a regular expression looking for .svg files under the icons-wc/icons directory for a Webpack svgo-loader.
/icons-wc\\icons\\.*\.svg$/
I'd now like to find all .svg files outside the icons-wc/icons directory but I'm not sure how to approach it. I've tried something like this but that doesn't seem to work. It seems to be too over eager to select
/(?<!icons-wc\\icons)\\.*\.svg$/
Use
/^(?!.*(?:^|[\\\/])icons-wc[\\\/]icons[\\\/]).*\.svg$/
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\\\/] any character of: '\\', '\/'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
icons-wc 'icons-wc'
--------------------------------------------------------------------------------
[\\\/] any character of: '\\', '\/'
--------------------------------------------------------------------------------
icons 'icons'
--------------------------------------------------------------------------------
[\\\/] any character of: '\\', '\/'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
svg 'svg'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
I have a long xml text and I want to match each product that is available. The text is made of products that are structured like this:
<product>
...
<available>instock</available>
...
</product>
I can match all products with this regex
((?s)<product>.*?<\/product>)
Example: https://regex101.com/r/kz8cn1/1
However, I want to match, only those products that have an 'instock' value in their tag.
My solution is this:
((?s)<product>(?=.*?\binstock\b).*?<\/product>)
Unfortunately, this works only partially as I believe the lookaround regex is not contained to the match group which results in products with 'outofstock' values being matched as well.
Here is my example:
https://regex101.com/r/AHlC0K/1
How should I change my regex so that the lookaround works only in the context of the match?
Use an XML parser. If there is none you can use use
(?s)<product>(?=(?:(?!<\/?product>).)*?\binstock\b).*?<\/product>
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
<product> '<product>'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
product> 'product>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
instock 'instock'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
product> 'product>'
I'd like to use mustache style for expression: {{abc}}, it's pretty easy to write /({{[a-z]+}})/.
However I cannot get it right to handle \{{abc}}, for which I'd like to skip them on match list. I tried /((?!\\)({{[a-z]+}}))/ but it doesn't work.
https://regex101.com/r/67nXA2/2
Use
(?<!\\)(?:\\\\)*\K{{[a-z]+}}
See proof
Explanation
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
{{ '{{'
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
}} '}}'
i am trying to do a multiple match within a lookbehind and a look forward
Let's say i have the following string:
$ Hi, this is an example #
Regex: (?<=$).+a.+(?=#)
I am expecting it to return both 'a' within those boundaries, is there a way to do it with only one regex?
Engine: Python
If you can use quantifiers in the lookbehind, use
(?<=\$[^$#]*?)a(?=[^#]*#)
See proof.
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
[^$#]*? any character except: '$' and '#' (0 or more
times (matching the least amount possible))
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^#]* any character except: '#' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-ahead
PCRE pattern:
(?:\G(?<!^)|\$)[^$]*?\Ka(?=[^#]*#)
See another proof
Explanation
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
[^$#]*? any character except: '$' and '#' (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^#]* any character except: '#' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-ahead