I am trying to create a regex to match any tags not including [first].
# Trying to match:
# [second]
# [first.second]
# [first.third]
[first]
# something = else
[second]
test = yes
[first.second]
[first.third]
I was trying ^\[((?!first).*)\]$
https://regex101.com/r/1fz1CW/1
And this seems to match [second] in the example above but I can't figure out why it doesn't match [first.second] or [first.third] I was thinking I may need word boundaries, but I can't seem to get them to work.
Use
^\[((?!first\])[^\]\[]*)\]$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
first 'first'
--------------------------------------------------------------------------------
\] ']'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^\]\[]* any character except: '\]', '\[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\] ']'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Related
I have a regular expression looking for .svg files under the icons-wc/icons directory for a Webpack svgo-loader.
/icons-wc\\icons\\.*\.svg$/
I'd now like to find all .svg files outside the icons-wc/icons directory but I'm not sure how to approach it. I've tried something like this but that doesn't seem to work. It seems to be too over eager to select
/(?<!icons-wc\\icons)\\.*\.svg$/
Use
/^(?!.*(?:^|[\\\/])icons-wc[\\\/]icons[\\\/]).*\.svg$/
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\\\/] any character of: '\\', '\/'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
icons-wc 'icons-wc'
--------------------------------------------------------------------------------
[\\\/] any character of: '\\', '\/'
--------------------------------------------------------------------------------
icons 'icons'
--------------------------------------------------------------------------------
[\\\/] any character of: '\\', '\/'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
svg 'svg'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
I am learning regex and am working on finding the regex format to satisfy below conditions:
check the contents in between "<NoteText>" and "</NoteText>"
If there is one or more "<" symbol not followed by "!", return all the identified "<" symbols.
example:
<NoteText><![CDATA[dvsdhjkndlv <<<RED>>> <72901> </NoteText>
this should return the 3 "<" before RED and the 1 "<" before 72901
initially i tried with the below regex pattern of negative lookahead.
<(?!!)
But it returns the "<" before the "NoteText" phrase as well.
I am not sure how to limit the area of filtering in between "<NoteText>" and "</NoteText>".
trying the below way did not work as well.
(?:<NoteText>.*)(<(?!!)).*(?:<\/NoteText>)
PCRE, not pretty, but working:
(?:\G(?!\A)|<NoteText>)(?:(?!<\/?NoteText>).)*?\K<(?!!)(?=(?:(?!<\/?NoteText>).)*?<\/NoteText>)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
<NoteText> '<NoteText>'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
! '!'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
This is a working method in Java 8. Remember that this works only if you don't have nested <NoteText> tags.
String myString = "<NoteText><![CDATA[dvsdhjkndlv <<<RED>>> <72901> </NoteText>";
Matcher outerMatcher = Pattern.compile("(?<=<NoteText>).*?(?=</NoteText>)").matcher(myString);
while (outerMatcher.find()) {
String content = outerMatcher.group(); // this is the content of the current NodeText tag
Matcher innerMatcher = Pattern.compile("<(?!!)").matcher(content);
int count = 0;
while (innerMatcher.find()) count++;
System.out.println(count); // this will print 4
}
The code above is thought for working also with strings of multiple occurrences of <NoteText> tags.
If you know you have only one <NoteText> tag, just replace the while with an if.
i am trying to do a multiple match within a lookbehind and a look forward
Let's say i have the following string:
$ Hi, this is an example #
Regex: (?<=$).+a.+(?=#)
I am expecting it to return both 'a' within those boundaries, is there a way to do it with only one regex?
Engine: Python
If you can use quantifiers in the lookbehind, use
(?<=\$[^$#]*?)a(?=[^#]*#)
See proof.
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
[^$#]*? any character except: '$' and '#' (0 or more
times (matching the least amount possible))
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^#]* any character except: '#' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-ahead
PCRE pattern:
(?:\G(?<!^)|\$)[^$]*?\Ka(?=[^#]*#)
See another proof
Explanation
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
[^$#]*? any character except: '$' and '#' (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^#]* any character except: '#' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-ahead
What is the REGEX to accept a string like this
Starts with EDO
has many characters(words,numbers,hypehns) in between
does not contain 24 or |(pipe)
Example:
Should match
edo-<<characters>>-<<characeters>>-<<numbers>>
BUT NOT
edo-<<characters>>-<<characeters>>-<<numbers>> | <<characeters>>- <<characeters>>- <<numbers>>
The string does not have a constant length
The negative look ahead will help you to decide if the string doesnt contain 24 or |
The regex can be written as
/^edo(?!.*(24|\|))[-a-zA-Z0-9]+$/i
Regex Demo
How it matches
^ Anchors the regex at the start of the string
edo The anchor ensures that the string starts with edo
(?!.*(24|\|)) look ahead assertion. It checks if the string doesnt contain 24 or |. If it doesnt contain, then proceeds with the remaining pattern. If it contains, discards the match
[-a-zA-Z0-9]+ Matches numbers alphabets or -
$ anchors the regex at the end of the string.
^EDO(?!.*(?:(?<!\d)24(?!\d)|\|))[a-zA-Z0-9 -]+$
Try this.This should work.Use flag gmi.
See demo.
https://regex101.com/r/fA6wE2/37
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
EDO 'EDO'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
24 '24'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead
| OR
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9-]+ any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '-' (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
I have regular expression like this:
s/<(?:[^>'"]|(['"]).?\1)*>//gs
and I don't know what exactly does it mean.
The regex looks intended to remove HTML tags from input.
It matches text beginning with < and ending with >, containing non->/non-quotes or quoted strings (which may contain >). But it appears to have an error:
The .? says that quotes may contain 0 or 1 character; it was probably intended to be .*? (0 or more characters). And to prevent backtracking from doing things like making the . match a quote in some odd cases, it needs to change the (?: ... ) grouping to be possessive (> instead of :).
This tool can explain the details: http://rick.measham.id.au/paste/explain.pl?regex=%3C%28%3F%3A[^%3E%27%22]|%28[%27%22]%29.%3F\1%29*%3E
NODE EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^>'"] any character except: '>', ''', '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
['"] any character of: ''', '"'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.? any character except \n (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
> '>'
So it tries to remove HTML tags as ysth also mentions.