For example, for this string,
div.img-wrapper img[title="Hello world"]
I want to match the first space but not the second space (which is enclosed in []). What is the regex?
The following expression will do the job by using a look ahead assertion.
_(?>[^[\]]*(\[|$))
The underscore represents a space. This expression does not support nested brackets because regular expression are not powerful enough to express nested matched structures.
_ Match the space and
(?> assert that it is not inside brackets
[^[\]]* by matching all characters except brackets
( followed by either
\[ an opening bracket (a space inside brackets
will have a closing bracket at this position)
| or
$ or no more characters (end of line).
)
)
UPDATE
Here is another (and more beautiful) solution using a negative look ahead assertion.
_(?![^[\]]*])
It asserts that the next bracket after a space is not a closing bracket.
Related
Is there a way to use a single regular-expression to match only within another math. For example, if I want to remove spaces from a string, but only within parentheses:
source : "foobar baz blah (some sample text in here) and some more"
desired: "foobar baz blah (somesampletextinhere) and some more"
In other words, is it possible to restrict matching to a specific part of the string?
In PCRE a combination of \G and \K can be used:
(?:\G(?!^)|\()[^)\s]*\K\s+
\G continues where the previous match ended
\K resets beginning of the reported match
[^)\s] matches any character not in the set
See demo at regex101
The idea is to chain matches to an opening parentheses. The chain-links are either [^)\s]* or \s+. To only get spaces \K is used to reset before. This solution does not require a closing ).
In other regex flavors that support \G but not \K, capturing groups can help out. Eg Search for
(\G(?!^)|\()([^)\s]*)\s+
and replace with captures of the 2 groups (depending on lang: $1$2 or \1\2) - Regex101 demo
Further there is (*SKIP)(*F), a PCRE feature for skipping over certain parts. It is often used together with The Trick. The idea is simple: skip this(*SKIP)(*F)|match that - Regex101 demo. Also this can be worked around with capture groups. Eg replace ([^)(]*\(|\)[^)(]*)|\s with$1
One idea is to replace any space between parentheses using a lookahead pattern:
(?=([^\s\(]+ )*\S*\))(?!\S*\s*\()`
The lookahead will attempt to match the last space before the closed parenthesis (\S*\)) and any optional space before ([^\s\(]+ )* (if found).
Detailed Regex Explanation:
: space
(?=([^\s\(]+ )*\S*\)): lookahead non-capturing group
([^\s\(]+ )*: any combination characters not including the open parenthesis and the space characters + space (this group is optional)
\S*\): any non-space character + closed parenthesis
(?!\S*\s*\(): what lookahead should not be
\S*: any non space character (optional), followed by
\s*: any space character (optional), followed by
\(: the open parenthesis
Check the demo here.
I want to remove text between two strings using regular expression in notepad++. Here is my full string
[insertedOn]) VALUES (1, N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
I want final string like this
[insertedOn]) VALUES (N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
Here I removed 1, from string. 1,2,3 is in incremental order.
I tried lot of expression but not worked. Here is one of them (VALUES ()(?s)(.*)(, N')
How can I remove this?
You may use
(VALUES \().*?,\s*(N')
and replace with $1$2. Note that in case the part of string to be removed can contain line breaks, enable the . matches newline. If the N and VALUES must be matched only when in ALLCAPS, make sure the Match case option is checked.
Pattern details
(VALUES \() - Group 1 (later referred with $1 from the replacement pattern): a literal substring VALUES (
.*? - any 0+ chars, as few as possible, up to the leftmost occurrence of the sunsequent subpatterns
,\s* - a comma and 0+ whitespaces (use \h instead of \s to only match horizontal whitespace chars)
(N') - Group 2 (later referred with $2 from the replacement pattern): a literal substring N'.
You should first escape literal ( before VALUES: \(
By doing so, .* in your regex in addition to s (DOTALL) flag causes engine to greedily match up to end of input string then backtracks to stop at the first occurrence of , N' which means unexpected matches.
To improve your regex you should 1) make .* ungreedy 2) remove (?s) 3) escape (:
(VALUES \().*?, (N')
To be more precise in matching you'd better search for:
VALUES \(\K\d+, *(?=N')
and replace with nothing.
Breakdown:
VALUES \( March VALUES ( literally
\K Reset match
\d+, * Match digits preceding a comma and optional spaces
(?=N') Followed by N'
I have following input:
!foo\[bar[bB]uz\[xx/
I want to match everything from start to [, including escaped bracket \[ and ommiting first characters if in [!#\s] group
Expected output:
foo\[bar
I've tried with:
(?![!#\s])[^/\s]+\[
But it returns:
foo\[bar[bB]uz\[
Java: Use Lookbehind
(?<=!)(?:\\\[|[a-z])+
See the regex demo
Explanation
The lookbehind (?<=!) asserts that what precedes the current position is the character !
The non-capture group (?:\\\[|[a-z]) matches \[ OR | a letter between a and z
The + causes the group to be matched one or more times
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
You can use this regex:
!((?:[^[\\]*\\\[)*[^[]*)
Online Regex Demo
Add a ? after [^/\s]+ to catch the shortest group possible
Add \w+ to the end to catch the first group of alphanumeric characters after \[
Result :
(?![!#\s])[^\/\s]+?\[\w+
Try it
You can try this pattern:
(?<=^[!#\s]{0,1000})(?:[^!#\s\\\[]|\\.)(?>[^\[\\]+|\\.)*(?=\[)
pattern details:
The begining is a lookbehind and means preceded by zero or several forbidden characters at the start of the string
(?:[^!#\s\\\[]|\\.) ensures that the first character is an allowed character or an escaped character.
(?>[^\[\\]+|\\.)* describes the content: all that is not a [ or a \, or an escaped character. (note that this subpattern can be written like that too: (?:[^\[\\]|\\.)*)
(?=\[) checks that the next character is a literal opening square bracket. (since all escaped characters are matched by the precedent group, you can be sure that this one is not escaped)
link to fiddle (push the Java button)
Use a negated character class first the start (ie the match must not start with a special char), then a reluctant quantifier (which stops at the first hit), with a negative look behind to skip over escaped brackets:
[^!#\s].*?(?<!\\)\[
See live demo
I have a string that contains a regular expression within square brackets, and can contain more than 1 item inside square brackets. below is an example of a string I'm using:
[REGEX:^([0-9])*$][REGEXERROR:That value is not valid]
In the example above, I'd like to match for the item [REGEX:^([0-9])*$], but I can't figure out how.
I thought I'd try using the regular expression \[REGEX:.*?\], but it matches [REGEX:^([0-9] (ie; it finishes when it finds the first ]).
I also tried \[REGEX:.*\], but it matches everything right to the end of the string.
Any ideas?
Suppose you are using PCRE, this should be able to find nested brackets in regular expressions:
\[REGEX:[^\[]*(\[[^\]]*\][^\[]*)*\]
This technique is called unrolling. The basic idea of this regex is:
match the starting brackets
match all characters that are not brackets
match one brackets
match all trailing characters that are not brackets
then repeat 3 and 4 until the last closing bracket comes
Explanation with free-space:
\[ # start brackets
REGEX: # plain match
[^\[]* # match any symbols other than [
( # then match nested brackets
\[ # the start [ of nested
[^\]]* # anything inside the bracket
\] # closing bracket
[^\[]* # trailing symbols after brackets
)* # repeatable
\] # end brackets
Reference: Mastering Regular Expression
Trying to create a pattern that matches an opening bracket and gets everything between it and the next space it encounters.
I thought \[.*\s would achieve that, but it gets everything from the first opening bracket on. How can I tell it to break at the next space?
\[[^\s]*\s
The .* is a greedy, and will eat everything, including spaces, until the last whitespace character. If you replace it with \S* or [^\s]*, it will match only a chunk of zero or more characters other than whitespace.
Masking the opening bracket might be needed. If you negate the \s with ^\s, the expression should eat everything except spaces, and then a space, which means up to the first space.
You could use a reluctant qualifier:
[.*?\s
Or instead match on all non-space characters:
[\S*\s
Use this:
\[[^ ]*
This matches the opening bracket (\[) and then everything except space ([^ ]) zero or more times (*).
I suggest using \[\S*(?=\s).
\[: Match a [ character.
\S*: Match 0 or more non-space characters.
(?=\s): Match a space character, but don't include it in the pattern. This feature is called a zero-width positive look-ahead assertion and makes sure you pattern only matches if it is followed by a space, so it won't match at the end of line.
You might get away with \[\S*\s if you don't care about groups and want to include the final space, but you would have to clarify exactly which patterns need matching and which should not.
You want to replace . with [^\s], this would match "not space" instead of "anything" that . implies