Delete tag [bracket] on QString - regex

I get a text with tag like this :
This is [tag=\"value\"]my text[/tag].
I want transform it like this :
This is my text.
I have seen I have to use QString.remove(QRegExp(myRegexExpression)) but I don't succeed.
I have tried :
remove(QRegExp("\[[^>]*\]")
result : This is .

You should add the ? operator to your * quantifier to make it lazy. Like this:
\[[^>]*?\]
This will make your expression match the least amount of characters after matching the opening [ , and will match the first ] available after it , instead of the last one.

You can use a negated character class [^\\]] that matches any character but a ]:
str.remove(QRegExp("\\[[^\\]]*\\]"));
The problem you have is cause by the [^>]* construct that matches any character but a > zero or more times that gets past the BB tag boundary and goes up to the last ] because * is a greedy quantifier (i.e. matches as many characters as it can). See your regex in action.
My regex (demo) brekadown:
\[ - a literal [
[^\\]]* - zero or more (greedy, matches as many as it can) characters other than ]
\] - a literal ].

Related

Regex to extract the characters [duplicate]

I have a text like this;
[Some Text][1][Some Text][2][Some Text][3][Some Text][4]
I want to match [Some Text][2] with this regex;
/\[.*?\]\[2\]/
But it returns [Some Text][1][Some Text][2]
How can i match only [Some Text][2]?
Note : There can be any character in Some Text including [ and ] And the numbers in square brackets can be any number not only 1 and 2. The Some Text that i want to match can be at the beginning of the line and there can be multiple Some Texts
JSFiddle
The \[.*?\]\[2\] pattern works like this:
\[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
.*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\] - ][2] substring.
So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.
Other examples:
Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
Strings between double quotation marks: "[^"]*" matches "..." with no " inside
Strings between curly braces: \{[^{}]*} matches "..." with no " inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.
You could try the below regex,
(?!^)(\[[A-Z].*?\]\[\d+\])
DEMO

Exclude an escaped character from a range

I need to extract an expression between brackets that can include everything but not an non-escaped closed bracket.
For example, the regexp from [aaa\]bbbbbb] should give as result : aaa\]bbbbbb.
I tried this : \[([^(?<!\\)\]]*)\] but that fail.
Any hints?
You may use
\[([^\]\[\\]*(?:\\.[^\]\[\\]*)*)]
Or - if there may be any non-escaped [ in-between non-escaped [ and ] (e.g. [a[\[aa\]bbbbbba\[aabbbbbb]), take out the \[:
\[([^\]\\]*(?:\\.[^\]\\]*)*)]
See the regex demo 1 and regex demo 2. It is an unrolled variant of a \[((?:[^][\\]|\\.)*)] regex.
Details:
\[ - a [
([^\]\[\\]*(?:\\.[^\]\[\\]*)*) - Group 1 capturing:
[^\]\[\\]* - zero or more chars other than [, ] and \ (in some regex flavors, you may write it without escapes - [^][\\]*)
(?:\\.[^\]\[\\]*)* - zero or more sequences of:
\\. - any escaped sequence (\ and any char other than line break chars
[^\]\[\\]* - zero or more chars other than [, ] and \
] - a closing ].
This is the simplest regex that (I think) works:
\[(.*?)(?<!\\)\]
which captures the bracketed text as group 1.
See live demo.

How can I check it with regular Expression?

I have a long input string that contains certain field names in-bedded in it. For instance:
SELECT some-name, some-name FROM [some-table] WHERE [some-column] = 'some-value'
The actual field name may change, but it is always in the form of word-word. I need to perform a regex replace on the string so that the output will look like this:
SELECT some - name, some - name FROM [some-table] WHERE [some-column] = 'some - value'
In other words, when the field name is enclosed in square-brackets, it should be left untouched, but when it is not, spaces should be inserted on either side of the dash. There are no nested square brackets and the reserved word could be one or more in the string.
You can do this:
Regex.Replace(input, "(?<!\[[^-\]]*)(\w+)-(\w+)(?![^-\]]*\])", "$1 - $2")
Here's an explanation of the pattern:
(?<!\[[^-\]]*) - This is a negative look-behind. It asserts that matches cannot be immediately preceded by text that matches the sub-pattern \[[^-\]]*. In other words, the matches we are looking for cannot be preceded by a [ character followed by any number of characters that are not a - or a ].
(\w+)-(\w+) - Matches one or more word-characters, then a dash, and then one or more word characters following the dash. By enclosing the sub-patterns on either side of the dash in capturing groups, we can then refer to their values as $1 and $2 in the replacement pattern.
(?![^-\]]*\]) - This is a negative look-ahead. Similar to the negative look-behind, it asserts that matches cannot be immediately followed by text which matches the sub pattern [^-\]]*\]. In other words, a match cannot be followed by any number of characters that are not a - or a ] and then a closing ].
See a demo.
At first glance, you might assume that you could simply assert that is must not be immediately preceded by a [ character and that it must not be immediately followed by a ] character. In other words, (?<!\[)(\w+)-(\w+)(?!\]). However, that pattern would still match the text ome-nam in the input [some-name] because the text ome-nam is not immediately preceded or followed by the brackets.
Dim regex As Regex = New Regex("\[[^-]*-[^-]*\]")
Dim match As Match = regex.Match("A long string containing square brackets [some-name]")
If match.Success Then
Console.WriteLine(match.Value)
End If
Or you could use Regex.IsMatch:
Return Regex.IsMatch("A long string containing square brackets [some-name]",
"\[[^-]*-[^-]*\]")
You may match and capture the [...] substrings and then only match hyphens that are not surrounded with hyphens to replace them:
Dim nStr As String = "SELECT 'some-name' FROM [some-name]"
Dim nResult = Regex.Replace(nStr, "(\[.+?])|\s*-\s*", New MatchEvaluator(Function(m As Match)
If m.Groups(1).Success Then
Return m.Groups(1).Value
Else
Return " - "
End If
End Function))
So, what is happening is:
(\[[^]]+]) - matches and stores the value of [...] substring inside the Group(1) buffer (or \[.+?] can be used here to match a [, then 1 or more any characters and then ] - with RegexOptions.Singleline flag so that . could match a newline, too)
(?<!\s)-(?!\s) - matches any hyphen not preceded ((?<!\s)) or followed ((?!\s)) with whitespace (\s). Actually, we may even use \s*-\s* (where \s* stands for zero or more whitespaces as many as possible since * is a greedy quantifier matching zero or more occurrences of the quantified subpattern) here to remove any whitespace there is to make sure we just insert 1 space before and after -.
If Group 1 matches, then we just re-insert it (Return m.Groups(1).Value), else we insert the space-enclosed hyphen Return " - ".
Just to check if it exists, you could try
\[[^\]]+-[^\]]+\]
It matches a literal [ and then any characters, except ], up to (including) a hyphen. Then again any characters, except ], up to a literal ].
See it here at regex101.
Actually I don't know the vb.net syntax but you can use regex as
/[\s\'](\w+)\-(\w+)/g
find the (\w+)-(\w+) which is followed by space or ' and replace your string with capture group 1st - 2nd
See the sample here

Regex lookahead/lookbehind match for SQL script

I'm trying to analyse some SQLCMD scripts for code quality tests. I have a regex not working as expected:
^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?)
I'm trying to match:
Strings that start with USE (ignore whitespace)
Followed by optional square bracket
Followed by 1 or more non-whitespace characters.
EXCEPT where that text is "master" (case insensitive)
OR EXCEPT where that that text is a $ symbol
Expected results:
USE [master] - don't match
USE [$(CompiledDatabaseName)] - don't match
USE [anything_else.01234] - match
Also, the same patterns above without the [ and ] characters.
I'm using Sublime Text 2 as my RegEx search tool and referencing this cheatsheet
Your pattern - ^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?) - uses a lookbehind that is variable-width (its length is not known beforehand) if you fix the character class issue inside it (i.e. replace [...] with (...) as you mean an alternative list of $ or a character sequence master) and thus is invalid in a Boost regex. Your (.)+ capturing is wrong since this group will only contain one last character captured (you could use (.+)), but this also matches spaces (while you need 1 or more non-whitespace characters). ? is the one or zero times quantifier, but you say you might have 2 opening and closing brackets (so, you need a limiting quantifier {0,2}).
You can use
^\h*USE(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master)))\h*\[{0,2}[^]\s]*]{0,2}
See regex demo
Explanation:
^ - start of a line in Sublime Text
\h* - optional horizontal whitespace (if you need to match newlines, use \s*)
USE - a literal case-sensitive character sequence USE
(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master))) - a negative lookahead that makes sure the USE is NOT followed with:
\h* - zero or more horizontal whitespace
\[{0,2} - zero, one or two [ brackets
[^]\s]* - zero or more characters other than ] and whitespace
(?:\$|(?i:master)) - either a $ or a case-insensitive master (we turn off case sensitivity with (?i:...) construct)
\h* - go on matching zero or more horizontal whitespace
\[{0,2} - zero, one or two [ brackets
[^]\s]* - zero or more characters other than ] and whitespace (when ] is the first character in a character class, it does not have to be escaped in Boost/PCRE regexps)
]{0,2} - zero, one or two ] brackets (outside of character class, the closing square bracket does not need escaping)

Trying to work out why this regex is not working? Regex should be less restrictive

The Text :
[prc:tl:plfl]
is matched by:
\[prc:tl:[^]]*plfl\]
However I need to also match:
[prc:tl:plfl,tr]
Basically "plfl" can appear anywhere in the string after "tl:" and before next "]"
So all of the following should match
[prc:tl:plfl,tr]
[prc:tl:tr, plfl]
[prc:tl:tr, plfl,sr]
[prc:tl:plfl,tr, sr, mr]
What is missing from my regex?
MAny thanks in advance.
You may match any text other than ] after plfl with a negated character class [^\]] (you are actually already using it in the regex):
\[prc:tl:[^\]]*?plfl[^\]]*\]
See the regex demo
Details
\[prc:tl: - a [prc:tl: substring
[^\]]*? - a negated character class matching any 0+ chars other than ] as few as possible
plfl - a plfl substring
[^\]]* - any 0+ chars other than ] as few as possible
\] - a ] char.
See the Regulex graph: