Cannot get this Regex - \[pbrc:tl:[^]]*ad*\] - to work? - regex

I have the following possible text strings:
[pbrc:tl:ad,xch]
[pbrc:tl:xch,ad,xyy]
[pbrc:tl:xch, xx, ad]
I need to ensure that "pbrc:tl:" and "ad" (following the colon, somewhere but before the "]" is searched. All of the above examples should return true. I am currently using the following Regex which is failing.
\[pbrc:tl:[^]]*ad*\]
I would appreciate a correction to my Regex.
Thank you in advance.
EDIT
These need to return false:
[pbrc:st:ad]
[pbrc:tl:abc]
I will check whether this is true for the provided solutions. First does not I believe.

Your ad* matches a and zero or more d symbols. You need to allow any number of characters other than ] before the final ], thus, replace the last * with [^]]*:
\[pbrc:tl:[^]]*ad[^]]*]
^^^^^
See the regex demo
Pattern details:
\[ - a literal [
pbrc:tl: - literal text pbrc:tl:
[^]]* - zero or more characters other than ] (in some regex flavors, the ] inside must be escaped, e.g. in JavaScript, or ICU)
ad - a literal text ad
[^]]* - zero or more characters other than ]
] -a literal ] (may be escaped, in most languages, it does not need escaping when outside a character class).

you miss after ad
\[pbrc:tl:[^]]*ad[^]]*\]

Related

Identify and replace non-ASCII characters between brackets

I have tags (only ASCII chars inside brackets) of the following structure: [Root.GetSomething], instead, some contributors ended up submitting contributions with Cyrillic chars that look similar to Latin ones, e.g. [Rооt.GеtSоmеthіng].
I need to locate, and then replace those inconsistencies with the matching ASCII characters inside the brackets.
I tried \[([АаІіВСсЕеРТтОоКкХхМ]+)\]; (\[)([^\x00-\x7F]+)(\]), and some variations of the range but those searches don't see any matches. I seem to be missing something important in the regex execution logic.
You can use a regex matching any "interesting" Cyrillic char in between [ + letters or . + ] and a conditional replacement pattern:
Find What: (?:\G(?!\A)|\[)[a-zA-Z.]*\K(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М))(?=[[:alpha:].]*])
Replace With: (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M)
Make sure Match Case option is ON. See a regex demo with a string:
Details:
(?:\G(?!\A)|\[) - end of the previous successful match or a [ char
[a-zA-Z.]* - zero or more . or ASCII letters
\K - match reset operator that discards the currently matched text from the overall match memory buffer
(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М)) - a non-capturing group containing 19 alternatives each of which is put into a separate capturing group
(?=[[:alpha:].]*]) - a positive lookahead that requires zero or more letters or . and then a ] char immediately to the right of the current location.
The (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M) replacement pattern replaces А with A (\u0410) if Group 1 matched, а (\u0430) with a if Group 2 matched, etc.

How to extract with REGEXP the specific combination in BigQuery

I have a long text, here is the part of it: "placement":1,"protocols":[2,3,5,6]},"secure":1
And I need to extract the list of protocols, so the result will be only [2,3,5,6].
I was using REGEXP_EXTRACT(text, r'"protocols":([^"]+)'), but the result is always different: sometimes it is only [2,3,5,6] and sometimes it takes more: [2,3,5,6]},
How to build my REGEXP so the result will be always only the list in brackets?
You can use
REGEXP_EXTRACT(text, r'"protocols"\s*:\s*(\[[^][]+])')
See the regex demo
To get the contents of protocols without the brackets, move the grouping boundaries a bit inward:
REGEXP_EXTRACT(text, r'"protocols"\s*:\s*\[([^][]+)]')
See this regex demo.
Details
"protocols" - a literal text
\s*:\s* - a colon enclosed with zero or more whitespace
\[ - a [ char
[^][]+ - one or more chars other than [ and ]
] - a ] char.

Exclude an escaped character from a range

I need to extract an expression between brackets that can include everything but not an non-escaped closed bracket.
For example, the regexp from [aaa\]bbbbbb] should give as result : aaa\]bbbbbb.
I tried this : \[([^(?<!\\)\]]*)\] but that fail.
Any hints?
You may use
\[([^\]\[\\]*(?:\\.[^\]\[\\]*)*)]
Or - if there may be any non-escaped [ in-between non-escaped [ and ] (e.g. [a[\[aa\]bbbbbba\[aabbbbbb]), take out the \[:
\[([^\]\\]*(?:\\.[^\]\\]*)*)]
See the regex demo 1 and regex demo 2. It is an unrolled variant of a \[((?:[^][\\]|\\.)*)] regex.
Details:
\[ - a [
([^\]\[\\]*(?:\\.[^\]\[\\]*)*) - Group 1 capturing:
[^\]\[\\]* - zero or more chars other than [, ] and \ (in some regex flavors, you may write it without escapes - [^][\\]*)
(?:\\.[^\]\[\\]*)* - zero or more sequences of:
\\. - any escaped sequence (\ and any char other than line break chars
[^\]\[\\]* - zero or more chars other than [, ] and \
] - a closing ].
This is the simplest regex that (I think) works:
\[(.*?)(?<!\\)\]
which captures the bracketed text as group 1.
See live demo.

Delete tag [bracket] on QString

I get a text with tag like this :
This is [tag=\"value\"]my text[/tag].
I want transform it like this :
This is my text.
I have seen I have to use QString.remove(QRegExp(myRegexExpression)) but I don't succeed.
I have tried :
remove(QRegExp("\[[^>]*\]")
result : This is .
You should add the ? operator to your * quantifier to make it lazy. Like this:
\[[^>]*?\]
This will make your expression match the least amount of characters after matching the opening [ , and will match the first ] available after it , instead of the last one.
You can use a negated character class [^\\]] that matches any character but a ]:
str.remove(QRegExp("\\[[^\\]]*\\]"));
The problem you have is cause by the [^>]* construct that matches any character but a > zero or more times that gets past the BB tag boundary and goes up to the last ] because * is a greedy quantifier (i.e. matches as many characters as it can). See your regex in action.
My regex (demo) brekadown:
\[ - a literal [
[^\\]]* - zero or more (greedy, matches as many as it can) characters other than ]
\] - a literal ].

Trying to work out why this regex is not working? Regex should be less restrictive

The Text :
[prc:tl:plfl]
is matched by:
\[prc:tl:[^]]*plfl\]
However I need to also match:
[prc:tl:plfl,tr]
Basically "plfl" can appear anywhere in the string after "tl:" and before next "]"
So all of the following should match
[prc:tl:plfl,tr]
[prc:tl:tr, plfl]
[prc:tl:tr, plfl,sr]
[prc:tl:plfl,tr, sr, mr]
What is missing from my regex?
MAny thanks in advance.
You may match any text other than ] after plfl with a negated character class [^\]] (you are actually already using it in the regex):
\[prc:tl:[^\]]*?plfl[^\]]*\]
See the regex demo
Details
\[prc:tl: - a [prc:tl: substring
[^\]]*? - a negated character class matching any 0+ chars other than ] as few as possible
plfl - a plfl substring
[^\]]* - any 0+ chars other than ] as few as possible
\] - a ] char.
See the Regulex graph: