How to extract with REGEXP the specific combination in BigQuery

How to extract with REGEXP the specific combination in BigQuery - regex

I have a long text, here is the part of it: "placement":1,"protocols":[2,3,5,6]},"secure":1
And I need to extract the list of protocols, so the result will be only [2,3,5,6].
I was using REGEXP_EXTRACT(text, r'"protocols":([^"]+)'), but the result is always different: sometimes it is only [2,3,5,6] and sometimes it takes more: [2,3,5,6]},
How to build my REGEXP so the result will be always only the list in brackets?

You can use
REGEXP_EXTRACT(text, r'"protocols"\s*:\s*(\[[^][]+])')
See the regex demo
To get the contents of protocols without the brackets, move the grouping boundaries a bit inward:
REGEXP_EXTRACT(text, r'"protocols"\s*:\s*\[([^][]+)]')
See this regex demo.
Details
"protocols" - a literal text
\s*:\s* - a colon enclosed with zero or more whitespace
\[ - a [ char
[^][]+ - one or more chars other than [ and ]
] - a ] char.

Related

Match all strings preceding string in brackets

I'm trying to retrieve preceding TableNames before brackets:
=IFERROR(INDEX(RepositoriesQ[ContentRepository];MATCH(1&2;RepositoriesQ[Url]&RepositoriesQ[Credentials];0));"something")
I found the way of getting all strings between brackets:
\[(.*?)\]
but what i want to get is all strings preceding column names in brackets.
So as result i should get 3 matches here:
RepositoriesQ[ContentRepository]
RepositoriesQ[Url]
RepositoriesQ[Credentials]

You can use
\w+\[[^\][]*]
See the regex demo. Details:
\w+ - one or more word chars
\[ - a [ char
[^\][]* - zero or more chars other than [ and ]
] - a ] char.

Identify and replace non-ASCII characters between brackets

I have tags (only ASCII chars inside brackets) of the following structure: [Root.GetSomething], instead, some contributors ended up submitting contributions with Cyrillic chars that look similar to Latin ones, e.g. [Rооt.GеtSоmеthіng].
I need to locate, and then replace those inconsistencies with the matching ASCII characters inside the brackets.
I tried \[([АаІіВСсЕеРТтОоКкХхМ]+)\]; (\[)([^\x00-\x7F]+)(\]), and some variations of the range but those searches don't see any matches. I seem to be missing something important in the regex execution logic.

You can use a regex matching any "interesting" Cyrillic char in between [ + letters or . + ] and a conditional replacement pattern:
Find What: (?:\G(?!\A)|\[)[a-zA-Z.]*\K(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М))(?=[[:alpha:].]*])
Replace With: (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M)
Make sure Match Case option is ON. See a regex demo with a string:
Details:
(?:\G(?!\A)|\[) - end of the previous successful match or a [ char
[a-zA-Z.]* - zero or more . or ASCII letters
\K - match reset operator that discards the currently matched text from the overall match memory buffer
(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М)) - a non-capturing group containing 19 alternatives each of which is put into a separate capturing group
(?=[[:alpha:].]*]) - a positive lookahead that requires zero or more letters or . and then a ] char immediately to the right of the current location.
The (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M) replacement pattern replaces А with A (\u0410) if Group 1 matched, а (\u0430) with a if Group 2 matched, etc.

Exclude an escaped character from a range

I need to extract an expression between brackets that can include everything but not an non-escaped closed bracket.
For example, the regexp from [aaa\]bbbbbb] should give as result : aaa\]bbbbbb.
I tried this : \[([^(?<!\\)\]]*)\] but that fail.
Any hints?

You may use
\[([^\]\[\\]*(?:\\.[^\]\[\\]*)*)]
Or - if there may be any non-escaped [ in-between non-escaped [ and ] (e.g. [a[\[aa\]bbbbbba\[aabbbbbb]), take out the \[:
\[([^\]\\]*(?:\\.[^\]\\]*)*)]
See the regex demo 1 and regex demo 2. It is an unrolled variant of a \[((?:[^][\\]|\\.)*)] regex.
Details:
\[ - a [
([^\]\[\\]*(?:\\.[^\]\[\\]*)*) - Group 1 capturing:
[^\]\[\\]* - zero or more chars other than [, ] and \ (in some regex flavors, you may write it without escapes - [^][\\]*)
(?:\\.[^\]\[\\]*)* - zero or more sequences of:
\\. - any escaped sequence (\ and any char other than line break chars
[^\]\[\\]* - zero or more chars other than [, ] and \
] - a closing ].

This is the simplest regex that (I think) works:
\[(.*?)(?<!\\)\]
which captures the bracketed text as group 1.
See live demo.

Cannot get this Regex - \[pbrc:tl:[^]]ad\] - to work?

I have the following possible text strings:
[pbrc:tl:ad,xch]
[pbrc:tl:xch,ad,xyy]
[pbrc:tl:xch, xx, ad]
I need to ensure that "pbrc:tl:" and "ad" (following the colon, somewhere but before the "]" is searched. All of the above examples should return true. I am currently using the following Regex which is failing.
\[pbrc:tl:[^]]*ad*\]
I would appreciate a correction to my Regex.
Thank you in advance.
EDIT
These need to return false:
[pbrc:st:ad]
[pbrc:tl:abc]
I will check whether this is true for the provided solutions. First does not I believe.

Your ad* matches a and zero or more d symbols. You need to allow any number of characters other than ] before the final ], thus, replace the last * with [^]]*:
\[pbrc:tl:[^]]*ad[^]]*]
^^^^^
See the regex demo
Pattern details:
\[ - a literal [
pbrc:tl: - literal text pbrc:tl:
[^]]* - zero or more characters other than ] (in some regex flavors, the ] inside must be escaped, e.g. in JavaScript, or ICU)
ad - a literal text ad
[^]]* - zero or more characters other than ]
] -a literal ] (may be escaped, in most languages, it does not need escaping when outside a character class).

you miss after ad
\[pbrc:tl:[^]]*ad[^]]*\]

Regex lookahead/lookbehind match for SQL script

I'm trying to analyse some SQLCMD scripts for code quality tests. I have a regex not working as expected:
^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?)
I'm trying to match:
Strings that start with USE (ignore whitespace)
Followed by optional square bracket
Followed by 1 or more non-whitespace characters.
EXCEPT where that text is "master" (case insensitive)
OR EXCEPT where that that text is a $ symbol
Expected results:
USE [master] - don't match
USE [$(CompiledDatabaseName)] - don't match
USE [anything_else.01234] - match
Also, the same patterns above without the [ and ] characters.
I'm using Sublime Text 2 as my RegEx search tool and referencing this cheatsheet

Your pattern - ^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?) - uses a lookbehind that is variable-width (its length is not known beforehand) if you fix the character class issue inside it (i.e. replace [...] with (...) as you mean an alternative list of $ or a character sequence master) and thus is invalid in a Boost regex. Your (.)+ capturing is wrong since this group will only contain one last character captured (you could use (.+)), but this also matches spaces (while you need 1 or more non-whitespace characters). ? is the one or zero times quantifier, but you say you might have 2 opening and closing brackets (so, you need a limiting quantifier {0,2}).
You can use
^\h*USE(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master)))\h*\[{0,2}[^]\s]*]{0,2}
See regex demo
Explanation:
^ - start of a line in Sublime Text
\h* - optional horizontal whitespace (if you need to match newlines, use \s*)
USE - a literal case-sensitive character sequence USE
(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master))) - a negative lookahead that makes sure the USE is NOT followed with:
\h* - zero or more horizontal whitespace
\[{0,2} - zero, one or two [ brackets
[^]\s]* - zero or more characters other than ] and whitespace
(?:\$|(?i:master)) - either a $ or a case-insensitive master (we turn off case sensitivity with (?i:...) construct)
\h* - go on matching zero or more horizontal whitespace
\[{0,2} - zero, one or two [ brackets
[^]\s]* - zero or more characters other than ] and whitespace (when ] is the first character in a character class, it does not have to be escaped in Boost/PCRE regexps)
]{0,2} - zero, one or two ] brackets (outside of character class, the closing square bracket does not need escaping)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract with REGEXP the specific combination in BigQuery - regex

Related

Match all strings preceding string in brackets

Identify and replace non-ASCII characters between brackets

Exclude an escaped character from a range

Cannot get this Regex - \[pbrc:tl:[^]]ad\] - to work?

Regex lookahead/lookbehind match for SQL script

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract with REGEXP the specific combination in BigQuery - regex

Related

Match all strings preceding string in brackets

Identify and replace non-ASCII characters between brackets

Exclude an escaped character from a range

Cannot get this Regex - \[pbrc:tl:[^]]*ad*\] - to work?

Regex lookahead/lookbehind match for SQL script

Categories

Resources

Cannot get this Regex - \[pbrc:tl:[^]]ad\] - to work?