Matching words & partial colon-delimited words within parentheses (excluding parentheses) - regex

I am trying to extract stock symbols from a body of text. These matches usually come in the following forms:
(<symbol>) => (VOO)
(<market>:<symbol>) => (NASDAQ:C)
In the sample cases shown above, I'd like to match VOO and C, skipping everything else. This regex gets me halfway there:
(?<=\()(.*?)(?=\))
With this, I match what's included within the parentheses, but the logic that ignores "noise" like NASDAQ: eludes me. I'd love to learn how to conditionally specify this pattern/logic.
Any ideas? Thanks!

You can use
[A-Z]+(?=\))
See the regex demo.
Details:
[A-Z]+ - one or more uppercase ASCII letters
(?=\)) - a positive lookahead that matches a location that is immediately followed with a ) char.
Alternatively, you can use the following to capture the values into Group 1:
\((?:[^():]*:)?([A-Z]+)\)
See this regex demo. Details:
\( - a ( char
(?:[^():]*:)? - an optional sequence of any zero or more chars other than (, ) and : and then a : char
([A-Z]+) - Group 1: one or more uppercase ASCII letters
\) - a ) char.

Related

Match string between delimiters, but ignore matches with specific substring

I have to parse all the text in a paranthesis but not the one that contains "GST"
e.g:
(AUSTRALIAN RED CROSS – ATHERTON)
(Total GST for this Invoice $1,104.96)
today for a quote (07) 55394226 − admin.nerang#waste.com.au − this applies to your Nerang services.
expected parsed value:
AUSTRALIAN RED CROSS – ATHERTON
I am trying:
^\(((?!GST).)*$
But its only matching the value and not grouping correctly.
https://regex101.com/r/HndrUv/1
What would be the correct regex for the same?
This regex should work to get the expected string:
^\((?!.*GST)(.*)\)$
It first checks if it does not contain the regular expression *GST. If true, it then captures the entire text.
(?!*GST)(.*)
All that is then surrounded by \( and \) to leave it out of the capturing group.
\((?!.*GST)(.*)\)
Finally you add the BOL and EOL symbols and you get the result.
^\((?!.*GST)(.*)\)$
The expected value is saved in the first capture group (.*).
You can use
^\((?![^()]*\bGST\b)([^()]*)\)$
See the regex demo. Details:
^ - start of string
\( - a ( char
(?![^()]*\bGST\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there are zero or more chars other than ) and ( and then GST as a whole word (remove \bs if you do not need whole word matching)
([^()]*) - Group 1: any zero or more chars other than ) and (
\) - a ) char
$ - end of string
Bonus:
If substrings in longer texts need to be matched, too, you need to remove ^ and $ anchors in the above regex.

Identify and replace non-ASCII characters between brackets

I have tags (only ASCII chars inside brackets) of the following structure: [Root.GetSomething], instead, some contributors ended up submitting contributions with Cyrillic chars that look similar to Latin ones, e.g. [Rооt.GеtSоmеthіng].
I need to locate, and then replace those inconsistencies with the matching ASCII characters inside the brackets.
I tried \[([АаІіВСсЕеРТтОоКкХхМ]+)\]; (\[)([^\x00-\x7F]+)(\]), and some variations of the range but those searches don't see any matches. I seem to be missing something important in the regex execution logic.
You can use a regex matching any "interesting" Cyrillic char in between [ + letters or . + ] and a conditional replacement pattern:
Find What: (?:\G(?!\A)|\[)[a-zA-Z.]*\K(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М))(?=[[:alpha:].]*])
Replace With: (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M)
Make sure Match Case option is ON. See a regex demo with a string:
Details:
(?:\G(?!\A)|\[) - end of the previous successful match or a [ char
[a-zA-Z.]* - zero or more . or ASCII letters
\K - match reset operator that discards the currently matched text from the overall match memory buffer
(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М)) - a non-capturing group containing 19 alternatives each of which is put into a separate capturing group
(?=[[:alpha:].]*]) - a positive lookahead that requires zero or more letters or . and then a ] char immediately to the right of the current location.
The (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M) replacement pattern replaces А with A (\u0410) if Group 1 matched, а (\u0430) with a if Group 2 matched, etc.

Regex to get value from <key, value> by asserting conditions on the value

I have a regex which takes the value from the given key as below
Regex .*key="([^"]*)".* InputValue key="abcd-qwer-qaa-xyz-vwxc"
output abcd-qwer-qaa-xyz-vwxc
But, on top of this i need to validate the value with starting only with abcd- and somewhere the following pattern matches -xyz
Thus, the input and outputs has to be as follows:
I tried below which is not working as expected
.*key="([^"]*)"?(/Babcd|-xyz).*
The key value pair is part of the large string as below:
object{one="ab-vwxc",two="value1",key="abcd-eest-wd-xyz-bnn",four="obsolete Values"}
I think by matching the key its taking the value and that's y i used this .*key="([^"]*)".*
Note:
Its a dashboard. you can refer this link and search for Regex: /"([^"]+)"/ This regex is applied on the query result which is a string i referred. Its working with that regex .*key="([^"]*)".* above. I'm trying to alter with that regexGroup itself. Hope this helps?
Can anyone guide or suggest me on this please? That would be helpful. Thanks!
Looks like you could do with:
\bkey="(abcd(?=.*-xyz\b)(?:-[a-z]+){4})"
See the demo online
\bkey=" - A word-boundary and literally match 'key="'
( - Open 1st capture group.
abcd - Literally match 'abcd'.
(?=.*-xyz\b) - Positive lookahead for zero or more characters (but newline) followed by literally '-xyz' and a word-boundary.
(?: - Open non-capturing group.
-[a-z]+ - Match an hyphen followed by at least a single lowercase letter.
){4} - Close non-capture group and match it 4 times.
) - Close 1st capture group.
" - Match a literal double quote.
I'm not a 100% sure you'd only want to allow for lowercase letter so you can adjust that part if need be. The whole pattern validates the inputvalue whereas you could use capture group one to grab you key.
Update after edited question with new information:
Prometheus uses the RE2 engine in all regular expressions. Therefor the above suggestion won't work due to the lookarounds. A less restrictive but possible answer for OP could be:
\bkey="(abcd(?:-\w+)*-xyz(?:-\w+)*)"
See the online demo
Will this work?
Pattern
\bkey="(abcd-[^"]*\bxyz\b[^"]*)"
Demo
You could use the following regular expression to verify the string has the desired format and to match the portion of the string that is of interest.
(?<=\bkey=")(?=.*-xyz(?=-|$))abcd(?:-[a-z]+)+(?=")
Start your engine!
Note there are no capture groups.
The regex engine performs the following operations.
(?<=\bkey=") : positive lookbehind asserts the current
position in the string is preceded by 'key='
(?= : begin positive lookahead
.*-xyz : match 0+ characters, then '-xyz'
(?=-|$) : positive lookahead asserts the current position is
: followed by '-' or is at the end of the string
) : end non-capture group
abcd : match 'abcd'
(?: : begin non-capture group
-[a-z]+ : match '-' followed by 1+ characters in the class
)+ : end non-capture group and execute it 1+ times
(?=") : positive lookahead asserts the current position is
: followed by '"'

Match only closing brackets

In the following string I have to find only 1) and 2):
"Dsfgdsf ghdsgtaq sadf 5hs a sdgewrg1) AF AFDS (1,1-3). sdfwurf sgwefasöpopwe qdasda (2,3-29). jkgwgvsd sdfawefas2)"
With \d\) I find all closing brackets.
With \((.*?)\) I find (1,1-3) and (2,3-29).
How do I manage to combine both patterns?
It seems you need to match 1 or more digits with a ) after them only if preceded with a letter.
You may use
(?<=\p{L})\d+\)
See the regex demo.
Details
(?<=\p{L}) - a positvie lookbehind requiring a letter to be present immediately to the left of the current position
\d+ - 1+ digits
\) - a literal ).

Regular expression: how to match words preceded and followed by a specific chars sequence and containing some other chars

I'm using a sort of bookmarks on a text. These bookmarks are so structured:
(# + field + #)
"Field" must contain only alphabetical chars (A-Z and a-z, not numbers or other chars).
I need to match words not satisfying this rule.
So, consider the following examples:
(#CompanyName#)
(#Company_Name#)
(#Company5Name#)
Only the first one is correct but I have to match the other 2 cases.
The pattern to match the first one is:
(\(\#)[A-Za-z]+(\#\))
To match the wrong cases I need something like this:
(\(\#)[^A-Za-z]+(\#\))
But this doesn't work correctly.
Can anyone suggest me how to get it working?
Thanks in advance and sorry for my english...
You can try
\(#[^#]*?[^A-Za-z#]+[^#]*?#\)
This should work:
\(#.*?[^A-Za-z].*?#\)
\( - escaped (
# - hash
.*? - zero or more wild-cards (non-greedily)
[^A-Za-z] - a single invalid character
.*? - zero or more wild-cards (non-greedily)
# - hash
\) - escaped )