Find LETTERS-NUMBER pairs in postgres using regex

Find LETTERS-NUMBER pairs in postgres using regex - regex

I need to replace TEXT1-NUMBER with TEXT2-NUMBER.
Example "These are TEXT1-123 and TEXT1-456 examples" should be replaced with "These are TEXT2-123 and TEXT2-456 examples".
I can replace most of the cases using
Regexp_Replace(column_name, '(\mTEXT1)(-[0-9]+\M)', 'TEXT2\2', 'g')
But it also replaces some cases that I want to exclude, such as
TEXT1-NUMBER-NUMBER
TEXT3-NUMBER-TEXT1-NUMBER
How can I make it to match only exact pairs of TEXT-NUMBER?
Thanks.

You can use
SELECT REGEXP_REPLACE(column_name,
'(\s|^)TEXT1(-[0-9]+)(?!\S)',
'\1TEXT2\2', 'g') AS Result;
See the regex demo.
Beginning with PostgreSQL 10, lookbehinds are supported, and you can also use REGEXP_REPLACE(column_name, '(?<!\S)TEXT1(-[0-9]+)(?!\S)', 'TEXT2\1', 'g') then.
Regex details:
(\s|^) - Group 1 (\1 refers to this value): a whitespace or start of string
TEXT1 - a static string
-(-[0-9]+) - Group 2 (\2 refers to this value): - and one or more digits
(?!\S) - a negative lookahead that fails the match if there is no non-whitespace char immediately to the right of the current location.

Related

Pattern to match everything except a string of 5 digits

I only have access to a function that can match a pattern and replace it with some text:
Syntax
regexReplace('text', 'pattern', 'new text'
And I need to return only the 5 digit string from text in the following format:
CRITICAL - 192.111.6.4: rta nan, lost 100%
Created Time Tue, 5 Jul 8:45
Integration Name CheckMK Integration
Node 192.111.6.4
Metric Name POS1
Metric Value DOWN
Resource 54871
Alert Tags 54871, POS1
So from this text, I want to replace everything with "" except the "54871".
I have come up with the following:
regexReplace("{{ticket.description}}", "\w*[^\d\W]\w*", "")
Which almost works but it doesn't match the symbols. How can I change this to match any word that includes a letter or symbol, essentially.
As you can see, the pattern I have is very close, I just need to include special characters and letters, whereas currently it is only letters:

You can match the whole string but capture the 5-digit number into a capturing group and replace with the backreference to the captured group:
regexReplace("{{ticket.description}}", "^(?:[\w\W]*\s)?(\d{5})(?:\s[\w\W]*)?$", "$1")
See the regex demo.
Details:
^ - start of string
(?:[\w\W]*\s)? - an optional substring of any zero or more chars as many as possible and then a whitespace char
(\d{5}) - Group 1 ($1 contains the text captured by this group pattern): five digits
(?:\s[\w\W]*)? - an optional substring of a whitespace char and then any zero or more chars as many as possible.
$ - end of string.

The easiest regex is probably:
^(.*\D)?(\d{5})(\D.*)?$
You can then replace the string with "$2" ("\2" in other languages) to only place the contents of the second capture group (\d{5}) back.
The only issue is that . doesn't match newline characters by default. Normally you can pass a flag to change . to match ALL characters. For most regex variants this is the s (single line) flag (PCRE, Java, C#, Python). Other variants use the m (multi line) flag (Ruby). Check the documentation of the regex variant you are using for verification.
However the question suggest that you're not able to pass flags separately, in which case you could pass them as part of the regex itself.
(?s)^(.*\D)?(\d{5})(\D.*)?$
regex101 demo
(?s) - Set the s (single line) flag for the remainder of the pattern. Which enables . to match newline characters ((?m) for Ruby).
^ - Match the start of the string (\A for Ruby).
(.*\D)? - [optional] Match anything followed by a non-digit and store it in capture group 1.
(\d{5}) - Match 5 digits and store it in capture group 2.
(\D.*)? - [optional] Match a non-digit followed by anything and store it in capture group 3.
$ - Match the end of the string (\z for Ruby).
This regex will result in the last 5-digit number being stored in capture group 2. If you want to use the first 5-digit number instead, you'll have to use a lazy quantifier in (.*\D)?. Meaning that it becomes (.*?\D)?.
(?s) is supported by most regex variants, but not all. Refer to the regex variant documentation to see if it's available for you.
An example where the inline flags are not available is JavaScript. In such scenario you need to replace . with something that matches ALL characters. In JavaScript [^] can be used. For other variants this might not work and you need to use [\s\S].
With all this out of the way. Assuming a language that can use "$2" as replacement, and where you do not need to escape backslashes, and a regex variant that supports an inline (?s) flag. The answer would be:
regexReplace("{{ticket.description}}", "(?s)^(.*\D)?(\d{5})(\D.*)?$", "$2")

Match string between delimiters, but ignore matches with specific substring

I have to parse all the text in a paranthesis but not the one that contains "GST"
e.g:
(AUSTRALIAN RED CROSS – ATHERTON)
(Total GST for this Invoice $1,104.96)
today for a quote (07) 55394226 − admin.nerang#waste.com.au − this applies to your Nerang services.
expected parsed value:
AUSTRALIAN RED CROSS – ATHERTON
I am trying:
^\(((?!GST).)*$
But its only matching the value and not grouping correctly.
https://regex101.com/r/HndrUv/1
What would be the correct regex for the same?

This regex should work to get the expected string:
^\((?!.*GST)(.*)\)$
It first checks if it does not contain the regular expression *GST. If true, it then captures the entire text.
(?!*GST)(.*)
All that is then surrounded by \( and \) to leave it out of the capturing group.
\((?!.*GST)(.*)\)
Finally you add the BOL and EOL symbols and you get the result.
^\((?!.*GST)(.*)\)$
The expected value is saved in the first capture group (.*).

You can use
^\((?![^()]*\bGST\b)([^()]*)\)$
See the regex demo. Details:
^ - start of string
\( - a ( char
(?![^()]*\bGST\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there are zero or more chars other than ) and ( and then GST as a whole word (remove \bs if you do not need whole word matching)
([^()]*) - Group 1: any zero or more chars other than ) and (
\) - a ) char
$ - end of string
Bonus:
If substrings in longer texts need to be matched, too, you need to remove ^ and $ anchors in the above regex.

postgres regex positive lookahead is not working as expected

I want to capture tokens in a text in the following pattern:
The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.
example:
AA123123A1
AA123123A
AA123123123
i want to match and capture
start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2
Example:
AA123123A1 => [AA,123123,A1]
AA123123A. => [AA,123123,A]
AA123123123 => [AA,123123123,'']
the following regex is working in python but not in postgres.
regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'
In Postgressql
select regexp_matches('AA2311121A1',
'^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');
result:
{AA,2311121A1,""}
I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.

You can use
^([A-Za-z]{2})(.*?)([A-Za-z][0-9]?)?$
See the regex demo and a DB fiddle online:
Details:
^ - start of string
([A-Za-z]{2}) - two ASCII letters
(.*?) - Group 2: any zero or more chars as few as possible
([A-Za-z][0-9]?)? - Group 3: an optional sequence of an ASCII letter and then an optional digit
$ - end of string.

Regex to get value from <key, value> by asserting conditions on the value

I have a regex which takes the value from the given key as below
Regex .*key="([^"]*)".* InputValue key="abcd-qwer-qaa-xyz-vwxc"
output abcd-qwer-qaa-xyz-vwxc
But, on top of this i need to validate the value with starting only with abcd- and somewhere the following pattern matches -xyz
Thus, the input and outputs has to be as follows:
I tried below which is not working as expected
.*key="([^"]*)"?(/Babcd|-xyz).*
The key value pair is part of the large string as below:
object{one="ab-vwxc",two="value1",key="abcd-eest-wd-xyz-bnn",four="obsolete Values"}
I think by matching the key its taking the value and that's y i used this .*key="([^"]*)".*
Note:
Its a dashboard. you can refer this link and search for Regex: /"([^"]+)"/ This regex is applied on the query result which is a string i referred. Its working with that regex .*key="([^"]*)".* above. I'm trying to alter with that regexGroup itself. Hope this helps?
Can anyone guide or suggest me on this please? That would be helpful. Thanks!

Looks like you could do with:
\bkey="(abcd(?=.*-xyz\b)(?:-[a-z]+){4})"
See the demo online
\bkey=" - A word-boundary and literally match 'key="'
( - Open 1st capture group.
abcd - Literally match 'abcd'.
(?=.*-xyz\b) - Positive lookahead for zero or more characters (but newline) followed by literally '-xyz' and a word-boundary.
(?: - Open non-capturing group.
-[a-z]+ - Match an hyphen followed by at least a single lowercase letter.
){4} - Close non-capture group and match it 4 times.
) - Close 1st capture group.
" - Match a literal double quote.
I'm not a 100% sure you'd only want to allow for lowercase letter so you can adjust that part if need be. The whole pattern validates the inputvalue whereas you could use capture group one to grab you key.
Update after edited question with new information:
Prometheus uses the RE2 engine in all regular expressions. Therefor the above suggestion won't work due to the lookarounds. A less restrictive but possible answer for OP could be:
\bkey="(abcd(?:-\w+)*-xyz(?:-\w+)*)"
See the online demo

Will this work?
Pattern
\bkey="(abcd-[^"]*\bxyz\b[^"]*)"
Demo

You could use the following regular expression to verify the string has the desired format and to match the portion of the string that is of interest.
(?<=\bkey=")(?=.*-xyz(?=-|$))abcd(?:-[a-z]+)+(?=")
Start your engine!
Note there are no capture groups.
The regex engine performs the following operations.
(?<=\bkey=") : positive lookbehind asserts the current
position in the string is preceded by 'key='
(?= : begin positive lookahead
.*-xyz : match 0+ characters, then '-xyz'
(?=-|$) : positive lookahead asserts the current position is
: followed by '-' or is at the end of the string
) : end non-capture group
abcd : match 'abcd'
(?: : begin non-capture group
-[a-z]+ : match '-' followed by 1+ characters in the class
)+ : end non-capture group and execute it 1+ times
(?=") : positive lookahead asserts the current position is
: followed by '"'

Extract TextString between second and third hyphen

I'm trying to extract some information from a string in one of my columns with a RegEx.
I need to define a second column equal to what is between the 2nd and 3rd occurrence of a hyphen in my first column.
After much googling around I managed to get this far:
IFNULL(SAFE.REGEXP_EXTRACT(Final.CampaignName, r"(?:\w+\s+-\s+){2}(\w+)\s+-"), "Other") AS CampaignCategory
Example of how a string of in Final.CampaignName could look:
S - Oranges - Bar - Apples
S - Apples - Foo Bar - Oranges - Bananas
S - Apples - Bar
My Regex will only return the value if there is 1 word between the 2nd and 3rd hyphens, but I need to have the entire text returned (minus leading and trailing whitespace).
Can anyone guide me in the right direction to doing this?
Thanks!

If the regex engine supports \K (loosely, forget everything matched so far), one could use the following regular expression to match the text between the second and third hyphen.
^(?:[^-]+-){2}\K[^-]+(?=-)
Note that this regex does not contain a capture group.
Demo
This does not match Bar in the third example because there are only two hyphens. To match Bar simply remove the lookahead (?=-).
The regex engine performs the following operations.
^ match beginning of line
(?:[^-]+-) match 1+ chars other than '-' followed by '-'
in a non-capture group
{2} execute non-capture group twice
\K discard everything matched so far (reset the starting
point of the reported match)
[^-]+ match 1+ chars other than '-'
(?=-) match '-' in a positive lookahead
If [^-] is not to match newlines change it to [^-\r\n].
If \K is not supported, a capture group is needed (and the lookahead is not):
^(?:[^-]+-){2}([^-]+)-

I were almost there - so, below is as close to your original idea as I could get (BigQuery Standard SQL)
SELECT IFNULL(REGEXP_EXTRACT(final.CampaignName, r"(?: - .*?){2}(.*?)(?: -|$)"), "Other") AS CampaignCategory

Use the following pattern with a capture group to isolate what you really want to extract:
SAFE.REGEXP_EXTRACT(Final.CampaignName, r"[^-]+-[^-]+-\s*([^-]+?)\s*-") AS CampaignCategory
Demo

You could match what is between the second and third hyphen using a capturing group and make matching the rest optional using a repeating pattern with *
\w+(?:\s+-\s+\w+)\s+-\s+(\w+(?: \w+)*)(?:\s+-\s+\w+)*
Regex demo

I always prefer the other way if possible, instead of using Regex.
So for your problem, I can recommend that code:
split(Final.CampaignName, ' - ')[safe_offset(2)]
An example with your sample data:
select campaignName, split(campaignName, ' - ')[safe_offset(2)] as third_item
from unnest(['S - Oranges - Bar - Apples', 'S - Apples - Foo Bar - Oranges - Bananas', 'S - Apples - Bar']) as campaignName
Output looks like:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find LETTERS-NUMBER pairs in postgres using regex - regex

Related

Pattern to match everything except a string of 5 digits

Match string between delimiters, but ignore matches with specific substring

postgres regex positive lookahead is not working as expected

Regex to get value from <key, value> by asserting conditions on the value

Extract TextString between second and third hyphen

Categories

Resources