Postgres regexp_matches between two patterns

Postgres regexp_matches between two patterns - regex

I am trying to split the expression like in Postgres 9.4:
"some text 123_good_345 and other text 123_some_invalid and 222_work ok_333 stop."
using pattern: (\d+\_.*\_\d+\D)+?
result is:
"123_good_345"
"123_some_invalid and 222_work ok_333"
But I need
"123_good_345"
"222_work ok_333"
note, ignoring "123_some_invalid"
Please help!

You may use
\d+_(?:(?!\d_).)*_\d+
See the regex demo. Or, if there can be no digits between \d+_ and _\d+, use
\d+_\D+_\d+
See this regex demo.
Details
\d+ - 1 or more digits
-_ - an underscore
(?:(?!\d_).)* - any char, 0 or more repetitions, as many as possible, that does not start a digit + _ char sequence
\D+ - any 1+ chars other than digits
_ - an underscore
\d+ - 1+ digits.
See the PostgreSQL demo:
SELECT unnest(regexp_matches('some text 123_good_345 and other text 123_some_invalid and 222_work ok_333 stop.', '\d+_(?:(?!\d_).)*_\d+', 'g'));
or
SELECT unnest(regexp_matches('some text 123_good_345 and other text 123_some_invalid and 222_work ok_333 stop.', '\d+_\D+_\d+', 'g'));

Related

postgres regex positive lookahead is not working as expected

I want to capture tokens in a text in the following pattern:
The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.
example:
AA123123A1
AA123123A
AA123123123
i want to match and capture
start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2
Example:
AA123123A1 => [AA,123123,A1]
AA123123A. => [AA,123123,A]
AA123123123 => [AA,123123123,'']
the following regex is working in python but not in postgres.
regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'
In Postgressql
select regexp_matches('AA2311121A1',
'^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');
result:
{AA,2311121A1,""}
I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.

You can use
^([A-Za-z]{2})(.*?)([A-Za-z][0-9]?)?$
See the regex demo and a DB fiddle online:
Details:
^ - start of string
([A-Za-z]{2}) - two ASCII letters
(.*?) - Group 2: any zero or more chars as few as possible
([A-Za-z][0-9]?)? - Group 3: an optional sequence of an ASCII letter and then an optional digit
$ - end of string.

Why's this postgres regexp_match giving me null instead of the regex groups?

This:
select regexp_matches('test text user:testuser,anotheruser hashtag:peach,phone,milk site:youtube.com,twitter.com flair:😂bobby😂', '^.*?(?=\s+[^:\s]+:)|([^:\s]+):([^:\s]+)','gi');
gives me only one group match and a row with NULL:
regexp_matches
-----------------
{NULL,NULL}
{flair,😂bobby😂}
It works fine when I test it here:
https://regex101.com/r/AxsatL/3
What am I doing wrong?

You may use
'^(?:(?!\s+[^:\s]+:).)*|[^:\s]+:[^:\s]+'
The point here is to keep all quantifiers greedy and remove all capturing parentheses.
The ^(?:(?!\s+[^:\s]+:).)* part will match - from the start of the string - any char, 0 or more occurrences, that does not start a sequence of the following patterns: 1+ whitespaces, 1+ chars other than : and whitespace and then a :.
Online test:
select regexp_matches(
'test text user:testuser,anotheruser hashtag:peach,phone,milk site:youtube.com,twitter.com flair:😂bobby😂',
'^(?:(?!\s+[^:\s]+:).)*|[^:\s]+:[^:\s]+',
'gi'
);
Result:

Regex: user pattern this or another

I have strings like:
Name 31X10.50R15 109S RX706 SUV
Brand 131/70R11 NU8 Word RX808
Word 6.00R16 983/222 10PR MONO S+V
I need to match 31X10.50 and 6.00R16 only from strings, as you can see there is no pattern like "digit X digit" or "digit R digit" in the second string line.
My preg_match was this:
/(\d*\.?\d+?)x\K\d*\.?\d+?|\d*\.?\d+?r\d*/i
With this line: (\d*\.?\d+?)x\K\d*\.?\d+? I am finding 31 and 10.5 from first string.
With next line: \d*\.?\d+?r\d* I hope to find 6.00R16 and took only 6.00
So my regex logic is to match 31X10.50 or 6.00R16 from strings. But second line is not working for me...
what I am doing wrong?

You may use
(?<![\d\/])(\d*\.?\d+)[xr](\d*\.?\d+)
See the regex demo.
Details
(?<![\d\/]) - there should be no digit or / to the right of the current location
(\d*\.?\d+) - Group 1: 0+ digits, an optional . and 1+ digits
[xr] - x or r
(\d*\.?\d+) - Group 2: 0+ digits, an optional . and 1+ digits

Regex
the code worked on any string contained . .
(\d+\.\d+)\w+

Regex match depending on lookbehind match

I need to match these values:
(First approach to a regex that roughly does what I want)
\d+([.,]\d{3})*[.,]\d{2}
like
24,56
24.56
1.234,56
1,234.56
1234,56
1234.56
but I need to not match
1.234.56
1,234,56
So somehow I need to check the last occurrence of "." or "," to not be the same as the previous "." or ",".
Background: Amounts shall be matched in English and German format with (optional) 1000-Separators.
But even with help of regex101 I completely fail at coming up with a correctly working look-behind. Any suggestions are highly appreciated.
UPDATE
Based on the answers I got so far, I came up with this (demo):
\d{1,3}(?:([\.,'])?\d{3})*(?!\1)[\.,\s]\d{2}
But it matches for example 1234.567,23 which is not desirable.

You may capture the digit grouping symbol and use a negative lookahead with a backreference to restrict the decimal separator:
^(?:\d+|\d{1,3}(?:([.,])\d{3})*)(?!\1)[.,]\d{2}$
^ ^ ^^^^^
See the regex demo
Group 1 will contain the last value of the digit grouping symbol and (?!\1)[.,] will match the other symbol.
Details:
^ - start of string
(?:\d+|\d{1,3}(?:([.,])\d{3})*) - either of the two alternatives:
\d+ - 1+ digits
| - or
\d{1,3} - 1 to 3 digits,
(?:([.,])\d{3})* - zero or more sequences of:
([.,]) - Group 1 capturing . or ,
\d{3} - 3 digits
(?!\1)[.,] - a . or , but not equal to what was last captured with ([.,]) pattern above
\d{2} - 2 digits
$ - end of string.

You can use
^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$
live demo

Regular Expression to match the following Date Format

I have a regex to match the date formats Sep.23'15 or Sep 23'15 or Sep23'15
[a-zA-Z]{3}[. ]\d{2}'\d{2}
I am able to match Sep.23'15 & Sep 23'15 but not Sep23'15
How to write the regex to match with space and without space ?

I suggest matching a dot (optionally) and then use * quantifier instead of the ? suggested by Tushar applied to a space:
[a-zA-Z]{3}\.?[ ]*\d{2}'\d{2}
^^^^^^^
This regex will also handle format like Sep. 23'15 (with a dot and a space(s) between the month and the day'year).
Regex explanation:
[a-zA-Z]{3} - 3 ASCII letters
\.? - 1 or 0 dots
[ ]* - zero or more regular spaces (\h, or \p{Zs}, or [[:blank:]] are recommended depending on the regex flavor if you only need to match horizontal whitespace)
\d{2}'\d{2} - 2 digits + ' + 2 digits.
See demo

You can use the following regex:
[A-Za-z]{3}[\.\s]?\d{1,2}\'\d{2}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Postgres regexp_matches between two patterns - regex

Related

postgres regex positive lookahead is not working as expected

Why's this postgres regexp_match giving me null instead of the regex groups?

Regex: user pattern this or another

Regex match depending on lookbehind match

Regular Expression to match the following Date Format

Categories

Resources