regex select all number after a string - regex

I'm new to regex and i want to know if there is anyway to select all numbers after a matched string?
For example:
Input:
important string
abc 100
def 50
ghi jk 10
m 60
not important string
aa 90
bb 20
And as output, i want to select all these numbers: 100, 50, 10, 60
I have tried with important string[\w\n ]* (\d+) but i got only 60
Thanks alot!

A generic PCRE approach to matching multiple occurrences in between some texts is to use a \G based pattern that allows anchoring matches at the end of the previous successful match:
(?:\G(?!\A)|(?<!\bnot )important string)(?:(?!not important string)\D)*?\K\d+
See the regex demo
Basically,
(?s)(?:\G(?!\A)|STARTING_DELIMITER_STRING)(?:(?!END_DELIMITER_STRING).)*?\K\d+
Or, in order to stay within the initial STARTING_DELIMITER_STRING boundaries, add it to the negative lookahead:
(?s)(?:\G(?!\A)|STARTING_DELIMITER_STRING)(?:(?!STARTING_DELIMITER_STRING|END_DELIMITER_STRING).)*?\K\d+
Details:
(?:\G(?!\A)|(?<!\bnot )important string) - either the end of the previous successful match (\G(?!\A)) or an important string literal char sequence not preceded with not + space
(?:(?!not important string)\D)*? - any char other than digit (\D), 0+ occurrences, as few as possible, that is not a starting point for a not important string char sequence
\K - match reset operator
\d+ - 1+ digits

Related

regex to extract housenumber plus addition

I'm looking for a regex that matches housenumbers combined with additions for all addresses below:
Breestraat 4
Breestraat 45
Breestraat 456
Dubbele Straat 4a
Dubbele Straat 4-a
5 meistraat 1a
5meistraat 12
5meistraat 12a
Teststraat 22-III
Now the following regex works, except in the first case. This is because the single digit housenummber is missed because of the first \d in the regex (which prevents a starting digit to be captured).
\d?.(\d+.+)$
regex to extract housenumber addition
I'm scratching my head how to get the housenumer '4' for the first line. so basically how to change the "skip starting digit" to "skip starting digit but let it have to result on the capturing group".
You can use
\d+\D*$
\d+\S*$
See the regex demo #1 and regex demo #2.
The pattern matches
\d+ - one or more digits
\D* - zero or more non-digit chars
\S* - zero or more non-whitespace chars
$ - end of string.
It's not perfectly clear what you are requesting precisely..
Anyway this is the pattern matching the house number at the end of the string:
\d+[-\da-zI]*$
https://regexr.com/6l0g7
Anyway I'm aware this is not a valid answer

Regex (PCRE): Match all digits in a line following a line which includes a certain string

Using PCRE, I want to capture only and all digits in a line which follows a line in which a certain string appears. Say the string is "STRING99". Example:
car string99 house 45b
22 dog 1 cat
women 6 man
In this case, the desired result is:
221
As asked a similar question some time ago, however, back then trying to capture the numbers in the SAME line where the string appears ( Regex (PCRE): Match all digits conditional upon presence of a string ). While the question is similar, I don't think the answer, if there is one at all, will be similar. The approach using the newline anchor ^ does not work in this case.
I am looking for a single regular expression without any other programming code. It would be easy to accomplish with two consecutive regex operations, but this not what I'm looking for.
Maybe you could try:
(?:\bstring99\b.*?\n|\G(?!^))[^\d\n]*\K\d
See the online demo
(?: - Open non-capture group:
\bstring99\b - Literally match "string99" between word-boundaries.
.*?\n - Lazy match up to (including) nearest newline character.
| - Or:
\G(?!^) - Asserts position at the end of the previous match but prevent it to be the start of the string for the first match using a negative lookahead.
) - Close non-capture group.
[^\d\n]* - Match 0+ non-digit/newline characters.
\K - Resets the starting point of the reported match.
\d - Match a digit.

What regex to use to take only first occurrence of number in string?

I have table with comments like this
'Payment amount: 11000,50 from 144232'
'Payment amount: 13 450,20 from 144232'
Sometimes white spaces occurs in number, because people typing this manually. I need to get first numbers like 11000,50 and 13 450,20 from example.
I'm trying to use
regexp_replace('Payment amount: 11000,20 from 144232','([a-zA-Z:\s])','','g') and get result '11000,20144232', but I need only '11000,00'
How can I improve regex or what function I need to use to get this numbers?
To get the first number in the string with a comma and 2 digits after it, and spaces can occur between the digits due to typing:
^[a-zA-Z:\s]*(\d[\s\d]*,\s*\d\s*\d)
Explanation
^ Start of string
[a-zA-Z:\s]* Match 0+ times any of the listed chars in the character class
( Capture group 1 (this will contain the value)
\d Match a digit
\d[\s\d]* Match 0+ times a whitespace char or digit
,\s*\d\s*\d Match a comma and 2 digits with optional whitespace chars (Add \M if there can be no more word character following)
) Close group 1
Regex demo | Postgresql demo
A broader match could be to match 0+ times any char except a digit \D* instead of using [a-zA-Z:\s]*
Regex demo
This works for your examples, with or without decimals:
^[a-zA-Z:\s]+([\d\s]*(,\d{1,2})?)
https://regex101.com/r/1JEe3F/1
I would do it like this:
SELECT (regexp_match(
'Payment amount: 11000,00 from 144232',
'[[:digit:]][[:digit:] ]*(?:,[[:digit:]]+)')
)[1];
regexp_match
--------------
11000,00
(1 row)
Somewhat simpler but less strict, ([+\-]?\d[\d ,\.]+).*
SELECT (regexp_matches('Payment amount: 11,000.00 from 144232', '([+\-]?\d[\d ,\.]+).*'))[1];
Result: 11,000.00

RegEx match anything except linebreaks up to positive lookahead

I'm trying to match certain text lines up to a specific string in RegEx (PCRE). Here's an example:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2 - PZN: 04472730
-
Dr. Max Mustermann
In this text, I'd like to match exactly this part:
Amoxicillin 1000 Heumann 20 Filmtbl. N2
The similarity is always the part with the PZN and a 7-8 digit number behind that at the end of every line I'd like to match. However, the PZN part might sometimes be in the next line instead of directly behind it:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2
- PZN: 04472730
-
Dr. Max Mustermann
So it's either directly behind it or in the next line. I've tried to do so using this RegEx:
.*(?=[ \-\r\n]+PZN)
This does work, however, in the first example above, it matches this:
Amoxicillin 1000 Heumann 20 Filmtbl. N2 -
Notice the " -" at the end. This should not be included in the match. I suppose RegEx prioritizes the .* part since it's working from left to right, and therefore only strips the very last character of the lookahead. I can't wrap my head around as to how to do it otherwise though.
Any ideas?
One option is to use a capturing group and match 0+ whitespace chars before the - PZN: part.
^(?![^\S\r\n]*$)(.+)\s* - PZN: \d{7,8}$
^ Start of line
(?![^\S\r\n]*$) Assert not an empty line
(.+)\s* Capture in group 1 matching any char 1+ times followed by 0+ times a whitespace char
- PZN: Match a space - and space followed by PZN: and space
\d{7,8} Match 7-8 digits
$ End of line
Regex demo
Another option is the same pattern in the form of using a lookahead
^(?![^\S\r\n]*$).+(?=\s* - PZN: \d{7,8}$)
Regex demo
This would work:
^(.+?)(?=\s?- PZN:)
^(.+?) - at the start of a line lazily match everything
(?=\s?- PZN:) - tell .+? to quit matching once we detect an upcoming PZN:
https://regex101.com/r/dhpth0/1/

Match all type of numbers

I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.
I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.