Regex not returning all matches - regex

I have the following regex (my actual regex is actually a lot more complex but I pinned down my problem to this): \s(?<number>123|456)\s
And the following test data:
" 123 456 "
As expected/wanted result I would have the regex match in 2 matches one with "number" being "123" and the second with number being "456". However, I'm only getting 1 match with "number" being "123".
I did notice that adding another space in between "123" en "456" in the test data does give 2 matches...
Why don't I get the result I want? How to get it right?

Your pattern contains consuming \s patterns that matches a whitespace before and after a number, and the input contains consecutive numbers separated with a single whitespace. If there were two spaces between the numbers, it would work.
Use whitespace boundaries based on lookarounds:
(?<!\S)(?<number>123|456)(?!\S)
See the regex demo
The (?<!\S) is a negative lookbehind that will fail the match if there is a non-whitespace char immediately to the left of the current location, and (?!\S) is a negative lookahead that will fail the match if there is a non-whitespace char immediately to the right of the current location.
(?<!\S) is the same as (?<=^|\s) and (?!\S) is the same as (?=$|\s), but more efficient.
Note that in many situations you might even go with 1 lookahead and use
\s(?<number>123|456)(?!\S)
It will ensure the consecutive whitespace separated matches are found.

Related

REGEX to cover just first match from all

I have this type of regex
\b[0-9][0-9][0-9][0-9][0-9]\b
It's not complete, this will match me many examples of 5 digit but I need just first and one match from this structure:
Reference Number WW
30966 CFUN22 098765334
30967 CFUN22 098765335
30968 CFUN22 098765336
30969 CFUN22 098765337
In this case I need just "30966" , not 30967,30968 and so on...
I tried to do
\b[0-9][0-9][0-9][0-9][0-9]\b
You can use a positive lookbehind to make sure that you're grabbing the first 5-digit number after the word "Comments":
(?<=Comments\n)\d{5}\b
https://regex101.com/r/pZLj4K/1
Try using the following regex:
^\N+\n.*?(\d{5})
It will match:
^: start of string
\N+\n: any sequence of non-newline characters, followed by newline
\n: the newline character
.*?: optional smallest sequence of characters
(\d{5}): "Group 1" - sequence of five characters
Your needed digits can be found within Group 1.
Given you're dealing with a textual table, using \N\n will allow you to skip the header from selection, while .*? will allow to match your code not necessarily at the beginning of the second line.
Check the regex demo here.

Capture number if string contains "X", but limit match (cannot use groups)

I need to extract numbers like 2.268 out of strings that contain the word output:
Approxmiate output size of output: 2.268 kilobytes
But ignore it in strings that don't:
some entirely different string: 2.268 kilobytes
This regex:
(?:output.+?)([\d\.]+)
Gives me a match with 1 group, with the group being 2.268 for the target string. But since I'm not using a programming language but rather CloudWatch Log Insights, I need a way to only match the number itself without using groups.
I could use a positive lookbehind ?<= in order to not consume the string at all, but then I don't know how to throw away size of output: without using .+, which positive lookbehind doesn't allow.
With your shown samples, please try following regex.
output:\D+\K\d(?:\.\d+)?
Online demo for above regex
Explanation: Adding detailed explanation for above.
output:\D+ ##Matching output colon followed by non-digits(1 or more occurrences)
\K ##\K to forget previous matched values to make sure we get only further matched values in this expression.
\d(?:\.\d+)? ##Matching digit followed by optional dot digits.
Since you are using PCRE, you can use
output.*?\K\d[\d.]*
See the regex demo. This matches
output - a fixed string
.*? - any zero or more chars other than line break chars, as few as possible
\K - match reset operator that removes all text matched so far from the overall match memory buffer
\d - a digit
[\d.]* - zero or more digits or periods.

Regex match an entire number with lookbehind and look ahead logic(without word boundaries)

I am trying to detect if a string has a number based on few conditions. For example I don't want to match the number if it's surrounded by parentheses and I'm using the lookahead and lookbehind to do this but I'm running into issues when the number contains multiple digits. Also, the number can be between text without any space separators.
My regex:
(?https://regex101.com/r/RnTSMJ/1
Sample examples:
{2}: Should NOT Match. //My regex Works
{34: Should NOT Match. //My regex matches 4 in {34
45}: Should NOT Match. //My regex matches 4 in {45
{123}: Should NOT Match. //My regex matches 2 in {123}
I looked at Regex.Match whole words but this approach doesn't work for me. If I use word boundaries, the above cases work as expected but then cases like the below don't where numbers are surrounded with text. I also want to add some additional logic like don't match specific strings like 1st, 2nd, etc or #1, #2, etc
updated regex:
(?<!\[|\{|\(|#)(\b\d+\b)(?!\]|\}|\|st|nd|rd|th)
See here https://regex101.com/r/DhE3K4/4
123abd //should match 123
abc345 //should match 234
ab2123cd // should match 2123
Is this possible with pure regex or do I need something more comprehensive?
You could match 1 or more digits asserting what is on the left and right is not {, } or a digit or any of the other options to the right
(?<![{\d#])\d+(?![\d}]|st|nd|rd|th)
Explanation
(?<![{\d#]) Negative lookbehind, assert what is on the left is not {, # or a digit
\d+ Match 1+ digits
(?! Negative lookahead, assert what is on the right is not
[\d}]|st|nd|rd|th Match a digit, } or any of the alternatives
) Close lookahead
Regex demo
The following regex is giving the expected result.
(?<![#\d{])\d+(?!\w*(?:}|(?:st|th|rd|nd)\b))
Regex Link

Global regex with multiple matches, where the separator should be shared in several matches

First of all, sorry for the unclear title, it's hard to describe (and to find an existing solution for the same reason).
I use this regex in Javascript, to collect numbers in a string :
/(?:^|[^\d])([\d]+)(?:$|[^\d])/g
Executing it on "5358..2145" returns 2 matches, where the submatches are "5358" and "2145"
But if I use it on "5358.2145", I receive only 1 match : "5358"
So, I understand it so :
The first match is found ("5358.") so the point goes in the first match
What I want as second match is not preceded with start of string or the point because this point already belongs to the first match
How can I change my pattern to find all numbers separated with 1 non-number character ?
Use a negative lookahead at the end:
/(?:^|\D)(\d+)(?!\d)/g
See the regex demo
The pattern matches:
(?:^|\D) - either start of string (^) or any non-digit char (\D)
(\d+) - Group 1: one or more digits
(?!\d) - the negative lookahead failing the match if there is a digit immediately to the right of the current location.

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.