I'm testing this on regex101.com
Regex: ^\+([0-9A-Za-z-]+)(?:\.([0-9A-Za-z-]+))*$
Test string: +beta-bar.baz-bz.fd.zz
The string matches, but the "match information" box shows that there are only two capture groups:
MATCH 1
1. [1-9] `beta-bar`
2. [20-22] `zz`
I was expecting all these captures:
beta-bar
baz-bz
fd
zz
Why didn't each identifier between periods get recognized as its own captured group?
The reason why that happens is because when using a quantifier on a capture group and it is captured n times, only the last captured text gets stored in the buffer and returned at the end.
Instead of matching those parts, you can preg_split the string you have with a simple regex [+.]:
$str = "+beta-bar.baz-bz.fd.zz";
$a = preg_split('/[+.]/', $str, -1, PREG_SPLIT_NO_EMPTY);
See IDEONE demo
Result:
Array
(
[0] => beta-bar
[1] => baz-bz
[2] => fd
[3] => zz
)
Related
How to split string by slash which is not between numbers?
I am using preg_split function below:
$splitted = preg_split('#[/\\\\\_\s]+#u', $string);
Input: "925/123 Black/Jack"
Splitted result now:
[
0 => '925',
1 => '123',
2 => 'Black',
3 => 'Jack'
]
Splitted result I want:
[
0 => '925/123',
1 => 'Black',
2 => 'Jack'
]
You may use
preg_split('#(?:[\s\\\\_]|(?<!\d)/(?!\d))+#u', '925/123 Black/Jack')
See the PHP demo and the regex demo and the regex graph:
Details
(?: - start of a non-capturing group:
[\s\\_] - a whitespace, \ or _
| - or
(?<!\d)/(?!\d) - a / not enclosed with digits
)+ - end of a non-capturing group, repeat 1 or more times.
One option is match 1 or more digits divided by a forward slash with whitespace boundaries on the left and on the right.
Then use SKIP FAIL, and match 1 or more times what is listed in the character class. Note that you don't have to escape the underscore.
(?<!\S)\d+(?:/\d+)+(?!\S)(*SKIP)(*F)|[/\\_\s]+
Explanation
(?<!\S)\d+(?:/\d+)+(?!\S) Match a repeated number of digits between forward slashes
(*SKIP)(*F) Skip
| Or
[/\\_\s]+ Match 1+ occurrences of any of the listed
Regex demo | Php demo
For example
$string = "925/123 Black/Jack";
$pattern = "#(?<!\S)\d+(?:/\d+)+(?!\S)(*SKIP)(*F)|[/\\\\_\s]+#u";
$splitted = preg_split($pattern, $string);
print_r($splitted);
Output
Array
(
[0] => 925/123
[1] => Black
[2] => Jack
)
Your regex is unnecessarily complicated. You need to split your string on:
either a space (maybe more generally - a sequence of white chars),
or a slash
not preceded by a digit (negative lookbehind),
not followed by a digit (negative lookahead).
So the regex you need (enclosed in # chars, with doubled backslashes) is:
#(?<!\\d)/(?!\\d)|\\s+#
Example of code:
$string = "925/123 Black/Jack";
$pattern = "#(?<!\\d)/(?!\\d)|\\s+#";
$splitted = preg_split($pattern, $string);
print_r($splitted);
prints just what you want:
Array
(
[0] => 925/123
[1] => Black
[2] => Jack
)
I want to separate the starting digits from strings as
01.text
2 - something
3 more
to get
array (
[0] => 01.text
[1] => 01
[2] text
)
array (
[0] => 2 - something
[1] => 2
[2] something
)
array (
[0] => 3 more
[1] => 3
[2] more
)
I tried a regex pattern of
^(\d+)\.+|\s+|-+(.*?)
but doesn't work as I expected.
My problem is how to match . or - with or without space after the digits.
Your regex uses an alternation which would match either in a capturing group one or more digits followed by a dot or a whitespace character or | in a group any character zero or more times non greedy.
You could update your regex to not use the alternations | and make the quantifier in the second group greedy.
In the first group capture one or more digits, then match your character in a character class followed by another capturing group that would match one or more times any character:
^(\d+)[.\s-]+(.+)
Demo
It's better try to give a pattern to strings that you want to split. I know that sometimes its not possible. So, this Regex match with all cases and give to you the Array you desire
/^(\d+)[\.\-\s]*(.*)?$/
let rows = [
"01.text",
"2 - something",
"3 more"
];
let regex = /^(\d+)[\.\-\s]*(.*)?$/;
for(let row of rows) {
console.log(regex.exec(row))
}
Anyway, if you know more separators in the file add then to the [\.\-\s]*
I'm testing this on regex101.com
Regex: ^\+([0-9A-Za-z-]+)(?:\.([0-9A-Za-z-]+))*$
Test string: +beta-bar.baz-bz.fd.zz
The string matches, but the "match information" box shows that there are only two capture groups:
MATCH 1
1. [1-9] `beta-bar`
2. [20-22] `zz`
I was expecting all these captures:
beta-bar
baz-bz
fd
zz
Why didn't each identifier between periods get recognized as its own captured group?
The reason why that happens is because when using a quantifier on a capture group and it is captured n times, only the last captured text gets stored in the buffer and returned at the end.
Instead of matching those parts, you can preg_split the string you have with a simple regex [+.]:
$str = "+beta-bar.baz-bz.fd.zz";
$a = preg_split('/[+.]/', $str, -1, PREG_SPLIT_NO_EMPTY);
See IDEONE demo
Result:
Array
(
[0] => beta-bar
[1] => baz-bz
[2] => fd
[3] => zz
)
I've been reading some articles on non-capturing groups on this site and on the net
(such as http://www.regular-expressions.info/brackets.html and http://www.asiteaboutnothing.net/regexp/regex-disambiguation.html, What does the "?:^" regular expression mean?, What is a non-capturing group? What does a question mark followed by a colon (?:) mean?)
I am clear on the meaning of (?:foo). What I am unclear about is (?=foo). Is (?=foo) also always a non-capturing group, or does it depend?
No, (?=foo) will not capture "foo". Any look-around assertion (negative- and positive look ahead & behind) will not capture, but only check the presence (or absence) of text.
For example, the regex:
(X(?=\d+))
matches "X" only when there's one or more digits after it. However, these digits are not a part of match group 1.
You can define captures inside the look ahead to capture it. For example, the regex:
(X(?=(\d+)))
matches "X" only when there's one or more digits after it. And these digits are captured in match group 2.
A PHP demo:
<?php
$s = 'X123';
preg_match_all('/(X(?=(\d+)))/', $s, $matches);
print_r($matches);
?>
will print:
Array
(
[0] => Array
(
[0] => X
)
[1] => Array
(
[0] => X
)
[2] => Array
(
[0] => 123
)
)
Lookarounds are always non-capturing and zero-width.
Every group starting with ? will be non-capturing, although only (?:foo) works as a regular group.
Here are the patterns:
Red,Green (and so on...)
Red (+5.00),Green (+6.00) (and so on...)
Red (+5.00,+10.00),Green (+6.00,+20.00) (and so on...)
Red (+5.00),Green (and so on...)
Each attribute ("Red,"Green") can have 0, 1, or 2 modifiers (shown as "+5.00,+10.00", etc.).
I need to capture each of the attributes and their modifiers as a single string (i.e. "Red (+5.00,+10.00)", "Green (+6.00,+20.00)".
Help?
Another example (PCRE):
((?:Red|Green)(?:\s\((?:\+\d+\.\d+,?)+\))?)
Explanation:
(...) // a capture group
(?:...) // a non-capturing group
Read|Green // matches Red or Green
(?:...)? // an optional non-capturing group
\s // matches any whitespace character
\( // matches a literal (
(?:...)+ // a non-capturing group that can occur one or more times
\+ // matches a literal +
\d+ // matches one or more digits
\. // matches a literal .
\d+ // matches one or more digits
,? // matches an optional comma
\) //matches a literal )
Update:
Or actually if you just want to extract the data, then
((?:Red|Green)(?:\s\([^)]+\))?)
would be sufficient.
Update 2: As pointed out in your comment, this would match anything in the first part but , and (:
([^,(]+(?:\s\([^)]+\))?)
(does not work, too permissive)
to be more restrictive (allowing only characters and numbers, you can just use \w:
(\w+(?:\s\([^)]+\))?)
Update 3:
I see, the first of my alternatives does not work correctly, but \w works:
$pattern = "#\w+(?:\s\([^)]+\))?#";
$str = "foo (+15.00,-10.00),bar (-10.00,+25),baz,bing,bam (150.00,-5000.00)";
$matches = array();
preg_match_all($pattern, $str, $matches);
print_r($matches);
prints
Array
(
[0] => Array
(
[0] => foo (+15.00,-10.00)
[1] => bar (-10.00,+25)
[2] => baz
[3] => bing
[4] => bam (150.00,-5000.00)
)
)
Update 4:
Ok, I got something working, please check whether it always works:
(?=[^-+,.]+)[^(),]+(?:\s?\((?:[-+\d.]+,?)+\))?
With:
$pattern = "#(?=[^-+,.]+)[^(),]+(?:\s?\((?:[-+\d.]+,?)+\))?#";
$str = "5 lb. (+15.00,-10.00),bar (-10.00,+25),baz,bing,bam (150.00,-5000.00)";
preg_match_all gives me
Array
(
[0] => Array
(
[0] => 5 lb. (+15.00,-10.00)
[1] => bar (-10.00,+25)
[2] => baz
[3] => bing
[4] => bam (150.00,-5000.00)
)
)
Maybe there is a simpler regex, I'm not an expert...
PCRE format:
(Red|Green)(\s\((?P<val1>.+?)(,){0,1}(?P<val2>.+?){0,1}\)){0,1}
Match from PHP:
preg_match_all("/(Red|Green)(\s\((?P<val1>.+?)(,){0,1}(?P<val2>.+?){0,1}\)){0,1}/ims", $text, $matches);
Here's my bid:
/
(?:^|,) # Match line beginning or a comma
(?: # parent wrapper to catch multiple "color (+#.##)" patterns
( # grouping pattern for picking off matches
(?:(?:Red|Green),?)+ # match the color prefix
\s\( # space then parenthesis
(?: # wrapper for repeated number groups
(?:\x2B\d+\.\d+) # pattern for the +#.##
,?)+ # end wrapper
\) # closing parenthesis
)+ # end matching pattern
)+ # end parent wrapper
/
Which translates to:
/(?:^|,)(?:((?:(?:Red|Green),?)+\s\((?:(?:\x2B\d+\.\d+),?)+\))+)+/
EDIT
Sorry, it was only catching the last pattern before. This will catch all matches (or should).