Separating starting digits with regex - regex

I want to separate the starting digits from strings as
01.text
2 - something
3 more
to get
array (
[0] => 01.text
[1] => 01
[2] text
)
array (
[0] => 2 - something
[1] => 2
[2] something
)
array (
[0] => 3 more
[1] => 3
[2] more
)
I tried a regex pattern of
^(\d+)\.+|\s+|-+(.*?)
but doesn't work as I expected.
My problem is how to match . or - with or without space after the digits.

Your regex uses an alternation which would match either in a capturing group one or more digits followed by a dot or a whitespace character or | in a group any character zero or more times non greedy.
You could update your regex to not use the alternations | and make the quantifier in the second group greedy.
In the first group capture one or more digits, then match your character in a character class followed by another capturing group that would match one or more times any character:
^(\d+)[.\s-]+(.+)
Demo

It's better try to give a pattern to strings that you want to split. I know that sometimes its not possible. So, this Regex match with all cases and give to you the Array you desire
/^(\d+)[\.\-\s]*(.*)?$/
let rows = [
"01.text",
"2 - something",
"3 more"
];
let regex = /^(\d+)[\.\-\s]*(.*)?$/;
for(let row of rows) {
console.log(regex.exec(row))
}
Anyway, if you know more separators in the file add then to the [\.\-\s]*

Related

How to split string by slash which is not between numbers?

How to split string by slash which is not between numbers?
I am using preg_split function below:
$splitted = preg_split('#[/\\\\\_\s]+#u', $string);
Input: "925/123 Black/Jack"
Splitted result now:
[
0 => '925',
1 => '123',
2 => 'Black',
3 => 'Jack'
]
Splitted result I want:
[
0 => '925/123',
1 => 'Black',
2 => 'Jack'
]
You may use
preg_split('#(?:[\s\\\\_]|(?<!\d)/(?!\d))+#u', '925/123 Black/Jack')
See the PHP demo and the regex demo and the regex graph:
Details
(?: - start of a non-capturing group:
[\s\\_] - a whitespace, \ or _
| - or
(?<!\d)/(?!\d) - a / not enclosed with digits
)+ - end of a non-capturing group, repeat 1 or more times.
One option is match 1 or more digits divided by a forward slash with whitespace boundaries on the left and on the right.
Then use SKIP FAIL, and match 1 or more times what is listed in the character class. Note that you don't have to escape the underscore.
(?<!\S)\d+(?:/\d+)+(?!\S)(*SKIP)(*F)|[/\\_\s]+
Explanation
(?<!\S)\d+(?:/\d+)+(?!\S) Match a repeated number of digits between forward slashes
(*SKIP)(*F) Skip
| Or
[/\\_\s]+ Match 1+ occurrences of any of the listed
Regex demo | Php demo
For example
$string = "925/123 Black/Jack";
$pattern = "#(?<!\S)\d+(?:/\d+)+(?!\S)(*SKIP)(*F)|[/\\\\_\s]+#u";
$splitted = preg_split($pattern, $string);
print_r($splitted);
Output
Array
(
[0] => 925/123
[1] => Black
[2] => Jack
)
Your regex is unnecessarily complicated. You need to split your string on:
either a space (maybe more generally - a sequence of white chars),
or a slash
not preceded by a digit (negative lookbehind),
not followed by a digit (negative lookahead).
So the regex you need (enclosed in # chars, with doubled backslashes) is:
#(?<!\\d)/(?!\\d)|\\s+#
Example of code:
$string = "925/123 Black/Jack";
$pattern = "#(?<!\\d)/(?!\\d)|\\s+#";
$splitted = preg_split($pattern, $string);
print_r($splitted);
prints just what you want:
Array
(
[0] => 925/123
[1] => Black
[2] => Jack
)

RegEx for capturing every single character except forward slash

I have the following two example strings:
"taxonomy": "abc/about_abc/bsc/archive/2009/presentations_dec"
"taxonomy": "about/archive/term"
"taxonomy": "_decommisioned/ntp-server.niehs.nih.gov/htdocs/results_status/resstatf"
I have tried with the following RegEx:
"taxonomy": "(\w+[^\/])\/?"?
The goal is to take each of those strings and explode them onto their own separate lines on the forward slash, so term1/term2/term3 equals
term1
term2
term3
I also don't know how many terms there are per line, which is why they are broke up like they are. It could be minimum one, max 7. My fill RegEx looks like this:
( "taxonomy": "(\w+[^\/])?\/?(\w+[^\/])?\/?(\w+[^\/])?\/?(\w+[^\/])?\/?(\w+[^\/])?\/?(\w+[^\/])?\/?(\w+[^\/])?\/?")
How do I adjust my capture group to get everything except the forward slashes?
As mentioned in the comments, in the third string this part ntp-server.niehs.nih.gov which is not matched by \w
But you might simplify your expression by matching not a forward slash by using a negated character class and a repeating pattern that match a forward slash and then again 1+ times not a forward slash.
Then you could split your match on a forward slash.
Pattern
"taxonomy": "\K[^/\n]+(?:/[^/\n]+)+(?=")
Explanation
"taxonomy": Match literally
"\K Match double quote and then forget what was matched using \K
[^/\n]+ Match 1+ times not a forward slash using a negated character class
(?:/[^/\n]+)+ Repeating pattern to match /, then 1+ times not a /
(?=") Positive lookahead to assert what is on the right is a double quote
Demo on regex101 | Php demo
For example, if you use explode in php:
$pattern = '~"taxonomy": "\K[^/\n]+(?:/[^/\n]+)+(?=")~';
$strings = [
'"taxonomy": "abc/about_abc/bsc/archive/2009/presentations_dec"',
'"taxonomy": "about/archive/term"',
'"taxonomy": "_decommisioned/ntp-server.niehs.nih.gov/htdocs/results_status/resstatf"'
];
foreach ($strings as $string) {
preg_match($pattern, $string, $match);
print_r(explode('/', $match[0]));
}
Result:
Array
(
[0] => abc
[1] => about_abc
[2] => bsc
[3] => archive
[4] => 2009
[5] => presentations_dec
)
Array
(
[0] => about
[1] => archive
[2] => term
)
Array
(
[0] => _decommisioned
[1] => ntp-server.niehs.nih.gov
[2] => htdocs
[3] => results_status
[4] => resstatf
)

Filter out a expression from Regex match

I have a regex query which works fine for most of the input patterns but few.
Regex query I have is
("(?!([1-9]{1}[0-9]*)-(([1-9]{1}[0-9]*))-)^(([1-9]{1}[0-9]*)|(([1-9]{1}[0-9]*)( |-|( ?([1-9]{1}[0-9]*))|(-?([1-9]{1}[0-9]*)){1})*))$")
I want to filter out a certain type of expression from the input strings i.e except for the last character for the input string every dash (-) should be surrounded by the two separate integers i.e (integer)(dash)(integer).
Two dashes sharing 3 integers is not allowed even if they have integers on either side like (integer)(dash)(integer)(dash)(integer).
If the dash is the last character of input preceded by the integer that's an acceptable input like (integer)(dash)(end of the string).
Also, two consecutive dashes are not allowed. Any of the above-mentioned formats can have space(s) between them.
To give the gist, these dashes are used in my input string to provide a range.
Some example of expressions that I want to filter out are
1-5-10, 1 - 5 - 10, 1 - - 5, -5
Update - There are some rules which will drive the input string. My job is to make sure I allow only those input strings which follow the format. Rules for the format are -
1. Space (‘ ‘) delimited numbers. But dash line doesn’t need to have a space. For example, “10 20 - 30” or “10 20-30” are all valid values.
2. A dash line (‘-‘) is used to set range (from, to). It also can used to set to the end of job queue list. For example, “100-150 200-250 300-“ is a valid value.
3. A dash-line without start job number is not allowed. For example, “-10” is not allowed.
Thanks
You might use:
^(?:(?:[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]*|[1-9][0-9]*)(?: (?:[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]*|[1-9][0-9]*))*(?: [1-9][0-9]*-)?|[1-9][0-9]*-?)[ ]*$
Regex demo
Explanation
^ Assert start of the string
(?: Non capturing group
(?: Non capturing group
[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]* Match number > 0, an optional space, a dash, an optional space and number > 0. The space is in a character class [ ] for clarity.
| Or
[1-9][0-9]* Match number > 0
) Close non capturing group
(?:[ ] Non capturing group followed by a space
(?: Non capturing group
[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]* Match number > 0, an optional space, a dash, an optional space and number > 0.
| Or
[1-9][0-9]* Match number > 0
) close non capturing group
)* close non capturing group and repeat zero or more times
(?: [1-9][0-9]*-)? Optional part that matches a space followed by a number > 0
| Or
[1-9][0-9]*-? Match a number > 0 followed by an optional dash
) close non capturing group
[ ]*$ Match zero or more times a space and assert the end of the string
NoteIf you want to match zero or more times a space instead of an optional space, you could update [ ]? to [ ]*. You can write [1-9]{1} as [1-9]
After the update the question got quite a lot of complexity. Since some parts of the regex are reused multiple times I took the liberty of working this out in Ruby and cleaned it up afterwards. I'll show you the build process so the regex can be understood. Ruby uses #{variable} for regex and string interpolation.
integer = /[1-9][0-9]*/
integer_or_range = /#{integer}(?: *- *#{integer})?/
integers_or_ranges = /#{integer_or_range}(?: +#{integer_or_range})*/
ending = /#{integer} *-/
regex = /^(?:#{integers_or_ranges}(?: +#{ending})?|#{ending})$/
#=> /^(?:(?-mix:(?-mix:(?-mix:[1-9][0-9]*)(?: *- *(?-mix:[1-9][0-9]*))?)(?: +(?-mix:(?-mix:[1-9][0-9]*)(?: *- *(?-mix:[1-9][0-9]*))?))*)(?: +(?-mix:(?-mix:[1-9][0-9]*) *-))?|(?-mix:(?-mix:[1-9][0-9]*) *-))$/
Cleaning up the above regex leaves:
^(?:[1-9][0-9]*(?: *- *[1-9][0-9]*)?(?: +[1-9][0-9]*(?: *- *[1-9][0-9]*)?)*(?: +[1-9][0-9]* *-)?|[1-9][0-9]* *-)$
You can replace [0-9] with \d if you like, but since you used the [0-9] syntax in your question I used it for the answer as well. Keep in mind that if you do replace [0-9] with \d you'll have to escape the backslash in string context. eg. "[0-9]" equals "\\d"
You mention in your question that
Any of the above-mentioned formats can have space(s) between them.
I assumed that this means space(s) are not allowed before or after the actual content, only between the integers and -.
Valid:
15 - 10
1234 -
Invalid:
15 - 10
123
If this is not the case simply add * to the start and end.
^ *... *$
Where ... is the rest of the regex.
You can test the regex in my demo, but it should be clear from the build process what the regex does.
var
inputs = [
'1-5-10',
'1 - 5 - 10',
'1 - - 5',
'-5',
'15-10',
'15 - 10',
'15 - 10',
'1510',
'1510-',
'1510 -',
'1510 ',
' 1510',
' 15 - 10',
'10 20 - 30',
'10 20-30',
'100-150 200-250 300-',
'100-150 200-250 300- ',
'1-2526-27-28-',
'1-25 26-2728-',
'1-25 26-27 28-',
],
regex = /^(?:[1-9][0-9]*(?: *- *[1-9][0-9]*)?(?: +[1-9][0-9]*(?: *- *[1-9][0-9]*)?)*(?: +[1-9][0-9]* *-)?|[1-9][0-9]* *-)$/,
logInputAndMatch = input => {
console.log(`input: "${input}"`);
console.log(input.match(regex))
};
inputs.forEach(logInputAndMatch);

Regex: Regular expression to pick the nth parameter of the response

Consider the example below:
AT+CEREG?
+CEREG: "4s",123,"7021","28","8B7200B",8,,,"00000010","10110100"
The desired response would be to pick n
n=1 => "4s"
n=2 => 123
n=8 =>
n=10 => 10110100
In my case, I am enquiring some details from an LTE modem and above is the type of response I receive.
I have created this regex which captures the (n+1)th member under group 2 including the last member, however, I can't seem to work out how to pick the 1st parameter in the approach I have taken.
(?:([^,]*,)){5}([^,].*?(?=,|$))?
Could you suggest an alternative method or complete/correct mine?
You may start matching from : (or +CEREG: if it is a static piece of text) and use
:\s*(?:[^,]*,){min}([^,]*)
where min is the n-1 position of the expected value.
See the regex demo. This solution is std::regex compatible.
Details
: - a colon
\s* - 0+ whitespaces
(?:[^,]*,){min} - min occurrences of any 0+ chars other than , followed with ,
([^,]*) - Capturing group 1: 0+ chars other than ,.
A boost::regex solution might look neater since you may easily capture substrings inside double quotes or substrings consisting of chars other than whitespace and commas using a branch reset group:
:\s*(?:[^,]*,){0}(?|"([^"]*)"|([^,\s]+))
See the regex demo
Details
:\s*(?:[^,]*,){min} - same as in the first pattern
(?| - start of a branch reset group where each alternative branch shares the same IDs:
"([^"]*)" - a ", then Group 1 holding any 0+ chars other than " and then a " is just matched
| - or
([^,\s]+)) - (still Group 1): one or more chars other than whitespace and ,.

Trying to create a RegEx for the following patterns

Here are the patterns:
Red,Green (and so on...)
Red (+5.00),Green (+6.00) (and so on...)
Red (+5.00,+10.00),Green (+6.00,+20.00) (and so on...)
Red (+5.00),Green (and so on...)
Each attribute ("Red,"Green") can have 0, 1, or 2 modifiers (shown as "+5.00,+10.00", etc.).
I need to capture each of the attributes and their modifiers as a single string (i.e. "Red (+5.00,+10.00)", "Green (+6.00,+20.00)".
Help?
Another example (PCRE):
((?:Red|Green)(?:\s\((?:\+\d+\.\d+,?)+\))?)
Explanation:
(...) // a capture group
(?:...) // a non-capturing group
Read|Green // matches Red or Green
(?:...)? // an optional non-capturing group
\s // matches any whitespace character
\( // matches a literal (
(?:...)+ // a non-capturing group that can occur one or more times
\+ // matches a literal +
\d+ // matches one or more digits
\. // matches a literal .
\d+ // matches one or more digits
,? // matches an optional comma
\) //matches a literal )
Update:
Or actually if you just want to extract the data, then
((?:Red|Green)(?:\s\([^)]+\))?)
would be sufficient.
Update 2: As pointed out in your comment, this would match anything in the first part but , and (:
([^,(]+(?:\s\([^)]+\))?)
(does not work, too permissive)
to be more restrictive (allowing only characters and numbers, you can just use \w:
(\w+(?:\s\([^)]+\))?)
Update 3:
I see, the first of my alternatives does not work correctly, but \w works:
$pattern = "#\w+(?:\s\([^)]+\))?#";
$str = "foo (+15.00,-10.00),bar (-10.00,+25),baz,bing,bam (150.00,-5000.00)";
$matches = array();
preg_match_all($pattern, $str, $matches);
print_r($matches);
prints
Array
(
[0] => Array
(
[0] => foo (+15.00,-10.00)
[1] => bar (-10.00,+25)
[2] => baz
[3] => bing
[4] => bam (150.00,-5000.00)
)
)
Update 4:
Ok, I got something working, please check whether it always works:
(?=[^-+,.]+)[^(),]+(?:\s?\((?:[-+\d.]+,?)+\))?
With:
$pattern = "#(?=[^-+,.]+)[^(),]+(?:\s?\((?:[-+\d.]+,?)+\))?#";
$str = "5 lb. (+15.00,-10.00),bar (-10.00,+25),baz,bing,bam (150.00,-5000.00)";
preg_match_all gives me
Array
(
[0] => Array
(
[0] => 5 lb. (+15.00,-10.00)
[1] => bar (-10.00,+25)
[2] => baz
[3] => bing
[4] => bam (150.00,-5000.00)
)
)
Maybe there is a simpler regex, I'm not an expert...
PCRE format:
(Red|Green)(\s\((?P<val1>.+?)(,){0,1}(?P<val2>.+?){0,1}\)){0,1}
Match from PHP:
preg_match_all("/(Red|Green)(\s\((?P<val1>.+?)(,){0,1}(?P<val2>.+?){0,1}\)){0,1}/ims", $text, $matches);
Here's my bid:
/
(?:^|,) # Match line beginning or a comma
(?: # parent wrapper to catch multiple "color (+#.##)" patterns
( # grouping pattern for picking off matches
(?:(?:Red|Green),?)+ # match the color prefix
\s\( # space then parenthesis
(?: # wrapper for repeated number groups
(?:\x2B\d+\.\d+) # pattern for the +#.##
,?)+ # end wrapper
\) # closing parenthesis
)+ # end matching pattern
)+ # end parent wrapper
/
Which translates to:
/(?:^|,)(?:((?:(?:Red|Green),?)+\s\((?:(?:\x2B\d+\.\d+),?)+\))+)+/
EDIT
Sorry, it was only catching the last pattern before. This will catch all matches (or should).