Regular Expression not grouping - regex

Need help with this regex
ABC 130 zlis 02-03/12 N180 Grouping req
A B Csd 130 pain 02/12 I80 alias
(\w+\s{0,3})(\d+)
The regex does not seem to group as I need it to.
Desired Output, brackests are the groups im trying to detect.
(A B Csd) (130) (pain) (02/12) (I80) (alias)

Try this regex:
([a-z ]+?)\s+(\d+)\s+([a-z]+)\s+([\d-\/]+)\s+([\w ]+)
Click for Demo
Explanation:
([a-z ]+?) - match 1+ occurrences(as few as possible) of a letter or a space and capture it as Group1
\s+ - matches 1+ occurrences of a whitespace character
(\d+) - match 1+ occurrences of digits and capture as Group2
\s+ - matches 1+ occurrences of a whitespace character
([a-z]+) - match 1+ occurrences of a letter and Capture as Group 3
\s+ - matches 1+ occurrences of a whitespace character
([\d-\/]+) - match 1+ occurrences of a digit or - or / and capture it as Group4
\s+ - matches 1+ occurrences of a whitespace character
([\w ]+) - match 1+ occurrences of a word-character or a space and capture as Group5
Note that I have used the g, i, m flags for Global matches, Case-insensitive and Multiline respectively.

Related

Regex get certain information

I've stumbled with certain types of rows.
I parse this information
195/75 R 16 C X Wonder Van 110/108R 10PR Tourador
The groups, which I need
I've got the following regex
([0-9]+)?\/([0-9]+)\s*\w\s*([0-9]+(?:\.\d+)?)\s*(C\s+)?(.+\s+?(?=[0-9]{2,3}|(\d{2,3}\/\d{2,3})))(?:(\d{2,3}\/\d{2,3})|(\d{2,3}))\s*(\w)(.*)
It works nicely for all kinds od rows, e.g
225/55 R18 X Speed TU1 98V Toradfor
225/50 R 16 X Wonder TH1 96W XL Tourador
195/75 R 16 C X Wonder Van 110/108R 8PR Tourador
However, it doesn't work for
195/75 R 16 C X Wonder Van 110/108R 10PR Tourador
because of 10PR, where 10 consists of 2 digits
how it works now
Thank you!
In you pattern you use alternations | that can match and capture unrelated parts in the strings.
What you could do is use anchors and and an optional capture group
For all the given example strings you might use:
^(\d+)\/(\d+)\s+[A-Z]*\s*(\d+)\s*([A-Z])(.*?)(\d+\/\d+([A-Z]+))?\s+(\d+[A-Z]+\s+.*)$
The pattern in parts:
^ Start of string
(\d+)\/(\d+)\s+ Capture 2 times 1+ digits in a group
[A-Z]*\s* Match optional chars A-Z and optional whitspace chars
(\d+)\s* Capture 1+ digits in a group and match optional whitespace chars
([A-Z]) Capture a single char A-Z in a group
(.*?) Capture as few as possible chars in a group
( Capture group
\d+\/\d+ Match 1+ digits / and 1+ digits
([A-Z]+) Capture 1+ chars A-Z
)? Close the capture group and make it optional
\s+ Match 1+ whitespace chars
(\d+[A-Z]+\s+.*) Capture group, match 1+ digits, 1+ chars A-Z, 1+ whitespce chars and the rest of the line
$ End of string
Regex demo

Regex exclude whitespaces from a group to select only a number

I need to take only a number (a float number) from a text, but I can't remove the whitespaces...
** Update
I have a problem with this method, I only need to consider numbers and ',' between '- EUR' and 'Fee' as rule.
You can use
- EUR\W*(.*?)\W*Fee
See the regex demo.
Variations of the regex that might work in different regex engines:
- EUR\W*\K.*?(?=\W*Fee)
(?<=- EUR\W*).*?(?=\W*Fee)
Details:
- EUR - literal text
\W* - zero or more non-word chars
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W*- zero or more non-word chars
Fee - a string.
You could also match the number format in capture group 1
- EUR\b\D*(\d+(?:,\d+)?)\s+Fee\b
- EUR\b Match - EUR and a word boundary
\D* Match 0+ times any char except a digit
( Capture group 1
\d+(?:,\d+)? Match 1+ digits with an optional decimal part
) Close group 1
\s+Fee\b Match 1+ whitespace chars, Fee and a word boundary
Regex demo
this is working i removed the , from (.) in test string.
Regex example - working

Regex is matching in Javascript but not PCRE

I'm trying to match all fractions or 'evs' and strings (string1, string2) the following string with regex. The strings may contain any number of white spaces ('String 1', 'The String 1', 'The String Number 1').
10/3 string1 evs string2 8/5 mon 19:45 string1 v string2 1/1 string1 v string2 1/1
The following regex works in Javascript but not in PHP. No errors are returned, just 0 results.
(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs)
Here's the expected result, other than group 6 and 7 (ran using Javascript):
If I add a ? to the first (.+) so that it becomes (.+?), I get the desired result but with the first string not captured:
As soon as I remove the ? to capture the whole string, there are no results returned. Can somebody work out what's going on here?
In PCRE/PHP, you may use
$regex = '(\d{1,3}\/\d{1,3}|evs)\s+(\S+)\s+((?1))\s+(\S+)\s+((?1))\s+(.+?)\s+v\s+(\S+)\s+((?1))\s+(\S+)\s+v\s+(\S+)\s+((?1))';
if (preg_match_all($regex, $text, $matches)) {
print_r($matches[0]);
}
See the regex demo
The point is that you can't over-use .*? / .+ in the middle of the pattern, that leads to catastrophic backtracking.
You need to use precise patterns to match whitespace, and non-whitespace fields, and only use .*? / .+? where the fields can contain any amount of whitespace and non-whitespace chars.
Details
(\d{1,3}\/\d{1,3}|evs) - Group 1 (its pattern can be later accessed using (?1) subroutine): one to three digits, / and then one to three digits, or evs
\s+(\S+)\s+ - 1+ whitespaces, Group 2 matching 1+ non-whitespace chars, 1+ whitespaces
((?1)) - Group 3 that matches the same way Group 1 pattern does
\s+(\S+)\s+((?1))\s+ - 1+ whitespaces, Group 4 matching 1+ non-whitespaces, 1+ whitespaces, Group 5 with the Group 1 pattern, 1+ whitespaces
(.+?) - Group 6: matching any 1 or more char chars other than line break chars as few as possible
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+) - Group 7: 1+ non-whitespaces
\s+((?1))\s+ - 1+ whitespaces, Group 8 with Group 1 pattern, 1+ whitespaces
(\S+) - Group 9: 1+ non-whitespaces
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+)\s+((?1)) - Group 10: 1+ non-whitespaces, then 1+ whitespaces and Group 11 with Group 1 pattern.

Task for matching floating point numbers

Task:
MATCH:
3.45
5,4
.45
3e4
,54
4
4.
4,
DON'T MATCH:
4,5e
2e
.3.
2e,4
,4.
d34
2.45t
2,45.
Currently i came up with the following:
(?<=\s|^)[-+]?(?:(?:[.,]?\d+[.,]?\d*[eE]\d+(?!\w|[.,]))|[.,]?\d+[.,]?\d*(?!\w|[.,]))\b
That works for almost everything, except 2 last numbers (4. and 4,) and got stucked
You may use
(?<!\S)[-+]?[0-9]*(?:[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?|(?<=\d)[,.])(?!\S)
See the regex demo
Details
(?<!\S) - start of string or a whitespace must appear immediately to the left
[-+]? - an optional + or -
[0-9]* - 0+ digits
(?:[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?|[,.]) - either
[.,]?[0-9]+(?:[eE][-+]?[0-9]+)? - an optional . or ,, then 1+ digits, then an optional sequence of e or E, followed with an optional . or , and 1+ digits
| - or
(?<=\d)[,.] - a dot or comma only if preceded with a digit (to avoid matching standalone . or ,)
(?!\S) - end of string or a whitespace must appear immediately to the right.
Regex graph:
You could use an alternation to match 1+ digits followed by a dot or comma and 0+ digits or match the Ee part followed by 1+ digits.
Or match starting with a dot or comma followed by 1+ digits.
If this is the only thing to match on the line, you could use anchors ^ and $ or use lookarounds to assert that there are no non whitespace chars on the left and right.
(?<!\S)(?:\d+(?:[.,]\d*|[eE]\d+)?|[.,]\d+)(?!\S)
Pattern parts
(?<!\S) Assert what is directly to the left is non a non whitespace char
(?: Non capturing group
\d+ Match 1+ digits
(?: Non capturing group
[.,]\d* Match either . or , and 0+ digits
| Or
[eE]\d+ Match e or E and 1+ digits
)? Close group and make it optional
| Or
[.,]\d+ Match . or , and 1+ digits
) Close group
(?!\S) Assert what is directly to the right is non a non whitespace char
Regex demo

Regex : Everything in group except white space

I have this regex right here :
^(#include rem\(\s*(.*)),\s*(.*)\)
That matches this string :
#include rem( padding-top, $alert-padding );
I want to be able that the group with $alert-padding ignores the white space at the end. I tried doing :
^(#include rem\(\s*(.*)),\s*(/S)\)
replace the .* by /S but it doesn't match.
You can play around with the regex here :
https://regex101.com/r/9rouVU/1/
You may use \S+ to match 1 or more non-whitespace characters:
^(#include rem\(\s*(\S+))\s*,\s*(\S+)\s*\)
See the regex dem0
Details:
^ - start of string
(#include rem\(\s*(\S+)) - Group 1 capturing:
#include rem\( - a literal substring #include rem(
\s* - 0+ whitespaces
(\S+) - Group 2 capturing 1+ non-whitespace symbols
\s*,\s* - 0+ whitespaces, , and again 0+ whitespaces
(\S+) - 1+ non-whitespace symbols
\s* - 0+ whitespaces
\) - a literal ).
You can make the match in the second group lazy and then match for further optional whitespace:
^(#include rem\(\s*(.*)),\s*(.*?)\s*\)