Regex is matching in Javascript but not PCRE

Regex is matching in Javascript but not PCRE - regex

I'm trying to match all fractions or 'evs' and strings (string1, string2) the following string with regex. The strings may contain any number of white spaces ('String 1', 'The String 1', 'The String Number 1').
10/3 string1 evs string2 8/5 mon 19:45 string1 v string2 1/1 string1 v string2 1/1
The following regex works in Javascript but not in PHP. No errors are returned, just 0 results.
(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs)
Here's the expected result, other than group 6 and 7 (ran using Javascript):
If I add a ? to the first (.+) so that it becomes (.+?), I get the desired result but with the first string not captured:
As soon as I remove the ? to capture the whole string, there are no results returned. Can somebody work out what's going on here?

In PCRE/PHP, you may use
$regex = '(\d{1,3}\/\d{1,3}|evs)\s+(\S+)\s+((?1))\s+(\S+)\s+((?1))\s+(.+?)\s+v\s+(\S+)\s+((?1))\s+(\S+)\s+v\s+(\S+)\s+((?1))';
if (preg_match_all($regex, $text, $matches)) {
print_r($matches[0]);
}
See the regex demo
The point is that you can't over-use .*? / .+ in the middle of the pattern, that leads to catastrophic backtracking.
You need to use precise patterns to match whitespace, and non-whitespace fields, and only use .*? / .+? where the fields can contain any amount of whitespace and non-whitespace chars.
Details
(\d{1,3}\/\d{1,3}|evs) - Group 1 (its pattern can be later accessed using (?1) subroutine): one to three digits, / and then one to three digits, or evs
\s+(\S+)\s+ - 1+ whitespaces, Group 2 matching 1+ non-whitespace chars, 1+ whitespaces
((?1)) - Group 3 that matches the same way Group 1 pattern does
\s+(\S+)\s+((?1))\s+ - 1+ whitespaces, Group 4 matching 1+ non-whitespaces, 1+ whitespaces, Group 5 with the Group 1 pattern, 1+ whitespaces
(.+?) - Group 6: matching any 1 or more char chars other than line break chars as few as possible
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+) - Group 7: 1+ non-whitespaces
\s+((?1))\s+ - 1+ whitespaces, Group 8 with Group 1 pattern, 1+ whitespaces
(\S+) - Group 9: 1+ non-whitespaces
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+)\s+((?1)) - Group 10: 1+ non-whitespaces, then 1+ whitespaces and Group 11 with Group 1 pattern.

Related

Regex exclude whitespaces from a group to select only a number

I need to take only a number (a float number) from a text, but I can't remove the whitespaces...
** Update
I have a problem with this method, I only need to consider numbers and ',' between '- EUR' and 'Fee' as rule.

You can use
- EUR\W*(.*?)\W*Fee
See the regex demo.
Variations of the regex that might work in different regex engines:
- EUR\W*\K.*?(?=\W*Fee)
(?<=- EUR\W*).*?(?=\W*Fee)
Details:
- EUR - literal text
\W* - zero or more non-word chars
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W*- zero or more non-word chars
Fee - a string.

You could also match the number format in capture group 1
- EUR\b\D*(\d+(?:,\d+)?)\s+Fee\b
- EUR\b Match - EUR and a word boundary
\D* Match 0+ times any char except a digit
( Capture group 1
\d+(?:,\d+)? Match 1+ digits with an optional decimal part
) Close group 1
\s+Fee\b Match 1+ whitespace chars, Fee and a word boundary
Regex demo

this is working i removed the , from (.) in test string.
Regex example - working

How to get only the first match of a regex Grok filter

goal
I want to retrieve only this string "14" from this message with a logstash Grok
3/03/0 EE 14 GFR 20 AAA XXXXX 50 3365.00
this is my grok code
grok{
match => {
field1 => [
"(?<number_extract>\d{0}\s\d{1,3}\s{1})"
]
}
}
I would like to match just the first match "14" but my Grok filter returns all matches:
14 20 50

If you need to find the first occurrence of a number that consists of 1, 2 or 3 digits only, you may use
^(?:.*?\s)?(?<number_extract>\d{1,3})(?!\S)
Details
^ - start of string
(?:.*?\s)? - an optional substring of any 0+ chars other than line break chars as few as possible, and then a whitespace (this enables a match at the start of the string if it is there)
(?<number_extract>\d{1,3}) - 1 to 3 digits
(?!\S) - a negative lookahead that makes sure there is a whitespace or end of string immediately to the right (enables a match at the end of the string).
Alternative solution
If you know that the number you are looking for is after a date-like field and another field, and you want to force this pre-validation, you may use
^\d+/\d+/\d+\s+\S+\s+(?<number_extract>\d+)
See the regex demo
If you do not have to check if the first field is date-like, you may simply use
^\S+\s+\S+\s+(?<number_extract>\d+)
^(?:\S+\s+){2}(?<number_extract>\d+) // Equivalent
See the regex demo here.
Details
^ - start of string
\d+/\d+/\d+ - 1+ digits, /, 1+ digits, /, 1+ digits
\s+ - 1+ whitespaces
\S+ - 1+ chars other than whitespace
\s+ - 1+ whitespaces
(?<number_extract>\d+) - Capturing group "number_extract": 1+ digits.
Grok demo:

Regex to Capture rest of the line

I have a regex that captures the following expression
XPT 123A
Now I need to add "something" to my regex to capture the remaining string as a group
XPT 123A I AM VERY HAPPY
So XPT would be group 1, 123A group 2, and I AM VERY HAPPY group 3.
Here is my regex (also here http://regexr.com/4mocf):
^([A-Z]{2,4}).((?=\d)[a-zA-Z\d]{0,4})
EDIT:
I dont want to name my groups (editing b/c some people thought it was a dup of another question)

Assuming Group 3 is optional, you may use
^([A-Z]{2,4}) (\d[a-zA-Z\d]{0,3})(?: (.*))?$
^([A-Z]{2,4})\s+(\d[a-zA-Z\d]{0,3})(?:\s+(.*))?$
The \s+ matches any 1+ whitespace chars.
See the regex demo.
Details
^ - start of string
([A-Z]{2,4}) - Group 1: two, three or four uppercase ASCII letters
\s+ - 1+ whitespaces
(\d[a-zA-Z\d]{0,3}) - Group 2: a digit followed with 0 or more alphanumeric chars
(?:\s+(.*))? - an optional non-capturing group matching 1 or 0 occurrences of:
\s+ - 1+ whitespaces
(.*) - Group 3: any 0+ chars other than line break chars as many as possible
$ - end of string

Just add the following suffix to your regex to capture the rest of the line:
(?<rest>.+)?$

Regex for a sentence syntax

I am trying to separate a String into different parts that match a specif syntax.
The String I am using as example is Username 5/5, Version: 1.0 This is a custom message Sep 25, 2018.
Currently I have this Regex (\w+) ([0-9]\/[0-9]), (\w+): ([0-9][.][0-9][.]?[0-9]?) which gives me The username, the 5/5, the word version and the version 1.0.
First, how can I ignore the (\w+)? Since it'll always be version and I only need the number after.
Second question, is it possible to get the big message after the version, then get the date after it?
Output needed:
Username
5/5
1.0
This is a custom message
Sep 25, 2018

You may use
/^(\w+)\s+(\d+\/\d+),\s+\w+:\s*(\d+(?:\.\d+){1,2})\s*(.*?)\s*([a-zA-Z]+\s*\d{1,2},\s*\d{4})$/
See the regex demo
Details
^ - start of string
(\w+) - Group 1 (username): one or more letters, digits or _
\s+ - 1+ whitespaces
(\d+\/\d+) - Group 2 (5/5)
,\s+ - a comma and 1+ whitespaces
\w+: - 1+ word chars followed with :
\s* - 0+ whitespaces
(\d+(?:\.\d+){1,2}) - Group 3 (version number):
\d+ - 1+ digits
(?:\.\d+){1,2} - 1 or 2 sequences of a . followed with 1+ digits
\s* - 0+ whitespaces
(.*?) - Group 4 (message): any 0+ chars, as few as possible
\s* - 0+ whitespaces
([a-zA-Z]+\s*\d{1,2},\s*\d{4}) - Group 4 (date):
[a-zA-Z]+ - 1+ ASCII letters
\s* - 0+ whitespaces
\d{1,2} - 1 to 2 digits
,\s* - a comma and 0+ whitespaces
\d{4} - 4 digits
$ - end of string.

Try (.*)\s(\d\/\d),\s*Version:\s*(\d+\.\d+)\s*(.+?)\s*(\w{3} \d{1,2}, \d{4})
Capture the groups 1,2,3,4,5 to get the output you needed.
Regex

Regular Expression not grouping

Need help with this regex
ABC 130 zlis 02-03/12 N180 Grouping req
A B Csd 130 pain 02/12 I80 alias
(\w+\s{0,3})(\d+)
The regex does not seem to group as I need it to.
Desired Output, brackests are the groups im trying to detect.
(A B Csd) (130) (pain) (02/12) (I80) (alias)

Try this regex:
([a-z ]+?)\s+(\d+)\s+([a-z]+)\s+([\d-\/]+)\s+([\w ]+)
Click for Demo
Explanation:
([a-z ]+?) - match 1+ occurrences(as few as possible) of a letter or a space and capture it as Group1
\s+ - matches 1+ occurrences of a whitespace character
(\d+) - match 1+ occurrences of digits and capture as Group2
\s+ - matches 1+ occurrences of a whitespace character
([a-z]+) - match 1+ occurrences of a letter and Capture as Group 3
\s+ - matches 1+ occurrences of a whitespace character
([\d-\/]+) - match 1+ occurrences of a digit or - or / and capture it as Group4
\s+ - matches 1+ occurrences of a whitespace character
([\w ]+) - match 1+ occurrences of a word-character or a space and capture as Group5
Note that I have used the g, i, m flags for Global matches, Case-insensitive and Multiline respectively.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex is matching in Javascript but not PCRE - regex

Related

Regex exclude whitespaces from a group to select only a number

How to get only the first match of a regex Grok filter

Regex to Capture rest of the line

Regex for a sentence syntax

Regular Expression not grouping

Categories

Resources