I am trying to separate a String into different parts that match a specif syntax.
The String I am using as example is Username 5/5, Version: 1.0 This is a custom message Sep 25, 2018.
Currently I have this Regex (\w+) ([0-9]\/[0-9]), (\w+): ([0-9][.][0-9][.]?[0-9]?) which gives me The username, the 5/5, the word version and the version 1.0.
First, how can I ignore the (\w+)? Since it'll always be version and I only need the number after.
Second question, is it possible to get the big message after the version, then get the date after it?
Output needed:
Username
5/5
1.0
This is a custom message
Sep 25, 2018
You may use
/^(\w+)\s+(\d+\/\d+),\s+\w+:\s*(\d+(?:\.\d+){1,2})\s*(.*?)\s*([a-zA-Z]+\s*\d{1,2},\s*\d{4})$/
See the regex demo
Details
^ - start of string
(\w+) - Group 1 (username): one or more letters, digits or _
\s+ - 1+ whitespaces
(\d+\/\d+) - Group 2 (5/5)
,\s+ - a comma and 1+ whitespaces
\w+: - 1+ word chars followed with :
\s* - 0+ whitespaces
(\d+(?:\.\d+){1,2}) - Group 3 (version number):
\d+ - 1+ digits
(?:\.\d+){1,2} - 1 or 2 sequences of a . followed with 1+ digits
\s* - 0+ whitespaces
(.*?) - Group 4 (message): any 0+ chars, as few as possible
\s* - 0+ whitespaces
([a-zA-Z]+\s*\d{1,2},\s*\d{4}) - Group 4 (date):
[a-zA-Z]+ - 1+ ASCII letters
\s* - 0+ whitespaces
\d{1,2} - 1 to 2 digits
,\s* - a comma and 0+ whitespaces
\d{4} - 4 digits
$ - end of string.
Try (.*)\s(\d\/\d),\s*Version:\s*(\d+\.\d+)\s*(.+?)\s*(\w{3} \d{1,2}, \d{4})
Capture the groups 1,2,3,4,5 to get the output you needed.
Regex
Related
I have some string like this below:
0015/Cnt.A/2021/EX. Mmj tech
021/Cnt.B/2021/EX.Mm logs
31/ Cgt.A / 2020 / PK Jap
453/ Nnt.A / 2020 / WK Jap pom sc
13/Wnt.A/2021/ LO.Mm pom
1911/Cno.A/2021/PQ Mm ris dMn
and I want to select for output like this below:
0015/Cnt.A/2021/EX. Mmj
021/Cnt.B/2021/EX.Mm
31/ Cgt.A / 2020 / PK Jap
453/ Nnt.A / 2020 / WK Jap
13/Wnt.A/2021/ LO.Mm
1911/Cno.A/2021/PQ Mm
I have tried this pattern [0-9]{1,}\/[a-zA-Z.\s-]{1,}\/[0-9\s]{1,}\/[a-zA-Z\s]+[\.\s]+[a-zA-Z]{1,} but it can't handle the 4th and 6th string. Anyone, can fix that pattern? and maybe make it more efficient?
edited:
There is a rule like this pattern -> number/letter with dot or space/year/letter with dot or space
The pattern to get all text up to the last slash and then only two words separated with a whitespace or . is
.*\/\s*[a-zA-Z]+[\s.]+[a-zA-Z]+
.*\/\s*\w+[\s.]+\w+
If you need to keep the initial regex part for stricter validation, use
[0-9]+\/[a-zA-Z.\s-]+\/[0-9\s]+\/\s*\w+[\s.]+\w+
See this demo (or this demo). Details:
.*\/ - any zero or more chars other than line break chars, as many as possible
\s* - zero or more whitespaces
[a-zA-Z]+ - one or more ASCII letters
[\s.]+ - one or more whitespaces/dots
[a-zA-Z]+ - one or more ASCII letters.
\w+ would match one or more letters, digits, or underscores.
Now, accommodating for the number/letter with dot or space/year/letter with dot or space rule:
\d+\/\s*[a-zA-Z]+(?:\.[a-zA-Z]+)*\s*\/\s*[0-9]{4}\s*\/\s*\w+[\s.]+\w+
See this regex demo. Details:
\d+ - one or more digits
\/ - a / char
\s* - zero or more whitespaces
[a-zA-Z]+(?:\.[a-zA-Z]+)*
\s*\/\s* - 0+ whitespaces, /, 0+ whitespaces
\d{4} - four digits
\s*\/\s* - 0+ whitespaces, /, 0+ whitespaces
\w+[\s.]+\w+ - one or more word chars, 1+ whitespaces/dots, 1+ word chars.
I need to take only a number (a float number) from a text, but I can't remove the whitespaces...
** Update
I have a problem with this method, I only need to consider numbers and ',' between '- EUR' and 'Fee' as rule.
You can use
- EUR\W*(.*?)\W*Fee
See the regex demo.
Variations of the regex that might work in different regex engines:
- EUR\W*\K.*?(?=\W*Fee)
(?<=- EUR\W*).*?(?=\W*Fee)
Details:
- EUR - literal text
\W* - zero or more non-word chars
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W*- zero or more non-word chars
Fee - a string.
You could also match the number format in capture group 1
- EUR\b\D*(\d+(?:,\d+)?)\s+Fee\b
- EUR\b Match - EUR and a word boundary
\D* Match 0+ times any char except a digit
( Capture group 1
\d+(?:,\d+)? Match 1+ digits with an optional decimal part
) Close group 1
\s+Fee\b Match 1+ whitespace chars, Fee and a word boundary
Regex demo
this is working i removed the , from (.) in test string.
Regex example - working
I'm trying to match all fractions or 'evs' and strings (string1, string2) the following string with regex. The strings may contain any number of white spaces ('String 1', 'The String 1', 'The String Number 1').
10/3 string1 evs string2 8/5 mon 19:45 string1 v string2 1/1 string1 v string2 1/1
The following regex works in Javascript but not in PHP. No errors are returned, just 0 results.
(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs)
Here's the expected result, other than group 6 and 7 (ran using Javascript):
If I add a ? to the first (.+) so that it becomes (.+?), I get the desired result but with the first string not captured:
As soon as I remove the ? to capture the whole string, there are no results returned. Can somebody work out what's going on here?
In PCRE/PHP, you may use
$regex = '(\d{1,3}\/\d{1,3}|evs)\s+(\S+)\s+((?1))\s+(\S+)\s+((?1))\s+(.+?)\s+v\s+(\S+)\s+((?1))\s+(\S+)\s+v\s+(\S+)\s+((?1))';
if (preg_match_all($regex, $text, $matches)) {
print_r($matches[0]);
}
See the regex demo
The point is that you can't over-use .*? / .+ in the middle of the pattern, that leads to catastrophic backtracking.
You need to use precise patterns to match whitespace, and non-whitespace fields, and only use .*? / .+? where the fields can contain any amount of whitespace and non-whitespace chars.
Details
(\d{1,3}\/\d{1,3}|evs) - Group 1 (its pattern can be later accessed using (?1) subroutine): one to three digits, / and then one to three digits, or evs
\s+(\S+)\s+ - 1+ whitespaces, Group 2 matching 1+ non-whitespace chars, 1+ whitespaces
((?1)) - Group 3 that matches the same way Group 1 pattern does
\s+(\S+)\s+((?1))\s+ - 1+ whitespaces, Group 4 matching 1+ non-whitespaces, 1+ whitespaces, Group 5 with the Group 1 pattern, 1+ whitespaces
(.+?) - Group 6: matching any 1 or more char chars other than line break chars as few as possible
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+) - Group 7: 1+ non-whitespaces
\s+((?1))\s+ - 1+ whitespaces, Group 8 with Group 1 pattern, 1+ whitespaces
(\S+) - Group 9: 1+ non-whitespaces
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+)\s+((?1)) - Group 10: 1+ non-whitespaces, then 1+ whitespaces and Group 11 with Group 1 pattern.
I've got some strings like so
2020-03-05 11:23:25: zone 10 type Interior name 'Study PIR'
2020-03-05 11:57:15: zone 13 type Entry/Exit 1 name 'Front Door'
I've got the below regex that works for the first string, however I'm not sure how to get the product group to match the full group "Entry/Exit 1" The number can range from 1 - 100
(?<Date>[0-9]{4}-[0-2][1-9]-[0-2][1-9]) (?<Time>2[0-3]|[01][0-9]:[0-5][0-9]:[0-5][0-9]): (?<msgType>\w+) (?<id>[0-9]+) (?<type>\w+) (?<product>\w+) \w+ (?<deviceName>'([^']*)')
Any ideas how I can modify this to match?
Your product group pattern should be
(?<product>\w+(?:\/\w+\s+\d+)?)
See the regex demo
Details
\w+ - 1+ word chars
(?:\/\w+\s+\d+)? - an optional sequence of
\/ - a / char
\w+ - 1+ word chars
\s+ - 1+ whitespaces
\d+ - 1+ digits.
If the format is unknown, or does not fit the above description, just use (?<product>.*?), see demo.
I'm trying to build a regex where it accepts domain names with the following conditions:
Allows DNS names (only hyphens, periods and alphanumeric characters allowed) upto 255 characters.
Hyphens can only appear in between letters
Should start with a letter and end with a letter. It will have minimum 3 characters (letters and periods mandatory, hyphen is optional.)
The length of the label before a period should be 63
Possible Cases:
a.b.c
a-a.b
Cases that should not pass
a-.b
qwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwerhhg.v
aaaa
aaa-a
What I have built looks like this:
^(([a-zA-z0-9][A-Z0-9a-z-]{1,61}[a-zA-Z0-9][.])+[a-zA-Z0-9]+)$
But this does not accept a.b.c
You may use
^(?=.{1,255}$)(?=[^.]{1,63}(?![^.]))[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*(?:[.](?=[^.]{1,63}(?![^.]))[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*)+(?:[.][a-zA-Z0-9-]*[a-zA-Z0-9])?$
See the regex demo here.
Pattern details
^ - start of string
(?=.{1,255}$) - the whole string should have 1 to 255 chars
(?=[^.]{1,63}(?![^.])) - there must be 1 to 63 chars other than . before the char other than . or end of string
[a-zA-Z0-9]+ - 1 or more alphanumeric chars
(?: - start of a non-capturing group:
- - a hyphen
[a-zA-Z0-9]+ - 1+ alphanumeric chars
)* - zero or more repetitions
(?: - start of a non-capturing group...
[.] - a dot
(?=[^.]{1,63}(?![^.])) - there must be 1 to 63 chars other than . before the char other than . or end of string
[a-zA-Z0-9]+ - 1+ alphanumeric chars
(?:-[a-zA-Z0-9]+)* - 0 or more repetitions of a - followed with 1+ alphanumeric chars
)+ -... 1 or more times
(?: - start of a non-capturing group...
[.] - a dot
[a-zA-Z0-9-]* - 1+ alphanumeric or - chars
[a-zA-Z0-9] - an alphanumeric char (no hyphens at the end)
)? -... 1 or 0 times (it is optional)
$ - end of string.
You can use the following regex:
/^(?=[A-Z])((?:[A-Z\d]|(?<=[A-Z])-(?=[A-Z])){1,63})(?<=[A-Z])(?:\.[A-Z\d]+){1,2}$/im
Details:
^ - Start of the string.
(?=[A-Z]) - Positive lookahead: The whole string must start with a letter.
( - A capturing group - the domain name.
(?: - Start of a non-capturing group, needed due to the following quantifier.
[A-Z\d] - The first alternative: Either a letter or a digit.
| - Or.
(?<=[A-Z])-(?=[A-Z]) - The second alternative: A hyphen, preceded with a letter
and followed with a letter.
) - End of the non-capturing group.
{1,63} - This group (either alternative) must occur up to 63 times.
) - End of the capturing group.
(?<=[A-Z]) - Positive lookbehid: The capturing group just matched (domain name)
must end with a letter.
(?: - A non-capturing group, also needed due to the following quantifier.
\.[A-Z\d]+ - A dot and a sequence of letters or digits.
) - End of the non-capturing group.
{1,2} - This group must occur 1 or 2 times.
$ - End of the string.
You should definitely use i (case insensitive) option and if you check
a number of strings, each in a separate row, also m (multiline) option.
I didn't include any test for the whole length, but you didn't include it either.
I think, the main task here was to show how to match the case your regex failed.