regex for this "001/Cnt.A/2021/EX.Dng" pattern - regex

I have some string like this below:
0015/Cnt.A/2021/EX. Mmj tech
021/Cnt.B/2021/EX.Mm logs
31/ Cgt.A / 2020 / PK Jap
453/ Nnt.A / 2020 / WK Jap pom sc
13/Wnt.A/2021/ LO.Mm pom
1911/Cno.A/2021/PQ Mm ris dMn
and I want to select for output like this below:
0015/Cnt.A/2021/EX. Mmj
021/Cnt.B/2021/EX.Mm
31/ Cgt.A / 2020 / PK Jap
453/ Nnt.A / 2020 / WK Jap
13/Wnt.A/2021/ LO.Mm
1911/Cno.A/2021/PQ Mm
I have tried this pattern [0-9]{1,}\/[a-zA-Z.\s-]{1,}\/[0-9\s]{1,}\/[a-zA-Z\s]+[\.\s]+[a-zA-Z]{1,} but it can't handle the 4th and 6th string. Anyone, can fix that pattern? and maybe make it more efficient?
edited:
There is a rule like this pattern -> number/letter with dot or space/year/letter with dot or space

The pattern to get all text up to the last slash and then only two words separated with a whitespace or . is
.*\/\s*[a-zA-Z]+[\s.]+[a-zA-Z]+
.*\/\s*\w+[\s.]+\w+
If you need to keep the initial regex part for stricter validation, use
[0-9]+\/[a-zA-Z.\s-]+\/[0-9\s]+\/\s*\w+[\s.]+\w+
See this demo (or this demo). Details:
.*\/ - any zero or more chars other than line break chars, as many as possible
\s* - zero or more whitespaces
[a-zA-Z]+ - one or more ASCII letters
[\s.]+ - one or more whitespaces/dots
[a-zA-Z]+ - one or more ASCII letters.
\w+ would match one or more letters, digits, or underscores.
Now, accommodating for the number/letter with dot or space/year/letter with dot or space rule:
\d+\/\s*[a-zA-Z]+(?:\.[a-zA-Z]+)*\s*\/\s*[0-9]{4}\s*\/\s*\w+[\s.]+\w+
See this regex demo. Details:
\d+ - one or more digits
\/ - a / char
\s* - zero or more whitespaces
[a-zA-Z]+(?:\.[a-zA-Z]+)*
\s*\/\s* - 0+ whitespaces, /, 0+ whitespaces
\d{4} - four digits
\s*\/\s* - 0+ whitespaces, /, 0+ whitespaces
\w+[\s.]+\w+ - one or more word chars, 1+ whitespaces/dots, 1+ word chars.

Related

Regex exclude whitespaces from a group to select only a number

I need to take only a number (a float number) from a text, but I can't remove the whitespaces...
** Update
I have a problem with this method, I only need to consider numbers and ',' between '- EUR' and 'Fee' as rule.
You can use
- EUR\W*(.*?)\W*Fee
See the regex demo.
Variations of the regex that might work in different regex engines:
- EUR\W*\K.*?(?=\W*Fee)
(?<=- EUR\W*).*?(?=\W*Fee)
Details:
- EUR - literal text
\W* - zero or more non-word chars
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W*- zero or more non-word chars
Fee - a string.
You could also match the number format in capture group 1
- EUR\b\D*(\d+(?:,\d+)?)\s+Fee\b
- EUR\b Match - EUR and a word boundary
\D* Match 0+ times any char except a digit
( Capture group 1
\d+(?:,\d+)? Match 1+ digits with an optional decimal part
) Close group 1
\s+Fee\b Match 1+ whitespace chars, Fee and a word boundary
Regex demo
this is working i removed the , from (.) in test string.
Regex example - working

Regex exclude trailing text from company names

CURRENTLY
I am try to match valid company names from strings with 4 conditions:
the name can ONLY contain alphanumeric characters + spaces + hyphens
the name can contain a hyphen (inside the name)
there are company suffixes that should be excluded from the company name i.e. Pty Ltd, Pty. Ltd., Limited, and Ltd.
If there are additional matches on the same line, these are to be excluded
What I am trying to achieve:
My regex so far:
(?:\s|^)([a-zA-Z0-9]+[a-zA-Z0-9\s-]*?[a-zA-Z0-9]+)(?: Pty Ltd| Ltd(\.){0,1}| Limited){0,1}(?:\s|$)
ISSUES
https://regex101.com/r/Gpbdln/4
It seems I am struggling with:
Excluding the suffixes to be ignored
Making the capture include spaces for the company name (while at the same time excluded suffixes)
I have been stuck on this for over an hour and would appreciate some help.
You may use
^[a-zA-Z0-9]+(?:[\s-]+[a-zA-Z0-9]+)*?(?=(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)?$)
See the regex demo
If you only need to get matches that do not span across lines, replace \s with \h or [\p{Zs}\t] if supported, or [^\S\r\n], to only match horizontal whitespaces.
Details
^ - start of string
[a-zA-Z0-9]+ - 1+ ASCII alphanumeric chars
(?:[\s-]+[a-zA-Z0-9]+)*? - 0 or more (but as few as possible) occurrences of
[\s-]+ - 1+ whitespaces or hyphens
[a-zA-Z0-9]+ - 1+ ASCII alphanumeric chars
(?=(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)?$) - immediately to the right, there must be
(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)? - an optional occurrence of a sequence of patterns:
\s+ - 1+ whitespaces
(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]) - any of
(?:Pty\.?\s+)?Ltd\.?| - an optional sequence of Pty, an optional dot and then 1+ whitespaces and then Ltd string and an optional . char, or
Limited| - Limited string, or
[a-zA-Z0-9]*[^a-zA-Z0-9\s] - any 0 or more ASCII alphanumeric chars followed with a char other than whitespace and alphanumeric char
.* - the rest of the string
$ - end of string.

Regex to match variable length, spaces and special chars?

I've got some strings like so
2020-03-05 11:23:25: zone 10 type Interior name 'Study PIR'
2020-03-05 11:57:15: zone 13 type Entry/Exit 1 name 'Front Door'
I've got the below regex that works for the first string, however I'm not sure how to get the product group to match the full group "Entry/Exit 1" The number can range from 1 - 100
(?<Date>[0-9]{4}-[0-2][1-9]-[0-2][1-9]) (?<Time>2[0-3]|[01][0-9]:[0-5][0-9]:[0-5][0-9]): (?<msgType>\w+) (?<id>[0-9]+) (?<type>\w+) (?<product>\w+) \w+ (?<deviceName>'([^']*)')
Any ideas how I can modify this to match?
Your product group pattern should be
(?<product>\w+(?:\/\w+\s+\d+)?)
See the regex demo
Details
\w+ - 1+ word chars
(?:\/\w+\s+\d+)? - an optional sequence of
\/ - a / char
\w+ - 1+ word chars
\s+ - 1+ whitespaces
\d+ - 1+ digits.
If the format is unknown, or does not fit the above description, just use (?<product>.*?), see demo.

Regex for a sentence syntax

I am trying to separate a String into different parts that match a specif syntax.
The String I am using as example is Username 5/5, Version: 1.0 This is a custom message Sep 25, 2018.
Currently I have this Regex (\w+) ([0-9]\/[0-9]), (\w+): ([0-9][.][0-9][.]?[0-9]?) which gives me The username, the 5/5, the word version and the version 1.0.
First, how can I ignore the (\w+)? Since it'll always be version and I only need the number after.
Second question, is it possible to get the big message after the version, then get the date after it?
Output needed:
Username
5/5
1.0
This is a custom message
Sep 25, 2018
You may use
/^(\w+)\s+(\d+\/\d+),\s+\w+:\s*(\d+(?:\.\d+){1,2})\s*(.*?)\s*([a-zA-Z]+\s*\d{1,2},\s*\d{4})$/
See the regex demo
Details
^ - start of string
(\w+) - Group 1 (username): one or more letters, digits or _
\s+ - 1+ whitespaces
(\d+\/\d+) - Group 2 (5/5)
,\s+ - a comma and 1+ whitespaces
\w+: - 1+ word chars followed with :
\s* - 0+ whitespaces
(\d+(?:\.\d+){1,2}) - Group 3 (version number):
\d+ - 1+ digits
(?:\.\d+){1,2} - 1 or 2 sequences of a . followed with 1+ digits
\s* - 0+ whitespaces
(.*?) - Group 4 (message): any 0+ chars, as few as possible
\s* - 0+ whitespaces
([a-zA-Z]+\s*\d{1,2},\s*\d{4}) - Group 4 (date):
[a-zA-Z]+ - 1+ ASCII letters
\s* - 0+ whitespaces
\d{1,2} - 1 to 2 digits
,\s* - a comma and 0+ whitespaces
\d{4} - 4 digits
$ - end of string.
Try (.*)\s(\d\/\d),\s*Version:\s*(\d+\.\d+)\s*(.+?)\s*(\w{3} \d{1,2}, \d{4})
Capture the groups 1,2,3,4,5 to get the output you needed.
Regex

Regex Length issue

I'm trying to build a regex where it accepts domain names with the following conditions:
Allows DNS names (only hyphens, periods and alphanumeric characters allowed) upto 255 characters.
Hyphens can only appear in between letters
Should start with a letter and end with a letter. It will have minimum 3 characters (letters and periods mandatory, hyphen is optional.)
The length of the label before a period should be 63
Possible Cases:
a.b.c
a-a.b
Cases that should not pass
a-.b
qwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwerhhg.v
aaaa
aaa-a
What I have built looks like this:
^(([a-zA-z0-9][A-Z0-9a-z-]{1,61}[a-zA-Z0-9][.])+[a-zA-Z0-9]+)$
But this does not accept a.b.c
You may use
^(?=.{1,255}$)(?=[^.]{1,63}(?![^.]))[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*(?:[.](?=[^.]{1,63}(?![^.]))[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*)+(?:[.][a-zA-Z0-9-]*[a-zA-Z0-9])?$
See the regex demo here.
Pattern details
^ - start of string
(?=.{1,255}$) - the whole string should have 1 to 255 chars
(?=[^.]{1,63}(?![^.])) - there must be 1 to 63 chars other than . before the char other than . or end of string
[a-zA-Z0-9]+ - 1 or more alphanumeric chars
(?: - start of a non-capturing group:
- - a hyphen
[a-zA-Z0-9]+ - 1+ alphanumeric chars
)* - zero or more repetitions
(?: - start of a non-capturing group...
[.] - a dot
(?=[^.]{1,63}(?![^.])) - there must be 1 to 63 chars other than . before the char other than . or end of string
[a-zA-Z0-9]+ - 1+ alphanumeric chars
(?:-[a-zA-Z0-9]+)* - 0 or more repetitions of a - followed with 1+ alphanumeric chars
)+ -... 1 or more times
(?: - start of a non-capturing group...
[.] - a dot
[a-zA-Z0-9-]* - 1+ alphanumeric or - chars
[a-zA-Z0-9] - an alphanumeric char (no hyphens at the end)
)? -... 1 or 0 times (it is optional)
$ - end of string.
You can use the following regex:
/^(?=[A-Z])((?:[A-Z\d]|(?<=[A-Z])-(?=[A-Z])){1,63})(?<=[A-Z])(?:\.[A-Z\d]+){1,2}$/im
Details:
^ - Start of the string.
(?=[A-Z]) - Positive lookahead: The whole string must start with a letter.
( - A capturing group - the domain name.
(?: - Start of a non-capturing group, needed due to the following quantifier.
[A-Z\d] - The first alternative: Either a letter or a digit.
| - Or.
(?<=[A-Z])-(?=[A-Z]) - The second alternative: A hyphen, preceded with a letter
and followed with a letter.
) - End of the non-capturing group.
{1,63} - This group (either alternative) must occur up to 63 times.
) - End of the capturing group.
(?<=[A-Z]) - Positive lookbehid: The capturing group just matched (domain name)
must end with a letter.
(?: - A non-capturing group, also needed due to the following quantifier.
\.[A-Z\d]+ - A dot and a sequence of letters or digits.
) - End of the non-capturing group.
{1,2} - This group must occur 1 or 2 times.
$ - End of the string.
You should definitely use i (case insensitive) option and if you check
a number of strings, each in a separate row, also m (multiline) option.
I didn't include any test for the whole length, but you didn't include it either.
I think, the main task here was to show how to match the case your regex failed.