Task for matching floating point numbers - regex

Task:
MATCH:
3.45
5,4
.45
3e4
,54
4
4.
4,
DON'T MATCH:
4,5e
2e
.3.
2e,4
,4.
d34
2.45t
2,45.
Currently i came up with the following:
(?<=\s|^)[-+]?(?:(?:[.,]?\d+[.,]?\d*[eE]\d+(?!\w|[.,]))|[.,]?\d+[.,]?\d*(?!\w|[.,]))\b
That works for almost everything, except 2 last numbers (4. and 4,) and got stucked

You may use
(?<!\S)[-+]?[0-9]*(?:[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?|(?<=\d)[,.])(?!\S)
See the regex demo
Details
(?<!\S) - start of string or a whitespace must appear immediately to the left
[-+]? - an optional + or -
[0-9]* - 0+ digits
(?:[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?|[,.]) - either
[.,]?[0-9]+(?:[eE][-+]?[0-9]+)? - an optional . or ,, then 1+ digits, then an optional sequence of e or E, followed with an optional . or , and 1+ digits
| - or
(?<=\d)[,.] - a dot or comma only if preceded with a digit (to avoid matching standalone . or ,)
(?!\S) - end of string or a whitespace must appear immediately to the right.
Regex graph:

You could use an alternation to match 1+ digits followed by a dot or comma and 0+ digits or match the Ee part followed by 1+ digits.
Or match starting with a dot or comma followed by 1+ digits.
If this is the only thing to match on the line, you could use anchors ^ and $ or use lookarounds to assert that there are no non whitespace chars on the left and right.
(?<!\S)(?:\d+(?:[.,]\d*|[eE]\d+)?|[.,]\d+)(?!\S)
Pattern parts
(?<!\S) Assert what is directly to the left is non a non whitespace char
(?: Non capturing group
\d+ Match 1+ digits
(?: Non capturing group
[.,]\d* Match either . or , and 0+ digits
| Or
[eE]\d+ Match e or E and 1+ digits
)? Close group and make it optional
| Or
[.,]\d+ Match . or , and 1+ digits
) Close group
(?!\S) Assert what is directly to the right is non a non whitespace char
Regex demo

Related

Regex get certain information

I've stumbled with certain types of rows.
I parse this information
195/75 R 16 C X Wonder Van 110/108R 10PR Tourador
The groups, which I need
I've got the following regex
([0-9]+)?\/([0-9]+)\s*\w\s*([0-9]+(?:\.\d+)?)\s*(C\s+)?(.+\s+?(?=[0-9]{2,3}|(\d{2,3}\/\d{2,3})))(?:(\d{2,3}\/\d{2,3})|(\d{2,3}))\s*(\w)(.*)
It works nicely for all kinds od rows, e.g
225/55 R18 X Speed TU1 98V Toradfor
225/50 R 16 X Wonder TH1 96W XL Tourador
195/75 R 16 C X Wonder Van 110/108R 8PR Tourador
However, it doesn't work for
195/75 R 16 C X Wonder Van 110/108R 10PR Tourador
because of 10PR, where 10 consists of 2 digits
how it works now
Thank you!
In you pattern you use alternations | that can match and capture unrelated parts in the strings.
What you could do is use anchors and and an optional capture group
For all the given example strings you might use:
^(\d+)\/(\d+)\s+[A-Z]*\s*(\d+)\s*([A-Z])(.*?)(\d+\/\d+([A-Z]+))?\s+(\d+[A-Z]+\s+.*)$
The pattern in parts:
^ Start of string
(\d+)\/(\d+)\s+ Capture 2 times 1+ digits in a group
[A-Z]*\s* Match optional chars A-Z and optional whitspace chars
(\d+)\s* Capture 1+ digits in a group and match optional whitespace chars
([A-Z]) Capture a single char A-Z in a group
(.*?) Capture as few as possible chars in a group
( Capture group
\d+\/\d+ Match 1+ digits / and 1+ digits
([A-Z]+) Capture 1+ chars A-Z
)? Close the capture group and make it optional
\s+ Match 1+ whitespace chars
(\d+[A-Z]+\s+.*) Capture group, match 1+ digits, 1+ chars A-Z, 1+ whitespce chars and the rest of the line
$ End of string
Regex demo

Regular Expression, optional character on multiple locations but result must contain at least once

I am performing a string search where I am looking for the following three strings:
XXX-99-X
XXX-99X
XXX99-X
So far I have:
([A-Z]{3}(-?)[0-9]{2}(-?)[A-Z]{1})
How do I enforce that - has to be present at least once in either of the two possible locations?
You might use an alternation, to match either a - and optional - at the left or - at the right part.
Note that you can omit {1} from the pattern.
^[A-Z]{3}(?:-[0-9]{2}-?|[0-9]{2}-)[A-Z]$
^[A-Z]{3}
(?: Non capture group
-[0-9]{2}-?|[0-9]{2}- Match either - 2 digits and optional - Or 2 digits and -
) Close non capture group
$ end of string
regex demo
Or use a positive lookahead to assert a - at the right
^(?=[^-\r\n]*-)[A-Z]{3}-?[0-9]{2}-?[A-Z]$
^ Start of string
(?=[^-\r\n]*-) Positive lookahead, assert a - at the right
[A-Z]{3}-? Match 3 chars A-Z and optional -
[0-9]{2}-? Match 2 digits and optional -
[A-Z] Match a single char A-Z
$ End of string
Regex demo
With your shown samples, please try following.
^[A-Z]{3}(?:-?\d{2}-|-\d{2})[A-Z]+$
online demo for above regex
Explanation: Adding detailed explanation for above.
^[A-Z]{3} ##Matching if value starts with 3 alphabets here.
(?: ##Starting a non capturing group here.
-?\d{2}- ##Matching -(optional) followed by 2 digits followed by -
|
-\d{2} ##Matching dash followed by 2 digits.
) ##Closing very first capturing group.
[A-Z]+$ ##Matching 1 or more occurrences of capital letters at the end of value.

Match parenthesis that doesn't contain digit + % only

I'm struggling with that one. I want to capture the content of parenthesis where there isn't only digit %. This means I would want to capture this (essiccato, ricco di flavonoidi) or (ricco di 23% pollo, in parte essiccato, in parte idrolizzato) but not this (23 %)or (23)or (23 %)
Here is an exemple : https://regex101.com/r/yW4aZ3/896
So far I'm there : \([^()][^()]*\)
You may use
r'\((?!\s*\d+(?:[.,]\d+)?\s*)[^()]+\)'
See the regex demo and the regex graph:
Details
\( - a ( char
(?!\s*\d+(?:[.,]\d+)?\s*) - a negative lookahead that matches a location not immediately followed with
\s* - 0+ whitespaces
\d+ - 1+ digits
(?:[.,]\d+)? - an optional occurrence of . or , and 1+ digits
\s* - 0+ whitespaces
[^()]+ - 1+ chars other than ( and )
\) - a ) char.
You might use a negative lookahead what follows after the opening parenthesis is not digits followed by an optional percentage sign:
\((?!\s*\d+\s*%?\s*\))[^)]+\)
Explanation
\( Match (
(?! Negative lookahead, assert what is on the right is not
\s*\d+\s*%?\s*\) match 1+ digits followed by an optional % till )
) Close lookahead
[^)]+\) Match 1+ times any char except ), then match )
Regex demo
Assuming that (...) are all balanced and there is no escaping of parentheses inside, you may use this regex with a character class and 2 negated character classes:
\([\d%]*[^%\d()][^()]*\)
Updated RegEx Demo
RegEx Details
\(: Match opening (
[\d%]*: Match 0 or more of any characters that is either a digit or %
[^%\d()]: Match a character that is not (, ), % and a digit
[^()]*: Match 0 or more of any characters that are not ( and not a )
\): Match closing )

Regex Length issue

I'm trying to build a regex where it accepts domain names with the following conditions:
Allows DNS names (only hyphens, periods and alphanumeric characters allowed) upto 255 characters.
Hyphens can only appear in between letters
Should start with a letter and end with a letter. It will have minimum 3 characters (letters and periods mandatory, hyphen is optional.)
The length of the label before a period should be 63
Possible Cases:
a.b.c
a-a.b
Cases that should not pass
a-.b
qwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwertqwerhhg.v
aaaa
aaa-a
What I have built looks like this:
^(([a-zA-z0-9][A-Z0-9a-z-]{1,61}[a-zA-Z0-9][.])+[a-zA-Z0-9]+)$
But this does not accept a.b.c
You may use
^(?=.{1,255}$)(?=[^.]{1,63}(?![^.]))[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*(?:[.](?=[^.]{1,63}(?![^.]))[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*)+(?:[.][a-zA-Z0-9-]*[a-zA-Z0-9])?$
See the regex demo here.
Pattern details
^ - start of string
(?=.{1,255}$) - the whole string should have 1 to 255 chars
(?=[^.]{1,63}(?![^.])) - there must be 1 to 63 chars other than . before the char other than . or end of string
[a-zA-Z0-9]+ - 1 or more alphanumeric chars
(?: - start of a non-capturing group:
- - a hyphen
[a-zA-Z0-9]+ - 1+ alphanumeric chars
)* - zero or more repetitions
(?: - start of a non-capturing group...
[.] - a dot
(?=[^.]{1,63}(?![^.])) - there must be 1 to 63 chars other than . before the char other than . or end of string
[a-zA-Z0-9]+ - 1+ alphanumeric chars
(?:-[a-zA-Z0-9]+)* - 0 or more repetitions of a - followed with 1+ alphanumeric chars
)+ -... 1 or more times
(?: - start of a non-capturing group...
[.] - a dot
[a-zA-Z0-9-]* - 1+ alphanumeric or - chars
[a-zA-Z0-9] - an alphanumeric char (no hyphens at the end)
)? -... 1 or 0 times (it is optional)
$ - end of string.
You can use the following regex:
/^(?=[A-Z])((?:[A-Z\d]|(?<=[A-Z])-(?=[A-Z])){1,63})(?<=[A-Z])(?:\.[A-Z\d]+){1,2}$/im
Details:
^ - Start of the string.
(?=[A-Z]) - Positive lookahead: The whole string must start with a letter.
( - A capturing group - the domain name.
(?: - Start of a non-capturing group, needed due to the following quantifier.
[A-Z\d] - The first alternative: Either a letter or a digit.
| - Or.
(?<=[A-Z])-(?=[A-Z]) - The second alternative: A hyphen, preceded with a letter
and followed with a letter.
) - End of the non-capturing group.
{1,63} - This group (either alternative) must occur up to 63 times.
) - End of the capturing group.
(?<=[A-Z]) - Positive lookbehid: The capturing group just matched (domain name)
must end with a letter.
(?: - A non-capturing group, also needed due to the following quantifier.
\.[A-Z\d]+ - A dot and a sequence of letters or digits.
) - End of the non-capturing group.
{1,2} - This group must occur 1 or 2 times.
$ - End of the string.
You should definitely use i (case insensitive) option and if you check
a number of strings, each in a separate row, also m (multiline) option.
I didn't include any test for the whole length, but you didn't include it either.
I think, the main task here was to show how to match the case your regex failed.

Regular Expression not grouping

Need help with this regex
ABC 130 zlis 02-03/12 N180 Grouping req
A B Csd 130 pain 02/12 I80 alias
(\w+\s{0,3})(\d+)
The regex does not seem to group as I need it to.
Desired Output, brackests are the groups im trying to detect.
(A B Csd) (130) (pain) (02/12) (I80) (alias)
Try this regex:
([a-z ]+?)\s+(\d+)\s+([a-z]+)\s+([\d-\/]+)\s+([\w ]+)
Click for Demo
Explanation:
([a-z ]+?) - match 1+ occurrences(as few as possible) of a letter or a space and capture it as Group1
\s+ - matches 1+ occurrences of a whitespace character
(\d+) - match 1+ occurrences of digits and capture as Group2
\s+ - matches 1+ occurrences of a whitespace character
([a-z]+) - match 1+ occurrences of a letter and Capture as Group 3
\s+ - matches 1+ occurrences of a whitespace character
([\d-\/]+) - match 1+ occurrences of a digit or - or / and capture it as Group4
\s+ - matches 1+ occurrences of a whitespace character
([\w ]+) - match 1+ occurrences of a word-character or a space and capture as Group5
Note that I have used the g, i, m flags for Global matches, Case-insensitive and Multiline respectively.