Regular expression to capture & fix a version number - regex

I'm trying to create a regular expression to extract version numbers. Since the source who provide those version strings is mostly unreliable I need to clean those values.
A version is a number or a group of numbers separated by only one dot. As soon as the chain is broken I stop capturing and I keep what was captured so far.
Test cases:
Foo 1.2.3.4.5 bar --> Should capture 1.2.3.4.5
Foo 111111.2..3.4.5 bar --> Should capture 111111.2
Foo 10.. bar --> Should capture 10
1.2.3 aaa --> Should capture 1.2.3
aaa 1.2.3 --> Should capture 1.2.3
1.23 --> Should capture 1.23
I found out some examples but none matches my edge cases (see 3rd case outlined above).
So far I have:
/(\d+(?:\.\d+)+)/i
But it does not cover all my cases... I use that with PHP (pcre).

I would go with the following :
\d+(?:\.\d+)*
This matches a number which can be followed by any amount of [ dot and number ].
The difference with your regex is the use of a * which enables capturing versions composed of a single number.
I've also removed the outer grouping parenthesis which likely serve no purpose.

Solution 1:
Regex demo
Regex: ^[^\d]+\s*\K(\d+(?:\.\d+)*)
1. ^ start of string.
2. [^\d]+\s* match all except digit then spaces
3. \K will reset the match.
4. (\d+(?:\.\d+)*) this will match digits and then existence of zero or more patterns of . and digits
Solution 2:
Regex demo
Regex: ^Foo\s*\K(\d+(?:\.\d+)*)
1. ^ start of string.
2. Foo\s* this will match Foo and spaces
3. \K will reset the match.
4. (\d+(?:\.\d+)*) this will match digits and then existence of zero or more patterns of . and digits

Related

Regular Expression to Validate Monaco Number Plates

I would like to have an expression to validate the plates of monaco.
They are written as follows:
A123
123A
1234
I started by doing:
^[a-zA-Z0-9]{1}?[0-9]{2}?[a-zA-Z0-9]{1}$
But the case A12A which is false is possible with that.
You can use
^(?!(?:\d*[a-zA-Z]){2})[a-zA-Z\d]{4}$
See the regex demo. Details:
^ - start of string
(?!(?:\d*[a-zA-Z]){2}) - a negative lookahead that fails the match if there are two occurrences of any zero or more digits followed with two ASCII letters immediately to the right of the current location
[a-zA-Z\d]{4} - four alphanumeric chars
$ - end of string.
You can write the pattern using 3 alternatives specifying all the allowed variations for the example data:
^(?:[a-zA-Z][0-9]{3}|[0-9]{3}[a-zA-Z]|[0-9]{4})$
See a regex demo.
Note that you can omit {1} and
To not match 2 chars A-Z you can write the alternation as:
^(?:[a-zA-Z]\d{3}|\d{3}[a-zA-Z\d]|\d[a-zA-Z\d][a-zA-Z\d]\d)$
See another regex demo.
So it needs 3 connected digits and 1 letter or digit.
Then you can use this pattern :
^(?=.?[0-9]{3})[A-Za-z0-9]{4}$
The lookahead (?=.?[0-9]{3}) asserts the 3 connected digits.
Test on Regex101 here

Refer to same branch of previous alternate group

I need to build a regex with capture groups that would result in the following:
12-34 # match: (1) (2) (3) (4)
1a-2b # match: (1) (a) (2) (b)
12-3b # nomatch
In a nutshell, if the first part has two digits, then the second part must also have two digits. And if it has a letter, then the second part must also have a letter.
In PCRE flavor, (\d)(\d|[abc])-(\d)(\d|[abc]) matches the third line, so it is too permissive.
Using named groups, (\d)(?<named>\d|[abc])-(\d)(?P=named) matches no line at all, for it requires the second characters to be exactly the same. It is too restrictive.
Is there a way I can require that my second alternate group (\d|[abc]) takes the same branch as the first (\d|[abc])?
Or do I need to fall back on the full (?:(\d)(\d)-(\d)(\d)|(\d)([abc])-(\d)([abc])) which duplicates parts of my regex?
In PCRE you may use this regex:
^(?:(?<num>\d{2})-(?&num)|(?<alnum>\d\pL)-(?&alnum))$
RegEx Demo 1
RegEx Details:
(?<num>\d{2}): named group num for matching 2 digits
(?<alnum>\d\pL): named group alnum for matching 1 digit followed by a letter
(?&num): Match same sub-pattern as in named group num
(?&alnum): Match same sub-pattern as in named group alnum
Another option is to use conditional sub-patterns in PCRE as:
^(?:(?<num>\d{2})|\d\pL)-(?(num)\d{2}|\d\pL)$
RegEx Demo 2

Using regex to determine straight (unordered hand)

A straight in poker is five cards in a row, for example 23456 or 89TJQ. With a "sorted" hand, the regex could be written as:
^(A2345|23456|34567|45678|56789|6789T|789TJ|89TJQ|9TJQK|TJQKA)$
It's a bit verbose but straightforward enough. However, would it be possible to generate a (sensible) regex if the hand was unordered? For example, if the hand was 52634 or JQ89T??
One possible way would be to use a ?=.*<item> lookahead (which would essentially be "unsorted"), for example:
^(?:
(?=.*A)(?=.*2)(?=.*3)(?=.*4)(?=.*5)
|(?=.*2)(?=.*3)(?=.*4)(?=.*5)(?=.*6)
|(?=.*3)(?=.*4)(?=.*5)(?=.*6)(?=.*7)
|(?=.*4)(?=.*5)(?=.*6)(?=.*7)(?=.*8)
|(?=.*5)(?=.*6)(?=.*7)(?=.*8)(?=.*9)
|(?=.*6)(?=.*7)(?=.*8)(?=.*9)(?=.*T)
|(?=.*7)(?=.*8)(?=.*9)(?=.*T)(?=.*J)
|(?=.*8)(?=.*9)(?=.*T)(?=.*J)(?=.*Q)
|(?=.*9)(?=.*T)(?=.*J)(?=.*Q)(?=.*K)
|(?=.*T)(?=.*J)(?=.*Q)(?=.*K)(?=.*A)
)
.{5}$
Are there other / better approaches to finding if a straight exists using regex only?
You can use the following regex:
See regex in use here
(?!.*(.).*\1)(?:[A2345]{5}|[23456]{5}|[34567]{5}|[45678]{5}|[56789]{5}|[6789T]{5}|[789TJ]{5}|[89TJQ]{5}|[9TJQK]{5}|[TJQKA]{5})
This works by first using a negative lookahead to ensure that the string doesn't contain any duplicates (?!.*(.).*\1). Then it matches 5 characters from any of the straight possibilities.
(?!.*(.).*\1)
#^^^ ^ negative lookahead ensuring what follows doesn't match
# ^^ match any character any number of times
# ^^^ capture a character into capture group #1
# ^^ match any character any number of times
# ^^ match the same text as most recently matched by the 1st capture group
Against JQQ89, it works as follows:
- .* matches J
- (.) captures Q
- .* matches nothing
- \1 tries to match Q (and succeeds)
- Negative lookahead has a match, so fail the match.

Regex to check only if the group is present

I have String which may have values like below.
854METHYLDOPA
041ALDOMET /00000101/
133IODETO DE SODIO [I 131]
In this i need to get the text starting from index 4 till we find any one these patterns /00000101/ or [I 131]
Expected Output:
METHYLDOPA
ALDOMET
IODETO DE SODIO
I have tried the below RegEx for the same
(?:^.{3})(.*)(?:[[/][A-Z0-9\s]+[]/\s+])
But this RegEx works if the string contains [/ but it doesn't work for the case1 where these patterns doesn't exist.
I have tried adding ? at the end but it works fore case 1 but doesn't work for case 2 and 3.
Could anyone please help me on getting the regx work?
Your logic is difficult to phrase. My interpretation is that you always want to capture from the 4th character onwards. What else gets captured depends on the remainder of the input. Should either /00000101/ or [I 131] occur, then you want to capture up until that point. Otherwise, you want to capture the entire string. Putting this all together yields this regex:
^.{3}(?:(.*)(?=/00000101/|\[I 131\])|(.*))
Demo
You may try this:
^.{3}(.*?)($|(?:\s*\/00000101\/)|(?:\s*\[I\s+131\])).*$
and replace by this to get the exact output you want.
\1
Regex Demo
Explanation:
^ --> start of a the string
.{3} --> followed by 3 characters
(.*?) --> followed by anything where ? means lazy it will fetch until it finds the following and won't go beyond that. It also captures it as
group 1 --> \1
($|(?:\s*\/00000101\/)|(?:\s*\[I\s+131\])) ---------->
$ --> ends with $ which means there is there is not such pattern that
you have mentioned
| or
(?:\s*\/00000101\/) -->another pattern of yours improvised with \s* to cover zero or more blank space.
| or
(?:\s*\[I\s+131\]) --> another pattern of yours with improvised \s+
which means 1 or more spaces. ?: indicates that we will not capture
it.
.*$ --> .* is just to match anything that follows and $
declares the end of string.
so we end up only capturing group 1 and nothing else which ensures to
replace everything by group1 which is your target output.
You could get the values you are looking for in group 1:
^.{3}(.+?)(?=$| ?\[I 131\]| ?\/00000101\/)
Explanation
From the beginning of the string ^
Match the first 3 characters .{3}
Match in a capturing group (where your values will be) any character one or more times non greedy (.+?)
A positive lookahead (?=
To assert what follow is either the end of the string $
or |
an optional space ? followed by [I 131] \[I 131\]
or |
an optional space ? followed by /00000101/ \/00000101\/
If your regex engine supports \K, you could try it like this and the values you are looking for are not in a group but the full match:
^.{3}\K.+?(?=$| ?\[I 131\]| ?\/00000101\/)

How to optionally match a group?

I have two possible patterns:
1.2 hello
1.2.3 hello
I would like to match 1, 2 and 3 if the latter exists.
Optional items seem to be the way to go, but my pattern (\d)\.(\d)?(\.(\d)).hello matches only 1.2.3 hello (almost perfectly: I get four groups but the first, second and fourth contain what I want) - the first test sting is not matched at all.
What would be the right match pattern?
Your pattern contains (\d)\.(\d)?(\.(\d)) part that matches a digit, then a ., then an optional digit (it may be 1 or 0) and then a . + a digit. Thus, it can match 1..2 hello, but not 1.2 hello.
You may make the third group non-capturing and make it optional:
(\d)\.(\d)(?:\.(\d))?\s*hello
^^^ ^^
See the regex demo
If your regex engine does not allow non-capturing groups, use a capturing one, just you will have to grab the value from Group 4:
(\d)\.(\d)(\.(\d))?\s*hello
See this regex.
Note that I replaced . before hello with \s* to match zero or more whitespaces.
Note also that if you need to match these numbers at the start of a line, you might consider pre-pending the pattern with ^ (and depending on your regex engine/tool, the m modifier).