Regex to find file version C# - regex

Below are some examples of the file name without extension, from which I want to extract version and type of the file.
1] 2.13.1801.221 Expected output-[Version: 2.13.1801.221 and Type: Null]
2] 2.17.1801.221.SQLServer
Expected output-[Version: 2.17.1801.221 and Type: SQLServer]
3] 2.19.1801.SQLite
Expected output-[Version: 2.19.1801 and Type: SQLite]
I am using below regex to extract version and type from file name
^(?<version>(\d+\.\d+)+)\.(?<type>\w*)$
But this doesn't work.
Tested with regex online which shows result as:[https://i.stack.imgur.com/c9FlW.png]
Match groups formed as: [https://i.stack.imgur.com/V0azi.png
]
What am I missing here ?
please suggest some good regex.
Thanks in advance!

Your regex is a little incorrect which is why it is not working. The correct regex you should use is following,
^(?<version>\d+(?:\.\d+)+)(?:\.(?<type>[a-zA-Z]+))?$
Demo
Here is the explanation of problems in your ^(?<version>(\d+\.\d+)+)\.(?<type>\w*)$ regex,
This (\d+\.\d+)+ in your regex will not correctly capture version as this will expect data of type one or more digits followed by literal dot again followed by one or more digits and whole it it one or more times. The corrected version of this part will be this \d+(?:\.\d+)+ which can capture strings like 1.1 or 1.2.33.11 etc.
Second problem in your regex part is this \.(?<type>\w*) where this will match a literal dot and then zero or more word character which will even match last digit part in case there is actually no version data due to which it will match 221 in string 2.13.1801.221 which is not what you want. In fact since your version can be absent in the string, you need to use ? operator to specify the whole group as being optional and use [a-zA-Z] for capturing version data and your corrected regex part should be this (?:\.(?<type>[a-zA-Z]+))?. In case your version data can contain numbers, then you can enhance your second by making changing [a-zA-Z]+ to [a-zA-Z][a-zA-Z\d]* where it means your version string should start with alphabet and numbers can be present later.
Also, I have made some groups in your regex as non-capture groups by placing ?: just before ( as you don't need to capture them separately.

You are always assuming that there would be . after the version numbers. However, if there is no type specified after the version, the extra . would not exist. So instead, you could use the following:
^(?<version>[\d+\.]+\d)\.*(?<type>\w*)$
Demo
^ matches the beginning of the line
The version capture group is defined by (?<version>[\d+\.]+\d)
[\d+\.]+ matches 1+ number of digit following by . for 1+ number of times
\d matches the last digit
\.* matches whether there is any type specified after the version numbers
The type capture group is defined by (?<type>\w*)
\w* matches any number of word characters
$ matches the end of the line

Related

My regex appears to be complete, yet it's missing matches

I'm currently working on a piece of regex that mostly works, however there's a few matches that aren't capturing, despite working when they're the only match. I'm hoping someone can point out what is clearly an obvious error, but one that I'm missing.
Specifically, the string kMad matches to [Kk\D]+ by itself, but not when it's part of the bigger string.
For reference:
Full Regex showing missing matches
Specific Regex showing matches
Expected matches by line
Non-matching occurrence of kMad10:31-18:5771 does not include 4 digits at the end since two digits after the colon is already captured by (\d{2}:\d{2}\-\d{2}:\d{2}). You can define a range for the digit occurrences for the regex section after it like \d{2,4} instead of \d{4}
The new regex will be:
(?:\d{4}\-\d{2}\-\d{2}+\.)?(?:Line\d{1,3})?(OFF|ADO|([Kk\D]+)?(\d{2}:\d{2}\-\d{2}:\d{2})(\d{2,4}))
Regex101 Demo

Regex needed for optional last digit

I am trying to figure out a regex for a version parser.
I need to parse a version containing a major.minor.patch.build version string with 3 to 4 digits with the last (4th) digit optional.
For example the version could be:
1.2.3.4
or
1.2.3
I have my regex as the following, but it fails for 1.2.3 version string:
regex = "(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)?"
Also, do I need the double back slashes ?
The following should do what you want:
(\d+)\.(\d+)\.(\d+)(\.(\d+))?
\d matches any single number <=> [0-9]
\. to match the . character (a single . in a regex matches any single character)
You can prepend '^' and append '$' to the regex to ensure there's no garbage before or after your version.
In your regex you have to make the last part \.(\d+) including the dot optional or else it would match 1.2.3.4 but also 1.2.3.
Try it like this with an optional last group where the dot and the digits are optional:
^\d+\.\d+\.\d+(?:\.\d+)?$
Or with capturing groups and the last is a non capturing group with a dot and a capturing group for the last digits:
^(\d+)\.(\d+)\.(\d+)(?:\.(\d+))?$
Instead of using anchors ^ and $ you could use a word boundary \b
There is no programming language specified concerning the double back slashes but what might help is when you open the regex101 demo link , there is a link under tools -> code generator where you can select a programming language. Perhaps that could be helpful.

Workaround for the lack of lookbehind?

To answer another user's question I knocked together the below regular expression to match numbers within a string.
\b[+-]?[0-9]+(\.[0-9]+)?\b
After providing my answer I noticed that I was getting unwanted matches in cases where there was a sequence of digits with more than one period among them due to \b matching the period character. For example "2.3.4" would return matches "2.3" and "4".
A negative lookahead and lookbehind could help me here, giving me a regex like this:
\b(?<!\.)[+-]?[0-9]+(\.[0-9]+)?\b(?!\.)
...except that for some unknown reason VBScript Regex (and by extension VBA) doesn't support lookbehind.
Is there some workaround that allows me to affirm that the word boundary at the start of the match is not a period without including it in the match?
Perhaps you don't need a look behind. If you are able to extract specific capture groups instead of the entire match then you can use:
(?:[^.]|^)\b([+-]?([0-9]+(\.[0-9]+)))\b(?!\.)
Will match:
2.5
54.5
+3.45
-0.5
Won't match:
1.2.3
3.6.
.3.5
Capture group 1 will output the whole number and sign
Capture group 2 will output the whole number
Capture group 3 will output the fraction (like capture group 1 in your original expression)

Regex matching Cisco interface

I am trying to match Cisco's interface names and split it up. The regex i have so far is:
(\D+)(\d+)(?:\/)?(\d+)?(?:\.)?(\d+)?
This matches:
FastEthernet9
FastEthernet9/5
FastEthernet9/5.10
The problem i have is that it also matches:
FastEthernet9.10
Any ideas on how to make it so it does not match? Bonus points if it can match:
tengigabitethernet0/0/0.20
Edit:
Okay. I am trying to split this string up into groups for use in python. In the cisco world the first part of the string FastEthernet is the type of interface, the first zero is the slot in the equipment the zero efter the slash is the port number and the one after the dot is a sub-interface.
Because of how regex works i can't get dynamic groups like (?:\/?\d+)+ to match all numbers in /0/0/0 by them selves, but i only get the last match.
My current regex (\D+)(\d+)(?:((?:\/?\d+)+)?(?:(?:\.)?(\d+))?) builds on murgatroid99's but groups all /0/0/0 together, for splitting in python.
My current result in python with this regex is [('tengigabitethernet', '0', '/0/0', '10')]. This seems to be how close i can get.
The regular expression for matching these names (Removing unnecessary capturing groups for clarity) is:
\D+\d+((/\d+)+(\.\d+)?)?
To break it up, \D+ matches the part of the string before the first number (such as FastEthernet and \d+ matches the first number (such as 10). Then the rest of the pattern is optional. /\d+ matches a forward slash followed by a number, so (/\d+)+ matches any number of repetitions of that (such as /0/0). Finally, (\.\d+)? optionally matches the period followed by a number at the end.
The important difference that makes this pattern match your specification is that in the final optional group, we get at least one (/\d+) before the (\.\d).

Regex to match all permutations of {1,2,3,4} without repetition

I am implementing the following problem in ruby.
Here's the pattern that I want :
1234, 1324, 1432, 1423, 2341 and so on
i.e. the digits in the four digit number should be between [1-4] and should also be non-repetitive.
to make you understand in a simple manner I take a two digit pattern
and the solution should be :
12, 21
i.e. the digits should be either 1 or 2 and should be non-repetitive.
To make sure that they are non-repetitive I want to use $1 for the condition for my second digit but its not working.
Please help me out and thanks in advance.
You can use this (see on rubular.com):
^(?=[1-4]{4}$)(?!.*(.).*\1).*$
The first assertion ensures that it's ^[1-4]{4}$, the second assertion is a negative lookahead that ensures that you can't match .*(.).*\1, i.e. a repeated character. The first assertion is "cheaper", so you want to do that first.
References
regular-expressions.info/Lookarounds and Backreferences
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
Just for a giggle, here's another option:
^(?:1()|2()|3()|4()){4}\1\2\3\4$
As each unique character is consumed, the capturing group following it captures an empty string. The backreferences also try to match empty strings, so if one of them doesn't succeed, it can only mean the associated group didn't participate in the match. And that will only happen if string contains at least one duplicate.
This behavior of empty capturing groups and backreferences is not officially supported in any regex flavor, so caveat emptor. But it works in most of them, including Ruby.
I think this solution is a bit simpler
^(?:([1-4])(?!.*\1)){4}$
See it here on Rubular
^ # matches the start of the string
(?: # open a non capturing group
([1-4]) # The characters that are allowed the found char is captured in group 1
(?!.*\1) # That character is matched only if it does not occur once more
){4} # Defines the amount of characters
$
(?!.*\1) is a lookahead assertion, to ensure the character is not repeated.
^ and $ are anchors to match the start and the end of the string.
While the previous answers solve the problem, they aren't as generic as they could be, and don't allow for repetitions in the initial string. For example, {a,a,b,b,c,c}. After asking a similar question on Perl Monks, the following solution was given by Eily:
^(?:(?!\1)a()|(?!\2)a()|(?!\3)b()|(?!\4)b()|(?!\5)c()|(?!\6)c()){6}$
Similarly, this works for longer "symbols" in a string, and for variable length symbols too.