Regex matching Cisco interface - regex

I am trying to match Cisco's interface names and split it up. The regex i have so far is:
(\D+)(\d+)(?:\/)?(\d+)?(?:\.)?(\d+)?
This matches:
FastEthernet9
FastEthernet9/5
FastEthernet9/5.10
The problem i have is that it also matches:
FastEthernet9.10
Any ideas on how to make it so it does not match? Bonus points if it can match:
tengigabitethernet0/0/0.20
Edit:
Okay. I am trying to split this string up into groups for use in python. In the cisco world the first part of the string FastEthernet is the type of interface, the first zero is the slot in the equipment the zero efter the slash is the port number and the one after the dot is a sub-interface.
Because of how regex works i can't get dynamic groups like (?:\/?\d+)+ to match all numbers in /0/0/0 by them selves, but i only get the last match.
My current regex (\D+)(\d+)(?:((?:\/?\d+)+)?(?:(?:\.)?(\d+))?) builds on murgatroid99's but groups all /0/0/0 together, for splitting in python.
My current result in python with this regex is [('tengigabitethernet', '0', '/0/0', '10')]. This seems to be how close i can get.

The regular expression for matching these names (Removing unnecessary capturing groups for clarity) is:
\D+\d+((/\d+)+(\.\d+)?)?
To break it up, \D+ matches the part of the string before the first number (such as FastEthernet and \d+ matches the first number (such as 10). Then the rest of the pattern is optional. /\d+ matches a forward slash followed by a number, so (/\d+)+ matches any number of repetitions of that (such as /0/0). Finally, (\.\d+)? optionally matches the period followed by a number at the end.
The important difference that makes this pattern match your specification is that in the final optional group, we get at least one (/\d+) before the (\.\d).

Related

Extra groups in regex

I'm building a regex to be able to parse addresses and am running into some blocks. An example address I'm testing against is:
5173B 63rd Ave NE, Lake Forest Park WA 98155
I am looking to capture the house number, street name(s), city, state, and zip code as individual groups. I am new to regex and am using regex101.com to build and test against, and ended up with:
(^\d+\w?)\s((\w*\s?)+).\s(\w*\s?)+([A-Z]{2})\s(\d{5})
It matches all the groups I need and matches the whole string, but there are extra groups that are null value according to the match information (3 and 4). I've looked but can't find what is causing this issue. Can anyone help me understand?
Your regex expression was almost good:
(^\d+\w?)\s([\w*\s?]+).\s([\w*\s?]+)\s([A-Z]{2})\s(\d{5})
What I changed are the second and third groups: in both you used a group inside a group ((\w*\s?)+), where a class inside a group (([\w*\s?]+)) made sure you match the same things and you get the proper group content.
With your previous syntax, the inner group would be able to match an empty substring, since both quantifiers allow for a zero-length match (* is 0 to unlimited matches and ? is zero or one match). Since this group was repeated one or more times with the +, the last occurrence would match an empty string and only keep that.
For this you'll need to use a non-capturing group, which is of the form (?:regex), where you currently see your "null results". This gives you the regex:
(^\d+\w?)\s((?:\w*\s?)+).\s(?:\w*\s?)+([A-Z]{2})\s(\d{5})
Here is a basic example of the difference between a capturing group and a non-capturing group: ([^s]+) (?:[^s]+):
See how the first group is captured into "Group 1" and the second one is not captured at all?
Matching an address can be difficult due to the different formats.
If you can rely on the comma to be there, you can capture the part before it using a negated character class:
^(\d+[A-Z]?)\s+([^,]+?)\s*,\s*(.+?)\s+([A-Z]{2})\s(\d{5})$
Regex demo
Or take the part before the comma that ends on 2 or more uppercase characters, and then match optional non word characters using \W* to get to the first word character after the comma:
^(\d+[A-Z]?)\s+(.*?\b[A-Z]{2,}\b)\W*(.+?)\s+([A-Z]{2})\s(\d{5})$
Regex demo

Regex check for name Initials

I am trying to create a regex that checks if one or more middle-name initials have the following stucture:
INITIAL.[BLANK]INITIAL.[BLANK]INITIAL.
There can be multiple Initials as long as they are followed by a dot (.) - blank spaces are only allowed between two initials (e.g. L. B.)
It should not be possible to have a space after an initial if there's no other initial following.
At the moment, I have the following Regex which doesn't work perfectly as of now:
([A-Z]\. (?=[A-Z]|$))+
Using regex101, this is an example:
As you can see, it still matches the string even though there's a blank space at the end, without having another Initial following.
I am not sure why this is happening. I am just learning regex and would be glad if anyone could provide me with a solution to my problem :)
The error you're seeing is because at the last step, your expression reads in [A-Z]\. looks ahead for $ (and finds it). I would express the pattern this way: (?:[A-Z]\. )*[A-Z]\.$. Treat the last initial specially because it does not have a final space.
The pattern you tried ([A-Z]\. (?=[A-Z]|$))+ uses a repeated capturing group which will give you the value of the last iteration.
In that repetition you match a space <code>[A-Z]\. </code> effectively meaning that it should be present in the match.
You could repeat 0+ times matching a char [A-Z] followed by a space to match multiple occurrences.
Then match a char [A-Z] asserting what is on the right is not a non whitespace char.
\b(?:[A-Z]\. )*[A-Z]\.(?!\S)
Regex demo
If there can be multiple spaces but it should not match a newline:
\b(?:[A-Z]\.[^\S\r\n]*)*[A-Z]\.(?!\S)
Regex demo

Regex to match everything except this regex

I think this is a simple thing for a lot of you, but I have a very limited knowlegde of regex at the moment. I want to match everything except a double digit number in a string.
For example:
TEST22KLO4567
QE45C2C
LOP10G7G400
Now I found out the regex to match the double digit numbers:
\d{2}
Which matches the following:
TEST22KLO4567
QE45C2C
LOP10G7G400
Now it seems to me that it would be fairly easy to turn that regex around to match everything BUT "\d{2}". I searched a lot but I can't seem to get it done. I hope someone here can help.
This only works if your regex engine supports look behinds:
^.+?(?=\d{2})|(?<=\d{2}).+$
Explanation:
The | separates two cases where this would match:
^.+?(?=\d{2})
This matches everything from the start of the string (^) until \d{2} is encountered.
(?<=\d{2}).+$
This matches the end of the string, from the place just after two digits.
If your regex engine doesn't support look behinds (JavaScript for example), I don't think it is possible using a pure regex solution.
You can match the first part:
^.+?(?=\d{2})
Then get where the match ends, add 2 to that number, and get the substring from that index.
You are right rejecting a search in regex is usually rather tricky.
In your case I think you want to have [^\d{2}], however, this is tricky as your other strings also contain two digits so your regex using it won't select them.
I would go with this regex (using PCRE 8.36 but should work also in others):
\*{2}\w*\*{2}
Explanation:
\*{2} .... matches "*" literally exactly two times
\w* .... matches "word character" zero or unlimited times
Found one regex pretty straightforward :
^(.*?[^\d])\d{2}([^\d].*?)$
Explanations :
^ : matches the beginnning of a line
(.*?[^\d]) : matches and catches the first part before the two numbers. It can contain anything (.*?) but needs to end with something different to a number ([^\d]) so we ensure that there is only 2 numbers in the middle
\d{2} : is the part you found yourself
([^\d].*?) : is the symetric of (.*?[^\d]) : begins with something different from a number ([^\d]) and matches anything next.
$ : up to the end of the line.
To test this reges you can use this link
It will match the first occurence of double digit, but because OP said there was only one it does the job correctly. I expect it to work with every regex engine as nothing too complex is used.

Detect multiple periods in Regex and kill entire match

I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$

Fetch one out of two Numbers out of String

I hav a list of strings, such as: Ø20X400
I need to extract the first of the numbers - between Ø and X
I've come so far to match the numbers in general with \d+ - as simple as it is...
But I need an expression to get the first value separated, not both of them...
You can use lookarounds (?<=..) and (?=..):
(?<=Ø)\d+(?=X)
or in Java style:
(?<=Ø)\\d+(?=X)
A second way is to use a capture group:
Ø(\d+)X
or
Ø(\\d+)X
Then you can extract the content of the group.
The regex engines I know parse \n as a newline. \d is used for numbers.
The following regex gives you the first number between a Ø and a X in a capture group:
^.*?Ø(\d+)X.*
Edit live on Debuggex
This Regex will do it for you, (\d+?)X, and here is a Rubular to prove it. See, you want to group digits together, but make it non-greedy, ending the evaluation on X.
Try this one:
\d+(?=\D)
Should find first number wich has some not a number ahead
With normal regular expressions, I would say:
Ø(\d+)X
This finds the Ø character, followed by one or more numbers, followed by an X. Also, the numbers will be stored in the first capture group. Capture groups differ from one regex implementation to another, but this would typically be denoted by \1. Capture group zero, \0, is usually the matched string itself. In this version, \d denotes digits 0-9, but if your regex engine uses \n for that purpose, use:
Ø(\n+)X