Regex to match everything except this regex - regex

I think this is a simple thing for a lot of you, but I have a very limited knowlegde of regex at the moment. I want to match everything except a double digit number in a string.
For example:
TEST22KLO4567
QE45C2C
LOP10G7G400
Now I found out the regex to match the double digit numbers:
\d{2}
Which matches the following:
TEST22KLO4567
QE45C2C
LOP10G7G400
Now it seems to me that it would be fairly easy to turn that regex around to match everything BUT "\d{2}". I searched a lot but I can't seem to get it done. I hope someone here can help.

This only works if your regex engine supports look behinds:
^.+?(?=\d{2})|(?<=\d{2}).+$
Explanation:
The | separates two cases where this would match:
^.+?(?=\d{2})
This matches everything from the start of the string (^) until \d{2} is encountered.
(?<=\d{2}).+$
This matches the end of the string, from the place just after two digits.
If your regex engine doesn't support look behinds (JavaScript for example), I don't think it is possible using a pure regex solution.
You can match the first part:
^.+?(?=\d{2})
Then get where the match ends, add 2 to that number, and get the substring from that index.

You are right rejecting a search in regex is usually rather tricky.
In your case I think you want to have [^\d{2}], however, this is tricky as your other strings also contain two digits so your regex using it won't select them.
I would go with this regex (using PCRE 8.36 but should work also in others):
\*{2}\w*\*{2}
Explanation:
\*{2} .... matches "*" literally exactly two times
\w* .... matches "word character" zero or unlimited times

Found one regex pretty straightforward :
^(.*?[^\d])\d{2}([^\d].*?)$
Explanations :
^ : matches the beginnning of a line
(.*?[^\d]) : matches and catches the first part before the two numbers. It can contain anything (.*?) but needs to end with something different to a number ([^\d]) so we ensure that there is only 2 numbers in the middle
\d{2} : is the part you found yourself
([^\d].*?) : is the symetric of (.*?[^\d]) : begins with something different from a number ([^\d]) and matches anything next.
$ : up to the end of the line.
To test this reges you can use this link
It will match the first occurence of double digit, but because OP said there was only one it does the job correctly. I expect it to work with every regex engine as nothing too complex is used.

Related

Detect multiple periods in Regex and kill entire match

I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$

Regex matching Cisco interface

I am trying to match Cisco's interface names and split it up. The regex i have so far is:
(\D+)(\d+)(?:\/)?(\d+)?(?:\.)?(\d+)?
This matches:
FastEthernet9
FastEthernet9/5
FastEthernet9/5.10
The problem i have is that it also matches:
FastEthernet9.10
Any ideas on how to make it so it does not match? Bonus points if it can match:
tengigabitethernet0/0/0.20
Edit:
Okay. I am trying to split this string up into groups for use in python. In the cisco world the first part of the string FastEthernet is the type of interface, the first zero is the slot in the equipment the zero efter the slash is the port number and the one after the dot is a sub-interface.
Because of how regex works i can't get dynamic groups like (?:\/?\d+)+ to match all numbers in /0/0/0 by them selves, but i only get the last match.
My current regex (\D+)(\d+)(?:((?:\/?\d+)+)?(?:(?:\.)?(\d+))?) builds on murgatroid99's but groups all /0/0/0 together, for splitting in python.
My current result in python with this regex is [('tengigabitethernet', '0', '/0/0', '10')]. This seems to be how close i can get.
The regular expression for matching these names (Removing unnecessary capturing groups for clarity) is:
\D+\d+((/\d+)+(\.\d+)?)?
To break it up, \D+ matches the part of the string before the first number (such as FastEthernet and \d+ matches the first number (such as 10). Then the rest of the pattern is optional. /\d+ matches a forward slash followed by a number, so (/\d+)+ matches any number of repetitions of that (such as /0/0). Finally, (\.\d+)? optionally matches the period followed by a number at the end.
The important difference that makes this pattern match your specification is that in the final optional group, we get at least one (/\d+) before the (\.\d).

Regular Expression for matching a single digital followed by a word exactly in Notepad++

:Statement
Say we have following three records, and we just want to match the first one only -- exactly one digital followed by a specific word, what is the regular expression can be used to make it(in NotePad ++)?
2Cups
11Cups
222Cups
The expressions I tried and their problems are:
Proposal 1:\d{1}Cups
it will find the "1Cups" and "2Cups" substrings in the second and third record respectively, which is what we do not want here
Proposal 2:[^0-9]+[0-9]Cups
same as the above
(PS: the records can be "XX 2Cups", "YY22Cups" and "XYZ 333Cups", i.e., no assumption on the position of the matchable parts)
Any suggestions?
:Reference
[1] The reg definition in NotePad++ (Same as SciTe)
As mentioned in Searching for a complex Regular Expression to use with Notepad++, it is: http://www.scintilla.org/SciTERegEx.html
[2] Matching exact number of digits
Here is an example: regular expression to match exactly 5 digits.
However, we do not want to find the match-able substring in longer records here.
If the string actually has the numbered sequence (1. 2Cups 2. 11Cups), you can use the white space that follows it:
\s\d{1}Cups
If there isn't the numbered list before, but the string will be at the beginning of the line, you can anchor it there:
^\d{1}Cups
Tested in Notepad++ v6.5.1 (Unicode).
It sounds like you want to match the digit only at the start of the string or if it has a space before it, so this would work:
(^|\b)\dCups
Debuggex Demo
Explanation:
(^|\b) Match the start of the string or beginning of a word (technically, word break)
\d Match a digit ({1} is redundant)
Cups Match Cups
This will work:
\b\dCups
If "Cups" must be a whole word (ie not matching 2Cupsizes:
\b\dCups\b
Note that \b matches even if at start or end of input.
I found one possible solution:
Using ^\d{1}Cups to match "Starting with one digital + Cups" cases, as suggested by Ken, Cottrell and Bohemian.
Using [^\d]\dCups to match other cases.
However, haven't found a solution using just one regex to solve the problem yet.
Have a try with:
(?:^|\D)\dCups
This will match xCups only if there aren't digit before.

Regex a decimal number with comma

I'm heaving trouble finding the right regex for decimal numbers which include the comma separator.
I did find a few other questions regarding this issue in general but none of the answers really worked when I tested them
The best I got so far is:
[0-9]{1,3}(,([0-9]{3}))*(.[0-9]+)?
2 main problems so far:
1) It records numbers with spaces between them "3001 1" instead of splitting them to 2 matches "3001" "1" - I don't really see where I allowed space in the regex.
2) I have a general problem with the beginning\ending of the regex.
The regex should match:
3,001
1
32,012,111.2131
But not:
32,012,11.2131
1132,012,111.2131
32,0112,111.2131
32131
In addition I'd like it to match:
1.(without any number after it)
1,(without any number after it)
as 1
(a comma or point at the end of the number should be overlooked).
Many Thanks!
.
This is a very long and convoluted regular expression that fits all your requirements. It will work if your regex engine is based on PCRE (hopefully you're using PHP, Delphi or R..).
(?<=[^\d,.]|^)\d{1,3}(,(\d{3}))*((?=[,.](\s|$))|(\.\d+)?(?=[^\d,.]|$))
DEMO on RegExr
The things that make it so long:
Matching multiple numbers on the same line separated by only 1 character (a space) whilst not allowing partial matchs requires a lookahead and a lookbehind.
Matching numbers ending with . and , without including the . or , in the match requires another lookahead.
(?=[,.](\s|$)) Explanation
When writing this explanation I realised the \s needs to be a (\s|$) to match 1, at the very end of a string.
This part of the regex is for matching the 1 in 1, or the 1,000 in 1,000. so let's say our number is 1,000. (with the . on the end).
Up to this point the regex has matched 1,000, then it can't find another , to repeat the thousands group so it moves on to our (?=[,.](\s|$))
(?=....) means its a lookahead, that means from where we have matched up to, look at whats coming but don't add it to the match.
So It checks if there is a , or a . and if there is, it checks that it's immediately followed by whitespace or the end of input. In this case it is, so it'd leave the match as 1,000
Had the lookahead not matched, it would have moved on to trying to match decimal places.
This works for all the ones that you have listed
^[0-9]{1,3}(,[0-9]{3})*(([\\.,]{1}[0-9]*)|())$
. means "any character". To use a literal ., escape it like this: \..
As far as I know, that's the only thing missing.

Matching parts of string that contain no consecutive dashes

I need a regex that will match strings of letters that do not contain two consecutive dashes.
I came close with this regex that uses lookaround (I see no alternative):
([-a-z](?<!--))+
Which given the following as input:
qsdsdqf--sqdfqsdfazer--azerzaer-azerzear
Produces three matches:
qsdsdqf-
sqdfqsdfazer-
azerzaer-azerzear
What I want however is:
qsdsdqf-
-sqdfqsdfazer-
-azerzaer-azerzear
So my regex loses the first dash, which I don't want.
Who can give me a hint or a regex that can do this?
This should work:
-?([^-]-?)*
It makes sure that there is at least one non-dash character between every two dashes.
Looks to me like you do want to match strings that contain double hyphens, but you want to break them into substrings that don't. Have you considered splitting it between pairs of hyphens? In other words, split on:
(?<=-)(?=-)
As for your regex, I think this is what you were getting at:
(?:[^-]+|-(?<!--)|\G-)+
The -(?<!--) will match one hyphen, but if the next character is also a hyphen the match ends. Next time around, \G- picks up the second hyphen because it's the next character; the only way that can happen (except at the beginning of the string) is if a previous match broke off at that point.
Be aware that this regex is more flavor dependent than most; I tested it in Java, but not all flavors support \G and lookbehinds.