Regular Expressions, getting digit after second occurence of dot - regex

I want to get a number after second dot in a string like that :
4.5.3. Some kind of question ? but input string might look like this as well 41.53.32. Some kind of question ? so im aiming for 3 in the first example and 32 in second example.
I'm trying to do it with
(?<=(\.\d\.))[0-9]+
and it works on 1st example, but when im trying to add (?<=(\.\d+\.))[0-9]+
it doesn't work at all.

If there is always a dot after the final number then you can use the following expression:
\d+(?=\.(?:[^\d]|$))
This will match one or more digits \d+ which are followed by a dot . then something that is either not a number [^\d] of the end-of-string $, i.e. (?=\.(?:[^\d]|$)).
Regex101 Demo

If you use PERL or PHP, you can try this pattern:
(?:\d+\.){2}\K\d+

The simplest complete answer is probably something like this:
(?<=^(?:[^.]*\.){2})\d+
If you're at all worried about performance, this one will be slightly faster:
^(?:[^.]*\.){2}(\d+)
This one will capture the desired value in capturing group 1.
If you are using an engine that doesn't support variable-length lookbehind, you'll need to use the second version.
If you wish, you can replace [^.] with \d, to only match digits.

(\d+.\d+.)\K\d+
Match digits dot digits dot digits, with the first section as a group not selected.

(?:(?:.*\.)?){2}(\d+)
the following regex should work for your use case.
check it out here

Related

Regex: split number into optional first group of up to three then last group of up to three

I have two 1-6 digit numbers separated by a slash. I want these split up into groups of at most 3 digits, taking from the right.
For example:
0/1 -> [,0,,1]
1234/3 -> [1,234,,3]
12345/1234 -> [12,345,1,234]
123456/789123 -> [123,456,789,123]
I need to use a regular expression to do this because I want to do this for a location in NGINX. It's possible to do this with application logic but that is not the question due to performance.
Similar question which solves part of this was here using a negative lookahead: Regular expression to match last number in a string
What regex can achieve this split?
UPDATE:
This regex comes close to what I want (https://regex101.com/r/bQtNdK/3):
(?<prefix1>\d{0,3}?)(?<threes1>\d{0,3})\/(?<prefix2>\d{0,3}?)(?=\d)(?<threes2>\d{0,3})
It fails matching if the second number behind the slash is more than 3 digits long.
UPDATE2:
Now this regex works for most combinations (https://regex101.com/r/bQtNdK/5):
(?<prefix1>\d{0,3}?)(?<threes1>\d{1,3})\/(?<prefix2>\d{0,3})(?<threes2>\d{3})
I don't understand why this starts to fail if I use the same regex for prefix2/threes2 like prefix1/threes1 (i.e. make prefix2 also lazy). Any ideas how to solve this? So close...
I don't know that it's possible without the ability for the regex engine to remember all intermediate matches of a match group that matched an arbitrary number of times (.NET can do this, not sure what others). PCRE will apparently only remember the 'last' match for each group, other wise you could use something like this : (?<prefix1>\d{0,2})(?:(?<threes1>\d{3})*)\/(?<prefix2>\d{0,2})(?<threes2>\d{3})*\s
This regex seems to be correct now (regex101):
(?<prefix1>\d{0,3}?)(?<suffix1>\d{1,3})\/(?<prefix2>\d{0,3}?)(?<suffix2>\d{1,3})\/

matching in between a long sentence with keywords

target sentence:
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system;$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host;$(SolDir)..\..\ABC\ccc\1234\components\fds\ab_cdef_1.0\host; $(SolDir)..\..\ABC\ccc\1234\somethingelse;
how should I construct my regex to extract item contains "..\..\ABC\ccc\1234\ccc_am_system"
basically, I want to extract all those folders and may be more, they are all under \ABC\ccc\1234\ccc_am_system:
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host\abc;
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host\123\123\123\123;
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host;
my current regex doesn't work and I can't figure out why
\$.*ccc\\1234\.*;
Your problem is most likely that * is a greedy operator. It's greedily matching more than you intend it to. In many regex dialects, *? is the reluctant operator. I would first try using it like this:
\$.*?ccc\\1234.*?;
You can read up a bit more on greedy vs reluctant operators in this question.
If that doesn't work, you can try to be more specific with the characters you match than .. For example, you can match every non-semicolon character with an expression like this: [^;]*. You could use that idea this way:
\$[^;]*ccc\\1234[^;]*;
The below regex would store the captured strings inside group 1.
(\$.*?ccc\\1234\\.*?;)
You need to make the * quantifier to does a shortest match by adding ? next to * . And also this \.* matches a literal dot zero or more times. It's wrong.
DEMO
I found this to be the best:
\$(.[^\$;])*ccc\\1234(.[^\$;])*;
it doesn't allow any over match whatsoever, if I use ?, it still matches more $ or ; more than once for some reason, but with above expression, that will never be case. Still thanks to all those who took the time to answer my question,.

Regex for two digits in any order

I need a regex that will match a string as long as it includes 2 or more digits.
What I have:
/(?=.*\d)(?=.*\d)/
and
/\d{2,}/
The first one will match even if there is one digit, and the second requires that there are 2 consecutive digits. I have tried to combine them in different ways to no avail.
You can do much simpler :
/\d\D*\d/
You can use the following expression:
.*\d.*\d.*
This will match anything that has two digits in it, anywhere. Regardless of where the numbers are. Example here.
You can also do it like this, using ranges:
.*[0-9].*[0-9].*
Link.
You may also consider using this:
\D*\d\D*\d
The \D will match anything that is not a digit character
It depends on your applications language, but this regex is the most general:
^(?=.*\d.*\d)
Not all application languages consider partial matches as "matching"; this regex will match no matter where in the input the two digits lie.
grep -E ".*[0-9].*[0-9].*" filename
You can use the following depending on the use case:
^(?=(?:\D*\d){2}).* - The restriction is implemented with a positive lookahead (anchored at the start of string) that requires any two (or more) digits anywhere inside the string (and the regex flavor supports lookaheads) - Regex demo #1
^([^0-9]*[0-9]){2}.* - The regex matches a string that starts with two sequences of any non-digit chars followed with a digit char and then contains any text (this pattern is POSIX ERE compliant, to make it POSIX BRE compliant, use ^\([^0-9]*[0-9]\)\{2\}.*) - Regex demo #2
\d\D*\d - in case you simply want to make sure there is a digit + zero or more chars other than digits followed with a digit and the method you are using allows partial matches - Regex demo #3.
The first approach is best when you already have a complex pattern and you need to add an extra constraint.
The second one is good for POSIX regex engines.
The third one is best when you implement complex if-else logic for password and other validations with separate error messages per issue.
try this.
[0-9].{2}
this will help to u

regex negative look-ahead for exactly 3 capital letters arround a char

im trying to write a regex finds all the characters that have
exactly 3 capital letters on both their sides
The following regex finds all the characters that have exactly 3 capital letters on the left side of the char, and 3 (or more) on the right:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})'
When trying to limit the right side to no more then 3 capitals using the regex:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})(?![A-Z])'
i get no results, there seems to be a fail when adding the (?![A-Z]) to the first regex.
can someone explain me the problem and suggest a way to solve it?
Thanks.
You need to put the negative lookahead inside the positive one:
(?<![A-Z])[A-Z]{3}.(?=[A-Z]{3}(?![A-Z]))
You can do that with the lookbehind, too:
(?<=(?<![A-Z])[A-Z]{3}).(?=[A-Z]{3}(?![A-Z]))
It doesn't violate the "fixed-length lookbehind" rule because lookarounds themselves don't consume any characters.
EDIT (about fixed-length lookbehind): Of all the flavors that support lookbehind, Python is the most inflexible. In most flavors (e.g. Perl, PHP, Ruby 1.9+) you could use:
(?<=^[A-Z]{3}|[^A-Z][A-Z]{3}).
...to match a character preceded by exactly three uppercase ASCII letters. The first alternative - ^[A-Z]{3} - starts looking three positions back, while the second - [^A-Z][A-Z]{3} - goes back exactly four positions. In Java, you can reduce that to:
(?<=(^|[^A-Z])[A-Z]{3}).
...because it does a little extra work at compile time to figure out that the maximum lookbehind length will be four positions. And in .NET and JGSoft, anything goes; if it's legal anywhere, it's legal in a lookbehind.
But in Python, a lookbehind subexpression has to match a single, fixed number of characters. If you've butted your head against that limitation a few times, you might not expect something like this to work:
(?<=(?<![A-Z])[A-Z]{3}).
At least I didn't. It's even more concise than the Java version; how can it work in Python? But it does work, in Python and in every other flavor that supports lookbehind.
And no, there are no similar restrictions on lookaheads, in any flavor.
Taking out the positive lookahead worked for me.
(?<![A-Z])[A-Z]{3}(.)([A-Z]{3})(?![A-Z])
'ABCdDEF' 'ABCfDEF' 'HHHhhhHHHH' 'jjJJjjJJJ' JJJjJJJ
matches
ABCdDEF
ABCfDEF
JJJjJJJ
I'm not sure how the regexp engines should work with multiple lookahead assertions, but the one you're using may have its own opinion on that.
You could as well use a single assertion as follows:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3}[^A-Z])'
The same with lookbehind:
'(?<=[^A-Z][A-Z]{3})(.)(?=[A-Z]{3}[^A-Z])'
This will have a problem matching the pattern in the beginning and in the end of the line.
I can't think of a proper solution, but there can be a dirty trick: for instance, add a space (or something else) in the beginning and the end of the whole line, then perform the matching.
$ echo 'ABCdDEF ABCfDEF HHHhhhHHHH AAAaAAAbAAA jjJJJJjJJJ JJJjJJJ' | sed 's/.*/ & /' | grep -oP '(?<=[^A-Z][A-Z]{3})(\S)(?=[A-Z]{3}[^A-Z])'
d
f
a
b
j
Note that I changed (.) to (\S) in the middle, change it back if you want the space to match.
P.S. Are you solving The Python Challenge? :)
Since the look ahead pattern is the same as the look behind pattern, you could also use the continue anchor \G:
/(?:[A-Z]{3}|\G[A-Z]*)(.)[A-Z]{3}/
A match is returned if three capitals precede a single character or where the last match left off (optionally followed by other capitals).

What's wrong with this number extracting Regex?

I have a string like the following:
<br><b>224h / 15.45 verbuchte Stunden</b>
I want to extract the numbers and have created the following Regex:
([0-9]\.?[0-9]{0,2})h\s\/\s([0-9]\.?[0-9]{0,2})
But for the preceding string this gives me the numbers 224 and 15 instead of 15.45.
What's wrong with this Regex?
Because you allow only one digit before the dot.
Try this, I used {1,2} as quantifier before the dot, change it to your needs. Probably + would be a better choice, it allows one or more.
([0-9]\.?[0-9]{0,2})h\s\/\s([0-9]{1,2}\.?[0-9]{0,2})
A better regex could be this
([0-9]+(?:\.[0-9]{1,2})?)h\s*\/\s*([0-9]+(?:\.[0-9]{1,2})?)
I made here the complete fraction part optional and require at least one and at most 2 digits after the dot and minimum one before.
The answer is given by stema.
If your regex engine supports character classes it could be a little bit more compact like this:
(\d{1,2}\.?\d{0,2})h\s/\s(\d{1,2}\.?\d{0,2})
\d is a shorthand character class for [0-9]