Regex - some matches are missing - regex

I am trying to solve a really simple problem,but I cant find any solution.
My string looks like this: "...0.0..0.0..."
My regex is: 0[.]{1,3}0
I am expecting 3 matches: 0.0, 0..0, 0.0
But instead of that I am getting only two matches: 0.0 and 0.0. Can You please tell me what

The problem is that when the regex matches the first time, it consumes the characters from the input string that it has matched with. So in first match, it matches with:
...0.0....0.0...
^^^
so then for the next match it will consider the remainder of the string which is
....0.0...
and there, as you can see, it will only find a single match.
One way around this issue is to use a zero width lookahead assertion, provided that your regex engine supports that. So your regex would look like
0[.]{1,3}(?=0)
The meaning of this is that it will match the 0 at the end but it will not consume it. The issue with this approach is that it will not include that 0 in the matches. One solution for this issue is add the 0 afterwards yourself.

Related

How can I solve this regex using two asserts?

I have these 3 consecutive words : Nocivic Voie and Quartier
I have something like this :
#Nocivic;Voie;Quartier#
Question :
I need make a regex to extract the 3 words Nocivic Voie and Quartier using positive lookahead and the commas need to be included in my regex but not the #.
I realized that this could work : \bNocivic(?=;Voie);\bVoie;Quartier
But why is this not working ?
\bNocivic(?=;Voie);\bVoie(?<=Voie;)\bQuartier
I am not too experienced with regex so if someone could tell me why or give me the correct answer if I really wanted to use another lookbehind would be greatly appreciated thanks.
First one is equivelent to
\bNocivic;Voie;Quartier\b
(?=;Voie) just tests if ;Voie follows Nocivic, no useful here
Extrac from
https://www.regextutorial.org/positive-and-negative-lookahead-assertions.php
They only assert if in a given test string the match with certain conditions is possible or not Yes or No.
See the difference below
Nocivic;Voie Ok & returns Nocivic;Voie
Nocivic(=?;Voie) Ok & returns Nocivic
Second one :
?< is not a valid command
The second one is not working, as after match Voie you assert that from the current position there should be Voie; to the left using (?<=Voie;) but you have not matched the semi colon yet.
Note that the lookaround assertions are fruitless in the example, as you are asserting what you are also matching.
If you want to match exactly those 3 words, it does not make sense to use lookarounds.
You can use 3 capture groups:
#(Nocivic);(Voie);(Quartier)#
Regex demo

Regex: Getting Variable from Substring

How would I go about using Regex to extract the number from the following file:
abc_defg123_100aaa_abc_defg123
Where I want the 100 from the substring '_100aaa_'?
The closest have gotten is:
[0-9](?!(aaa_))*\w
but this matches up to the first underscore found!
Many thanks!
Try this:
(?<=_)\d+(?=aaa_)
See live demo.
This regex uses look arounds to assert, without capturing, the delimiting input either side of the target.

Smallest possible match / nongreedy regex search

I first thought that this answer will totaly solve my issue, but it did not.
I have a string url like this one:
http://www.someurl.com/some-text-1-0-1-0-some-other-text.htm#id_76
I would like to extract some-other-text so basically, I come with the following regex:
/0-(.*)\.htm/
Unfortunately, this matches 1-0-some-other-text because regex are greedy. I can not succeed make it nongreedy using .*?, it just does not change anything as you can see here.
I also tried with the U modifier but it did not help.
Why the "nongreedy" tip does not work?
In case you need to get the closest match, you can make use of a tempered greedy token.
0-((?:(?!0-).)*)\.htm
See demo
The lazy version of your regex does not work because regex engine analyzes the string from left to right. It always gets leftmost position and checks if it can match. So, in your case, it found the first 0-and was happy with it. The laziness applies to the rightmost position. In your case, there is 1 possible rightmost position, so, lazy matching could not help achieve expected results.
You also can use
0-((?!.*?0-).*)\.htm
It will work if you have individual strings to extract the values from.
You want to exclude the 1-0? If so, you can use a non capturing group:
(?:1-0-)+(.*?)\.htm
Demo

regex negative lookbehind - pcre

I'm trying to write a rule to match on a top level domain followed by five digits. My problem arises because my existing pcre is matching on what I have described but much later in the URL then when I want it to. I want it to match on the first occurence of a TLD, not anywhere else. The easy way to check for this is to match on the TLD when it has not bee preceeded at some point by the "/" character. I tried using negative-lookbehind but that doesn't work because that only looks back one single character.
e.g.: How it is currently working
domain.net/stuff/stuff=www.google.com/12345
matches .com/12345 even though I do not want this match because it is not the first TLD in the URL
e.g.: How I want it to work
domain.net/12345/stuff=www.google.com/12345
matches on .net/12345 and ignores the later match on .com/12345
My current expression
(\.[a-z]{2,4})/\d{5}
EDIT: rewrote it so perhaps the problem is clearer in case anyone in the future has this same issue.
You're pretty close :)
You just need to be sure that before matching what you're looking for (i.e: (\.[a-z]{2,4})/\d{5}), you haven't met any / since the beginning of the line.
I would suggest you to simply preppend ^[^\/]*\. before your current regex.
Thus, the resulting regex would be:
^[^\/]*\.([a-z]{2,4})/\d{5}
How does it work?
^ asserts that this is the beginning of the tested String
[^\/]* accepts any sequence of characters that doesn't contain /
\.([a-z]{2,4})/\d{5} is the pattern you want to match (a . followed by 2 to 4 lowercase characters, then a / and at least 5 digits).
Here is a permalink to a working example on regex101.
Cheers!
You can use this regex:
'|^(\w+://)?([\w-]+\.)+\w+/\d{5}|'
Online Demo: http://regex101.com/

Regular Expressions, getting digit after second occurence of dot

I want to get a number after second dot in a string like that :
4.5.3. Some kind of question ? but input string might look like this as well 41.53.32. Some kind of question ? so im aiming for 3 in the first example and 32 in second example.
I'm trying to do it with
(?<=(\.\d\.))[0-9]+
and it works on 1st example, but when im trying to add (?<=(\.\d+\.))[0-9]+
it doesn't work at all.
If there is always a dot after the final number then you can use the following expression:
\d+(?=\.(?:[^\d]|$))
This will match one or more digits \d+ which are followed by a dot . then something that is either not a number [^\d] of the end-of-string $, i.e. (?=\.(?:[^\d]|$)).
Regex101 Demo
If you use PERL or PHP, you can try this pattern:
(?:\d+\.){2}\K\d+
The simplest complete answer is probably something like this:
(?<=^(?:[^.]*\.){2})\d+
If you're at all worried about performance, this one will be slightly faster:
^(?:[^.]*\.){2}(\d+)
This one will capture the desired value in capturing group 1.
If you are using an engine that doesn't support variable-length lookbehind, you'll need to use the second version.
If you wish, you can replace [^.] with \d, to only match digits.
(\d+.\d+.)\K\d+
Match digits dot digits dot digits, with the first section as a group not selected.
(?:(?:.*\.)?){2}(\d+)
the following regex should work for your use case.
check it out here