regex get everything after dot - regex

Need a help with a simple regex.
I need to remove everything before the first dot (including the dot)
Example:
Text-To-remove.Text.ToKeep.123.com
Should be:
Text.ToKeep.123.com
I tried \.(.*) but it keeps the first dot:
.Text.ToKeep.123.com
Please advice

One approach uses a positive lookbehind:
(?<=\.).*$
Demo

Related

How can I solve this regex using two asserts?

I have these 3 consecutive words : Nocivic Voie and Quartier
I have something like this :
#Nocivic;Voie;Quartier#
Question :
I need make a regex to extract the 3 words Nocivic Voie and Quartier using positive lookahead and the commas need to be included in my regex but not the #.
I realized that this could work : \bNocivic(?=;Voie);\bVoie;Quartier
But why is this not working ?
\bNocivic(?=;Voie);\bVoie(?<=Voie;)\bQuartier
I am not too experienced with regex so if someone could tell me why or give me the correct answer if I really wanted to use another lookbehind would be greatly appreciated thanks.
First one is equivelent to
\bNocivic;Voie;Quartier\b
(?=;Voie) just tests if ;Voie follows Nocivic, no useful here
Extrac from
https://www.regextutorial.org/positive-and-negative-lookahead-assertions.php
They only assert if in a given test string the match with certain conditions is possible or not Yes or No.
See the difference below
Nocivic;Voie Ok & returns Nocivic;Voie
Nocivic(=?;Voie) Ok & returns Nocivic
Second one :
?< is not a valid command
The second one is not working, as after match Voie you assert that from the current position there should be Voie; to the left using (?<=Voie;) but you have not matched the semi colon yet.
Note that the lookaround assertions are fruitless in the example, as you are asserting what you are also matching.
If you want to match exactly those 3 words, it does not make sense to use lookarounds.
You can use 3 capture groups:
#(Nocivic);(Voie);(Quartier)#
Regex demo

Regex for the string at between the last quotes?

I want to take DDEERR as a result in regex. My sample string is:
("NNNS" lllsds 4.5 ddsdsd "DDEERR")
I used (?<=\s*\s*").*?(?=") for all strings between "", but I couldn't take the last one only (or before the right parentheses).
Do you have any ideas? Thanks.
I would just make good use of greedy dot here:
^.*"(.*?)".*$
Demo
The idea here is that the first .* will consume everything up until the last term appearing in double quotes. Then, we capture the text inside those double quotes as the first (and only) capture group. Follow the link below to see a working demo.
Edit:
If you really need to do this without any capture groups at all, then we can try writing a pattern with lookarounds:
(?<=")[^"]+(?="[^"]*$)
Demo

Regular Expression to match two words near each other on a single line

Hi I am trying to construct a regular expression (PCRE) that is able to find two words near each other but which occur on the same line. The near examples generally provided are insufficient for my requirements as the "\W" obviously includes new lines. I have spent quite a bit of time trying to find an answer to this and have thus far been unsuccessful. To exemplify what I have so far, please see below:
(?i)(?:\b(tree)\b)\W+(?:\w+\W+){0,5}?\b(house)\b.*
I want this to match on:
here is a tree with a house
But not match on
here is a tree
with a house
Any help would be greatly appreciated!
How about
\btree\b[^\n]+\bhouse\b
Just add a negative lookahead to match all the non-word characters but not of a new line character.
(?i)(?:\b(tree)\b)(?:(?!\n)\W)+(?:\w+\W+){0,5}?\b(house)\b.*
DEMO
Dot matches anything except newlines, so just:
(?i)\btree\b.{1,5}\bhouse\b
Note it is impossible for there to be zero characters between the two words, because then they wouldn't be two words - they would be the one word and the \b wouldn't match.
Just replace \W with [^\w\r\n] in your regex:
(?i)(?:\b(tree)\b)[^\w\r\n]+(?:\w+[^\w\r\n]+){0,5}?\b(house)\b.*
To get the closest matches of both words on the same line, an option is to use a negative lookahead:
(?i)(\btree\b)(?>(?!(?1)).)*?\bhouse\b
The . dot default does not match a newline (only with s DOTALL modifier)
(?>(?!(?1)).)*? As few as possibly of any characters, that are not followed by \btree\b
(?1) pastes the first parenthesized pattern.
Example at regex101.com; Regex FAQ
Maybe this helps, found here https://www.regular-expressions.info/near.html
\bword1\W+(?:\w+\W+){1,6}?word2\b.

regex to remove everything after the last dot in a file

I'm trying to find a regex for removing everything after the last dot in a file. So far I've found ways to remove all text before the first dot, but I can't seem to find a way to select the end of the file. Could you help me on the way?
Thanks in advance
You can try something like:
\.[^.]*$
to match everything including and after the last dot. If you don't want to include the last dot, then you can use a positive lookbehind:
(?<=\.)[^.]*$
Try following regex for search and replace
s/\.[^.]*$/\./
On Bigquery, r'([^.]+).?$' works, if you want to remove the last dot.

Regular Expressions, getting digit after second occurence of dot

I want to get a number after second dot in a string like that :
4.5.3. Some kind of question ? but input string might look like this as well 41.53.32. Some kind of question ? so im aiming for 3 in the first example and 32 in second example.
I'm trying to do it with
(?<=(\.\d\.))[0-9]+
and it works on 1st example, but when im trying to add (?<=(\.\d+\.))[0-9]+
it doesn't work at all.
If there is always a dot after the final number then you can use the following expression:
\d+(?=\.(?:[^\d]|$))
This will match one or more digits \d+ which are followed by a dot . then something that is either not a number [^\d] of the end-of-string $, i.e. (?=\.(?:[^\d]|$)).
Regex101 Demo
If you use PERL or PHP, you can try this pattern:
(?:\d+\.){2}\K\d+
The simplest complete answer is probably something like this:
(?<=^(?:[^.]*\.){2})\d+
If you're at all worried about performance, this one will be slightly faster:
^(?:[^.]*\.){2}(\d+)
This one will capture the desired value in capturing group 1.
If you are using an engine that doesn't support variable-length lookbehind, you'll need to use the second version.
If you wish, you can replace [^.] with \d, to only match digits.
(\d+.\d+.)\K\d+
Match digits dot digits dot digits, with the first section as a group not selected.
(?:(?:.*\.)?){2}(\d+)
the following regex should work for your use case.
check it out here