Regular expression to match valid IP addresses over multiple lines - regex

How can I construct a regular expression that matches a valid IP address [1-255] and is able to be multiline and allow whitespace? The values will be typed out and submitted like this:
10.10.10.10
100.100.100.100
192.1.1.1.1
192.158.1.38
and so on with no limit.
I have this expression that I have tweaked but only does a fraction of what I need it to do:
"^(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\/?[0-4]?[0-9]?\s?\r?\n?\.?\d)*$\b"

Something as simple as the following should do it:
\b(?<!\.)(?:(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.){3}(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])(?!\.)\b
Not too sure what you think you're doing with all the new line checks though, regex engines handle new lines on their own just fine. See it in action here.

Related

Extract domain from email address with Regex

I'm learning regular expressions and I'm having trouble extracting the domain from the email address. I have an email address: example#gmail.com. I need to use a regular expression to extract #gmail (along with the # symbol). I should end up only getting example. I've already tried this:
your text#(\w+)
and this
your text(?<=#)[^.]+(?=.).*
but those expressions didn't work properly. I'd appreciate your help.
I just tried a simple look behind - #(?<=#).* it will match #google.com you can also group the entire expression and can change it according to single and multi-line matches.
#(?<=#).*

regular expressions: catch any URLs of the domain example.com

I'm trying to get regexp code for the below case. I tried multiple tries but in vain.
I need to catch any URLs of the domain site.com. Tried using regexp '^site.com/*$
but it does not recognizes it.
i'm just looking for regexp code whichmatches site.com/*
With your expression ^site.com/*$ you match all strings that start with site.com and have zero or more trailing / characters (/*):
If you want to match any strings starting with site.com/ you might want to try ^site\.com/.*$:
There are already a lot of other regex questions regarding domain names on SO, but your question is not clear to me in what context you are trying to do this, or what is the actual goal you want to achieve. If you describe your needs more precisely you could probably find some answers on this forum.
I generally use a helper website like regex101.com.
Also, a few things to note, . has a special meaning in regex meaning any character, and if you wanted to capture site.com/foo you might want to use something where you are not limited to the number of characters by the end. I'd do this with groupings.
^(site\.com\/)(.+)$
You can see this in action here: https://regex101.com/r/AU2iYC/2
Your regex ^site.com/*$ is only matched follow sentences
ex) site.com/ site.com//////// site.com
because * asterisk in regex means Match 0 or more of the preceding token.
so, it should be work
^site.com\/.*$

regular expression multiple matches

For reference, this is the regex tester I am using:
http://www.rsyslog.com/regex/
How can I modify this regular expression:
[^;]+
to receive multiple sub-matches for the following test string:
;first;second;third;fourth;fifth and sixth;seventh;
I currently only receive one sub-match:
first
Basically I want each sub-match to consist of the content between ; characters, I am hoping for a sub-match list like this:
first
second
third
fourth
fifth and sixth
seventh
Following information given in the comments I discovered that the reason I can't get more than one sub-match is that I need to specify the global modifier - and I can't seem to figure out how to do that in the ryslog regex tester I am using.
However, this did lead me to solve my problem in a slightly different manner. I came up with this regular expression which still only gives one match, but the number near the end acts as the index for the desired match, so for example:
(?:;([^;]+)){5}
matches this from my test string in the question:
fifth and sixth
While this solution allows me to achieve what I wanted - though in a different manner - the true answer to my question is found in HamZa's comments. More specifically:
How can I modify the regular expression to receive multiple
sub-matches?
The answer is, you can't modify the regular expression itself in order to get multiple sub-matches. Setting the global modifier is required in order to do that.
Based on this information I have posted a new question on serverfault targeted specifically to the rsyslog regular expression system.

Regex with negative look behind still matches certain strings in Scala

I have a text, that contains url domains in the following form:
[second_level_domain].[top_level_domain]
This could be for instance test.com, amazon.com or something similar, but not more complex stuff like e.g. www.test.com or de.wikipedia.org (no sub level domains!).
It could be that in front of the dot (between second and top level domain) or after the dot is an optional space like test . com, but this doesn't always have to be the case.
However what I don't want to match is if the second level domain and top level domain belong to an e-mail address like for instance hello#test.org. So in this case it shouldn't extract test.org
I wrote the following regex now:
(?<!#)(([a-zA-Z\d]+(?:-[a-zA-Z\d]+)*(?<!www))\s?\.\s?(com|net|org))
With the negative look behind I want to make sure, that in front of the second level domain shouldn't be an #. However it doesn't really do what I expected. For instance on the text hello#test.org it extracts est.org instead of extracting nothing. So, apparently it only looks at the first character when it checks if there is an # in front. But when I use the following regex it seems to work on the text hello#test.org:
(?<!#)((test)\s?\.\s?(com|net|org))
Here I hard coded the second level domain, with which it works. However if I exchange that with a regex that matches all kinds of second level domains
([a-zA-Z\d]+(?:-[a-zA-Z\d]+)*(?<!www))
it doesn't work anymore. It looks like that the negative look behind is already used after the first character is matched and that it doesn't wait with the negative look behind until everything is matched.
As an alternative I could match a bit more and then use the groups afterwards to build my desired match, but I want to avoid that if possible. I would like to match it correctly immediately. I'm not an expert in regular expressions and apparently I have not understood look arounds properly yet. Is there a way to write a regex, which behaves like I want?
(?:^|(?<=\s))((?:[a-zA-Z\d]+(?:-[a-zA-Z\d]+)*(?<!www))\s?\.\s?(?:com|net|org))
Add anchors to disallow partial matches.See demo.
https://www.regex101.com/r/rK5lU1/34

Validate incomplete Regex

Let's say we have a Regex, in my case it's one I found to match UK car registration plates:
^([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})
A typical UK car registration is
HG53CAY
This is matched correctly by the regex, but what i'd like to do is find a way to match any prefix substring of this, so the following would all be valid:
H, HG, HG5, HG53, HG53C, HG53CA, HG53CAY
Is there a suggested way to achieve this?
Firstly I'd rewrite your regexp to look like this:
^([A-Z]{3}\s?(\d{1,3})\s?[A-Z])|([A-Z]\s?(\d{1,3})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})
as the \d{3}|\d{2}|d{1} parts make no sense and should be written \d{1,3}.
Rewriting the regexp like
^([A-Z]{0,3}\s?(\d{0,3})\s?[A-Z]?)|([A-Z]\s?(\d{0,3})\s?[A-Z]{0,3})|(([A-HK-PRSVWY][A-HJ-PR-Y]?)\s?([0]?[2-9]?|[1-9]?[0-9]?)\s?[A-HJ-PR-Z]{0,3})
should have the desired effect of allowing matching of only the beginning of a registration, but unfortunately it's no longer guaranteed that the full registration will be a valid one, as I had to make most characters optional.
You could possibly try something like this
^(([A-Z]{3})|[A-Z]{1,2}$)\s?((\d{1,3})|$))...
to make it require either that each part is complete, or that it is incomplete but followed by "end of string", represented by the $ in the regexp.