Graylog regex extract first valid Mac Address in message - regex

I am trying to extract the first valid common mac address out of several different message entries in Graylog. I can do it with different Grok Extractors, but am wanting to do it with Regex so I can do conversions on the Mac to all lower case. Below are some sample messages and the Grok Patterns that work.
Question, how would I convert these Grok extractors to regex and or is there a single regex that would work in all 4 examples? Basically the regex would just need to match the first valid MAC address in each string and extract it.
Sample 1:
Equinox: *spamApTask1: Mar 20 15:26:04.033: #CAPWAP-3-ECHO_ERR: capwap_ac_sm.c:7019 Did not receive heartbeat reply; AP: 00:3a:9a:48:9b:40
Sample 2:
Equinox: *spamReceiveTask: Mar 17 12:34:39.264: #CAPWAP-3-DTLS_CONN_ERR: capwap_ac.c:934 00:3a:9a:30:f5:90: DTLS connection not found forAP 192.168.99.74 (43456), Controller: 192.168.99.2 (5246) send packet
Sample3:
Equinox: *spamApTask1: Mar 22 08:35:14.562: #LWAPP-4-SIG_INFO1: spam_lrad.c:44474 Signature information; AP 00:14:1b:61:f8:40, alarm ON, standard sig NULL probe resp 1, track per-Macprecedence 2, hits 1, slot 0, channel 1, most offending MAC 00:00:00:00:00:00 #yes but must make Mac lowercase
Sample 4:
Equinox: *idsTrackEventTask: Mar 22 08:40:13.816: #WPS-4-SIG_ALARM_OFF: sig_event.c:656 AP 00:14:1B:61:F8:40 : Alarm OFF, standard sig NULL probe resp 1, track=per-Mac preced=2 hits=1 slot=0 channel=1 yes but must make Mac lowercase
Sample1 Grok pattern:%{GREEDYDATA}AP: {COMMONMAC:WLC_APBaseMac}
Sample2 Grok pattern:%{GREEDYDATA}capwap_ac.c:934 %{COMMONMAC:WLC_APBaseMac}
Sample3 Grok pattern:%{GREEDYDATA}AP %{COMMONMAC:WLC_APBaseMac}
Sample4 Grok pattern:%{GREEDYDATA}AP %{COMMONMAC:WLC_APBaseMac}

You can make a pattern, which matches 5 groups of 2 hex digits followed by a semicolon, followed by the last 6th group of 2 hex digits:
(?i)(?:[0-9a-f]{2}:){5}[0-9a-f]{2}
Demo here. The (?i) at the start make the search case-insensitive.
UPDATED
If the above regex does not work in Graylog then you can try the very basic form of it, where all the quantifiers and character sets are expanded:
[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]:[0-9a-fA-F][0-9a-fA-F]
Demo here.

Related

Regular expression for matching a specifc substring of a string

I have a log file that logs connection drops of computers in a LAN. I want to extract name of each computer from every line of the log file and for that I am doing this: (?<=Name:)\w+|(-PC)
The target text:
`[C417] ComputerName:KCUTSHALL-PC UserID:GO kcutshall Station 9900 (locked) LanId: | (11/23 10:54:09 - 11/23 10:54:44) | Average limit (300) exceeded while pinging www.google.com [74.125.224.147] 8x
[C445] ComputerName:FRONTOFFICE UserID:YB Yenae Ball Station 7C LanId: | (11/23 17:02:00) | Client is connected to agent.`
The problem is that some computer names have -PC in them and in some isn't. The expression I have created matches computer without -PC in their names but it if a computer has -PC in the name, it treats that as a separate match and I don't want that. In short, it gives me 3 matches, but I want only 2. That's why I need help here, I am beginner in regex.
You may use
(?<=Name:)\w+(?:-PC)?
Details
(?<=Name:) - a place immediately preceded with Name:
\w+ - 1+ word chars
(?:-PC)? - an optional non-capturing group that matches 1 or 0 occurrences of -PC substring.
Consider using word boundaries if you need to match PC as a whole word,
(?<=Name:)\w+(?:-PC\b)?
See the regex demo.

How to creating a regex pattern in VBA to extract dates from string and exclude false matches

I am trying to use Regex to parse a series of strings to extract one or more text dates that may be in multiple formats. The strings will look something like the following:
24 Aug 2016: nno-emvirt010a/b; 16 Aug 2016 nnt-emvirt010a/b nnd-emvirt010a/b COSI-1.6.5
24.16 nno-emvirt010a/b nnt-emvirt010a/b nnd-emvirt010a/b EI.01.02.03\
9/23/16: COSI-1.6.5 Logs updated at /vobs/COTS/1.6.5/files/Status_2016-07-27.log, Status_2016-07-28.log, Status_2016-08-05.log, Status_2016-08-08.log
I am not concerned about validating the individual date fields; just extracting the date string. The part I am unable to figure out is how to not match on number sequences that match the pattern but aren’t dates (‘1.6.5’ in ex. (1) and 01.02.03 in ex. (2)) and dates that are part of a file name (2016-07-27 in ex. (3)). In each of these exception cases in my input data, the initial numbers are preceded by either a period(.), underscore (_) or dash (-), but I cannot determine how to use this to edit the pattern syntax to not match these strings.
The pattern I have that partially works is below. It will only ignore the non date matches if it starts with 1 digit as in example 1.
/[^_\.\(\/]\d{1,4}[/\-\.\s*]([1-9]|0[1-9]|[12][0-9]|3[01]|[a-z]{3})[/\-\.\s*]\d{1,4}/ig`
I am not sure about vba check if this works . seems they have given so much options : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html
^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
^(?:
# m/d or mm/dd
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
# d/m or dd/mm
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$
According to the test strings you've presented, you can use the following regex
See this regex in use here
(?<=[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
This regex ensures that specific date formats are met and are preceded by nothing (beginning of the string) or by a non-word character (specifically a-z, A-Z, 0-9) or dot .. The date formats that will be matched are:
24 Aug 2016
24.16
9/23/16
The regex could be further manipulated to ensure numbers are in the proper range according to days/month, etc., however, I don't feel that is really necessary.
Edits
Edit 1
Since VBA doesn't support lookbehinds, you can use the following. The date is in capture group 1.
(?:[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
Edit 2
As per bulbus's comment below
(?:[^\w.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d{2,4})|(?:(?:\d{‌1,2}\/){2}\d{2,4})|(‌​?:\d{2,4}(?:-\d{2}){‌​2})|\d{2}\.\d{2})
Took liberty to edit that a bit.
replaced [^a-zA-Z\d.] with [^\w.], comes with added advantage of excluding dates with _2016-07-28.log
Due to 1 removed trailing condition (?=[^a-zA-Z\d.]).
Forced year digits from \d+ to \d{2,4}
Edit 3
Due to added conditions of the regex, I've made the following edits (to improve upon both previous edits). As per the OP:
The edited pattern above works in all but 2 cases:
it does not find dates with the year first (ex. 2016/07/11)
if the date is contained within parenthesis in the string, it returns the left parenthesis as part of the date (ex. match = (8/20/2016)
Can you provide the edit to fix these?
In the below regexes, I've changed years to \d+ in order for it to work on any year greater than or equal to 0.
See the code in use here
(?:[^\w.]|^)((?:\d{1,2}\s+[A-Z][a-z]{2}\s+\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:\/\d{1,2}){2})|(?:\d+(?:-\d{2}){2})|\d{2}\.\d+)
This regex adds the possibility of dates in the XXXX/XX/XX format where the date may appear first.
The reason you are getting ( as a match before the regex is the nature of the Full Match. You need to, instead, grab the value of the first capture group and not the whole regex result. See this answer on how to grab submatches from a regex pattern in VBA.
Also, note that any additional date formats you need to catch need to be explicitly set in the regex. Currently, the regex supports the following date formats:
\d{1,2}\s+[A-Z][a-z]{2}\s+\d+
12 Apr 17
12 Apr 2017
(?:\d{1,2}\/){2}\d+
1/4/17
01/04/17
1/4/2017
01/04/2017
\d+(?:\/\d{1,2}){2}
17/04/01
2017/4/1
2017/04/01
17/4/1
\d+(?:-\d{2}){2}
17-04-01
2017-04-01
\d{2}\.\d+ - Although I'm not sure what this date format is even used for and how it could be considered efficient if it's missing month
24.16

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

R Regex for direct messages on Twitter

I got a set of Twitter status updates and im trying to filter all direct messages, senders and receivers of the latter. My dataframe includes columns for senders and text. Using regular expressions I'm trying filter the receivers out of the text column.
this ist what I got, but its returning some strange results
WD <- getwd()
if (!is.null(WD)) setwd(WD)
load("data.R")
#http://www.unet.univie.ac.at/~a0406222/data.R
dmtext <- grep("^#[a-z0-9_]{1,15}", tweets$text, perl=T, value=TRUE,ignore.case=TRUE)
dm.receiver <- gsub("^#([a-z0-9_]{1,15})[ :,].*$", "\\1", dmtext, perl=T,ignore.case=TRUE)
dm.sender <- as.character(tweets$from_user[grep("^#[a-z0-9_]{1,15}", tweets$text, perl=T,ignore.case=TRUE,value=FALSE)])
dm.df <- data.frame(dm.sender,dm.receiver,dmtext)
dm.df[1:1000,2]
these are some examples of the bad results I get for dm.receiver
#insultaofuturo Apesar da proibição, jovens insistem em acampar no Aterro na Rio+20\nhttp://t.co/dCfFHUWV
#mqtodd Bringing the .green Internet to Rio+20 Summit | DotGreen\nhttp://t.co/pQqYilXp #RioPlus20 #gogreen
#Shyman33 Elinor Ostrom's trailblazing commons research can inspire Rio+20\n http://t.co/m7OTHBtP
#OccupyRio20 #pnud_es #FBuenAbad #rioplussocial #Futurewewant \nALGO DE ESTO SE HA CUMPLIDO? http://t.co/QDlVwT5z
#UNDP_CDG#UNDP#Asia-Pacific#Rio+20E-discussion on National&Local Planning for Sustainable Development. Contribute&mail:aprc.rio20#undp.org
why is it that I get results longer than 15 characters using {1,15}?
It turned out to be a encoding problem. I was not able to solve this issue using regular expressions but the software I used to retrieve the tweets has a column which indicates a user id tweets a adressed to. So I'll use this to do the analysis.
Your grep command matches anything that starts with 1-15 alphanumeric characters. For example:
#blahblahblahblahblahblahblahblahblah
will match because grep looks for the start of the line, looks for the #, finds at least one alpha character and then happily stops, considering this a successful match. grep doesn't care what comes after your pattern in the string, as long as it found something that matches.
In order to get only things under 15 characters, you also have to specify what comes next:
dmtext <- grep("^#[a-z0-9_]{1,15}\\b", ...
This matches 1-15 characters, followed by a word boundary (\b, with an extra backslash for string escaping). Thus, it won't match a word that is 16 or 100 characters long--only something with between 1 and 15 characters.

Validating IPv4 addresses with regexp

I've been trying to get an efficient regex for IPv4 validation, but without much luck. It seemed at one point I had had it with (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?(\.|$)){4}, but it produces some strange results:
$ grep --version
grep (GNU grep) 2.7
$ grep -E '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?(\.|$)){4}\b' <<< 192.168.1.1
192.168.1.1
$ grep -E '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?(\.|$)){4}\b' <<< 192.168.1.255
192.168.1.255
$ grep -E '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?(\.|$)){4}\b' <<< 192.168.255.255
$ grep -E '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?(\.|$)){4}\b' <<< 192.168.1.2555
192.168.1.2555
I did a search to see if this had already been asked and answered, but other answers appear to simply show how to determine 4 groups of 1-3 numbers, or do not work for me.
Best for Now (43 chars)
^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$
This version shortens things by another 6 characters while not making use of the negative lookahead, which is not supported in some regex flavors.
Newest, Shortest, Least Readable Version (49 chars)
^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)(\.(?!$)|$)){4}$
The [0-9] blocks can be substituted by \d in 2 places - makes it a bit less readable, but definitely shorter.
Even Newer, even Shorter, Second least readable version (55 chars)
^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.(?!$)|$)){4}$
This version looks for the 250-5 case, after that it cleverly ORs all the possible cases for 200-249 100-199 10-99 cases. Notice that the |) part is not a mistake, but actually ORs the last case for the 0-9 range. I've also omitted the ?: non-capturing group part as we don't really care about the captured items, they would not be captured either way if we didn't have a full-match in the first place.
Old and shorter version (less readable) (63 chars)
^(?:(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(?!$)|$)){4}$
Older (readable) version (70 chars)
^(?:(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])(\.(?!$)|$)){4}$
It uses the negative lookahead (?!) to remove the case where the ip might end with a .
Alternative answer, using some of the newer techniques (71 chars)
^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.){3}(25[0-5]|(2[0-4]|1\d|[1-9]|)\d)$
Useful in regex implementations where lookaheads are not supported
Oldest answer (115 chars)
^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$
I think this is the most accurate and strict regex, it doesn't accept things like 000.021.01.0. it seems like most other answers here do and require additional regex to reject cases similar to that one - i.e. 0 starting numbers and an ip that ends with a .
You've already got a working answer but just in case you are curious what was wrong with your original approach, the answer is that you need parentheses around your alternation otherwise the (\.|$) is only required if the number is less than 200.
'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b'
^ ^
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Accept:
127.0.0.1
192.168.1.1
192.168.1.255
255.255.255.255
0.0.0.0
1.1.1.01 # This is an invalid IP address!
Reject:
30.168.1.255.1
127.1
192.168.1.256
-1.2.3.4
1.1.1.1.
3...3
Try online with unit tests: https://www.debuggex.com/r/-EDZOqxTxhiTncN6/1
IPv4 address (accurate capture)
Matches 0.0.0.0 through 255.255.255.255, but does capture invalid addresses such as 1.1.000.1
Use this regex to match IP numbers with accuracy.
Each of the 4 numbers is stored into a capturing group, so you can access them for further processing.
\b
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\b
taken from JGsoft RegexBuddy library
Edit: this (\.|$) part seems weird
I think many people reading this post will be looking for simpler regular expressions, even if they match some technically invalid IP addresses. (And, as noted elsewhere, regex probably isn't the right tool for properly validating an IP address anyway.)
Remove ^ and, where applicable, replace $ with \b, if you don't want to match the beginning/end of the line.
Basic Regular Expression (BRE) (tested on GNU grep, GNU sed, and vim):
/^[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+$/
Extended Regular Expression (ERE):
/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/
or:
/^([0-9]+(\.|$)){4}/
Perl-compatible Regular Expression (PCRE) (tested on Perl 5.18):
/^\d+\.\d+\.\d+\.\d+$/
or:
/^(\d+(\.|$)){4}/
Ruby (tested on Ruby 2.1):
Although supposed to be PCRE, Ruby for whatever reason allowed this regex not allowed by Perl 5.18:
/^(\d+[\.$]){4}/
My tests for all these are online here.
I was in search of something similar for IPv4 addresses - a regex that also stopped commonly used private ip addresses from being validated (192.168.x.y, 10.x.y.z, 172.16.x.y) so used negative look aheads to accomplish this:
(?!(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.).*)
(?!255\.255\.255\.255)(25[0-5]|2[0-4]\d|[1]\d\d|[1-9]\d|[1-9])
(\.(25[0-5]|2[0-4]\d|[1]\d\d|[1-9]\d|\d)){3}
(These should be on one line of course, formatted for readability purposes on 3 separate lines)
Debuggex Demo
It may not be optimised for speed, but works well when only looking for 'real' internet addresses.
Things that will (and should) fail:
0.1.2.3 (0.0.0.0/8 is reserved for some broadcasts)
10.1.2.3 (10.0.0.0/8 is considered private)
172.16.1.2 (172.16.0.0/12 is considered private)
172.31.1.2 (same as previous, but near the end of that range)
192.168.1.2 (192.168.0.0/16 is considered private)
255.255.255.255 (reserved broadcast is not an IP)
.2.3.4
1.2.3.
1.2.3.256
1.2.256.4
1.256.3.4
256.2.3.4
1.2.3.4.5
1..3.4
IPs that will (and should) work:
1.0.1.0 (China)
8.8.8.8 (Google DNS in USA)
100.1.2.3 (USA)
172.15.1.2 (USA)
172.32.1.2 (USA)
192.167.1.2 (Italy)
Provided in case anybody else is looking for validating 'Internet IP addresses not including the common private addresses'
Here is a better one with passing/failing IPs attached
/^((?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])[.]){3}(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])$/
Accepts
127.0.0.1
192.168.1.1
192.168.1.255
255.255.255.255
10.1.1.1
0.0.0.0
Rejects
1.1.1.01
30.168.1.255.1
127.1
192.168.1.256
-1.2.3.4
1.1.1.1.
3...3
192.168.1.099
Above answers are valid but what if the ip address is not at the end of line and is in between text.. This regex will even work on that.
code: '\b((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\.)){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))\b'
input text file:
ip address 0.0.0.0 asfasf
sad sa 255.255.255.255 cvjnzx
zxckjzbxk 999.999.999.999 jshbczxcbx
sjaasbfj 192.168.0.1 asdkjaksb
oyo 123241.24121.1234.3423 yo
yo 0000.0000.0000.0000 y
aw1a.21asd2.21ad.21d2
yo 254.254.254.254 y0
172.24.1.210 asfjas
200.200.200.200
000.000.000.000
007.08.09.210
010.10.30.110
output text:
0.0.0.0
255.255.255.255
192.168.0.1
254.254.254.254
172.24.1.210
200.200.200.200
'''
This code works for me, and is as simple as that.
Here I have taken the value of ip and I am trying to match it with regex.
ip="25.255.45.67"
op=re.match('(\d+).(\d+).(\d+).(\d+)',ip)
if ((int(op.group(1))<=255) and (int(op.group(2))<=255) and int(op.group(3))<=255) and (int(op.group(4))<=255)):
print("valid ip")
else:
print("Not valid")
Above condition checks if the value exceeds 255 for all the 4 octets then it is not a valid. But before applying the condition we have to convert them into integer since the value is in a string.
group(0) prints the matched output, Whereas group(1) prints the first matched value and here it is "25" and so on.
'''
/^(?:(25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)\.){3}(?1)$/m
Demo
I managed to construct a regex from all other answers.
(25[0-5]|2[0-4][0-9]|[1][0-9][0-9]|[1-9][0-9]|[0-9]?)(\.(25[0-5]|2[0-4][0-9]|[1][0-9][0-9]|[1-9][0-9]|[0-9]?)){3}
This is a little longer than some but this is what I use to match IPv4 addresses. Simple with no compromises.
^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$
For number from 0 to 255 I use this regex:
(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))
Above regex will match integer number from 0 to 255, but not match 256.
So for IPv4 I use this regex:
^(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})$
It is in this structure: ^(N)((\.(N)){3})$ where N is the regex used to match number from 0 to 255.
This regex will match IP like below:
0.0.0.0
192.168.1.2
but not those below:
10.1.0.256
1.2.3.
127.0.1-2.3
For IPv4 CIDR (Classless Inter-Domain Routing) I use this regex:
^(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})\/(([0-9])|([12][0-9])|(3[0-2]))$
It is in this structure: ^(N)((\.(N)){3})\/M$ where N is the regex used to match number from 0 to 255, and M is the regex used to match number from 0 to 32.
This regex will match CIDR like below:
0.0.0.0/0
192.168.1.2/32
but not those below:
10.1.0.256/16
1.2.3./24
127.0.0.1/33
And for list of IPv4 CIDR like "10.0.0.0/16", "192.168.1.1/32" I use this regex:
^("(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})\/(([0-9])|([12][0-9])|(3[0-2]))")((,([ ]*)("(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})\/(([0-9])|([12][0-9])|(3[0-2]))"))*)$
It is in this structure: ^(“C”)((,([ ]*)(“C”))*)$ where C is the regex used to match CIDR (like 0.0.0.0/0).
This regex will match list of CIDR like below:
“10.0.0.0/16”,”192.168.1.2/32”, “1.2.3.4/32”
but not those below:
“10.0.0.0/16” 192.168.1.2/32 “1.2.3.4/32”
Maybe it might get shorter but for me it is easy to understand so fine by me.
Hope it helps!
IPv4 address is a very complicated thing.
Note: Indentation and lining are only for illustration purposes and do not exist in the real RegEx.
\b(
((
(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])
|
0[Xx]0*[0-9A-Fa-f]{1,2}
|
0+[1-3]?[0-9]{1,2}
)\.){1,3}
(
(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])
|
0[Xx]0*[0-9A-Fa-f]{1,2}
|
0+[1-3]?[0-9]{1,2}
)
|
(
[1-3][0-9]{1,9}
|
[1-9][0-9]{,8}
|
(4([0-1][0-9]{8}
|2([0-8][0-9]{7}
|9([0-3][0-9]{6}
|4([0-8][0-9]{5}
|9([0-5][0-9]{4}
|6([0-6][0-9]{3}
|7([0-1][0-9]{2}
|2([0-8][0-9]{1}
|9([0-5]
))))))))))
)
|
0[Xx]0*[0-9A-Fa-f]{1,8}
|
0+[1-3]?[0-7]{,10}
)\b
These IPv4 addresses are validated by the above RegEx.
127.0.0.1
2130706433
0x7F000001
017700000001
0x7F.0.0.01 # Mixed hex/dec/oct
000000000017700000001 # Have as many leading zeros as you want
0x0000000000007F000001 # Same as above
127.1
127.0.1
These are rejected.
256.0.0.1
192.168.1.099 # 099 is not a valid number
4294967296 # UINT32_MAX + 1
0x100000000
020000000000
(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2}))\.){3}(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2})))
Test to find matches in text,
https://regex101.com/r/9CcMEN/2
Following are the rules defining the valid combinations in each number of an IP address:
Any one- or two-digit number.
Any three-digit number beginning with 1.
Any three-digit number beginning with 2 if the second digit is 0
through 4.
Any three-digit number beginning with 25 if the third digit is 0
through 5.
Let'start with (((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2}))\.), a set of four nested subexpressions, and we’ll look at them in reverse order. (\d{1,2}) matches any one- or two-digit number or numbers 0 through 99. (1\d{2}) matches any three-digit number starting with 1 (1 followed by any two digits), or numbers 100 through 199. (2[0-4]\d) matches numbers 200 through 249. (25[0-5]) matches numbers 250 through 255. Each of these subexpressions is enclosed within another subexpression with an | between each (so that one of the four subexpressions has to match, not all). After the range of numbers comes \. to match ., and then the entire series (all the number options plus \.) is enclosed into yet another subexpression and repeated three times using {3}. Finally, the range of numbers is repeated (this time without the trailing \.) to match the final IP address number. By restricting each of the four numbers to values between 0 and 255, this pattern can indeed match valid IP addresses and reject invalid addresses.
Excerpt From: Ben Forta. “Learning Regular Expressions.”
If neither a character is wanted at the beginning of IP address nor at the end, ^ and $ metacharacters ought to be used, respectively.
^(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2}))\.){3}(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2})))$
Test to find matches in text,
https://regex101.com/r/uAP31A/1
Valid regex for IPV4 address for Java
^((\\d|[1-9]\\d|[0-1]\\d{2}|2[0-4]\\d|25[0-5])[\\.]){3}(\\d|[1-9]\\d|[0-1]\\d{2}|2[0-4]\\d|25[0-5])$
Find a valid ip address in the text is a very difficult problem
I have a regexp, that match (extract) valid ip addresses from strings in text files.
my regexp
\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9])\.)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){2}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\b
\b word boundary
(?: - means start non capturing group
^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9])\.) - string must start with first right octet with dot char
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9]) - find first right octet - (firt octet can not start with - 0)
(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){2} - find next right two octets with dot string
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\b - string must end with right fourth octet (now zero char is allowed)
But this ip regexp has a minority false positive matches:
https://regexr.com/69dk7
Find or extract valid ip address from text file with only regexp is impossible. Without checking another conditions you always get false positive matches.
Solution
I write one liner perl for extract ip addresses from text files. It has this conditions:
when the ip address is at the beginning of the line, the next char is one or multiple whitespace char (space, tab, new line...)
when ip address is at end of line, the new line is next char and before ip address is one or multiple whitespace chars
in middle of text - before and after ip address is one or multiple whitespace chars
The consequence is that perl not match strings like https://84.25.74.125 and another URI strings. Or ip addres at the end of line with dot char at the end. But it find any valid ip address in the text.
perl one liner solution:
$ cat ip.txt | perl -lane 'use warnings; use strict; for my $i (#F){if ($i =~/^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[1-9])\.)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){2}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/) { print $i; } }'
36.42.84.233
158.22.45.0
36.84.84.233
12.0.5.4
1.25.45.36
255.3.6.5
4.255.2.1
127.0.0.1
127.0.0.5
126.0.0.1
testing text file:
$ cat ip.txt
36.42.84.233 stop 158.22.45.0 and 56.32.58.2.
25.36.84.84abc and abc2.4.8.2 is error.
1.2.3.4_
But false positive is 2.2.2.2.2.2.2.2 or 1.1.1.1.1
http://23.54.212.1:80
https://89.35.248.1/abc
36.84.84.233 was 25.36.58.4/abc/xyz&158.133.26.4&another_var
and 42.27.0.1:8333 in http://212.158.45.2:26
0.25.14.15 ip can not start with zero
2.3.0
abc 12.0.5.4
1.25.45.36
12.05.2.5
256.1.2.5
255.3.6.5
4.255.2.1
4.256.5.6
127.0.0.1 is localhost.
this ip 127.0.0.5 is not localhost
126.0.0.1
Appendix
For people from another planets for whom the strings 2130706433, 127.1, 24.005.04.52 is a valid ip address I have a message: Try to find a solution yourself!!!
Considering some variants suggested, \d and \b may not be supported. Hence, just in case:
IPv4 address
^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)$
Test: https://debuggex.com/r/izHiog3KkYztRMSJ
With subnet mask :
^$|([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\
.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\
.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\
.([01]?\\d\\d?|2[0-4]\\d|25[0-5])
((/([01]?\\d\\d?|2[0-4]\\d|25[0-5]))?)$
I tried to make it a bit simpler and shorter.
^(([01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}([01]?\d{1,2}|2[0-4]\d|25[0-5])$
If you are looking for java/kotlin:
^(([01]?\\d{1,2}|2[0-4]\\d|25[0-5])\\.){3}([01]?\\d{1,2}|2[0-4]\\d|25[0-5])$
If someone wants to know how it works here is the explanation. It's really so simple. Just give it a try :p :
1. ^.....$: '^' is the starting and '$' is the ending.
2. (): These are called a group. You can think of like "if" condition groups.
3. |: 'Or' condition - as same as most of the programming languages.
4. [01]?\d{1,2}: '[01]' indicates one of the number between 0 and 1. '?' means '[01]' is optional. '\d' is for any digit between 0-9 and '{1,2}' indicates the length can be between 1 and 2. So here the number can be 0-199.
5. 2[0-4]\d: '2' is just plain 2. '[0-4]' means a number between 0 to 4. '\d' is for any digit between 0-9. So here the number can be 200-249.
6. 25[0-5]: '25' is just plain 25. '[0-5]' means a number between 0 to 5. So here the number can be 250-255.
7. \.: It's just plan '.'(dot) for separating the numbers.
8. {3}: It means the exact 3 repetition of the previous group inside '()'.
9. ([01]?\d{1,2}|2[0-4]\d|25[0-5]): Totally same as point 2-6
Mathematically it is like:
(0-199 OR 200-249 OR 250-255).{Repeat exactly 3 times}(0-199 OR 200-249 OR 250-255)
So, as you can see normally this is the pattern for the IP addresses. I hope it helps to understand Regular Expression a bit. :p
I tried to make it a bit simpler and shorter.
^(([01]?\d{1,2}|2[0-4]\d|25[0-5]).){3}([01]?\d{1,2}|2[0-4]\d|25[0-5])$
If you are looking for java/kotlin:
^(([01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}([01]?\d{1,2}|2[0-4]\d|25[0-5])$
If someone wants to know how it works here is the explanation. It's really so simple. Just give it a try :p :
1. ^.....$: '^' is the starting and '$' is the ending.
2. (): These are called a group. You can think of like "if" condition groups.
3. |: 'Or' condition - as same as most of the programming languages.
4. [01]?\d{1,2}: '[01]' indicates one of the number between 0 and 1. '?' means '[01]' is optional. '\d' is for any digit between 0-9 and '{1,2}' indicates the length can be between 1 and 2. So here the number can be 0-199.
5. 2[0-4]\d: '2' is just plain 2. '[0-4]' means a number between 0 to 4. '\d' is for any digit between 0-9. So here the number can be 200-249.
6. 25[0-5]: '25' is just plain 25. '[0-5]' means a number between 0 to 5. So here the number can be 250-255.
7. \.: It's just plan '.'(dot) for separating the numbers.
8. {3}: It means the exact 3 repetition of the previous group inside '()'.
9. ([01]?\d{1,2}|2[0-4]\d|25[0-5]): Totally same as point 2-6
Mathematically it is like:
(0-199 OR 200-249 OR 250-255).{Repeat exactly 3 times}(0-199 OR 200-249 OR 250-255)
So, as you can see normally this is the pattern for the IP addresses. I hope it helps to understand Regular Expression a bit. :p
To validate any IP address in the valid range 0.0.0.0 to 255.255.255.255 can be written in very simple form as below.
((1?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\.){3}(1?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])
const char*ipv4_regexp = "\\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\."
"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\."
"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\."
"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b";
I adapted the regular expression taken from JGsoft RegexBuddy library to C language (regcomp/regexec) and I found out it works but there's a little problem in some OS like Linux.
That regular expression accepts ipv4 address like 192.168.100.009 where 009 in Linux is considered an octal value so the address is not the one you thought.
I changed that regular expression as follow:
const char* ipv4_regex = "\\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\."
"(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\."
"(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\."
"(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\b";
using that regular expressione now 192.168.100.009 is not a valid ipv4 address while 192.168.100.9 is ok.
I modified a regular expression for multicast address too and it is the following:
const char* mcast_ipv4_regex = "\\b(22[4-9]|23[0-9])\\."
"(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\."
"(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]?)\\."
"(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\\b";
I think you have to adapt the regular expression to the language you're using to develop your application
I put an example in java:
package utility;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NetworkUtility {
private static String ipv4RegExp = "\\b(?:(?:25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d?)\\.){3}(?:25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d?)\\b";
private static String ipv4MulticastRegExp = "2(?:2[4-9]|3\\d)(?:\\.(?:25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]\\d?|0)){3}";
public NetworkUtility() {
}
public static boolean isIpv4Address(String address) {
Pattern pattern = Pattern.compile(ipv4RegExp);
Matcher matcher = pattern.matcher(address);
return matcher.matches();
}
public static boolean isIpv4MulticastAddress(String address) {
Pattern pattern = Pattern.compile(ipv4MulticastRegExp);
Matcher matcher = pattern.matcher(address);
return matcher.matches();
}
}
-bash-3.2$ echo "191.191.191.39" | egrep
'(^|[^0-9])((2([6-9]|5[0-5]?|[0-4][0-9]?)?|1([0-9][0-9]?)?|[3-9][0-9]?|0)\.{3}
(2([6-9]|5[0-5]?|[0-4][0-9]?)?|1([0-9][0-9]?)?|[3-9][0-9]?|0)($|[^0-9])'
>> 191.191.191.39
(This is a DFA that matches the entire addr space (including broadcasts, etc.) an nothing else.
I think this one is the shortest.
^(([01]?\d\d?|2[0-4]\d|25[0-5]).){3}([01]?\d\d?|2[0-4]\d|25[0-5])$
I found this sample very useful, furthermore it allows different ipv4 notations.
sample code using python:
def is_valid_ipv4(ip4):
"""Validates IPv4 addresses.
"""
import re
pattern = re.compile(r"""
^
(?:
# Dotted variants:
(?:
# Decimal 1-255 (no leading 0's)
[3-9]\d?|2(?:5[0-5]|[0-4]?\d)?|1\d{0,2}
|
0x0*[0-9a-f]{1,2} # Hexadecimal 0x0 - 0xFF (possible leading 0's)
|
0+[1-3]?[0-7]{0,2} # Octal 0 - 0377 (possible leading 0's)
)
(?: # Repeat 0-3 times, separated by a dot
\.
(?:
[3-9]\d?|2(?:5[0-5]|[0-4]?\d)?|1\d{0,2}
|
0x0*[0-9a-f]{1,2}
|
0+[1-3]?[0-7]{0,2}
)
){0,3}
|
0x0*[0-9a-f]{1,8} # Hexadecimal notation, 0x0 - 0xffffffff
|
0+[0-3]?[0-7]{0,10} # Octal notation, 0 - 037777777777
|
# Decimal notation, 1-4294967295:
429496729[0-5]|42949672[0-8]\d|4294967[01]\d\d|429496[0-6]\d{3}|
42949[0-5]\d{4}|4294[0-8]\d{5}|429[0-3]\d{6}|42[0-8]\d{7}|
4[01]\d{8}|[1-3]\d{0,9}|[4-9]\d{0,8}
)
$
""", re.VERBOSE | re.IGNORECASE)
return pattern.match(ip4) <> None
((\.|^)(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0$)){4}
This regex will not accept
08.8.8.8 or 8.08.8.8 or 8.8.08.8 or 8.8.8.08
Finds a valid IP addresses as long as the IP is wrapped around any character other than digits (behind or ahead the IP). 4 Backreferences created: $+{first}.$+{second}.$+{third}.$+{forth}
Find String:
#any valid IP address
(?<IP>(?<![\d])(?<first>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<second>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<third>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<forth>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))(?![\d]))
#only valid private IP address RFC1918
(?<IP>(?<![\d])(:?(:?(?<first>10)[\.](?<second>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5])))|(:?(?<first>172)[\.](?<second>(:?1[6-9])|(:?2[0-9])|(:?3[0-1])))|(:?(?<first>192)[\.](?<second>168)))[\.](?<third>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<forth>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))(?![\d]))
Notepad++ Replace String Option 1: Replaces the whole IP (NO Change):
$+{IP}
Notepad++ Replace String Option 2: Replaces the whole IP octect by octect (NO Change)
$+{first}.$+{second}.$+{third}.$+{forth}
Notepad++ Replace String Option 3: Replaces the whole IP octect by octect (replace 3rd octect value with 0)
$+{first}.$+{second}.0.$+{forth}
NOTE: The above will match any valid IP including 255.255.255.255 for example and change it to 255.255.0.255 which is wrong and not very useful of course.
Replacing portion of each octect with an actual value however you can build your own find and replace which is actual useful to ammend IPs in text files:
for example replace the first octect group of the original Find regex above:
(?<first>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))
with
(?<first>10)
and
(?<second>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))
with
(?<second>216)
and you are now matching addresses starting with first octect 192 only
Find on notepad++:
(?<IP>(?<![\d])(?<first>10)[\.](?<second>216)[\.](?<third>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<forth>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))(?![\d]))
You could still perform Replace using back-referece groups in the exact same fashion as before.
You can get an idea of how the above matched below:
cat ipv4_validation_test.txt
Full Match:
0.0.0.1
12.108.1.34
192.168.1.1
10.249.24.212
10.216.1.212
192.168.1.255
255.255.255.255
0.0.0.0
Partial Match (IP Extraction from line)
30.168.1.0.1
-1.2.3.4
sfds10.216.24.23kgfd
da11.15.112.255adfdsfds
sfds10.216.24.23kgfd
NO Match
1.1.1.01
3...3
127.1.
192.168.1..
192.168.1.256
da11.15.112.2554adfdsfds
da311.15.112.255adfdsfds
Using grep you can see the results below:
From grep:
grep -oP '(?<IP>(?<![\d])(?<first>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<second>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<third>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<forth>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))(?![\d]))' ipv4_validation_test.txt
0.0.0.1
12.108.1.34
192.168.1.1
10.249.24.212
10.216.1.212
192.168.1.255
255.255.255.255
0.0.0.0
30.168.1.0
1.2.3.4
10.216.24.23
11.15.112.255
10.216.24.23
grep -P '(?<IP>(?<![\d])(?<first>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<second>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<third>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<forth>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))(?![\d]))' ipv4_validation_test.txt
0.0.0.1
12.108.1.34
192.168.1.1
10.249.24.212
10.216.1.212
192.168.1.255
255.255.255.255
0.0.0.0
30.168.1.0.1
-1.2.3.4
sfds10.216.24.23kgfd
da11.15.112.255adfdsfds
sfds10.216.24.23kgfd
#matching ip addresses starting with 10.216
grep -oP '(?<IP>(?<![\d])(?<first>10)[\.](?<second>216)[\.](?<third>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))[\.](?<forth>(:?\d)|(:?[1-9]\d)|(:?1\d{2})|(:?2[0-4]\d)|(:?25[0-5]))(?![\d]))' ipv4_validation_test.txt
10.216.1.212
10.216.24.23
10.216.24.23
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\\.)){3}+((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))$
Above will be regex for the ip address like:
221.234.000.112
also for 221.234.0.112, 221.24.03.112, 221.234.0.1
You can imagine all kind of address as above
I would use PCRE and the define keyword:
/^
((?&byte))\.((?&byte))\.((?&byte))\.((?&byte))$
(?(DEFINE)
(?<byte>25[0-5]|2[0-4]\d|[01]?\d\d?))
/gmx
Demo: https://regex101.com/r/IB7j48/2
The reason of this is to avoid repeating the (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) pattern four times. Other solutions such as the one below work well, but it does not capture each group as it would be requested by many.
/^((\d+?)(\.|$)){4}/
The only other way to have 4 capture groups is to repeat the pattern four times:
/^(?<one>\d+)\.(?<two>\d+)\.(?<three>\d+)\.(?<four>\d+)$/
Capturing a ipv4 in perl is therefore very easy
$ echo "Hey this is my IP address 138.131.254.8, bye!" | \
perl -ne 'print "[$1, $2, $3, $4]" if \
/\b((?&byte))\.((?&byte))\.((?&byte))\.((?&byte))
(?(DEFINE)
\b(?<byte>25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))
/x'
[138, 131, 254, 8]