Regex to find valid IP address using awk [duplicate] - regex

This question already has answers here:
Validating IPv4 addresses with regexp
(44 answers)
Closed 3 years ago.
I get all the IP address connected to the network along with strings and name of the network, but I wanted to extract only the IP's using awk regex
I tried :
awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}'
But it prints IP address along with some numbers and date, say
2019-12-13 12
192.168.1.1
123.168.1.12
0.00012
But I want just the IP address.

Could you please try following. Since no samples given so didn't test it.
awk 'match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/){print substr($0,RSTART,RLENGTH)}' Input_file
Why OP's code is not working: Since OP has mentioned . in regex which is matching any character NOT literal character . that's why OP is getting results which are NOT IPs too. In above code it is escaped by doing \. which will let awk know to look for literal character . NOT for any character.

to be honest, I don't have any idea about awk command, but as a good regularexp writer, to extract ip addresses , you can use this optimized exp:
/^([0-9]{0,3}\.){3}[0-9]{1,3}$/g
you can check it here :
IP address Regex test

In terms of regex, the PCRE-compatible expression (?:[12]?\d{1,2}\.){3}[12]?\d{1,2} should meet your needs. It's a simplified version of the more comprehensive IP regexes that can be found as answers on this question, and can be tested with this demo.
Unfortunately, awk is quite limited in its ability, and is not PCRE compatible. I would suggest using perl instead, but if you're insistent on using awk, the following command should work:
awk 'match($0, /[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]/) {print substr($0, RSTART, RLENGTH)}'
This uses awk-compatible regex to match IPs, and is an expanded form of the above regex. It matches and prints out only the IPs it finds, omitting the rest of the line.
Before you edited your question, your original regex was 0-9]+.[0-9]+.[0-9]+.[0-9]+ - the . allowed it to match any character, meaning hyphens, spaces, and numbers were all valid matches. By specifying \. instead, the regex will exactly match the period character.

Something like this ?
$ cat file
172.27.1.256 # invalid ip
2019-12-13 12
192.168.1.1
123.168.1.12
0.00012
299.288.299.333 # invalid ip
$ grep -oE '((1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.){3}((1?[0-9][0-9]?|2[0-4][0-9]|25[0-5]))\s+?$' file
192.168.1.1
123.168.1.12

Related

how to edit a line having IPv4 address using sed command

I need to modify an ntp configuration file by adding some options to the line containing Ip addresses.
I have been trying it for so long using sed command, but no able to modify the line unless i don't know the IP addresses.
Let say, i have few lines as,
server 172.0.0.1
server 10.0.0.1
I need to add iburst option after the ip address.
I have tried command like.. sed -e 's/(\d{1,3}\.\d{1.3}\.\d{1,3}\.\d{1,3})/ \1 iburst/g' ntp_file
or sed -e 's/^server +\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/server \1\.\2\.\3\ iburst/g' ntp_file
but its not modifying the line. Any kind of suggestions would be really appriciated.
The regex you have used as POSIX BRE cannot match the expected strings due to \d shorthand class that sed does not support, the misused dot inside a range quantifier and incorrect escaping of grouping and range quantifier delimiters.
You may use
sed -E -i 's/[0-9]{1,3}(\.[0-9]{1,3}){3}/ & iburst/g' ntp_file
The POSIX ERE (enabled with the -E option) expression means to match
[0-9]{1,3} - one to three digits
(\.[0-9]{1,3}){3} - three occurrences of a dot and one to three digits
The replacement pattern is & iburst where & stands for the whole match.
The g flag replaces all occurrences.

gawk regex to find any record having characters other then the specified by character class in regex pattern

I have list of email addresses in a text file. I have a pattern having character classes that specifies what characters are allowed in the email addresses.
Now from that input file, I want to only search the email addresses that has the characters other than the allowed ones.
I am trying to write a gawk for the same, but not able to get it to work properly.
Here is the gawk that I am trying:
gawk -F "," ' $2!~/[[:alnum:]#\.]]/ { print "has invalid chars" }' emails.csv
The problem I am facing is that the above gawk command only matches the records that has NONE of the alphanumeric, # and . (dot) in them. But what I am looking for is the records that are having the allowed characters but along with them the not-allowed ones as well.
For example, the above command would find
"_-()&(()%"
as the above only has the characters not in regex pattern, but will not find
"abc-123#xyz,com"
. as it also has the characters that are present in specified character classes in regex pattern.
How about several tests together: contains an alnum and an # and a dot and an invalid character
$2 ~ /[[:alnum:]]/ && $2 ~ /#/ && $2 ~ /\./ && $2 ~ /[^[:alnum:]#.]/
Your regex is wrong here:
/[[:alnum:]#\.]]/
It should be:
/[[:alnum:]#.]/
Not removal of an extra ] fron end.
Test Case:
# regex with extra ]
awk -F "," '{print ($2 !~ /[[:alnum:]#.]]/)}' <<< 'abc,ab#email.com'
1
# correct regex
awk -F "," '{print ($2 !~ /[[:alnum:]#.]/)}' <<< 'abc,ab#email.com'
0
Do you really care whether the string has a valid character? If not (and it seems like you don't), the simple solution is
$2 ~ /[^[:alnum:]#.]/{ print "has invalid chars" }
That won't trigger on an empty string, so you might want to add a test for that case.
Your question would REALLY benefit from some concise, testable sample input and expected output as right now we're all guessing at what you want but maybe this does it?
awk -F, '{r=$2} gsub(/[[:alnum:]#.]/,"",r) && (r!="") { print "has invalid chars" }' emails.csv
e.g. using the 2 input examples you provided:
$ cat file
_-()&(()%
abc-123#xyz,com
$ awk '{r=$0} gsub(/[[:alnum:]#.]/,"",r) && (r!="") { print $0, "has invalid chars" }' file
abc-123#xyz,com has invalid chars
There are more accurate email regexps btw, e.g.:
\<[[:alnum:]._%+-]+#[[:alnum:]_.-]+\.[[:alpha:]]{2,}\>
which is a gawk-specific (for word delimiters \< and \>) modification of the one described at http://www.regular-expressions.info/email.html after updating to use POSIX character classes.
If you are trying to validate email addresses do not use the regexp you started with as it will declare # and 7 to each be valid email addresses.
See also How to validate an email address using a regular expression? for more email regexp details.

Regex to Match range of IP addresses

I am wantin to match IP address that are from 10.0-29.x.x, 10.31-39.x.x, and 10.41-253.x.x.
Of the lines below, I want to capture the 3rd line and below.
network 10.40.5.0 0.0.0.255
network 10.255.5.0 0.0.0.255
network 10.23.3.0 0.0.0.255
netowrk 10.273.255.0 0.255.255
So the way it will work, is if there is a match, it will set a flag that the configuration is invalid. I may have 10 invalid lines, or just 1. It doesn't matter.
Regex are not designed to do math.
However, you can try something like [3-4]{1} if you want a 3 or a 4.
For bigger processing you might have to match it first with a general IP regex, then process it with any language.
The core of your problem is a regex that matches these number ranges: 0-29, 31-39, 41-253
An extended regex that matches this is:
^network 10\.([0-9]|1[0-9]|2[0-9]|3[1-9]|4[1-9]|[5-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-3])\.[0-9]+\.[0-9]+
The regex is divided in these steps:
0-9, 10-19, 20-29, 31-39, 41-49, 50-99, 100-199, 200-249, 250-253
A shell script that would work is:
if {
cat input_file |
egrep -q '^network 10.([0-9]|1[0-9]|2[0-9]|3[1-9]|4[1-9]|[5-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-3]).[0-9]+.[0-9]+ '
}
then
echo action if matched
else
echo action if not matched
fi

Multiline sed regex extraction issue: part of buffer matches

I have to extract data from a log, and I'm trying to use sed to extract the data from 3 lines. The log entries (after grepping) look like this:
Tuesday March 11 2014
INBOUND>>>>> 06:22:10:066 Eventid:141004(3)
[SGW-S11/S4]GTPv2C Rx PDU, from 172.9.9.1:10000 to 173.10.10.1:2123 (187)
TEID: 0x00000000, Message type: EGTP_CREATE_SESSION_REQUEST (0x20)
I need to extract the "from IP", the "to IP", and the "Message Type".
This is what I have as of now:
sed -n '1!N; s/^INBOUND>>>>>.*\n.*from \([0-9.]*\).* to \([0-9.]*\).*/\1 \2/p'
When I extend it to the third line, to extract the message type, with:
sed -n '1!N; s/^INBOUND>>>>>.*\n.*from \([0-9.]*\).* to \([0-9.]*\).*\n.*, Message type: \([A-Z_]*\).*/\1 \2/p'
The entire pattern doesn't match.
This doesn't match the string unless there is a line before the INBOUND>>>>> string, which I think should match, since the ^ indicates the start of line. (This isn't really a problem since there is a datestamp, just a curiosity)
Bash Version: GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
Sed Version: GNU sed version 4.1.5
Could you please give me any pointers on this? Thanks in advance.
P.S. The IPs can be IPv4 or IPv6, but I will change the IP regex once this problem's solved.
P.P.S. I need to use a regex i.e. not awk, because there will be other patterns too; this is the first, and I'm having problems :(
Your entire pattern
sed -n '1!N; s/^INBOUND>>>>>.*\n.*from \([0-9.]*\).* to \([0-9.]*\).*\n.*, Message type:\([A-Z_]*\).*/\1 \2/p'
can't match because you're missing a space between Message type: and \([A-Z_]*\)
Are you sure there are no hidden characters before INBOUND (when you omit the first line)?
This one works for me:
sed -r 's/.*from ([0-9.:]*) to ([0-9.:]*).*Message type: ([A-Z_]*).*/\1 \2 \3/'
(note that I used the -r flag so I won't have to escape the brackets)
You can use awk and no regex:
awk -F" |:" '/^INBOUND/ {getline;print $5 RS $8;getline;print $7}' file
172.9.9.1
173.10.10.1
EGTP_CREATE_SESSION_REQUEST
You say this is date out from a grep, it may be incorporated to the awk
Give us all data and how you like to output to be, and we will help you.
awk -F" |:" '/^INBOUND/ {getline;printf "%s %s",$5,$8;getline;print "",$7}' file
172.9.9.1 173.10.10.1 EGTP_CREATE_SESSION_REQUEST

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!
I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.
Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.
Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest
http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt
I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt
This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))