How can I grab next Word after Regex in bash? - regex

I am trying to grab the Word from a text file AFTER the IP regex match ($2):
fileName.txt:
IP hostname blah blah blah...
blah blah..
IP hostname blah blah blah...
.
.
.
I want the hostnames for each instance of the IP (which I found with grep regex and stored it in $var). I want to use found hostnames to set to $host and print them out onto a text file with the IPs (which are already done).
I have tried multiple methods from online answers but they all printed blanks.
Thank you!

See BashFAQ #1 for guidance on how best to read from a stream.
#!/bin/bash
# ^^^^ important, not /bin/sh
while read -r -a words; do
# put the array words into $1, $2, $3, etc.
set -- "${words[#]}"
# put $1 -- the first word -- into the variable named "ip"
ip=$1
# remove $1, leaving only hostnames in $1, $2, etc
shift
echo "IP address $ip has the following hostnames:"
for hostname; do # by default, a for loop iterates over $#
echo "- ${hostname}"
done
done < <(grep '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' test_amy_hostrun.txt)

awk to the rescue!
$ awk '/^([0-9]{1,3}\.){3}[0-9]{1,3}/{print $1, $2}'
will give you IP hostname for the lines starting with the matching regex.
if your awk doesn't support regex interval you need the add --re-interval option.

Related

Regex: find elements regardless of order

If I have the string:
geo:FR, host:www.example.com
(In reality the string is more complicated and has more fields.)
And I want to extract the "geo" value and the "host" value, I am facing a problem when the order of the keys change, as in the following:
host:www.example.com, geo:FR
I tried this line:
sed 's/.\*geo:\([^ ]*\).\*host:\([^ ]*\).*/\1,\2/'
But it only works on the first string.
Is there a way to do it in a single regex, and if not, what's the best approach?
I suggest extracting each text you need with a separate sed command:
s="geo:FR, host:www.example.com"
host="$(sed -n 's/.*host:\([^[:space:],]*\).*/\1/p' <<< "$s")"
geo="$(sed -n 's/.*geo:\([^[:space:],]*\).*/\1/p' <<< "$s")"
See the online demo, echo "$host and $geo" prints
www.example.com and FR
for both inputs.
Details
-n suppresses line output and p prints the matches
.* - matches any 0+ chars up the last...
host: - host: substring and then
\([^[:space:],]*\) - captures into Group 1 any 0 or more chars other than whitespace and a comma
.* - the rest of the line.
The result is just the contents of Group 1 (see \1 in the replacement pattern).
Whenever you have tag/name to value pairs in your input I find it best (clearest, simplest, most robust,, easiest to enhance, etc.) to first create an array that contains that mapping (f[] below) and then you can simply access the values by their tags:
$ cat file
geo:FR, host:www.example.com
host:www.example.com, geo:FR
foo:bar, host:www.example.com, stuff:nonsense, badgeo:uhoh, geo:FR, nastygeo:wahwahwah
$ cat tst.awk
BEGIN { FS=":|, *"; OFS="," }
{
for (i=1; i<=NF; i+=2) {
f[$i] = $(i+1)
}
print f["geo"], f["host"]
}
$ awk -f tst.awk file
FR,www.example.com
FR,www.example.com
FR,www.example.com
The above will work using any awk in any shell on every UNIX box.
Here I've used GNU Awk to convert your delimited key:value pairs to valid shell assignment. With Bash, you can load these assignments into your current shell using <(process substitution):
# source the file descriptor generated by proc sub
. < <(
# use comma-space as field separator, literal apostrophe as variable q
awk -F', ' -vq=\' '
# change every foo:bar in line to foo='bar' on its own line
{for(f=1;f<=NF;f++) print gensub(/:(.*)/, "=" q "\\1" q, 1, $f)}
# use here-string to load text; remove everything but first quote to use standard input
' <<< 'host:www.example.com, geo:FR'
)

gawk regex to find any record having characters other then the specified by character class in regex pattern

I have list of email addresses in a text file. I have a pattern having character classes that specifies what characters are allowed in the email addresses.
Now from that input file, I want to only search the email addresses that has the characters other than the allowed ones.
I am trying to write a gawk for the same, but not able to get it to work properly.
Here is the gawk that I am trying:
gawk -F "," ' $2!~/[[:alnum:]#\.]]/ { print "has invalid chars" }' emails.csv
The problem I am facing is that the above gawk command only matches the records that has NONE of the alphanumeric, # and . (dot) in them. But what I am looking for is the records that are having the allowed characters but along with them the not-allowed ones as well.
For example, the above command would find
"_-()&(()%"
as the above only has the characters not in regex pattern, but will not find
"abc-123#xyz,com"
. as it also has the characters that are present in specified character classes in regex pattern.
How about several tests together: contains an alnum and an # and a dot and an invalid character
$2 ~ /[[:alnum:]]/ && $2 ~ /#/ && $2 ~ /\./ && $2 ~ /[^[:alnum:]#.]/
Your regex is wrong here:
/[[:alnum:]#\.]]/
It should be:
/[[:alnum:]#.]/
Not removal of an extra ] fron end.
Test Case:
# regex with extra ]
awk -F "," '{print ($2 !~ /[[:alnum:]#.]]/)}' <<< 'abc,ab#email.com'
1
# correct regex
awk -F "," '{print ($2 !~ /[[:alnum:]#.]/)}' <<< 'abc,ab#email.com'
0
Do you really care whether the string has a valid character? If not (and it seems like you don't), the simple solution is
$2 ~ /[^[:alnum:]#.]/{ print "has invalid chars" }
That won't trigger on an empty string, so you might want to add a test for that case.
Your question would REALLY benefit from some concise, testable sample input and expected output as right now we're all guessing at what you want but maybe this does it?
awk -F, '{r=$2} gsub(/[[:alnum:]#.]/,"",r) && (r!="") { print "has invalid chars" }' emails.csv
e.g. using the 2 input examples you provided:
$ cat file
_-()&(()%
abc-123#xyz,com
$ awk '{r=$0} gsub(/[[:alnum:]#.]/,"",r) && (r!="") { print $0, "has invalid chars" }' file
abc-123#xyz,com has invalid chars
There are more accurate email regexps btw, e.g.:
\<[[:alnum:]._%+-]+#[[:alnum:]_.-]+\.[[:alpha:]]{2,}\>
which is a gawk-specific (for word delimiters \< and \>) modification of the one described at http://www.regular-expressions.info/email.html after updating to use POSIX character classes.
If you are trying to validate email addresses do not use the regexp you started with as it will declare # and 7 to each be valid email addresses.
See also How to validate an email address using a regular expression? for more email regexp details.

how to strip IP address of trailing string

Given an IP address 192.168.10.21.somebody.com.br
I need to extract just 192.168.10.21 I tried CUT below, it gives "cut: invalid byte or field list".
cut -d'.' -f-4
$ echo "192.168.10.21.somebody.com.br" | cut -d'.' -f -4
192.168.10.21
works for me!
All three of the following assume you have the domain name stored in a parameter
dom_name=192.168.10.21.somebody.com.br
More efficient than using cut, assuming the first label to remove doesn't start with a number:
echo "${dom_name%%.[[:alpha:]]*}"
If the first label could start with a number, these are still more efficient than cut, but uglier and much longer to type:
# Match one more dot than necessary to shorten the regular expression;
# then trim that dot when echoing
[[ $dn =~ (([0-9]+\.){4}) ]]
echo "${BASH_REMATCH[1]%.}"
or
# Split the string into an array, then output the
# first four fields rejoined by dots.
IFS=. read -a labels <<< "$dom_name"
(IFS=.; echo "${labels[*]:0:4}")

file comparision using Awk in linux

I have two files
File A.txt (Groupname; Groupid)
wheel:1
www:2
ftpshare:3
others:4
File B.txt (username:UserID:Groupid)
pi:1:1
useradmin:2:3
usertwo:3:3
trout:4:3
apachecri:5:2
guestthree:6:4
I need to create a output where it shows username:userID: Groupname like below
pi:1:wheel
useradmin:2:ftpshare
(and so on)
This needs to be done using awk for a unix class. After spending countless hrs trying to figure it out here is what I came up with.
awk -F ':' 'NR==FNR{ if ($2==[a-z]) a[$1] = $4;next} NF{ print $1, $2, $4}' fileA.txt fileB.txt
OR
awk -F, 'NR==FNR{ a[$2]=$2$1;next } NF{ print $1, $2 ((a[$2]==$2$3)?",ok":",nok") }' FileA.txt FileB.txt
can someone help me figure this out to get the right input and explain it to me what I am doing wrong.
You can use awk:
awk 'BEGIN{FS=OFS=":"} FNR==NR{a[$2]=$1; next} $3 in a{print $1, $2, a[$3]}' a.txt b.txt
pi:1:wheel
useradmin:2:ftpshare
usertwo:3:ftpshare
trout:4:ftpshare
apachecri:5:www
guestthree:6:others
How it works:
BEGIN{FS=OFS=":"} - Make input and output field separator as colon
FNR==NR - Execute this block for fileA only
{a[$2]=$1; next} - Create an associative array a with key as $2 and value as $1 and then skip to next record
$3 in a - Execute this block for 2nd file if $3 is found in array a
print $1, $2, a[$3] Print field1, field2 and a[field3]
I know you said you want to use awk, but you should also consider the standard tool designed for a task like this, namely join. Here is one way you could apply it:
join -o '2.1 2.2 1.1' -t: -1 2 -2 3 <(sort -t: -k2,2n fileA.txt) \
<(sort -t: -k3,3n fileB.txt)
Because the input to join needs to be sorted on the join-field, this method leaves the output unordered, if this is important use anubhava's answer.
Output in this case:
pi:1:wheel
apachecri:5:www
trout:4:ftpshare
useradmin:2:ftpshare
usertwo:3:ftpshare
guestthree:6:others

Replace perticular IP (Match exact IP) Address with new IP in shell script

I have a file which contains many id adress (Both IPV4 and IPV6). I want to use sed command to replace all the occurances of the old IP to new IP. But i am facing a problem ex:
old IPs like below
2.2.2.2 -IPV4
2:2:2.2.2.2 - IPV6: Note ':'
In ksh, I am using
oldIP=2.2.2.2
newIP=3.3.3.3
sed -i 's/'$oldIP'/'$newIP'/g' filename.
But this is replacing both 2.2.2.2 and 2:2:2.2.2.2 because '.' in oldIP variable is used as regular expression.
Can any one tell how to match exact IP of a file in a scipt?
Input file: a.txt - contains oldIP,newIP
1.1.1.1,9.9.9.9
0.0.0.0,9.9.9.9
2.2.2.2,9.9.9.9
5:5:5.5.5.5,[9:9:9.9.9.9]
3.3.3.3,9.9.9.9
3:3:3.3.3.3,9:9:9.9.9.9
1:1:3.3.3.3,9:9:9.9.9.9
#!/bin/ksh
ipAddrFile=$1
while read line
do
OLDIFS=$IFS
IFS=","
array=( $line )
IFS=$OLDIFS
if [ "${array[1]}" = "" ]; then
echo "tokenize ip address list fail. Check if proper separator is used."
fi
oldIP=${array[0]}
newIP=${array[1]}
`sed -i 's/'$oldIP'/'$newIP'/g' temp.xml`
if [ $? -ne 0 ]; then
echo "Replacing field value failed"
exit 1
fi
done < $ipAddrFile
First of all dot needs to be escaped in regex otherwise it matches any character. So set your variable like this:
oldIP="2\.2\.2\.2"
newIP="3.3.3.3"
Then you can use this sed:
sed -r "s/(^|[^:])$oldIP([^0-9]|$)/\1$newIP\2/g" input
OR on Mac:
sed -E "s/(^|[^:])$oldIP([^0-9]|$)/\1$newIP\2/g" input