awk: match regex with or operator of two shell variables - regex

I am trying to use different stuff with awk.
First, the use of some shell variables, which here shows how to use them.
Second, how to use a shell variable to match a pattern, which here points to use ~ operator.
Finally, I want to use some kind of or operator to match two shell variables.
Putting all together in foo.sh:
#!/bin/bash
START_TEXT="My start text"
END_TEXT="My end text"
awk -v start=$START_TEXT -v end=$END_TEXT '$0 ~ start || $0 ~ end { print $2 }' myfile
Which fails to run:
$ ./foo.sh
awk: fatal: cannot open file `text' for reading (No such file or directory)
So I think the OR-operator (||) does not work well with regex ~ operator.
I was guessing I may need to do the OR-thing inside the regex.
So I tried these two:
awk -v start=$START_TEXT -v end=$END_TEXT '$0~/start|end/ { print $2 }' myfile
awk -v start=$START_TEXT -v end=$END_TEXT '$0~start|end { print $2 }' myfile
With same failed result.
And even this thing fails...
awk -v start=$START_TEXT '$0~start { print $2 }' myfile
So I am doing something really wrong...
Any hints how to achieve this?

You can do the regex OR like this:
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~ start "|" end { print $2 }' myfile
awk knows the parameter passed to ~ operator is a regex, so we can just process it by insert the | or operator between two strings.
Also there's another way to pass variables into awk, like this:
awk '$0~ start "|" end { print $2 }' start="$START_TEXT" end="$END_TEXT" myfile
This will increase conciseness. But since it's less intuitive, so use it with caution.

Well, it seems #jxc pointed my problem in the comments: the shell variables need to be quoted.
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~start || $0~end { print $2 }' myfile
That made it work!

Related

How do I use awk to match multiple variable patterns?

I want to use awk to match multiple variable patterns. Here is what I have so far:
match=`awk -v "$var1\|$var2\|$var3" 'BEGIN{FS=":"; OFS="-"}
$2 ~ {print}' $file`
Any help is appreciated.
You need to pass 3 variable separately using awk -v var1=val1 syntax and then use alternation inside the awk regex as this one:
match=$(awk -v v1="$var1" -v v2="$var2" -v v3="$var3" 'BEGIN{FS=":"; OFS="-"}
$2 ~ v1 "|" v2 "|" v3' "$file")

How can I use bash variable in awk with regexp?

I have a file like this (this is sample):
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
21.23.51.22|212.152.22.12|71.13.54.12|8.8.8.8
...
I have iplist.txt like this:
71.13.55.
12.33.23.
8.8.
4.2.
...
I need to grep if 3. column starts like in iplist.txt.
Like this:
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
I tried:
for ip in $(cat iplist.txt); do
awk -v var="$ip" -F '|' '{if ($3 ~ /^$var/) print $0;}' text.txt
done
But bash variable does not work in /^ / regex block. How can I do that?
First, you can use a concatenation of strings for the regular expression, it doesn't have to be a regex block. You can say:
'{if ($3 ~ "^" var) print $0;}'
Second, note above that you don't use a $ with variables inside awk. $ is only used to refer to fields by number (as in $3, or $somevar where somevar has a field number as its value).
Third, you can do everything in awk in which case you can avoid the shell loop and don't need the var:
awk -F'|' 'NR==FNR {a["^" $0]; next} { for (i in a) if ($3 ~ i) {print;next} }' iplist.txt r.txt
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
EDIT
As rightly pointed out in the comments, the .s in the patterns will match any character, not just a literal .. Thus we need to escape them before doing the match:
awk -F'|' 'NR==FNR {gsub(/\./,"\\."); a["^" $0]; next} { for (i in a) if ($3 ~ i) print }' iplist.txt r.txt
I'm assuming that you only want to output a given line once, even if it matches multiple patterns from iplist.txt. If you want to output a line multiple times for multiple matches (as your version would have done), remove the next from {print;next}.
Use var directly, instead of in /^$var/ ( adding ^ to the variable first):
awk -v var="^$ip" -F '|' '$3 ~ var' text.txt
By the way, the default action for a true condition is to print the current record, so, {if (test) {print $0}} can often be contracted to just test.
Here is a way with bash, sed and grep, it's straight forward and I think may be a bit cleaner than awk in this case:
IFS=$(echo -en "\n\b") && for ip in $(sed 's/\./\\&/g' iplist.txt); do
grep "^[^|]*|[^|]*|${ip}" r.txt
done

Regular expression to search column in text file

I am having trouble getting a regular expression that will search for an input term in the specified column. If the term is found in that column, then it needs to output that whole line.
These are my variables:
sreg = search word #Example: Adam
file = text file #Example: Contacts.txt
sfield = column number #Example: 1
the text file is in this format with a space being the field seperator, with many contact entries:
First Last Email Phone Category
Adam aster junfmr# 8473847548 word
Jeff Williams 43wadsfddf# 940342221995 friend
JOhn smart qwer#qwer 999999393 enemy
yooun yeall adada 111223123 other
zefir sentr jjdirutk#jd 8847394578 other
I've tried with no success:
grep "$sreg" "$file" | cut -d " " -f"$sfield"-"$sfield"
awk -F, '{ if ($sreg == $sfield) print $0 }' "$file"
awk -v s="$sreg" -v c="$sfield" '$c == s { print $0 }' "$file"
Thanks for any help!
awk may be the best solution for this:
awk -v field="$field" -v name="$name" '$field==name' "$file"
This checks if the field number $field has the value $name. If so, awk automatically prints the full line that contains it.
For example:
$ field=1
$ name="Adam"
$ file="your_file"
$ awk -v field="$field" -v name="$name" '$field==name' "$file"
Adam aster junfmr# 8473847548 word
As you can see, we give the parameters using -v var="$bash_var", so that you can use them inside awk.
Also, the space is the field separator, so you don't need to specify it since it is the default.
This works for me:
awk -v f="$sfield" -v reg="$sreg" '{if ($f ~ reg) {print $0}}' "$file"
Major problem is that you need an indirection from $sfield (ex, "1") to $($sfield) (ex, $1).
I tried using backtricks `, and also using ${!sfield}, but they don't work in awk, as awk does not accept this. Finally I found the way of passing variable into awk, converting to awk internal variabls (using -v).
Within awk, I found you can not even access variables outside. So I had to pass $sreg as well.
Update: I think using "~" instead of "==" is better because the original requirement said matchi==ng a regular expression.
For example,
sreg=Ad

Using a user-set variable in an awk command

I'm trying to use awk with a user-defined variable ($EVENT, where $EVENT is a filename and also a column in a textfile) in the if condition, but it doesn't seem to recognize the variable. I've tried with various combinations of ', ", { and ( but nothing seem to work.
EVENT=19971010_1516.txt
awk '{if ($2=="$EVENT") print $3,$4,$8}' FILENAME.txt > output.txt
It is possible to use user-defined variables in awk commands? If so, how does the syntax work?
you cannot use $FOO directly in your code, because awk will think it is column FOO. (FOO is variable). but your FOO is empty. to use shell var, use -v like:
awk -v event="$EVENT" '{print event}' file
You can do:
awk '$2==event {print $3,$4,$8}' event="$EVENT" FILENAME.txt > output.txt
awk -v event="$EVENT" '$2==event {print $3,$4,$8}' FILENAME.txt > output.txt
See this post for more info:
How do I use shell variables in an awk script?
If you want to include the variable in the awk script literally then you need to enclose the script in double quotes (single quotes do not expand variables). So something like awk '{if ($2=="'"$EVENT"'") print $3,$4,$8}' FILENAME.txt > output.txt'. Which uses single quotes on the rest of the awk script to avoid needing to escape the $ characters but then uses double quotes for the event variable.
That being said you almost certainly want to expose the shell variable to awk as an awk variable which means you want to use the -v flag to awk. So something like awk -vevent="$EVENT" '{if ($2==event) print $3,$4,$8}' FILENAME.txt > output.txt. (Alternatively you could use something like awk '{if ($2==event) print $3,$4,$8}' event="$EVENT" FILENAME.txt > output.txt.)
You could also simplify your awk body a bit by using '$2 == event {print $3,$4,$8}' and let patterns do what they are supposed to do.

Passing the value of NR to a variable in AWK

Can we pass NR to a variable in awk ?
I have a script which goes like this :
awk -v { blah blah..
..........
count--
print count
}
if (count==0)
{print "The end of function"
print NR
exit
}
This is the awk part of the code . I want to pass the NR to var2 as :
sed -n ''"$var1"','"$var2"'p'
Which has to be reused several times !
Thanks for your replies .
If you only want to print a certain subset of lines you're almost there. The -v flag is the way to go.
awk -v var1=15 -v var2=25 'NR>=var1 && NR<=var2 {blah blah ...}'
Of course you have to change 15 and 25 to what you need. Observe that variables shoudn't be encapsulated in quotes.
As others have suggested, there are better ways to accomplish the overall goal.
However, in order to answer your specific question:
var2=$(awk 'END {print NR}' inputfile)
and add anything else you may need within the AWK script.
I don't know what you want to achieve with awk, sed and the NR variable. Do you mean the number of lines of the file?
This command gets it:
wc -l infile | sed -e 's/ .*$//'
So, use it with -v switch to awk and use it as you want. Next command will print 10 because infile has ten lines in my computer.
awk -v num_lines=$(wc -l infile | sed -e 's/ .*$//') 'BEGIN { print num_lines }'