How do I use awk to match multiple variable patterns? - regex

I want to use awk to match multiple variable patterns. Here is what I have so far:
match=`awk -v "$var1\|$var2\|$var3" 'BEGIN{FS=":"; OFS="-"}
$2 ~ {print}' $file`
Any help is appreciated.

You need to pass 3 variable separately using awk -v var1=val1 syntax and then use alternation inside the awk regex as this one:
match=$(awk -v v1="$var1" -v v2="$var2" -v v3="$var3" 'BEGIN{FS=":"; OFS="-"}
$2 ~ v1 "|" v2 "|" v3' "$file")

Related

awk: match regex with or operator of two shell variables

I am trying to use different stuff with awk.
First, the use of some shell variables, which here shows how to use them.
Second, how to use a shell variable to match a pattern, which here points to use ~ operator.
Finally, I want to use some kind of or operator to match two shell variables.
Putting all together in foo.sh:
#!/bin/bash
START_TEXT="My start text"
END_TEXT="My end text"
awk -v start=$START_TEXT -v end=$END_TEXT '$0 ~ start || $0 ~ end { print $2 }' myfile
Which fails to run:
$ ./foo.sh
awk: fatal: cannot open file `text' for reading (No such file or directory)
So I think the OR-operator (||) does not work well with regex ~ operator.
I was guessing I may need to do the OR-thing inside the regex.
So I tried these two:
awk -v start=$START_TEXT -v end=$END_TEXT '$0~/start|end/ { print $2 }' myfile
awk -v start=$START_TEXT -v end=$END_TEXT '$0~start|end { print $2 }' myfile
With same failed result.
And even this thing fails...
awk -v start=$START_TEXT '$0~start { print $2 }' myfile
So I am doing something really wrong...
Any hints how to achieve this?
You can do the regex OR like this:
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~ start "|" end { print $2 }' myfile
awk knows the parameter passed to ~ operator is a regex, so we can just process it by insert the | or operator between two strings.
Also there's another way to pass variables into awk, like this:
awk '$0~ start "|" end { print $2 }' start="$START_TEXT" end="$END_TEXT" myfile
This will increase conciseness. But since it's less intuitive, so use it with caution.
Well, it seems #jxc pointed my problem in the comments: the shell variables need to be quoted.
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~start || $0~end { print $2 }' myfile
That made it work!

Capture strings from several sets of quotes

been looking for a straight answer to this but not found anything within SO or wider searching that answers this simple question:
I have a string of quoted values, ip addresses in this case, that I want to extract individually to use as values elsewhere. I am intending to do this with sed and regex. The string format is like this:
"10.10.10.101","10.10.10.102","10.10.10.103"
I can capture the values between all quotes using regex such as:
"([^"]*)"
Question is how do I select each group separately so I can use them?
i.e.:
value1 = 10.10.10.101
value2 = 10.10.10.102
value3 = 10.10.10.103
I assume that I need three expressions but I cannot find how to select a specific occurance.
Apologies if its obvious but I have spent a while searching and testing with no luck...
You can try this bash:
$ str="10.10.10.101","10.10.10.102","10.10.10.103"
$ IFS="," arr=($str)
$ echo ${arr[1]}
10.10.10.102
If you have GNU awk, you can use FPAT to set the pattern for each field:
awk -v FPAT='[0-9.]+' '{ print $1 }' <<<'"10.10.10.101","10.10.10.102","10.10.10.103"'
Substitute $1 for $2 or $3 to print whichever value you want.
Since your fields don't contain spaces, you could use a similar method to read the values into an array:
read -ra ips < <(awk -v FPAT='[0-9.]+' '{ $1 = $1 }1' <<<'"10.10.10.101","10.10.10.102","10.10.10.103"')
Here, $1 = $1 makes awk reformat each line, so that the fields are printed with spaces in between.
Using grep -P you can use match reset:
s="10.10.10.101","10.10.10.102","10.10.10.103"
arr=($(grep -oP '(^|,)"\K[^"]*' <<< "$s"))
# check array content
declare -p arr
declare -a arr='([0]="10.10.10.101" [1]="10.10.10.102" [2]="10.10.10.103")'
If your grep doesn't support -P (PCRE) flag then use:
arr=($(grep -Eo '[.[:digit:]]+' <<< "$s"))
Here is an awk command that should work for BSD awk as well:
awk -F '"(,")?' '{for (i=2; i<NF; i++) print $i}' <<< "$s"

How can I use bash variable in awk with regexp?

I have a file like this (this is sample):
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
21.23.51.22|212.152.22.12|71.13.54.12|8.8.8.8
...
I have iplist.txt like this:
71.13.55.
12.33.23.
8.8.
4.2.
...
I need to grep if 3. column starts like in iplist.txt.
Like this:
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
I tried:
for ip in $(cat iplist.txt); do
awk -v var="$ip" -F '|' '{if ($3 ~ /^$var/) print $0;}' text.txt
done
But bash variable does not work in /^ / regex block. How can I do that?
First, you can use a concatenation of strings for the regular expression, it doesn't have to be a regex block. You can say:
'{if ($3 ~ "^" var) print $0;}'
Second, note above that you don't use a $ with variables inside awk. $ is only used to refer to fields by number (as in $3, or $somevar where somevar has a field number as its value).
Third, you can do everything in awk in which case you can avoid the shell loop and don't need the var:
awk -F'|' 'NR==FNR {a["^" $0]; next} { for (i in a) if ($3 ~ i) {print;next} }' iplist.txt r.txt
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
EDIT
As rightly pointed out in the comments, the .s in the patterns will match any character, not just a literal .. Thus we need to escape them before doing the match:
awk -F'|' 'NR==FNR {gsub(/\./,"\\."); a["^" $0]; next} { for (i in a) if ($3 ~ i) print }' iplist.txt r.txt
I'm assuming that you only want to output a given line once, even if it matches multiple patterns from iplist.txt. If you want to output a line multiple times for multiple matches (as your version would have done), remove the next from {print;next}.
Use var directly, instead of in /^$var/ ( adding ^ to the variable first):
awk -v var="^$ip" -F '|' '$3 ~ var' text.txt
By the way, the default action for a true condition is to print the current record, so, {if (test) {print $0}} can often be contracted to just test.
Here is a way with bash, sed and grep, it's straight forward and I think may be a bit cleaner than awk in this case:
IFS=$(echo -en "\n\b") && for ip in $(sed 's/\./\\&/g' iplist.txt); do
grep "^[^|]*|[^|]*|${ip}" r.txt
done

Regular expression to search column in text file

I am having trouble getting a regular expression that will search for an input term in the specified column. If the term is found in that column, then it needs to output that whole line.
These are my variables:
sreg = search word #Example: Adam
file = text file #Example: Contacts.txt
sfield = column number #Example: 1
the text file is in this format with a space being the field seperator, with many contact entries:
First Last Email Phone Category
Adam aster junfmr# 8473847548 word
Jeff Williams 43wadsfddf# 940342221995 friend
JOhn smart qwer#qwer 999999393 enemy
yooun yeall adada 111223123 other
zefir sentr jjdirutk#jd 8847394578 other
I've tried with no success:
grep "$sreg" "$file" | cut -d " " -f"$sfield"-"$sfield"
awk -F, '{ if ($sreg == $sfield) print $0 }' "$file"
awk -v s="$sreg" -v c="$sfield" '$c == s { print $0 }' "$file"
Thanks for any help!
awk may be the best solution for this:
awk -v field="$field" -v name="$name" '$field==name' "$file"
This checks if the field number $field has the value $name. If so, awk automatically prints the full line that contains it.
For example:
$ field=1
$ name="Adam"
$ file="your_file"
$ awk -v field="$field" -v name="$name" '$field==name' "$file"
Adam aster junfmr# 8473847548 word
As you can see, we give the parameters using -v var="$bash_var", so that you can use them inside awk.
Also, the space is the field separator, so you don't need to specify it since it is the default.
This works for me:
awk -v f="$sfield" -v reg="$sreg" '{if ($f ~ reg) {print $0}}' "$file"
Major problem is that you need an indirection from $sfield (ex, "1") to $($sfield) (ex, $1).
I tried using backtricks `, and also using ${!sfield}, but they don't work in awk, as awk does not accept this. Finally I found the way of passing variable into awk, converting to awk internal variabls (using -v).
Within awk, I found you can not even access variables outside. So I had to pass $sreg as well.
Update: I think using "~" instead of "==" is better because the original requirement said matchi==ng a regular expression.
For example,
sreg=Ad

Search regex on a specific field using awk

In awk I can search a field for a value like:
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2=="eaae" {print $0};'
aa,bb,cc
dd,eaae,ff
And I can search by regular expressions like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; /[a]{2}/ {print $0};'
aa,bb,cc
dd,eaae,ff
Can I force the awk to apply the regexp search to a specific field ? I'm looking for something like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2==/[a]{2}/ {print $0};'
expecting result:
dd,eaae,ff
Anyone know how to do it using awk?
Accepted response - Operator "~" (thanks to hek2mgl):
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2 ~ /[a]{2}/ {print $0};'
You can use :
$2 ~ /REGEX/ {ACTION}
If the regex should apply to the second field (for example) only.
In your case this would lead to:
awk -F, '$2 ~ /^[a]{2}$/' <<< "aa,bb,cc\ndd,eaae,ff"
You may wonder why I've just used the regex in the awk program and no print. This is because your action is print $0 - printing the current line - which is the default action in awk.