how to use awk to match the complete word - regex

I have this script which takes the value of variable $userId as input & uses it in below awk command.
gzip -cd "input.csv.gz"|/usr/xpg4/bin/awk -v search="$userId" -F, 'BEGIN{ OFS=","} { if( match($4, search)) print $0 }' >>$outputFileNameUser
format of input.csv.gz is like below:
gzip -cd input.csv.gz|head -4
Circle,Date,Time,SubscriberId,OperatorId,VoucherNumber,Status
UPE,01-JUN-15,20:23:39,9936596081,,1161504025632821,Used
UPE,01-JUN-15,20:23:39,7755802655,,1161504038349788,Used
UPE,01-JUN-15,20:23:39,9793948511,,1161504027670339,Used
This awk command is matching 4th field of input file (i.e. SubscriberId) with variable $UserId. In the first row value of $4 is 9936596081.
So even if I provide "6596" (which is a part of 9936596081 ) as $UserId it will match the first row. I want to match the complete number(9936596081) not any part of the number.. I tried like this..
gzip -cd "${outputFileName}"|/usr/xpg4/bin/awk -v search="$userId" -F, 'BEGIN{ OFS=","} { if( $4 == "search")) print $0 }' >>$outputFileNameUser
but didn't work.. could you please help me on this?

Try it like this:
awk -v search="$userId" -F, '$4==search'

Related

print the last letter of each word to make a string using `awk` command

I have this line
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
i am trying to print the last letter of each word to make a string using awk command
awk '{ print substr($1,6) substr($2,6) substr($3,6) substr($4,6) substr($5,6) substr($6,6) }'
In case I don't know how many characters a word contains, what is the correct command to print the last character of $column, and instead of the repeding substr command, how can I use it only once to print specific characters in different columns
If you have just this one single line to handle you can use
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($i))} END{print r}' file
If you have multiple lines in the input:
awk '{r=""; for (i=1;i<=NF;i++) r = r "" substr($i,length($i)); print r}' file
Details:
{for (i=1;i<=NF;i++) r = r "" substr($i,length($i)) - iterate over all fields in the current record, i is the field ID, $i is the field value, and all last chars of each field (retrieved with substr($i,length($i))) are appended to r variable
END{print r} prints the r variable once awk script finishes processing.
In the second solution, r value is cleared upon each line processing start, and its value is printed after processing all fields in the current record.
See the online demo:
#!/bin/bash
s='UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS'
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s"
Output:
GMUCHOS
Using GNU awk and gensub:
$ gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' file
Output:
GMUCHOS
1st solution: With GNU awk you could try following awk program, written and tested eith shown samples.
awk -v RS='.([[:space:]]+|$)' 'RT{gsub(/[[:space:]]+/,"",RT);val=val RT} END{print val}' Input_file
Explanation: Set record separator as any character followed by space OR end of value/line. Then as per OP's requirement remove unnecessary newline/spaces from fetched value; keep on creating val which has matched value of RS, finally when awk program is done with reading whole Input_file print the value of variable then.
2nd solution: Using record separator as null and using match function on values to match regex (.[[:space:]]+)|(.$) to get last letter values only with each match found, keep adding matched values into a variable and at last in END block of awk program print variable's value.
awk -v RS= '
{
while(match($0,/(.[[:space:]]+)|(.$)/)){
val=val substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
END{
gsub(/[[:space:]]+/,"",val)
print val
}
' Input_file
Simple substitutions on individual lines is the job sed exists to do:
$ sed 's/[^ ]*\([^ ]\) */\1/g' file
GMUCHOS
using many tools
$ tr -s ' ' '\n' <file | rev | cut -c1 | paste -sd'\0'
GMUCHOS
separate the words to lines, reverse so that we can pick the first char easily, and finally paste them back together without a delimiter. Not the shortest solution but I think the most trivial one...
I would harness GNU AWK for this as follows, let file.txt content be
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
then
awk 'BEGIN{FPAT="[[:alpha:]]\\>";OFS=""}{$1=$1;print}' file.txt
output
GMUCHOS
Explanation: Inform AWK to treat any alphabetic character at end of word and use empty string as output field seperator. $1=$1 is used to trigger line rebuilding with usage of specified OFS. If you want to know more about start/end of word read GNU Regexp Operators.
(tested in gawk 4.2.1)
Another solution with GNU awk:
awk '{$0=gensub(/[^[:space:]]*([[:alpha:]])/, "\\1","g"); gsub(/\s/,"")} 1' file
GMUCHOS
gensub() gets here the characters and gsub() removes the spaces between them.
or using patsplit():
awk 'n=patsplit($0, a, /[[:alpha:]]\>/) { for (i in a) printf "%s", a[i]} i==n {print ""}' file
GMUCHOS
An alternate approach with GNU awk is to use FPAT to split by and keep the content:
gawk 'BEGIN{FPAT="\\S\\>"}
{ s=""
for (i=1; i<=NF; i++) s=s $i
print s
}' file
GMUCHOS
Or more tersely and idiomatic:
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' file
GMUCHOS
(Thanks Daweo for this)
You can also use gensub with:
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' file
GMUCHOS
The advantage here of both is that single letter "words" are handled properly:
s2='SINGLE X LETTER Z'
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' <<< "$s2"
EXRZ
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' <<< "$s2"
EXRZ
Where the accepted answer and most here do not:
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s2"
ER # WRONG
gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' <<< "$s2"
EX RZ # WRONG

How can I use bash variable in awk with regexp?

I have a file like this (this is sample):
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
21.23.51.22|212.152.22.12|71.13.54.12|8.8.8.8
...
I have iplist.txt like this:
71.13.55.
12.33.23.
8.8.
4.2.
...
I need to grep if 3. column starts like in iplist.txt.
Like this:
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
I tried:
for ip in $(cat iplist.txt); do
awk -v var="$ip" -F '|' '{if ($3 ~ /^$var/) print $0;}' text.txt
done
But bash variable does not work in /^ / regex block. How can I do that?
First, you can use a concatenation of strings for the regular expression, it doesn't have to be a regex block. You can say:
'{if ($3 ~ "^" var) print $0;}'
Second, note above that you don't use a $ with variables inside awk. $ is only used to refer to fields by number (as in $3, or $somevar where somevar has a field number as its value).
Third, you can do everything in awk in which case you can avoid the shell loop and don't need the var:
awk -F'|' 'NR==FNR {a["^" $0]; next} { for (i in a) if ($3 ~ i) {print;next} }' iplist.txt r.txt
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
EDIT
As rightly pointed out in the comments, the .s in the patterns will match any character, not just a literal .. Thus we need to escape them before doing the match:
awk -F'|' 'NR==FNR {gsub(/\./,"\\."); a["^" $0]; next} { for (i in a) if ($3 ~ i) print }' iplist.txt r.txt
I'm assuming that you only want to output a given line once, even if it matches multiple patterns from iplist.txt. If you want to output a line multiple times for multiple matches (as your version would have done), remove the next from {print;next}.
Use var directly, instead of in /^$var/ ( adding ^ to the variable first):
awk -v var="^$ip" -F '|' '$3 ~ var' text.txt
By the way, the default action for a true condition is to print the current record, so, {if (test) {print $0}} can often be contracted to just test.
Here is a way with bash, sed and grep, it's straight forward and I think may be a bit cleaner than awk in this case:
IFS=$(echo -en "\n\b") && for ip in $(sed 's/\./\\&/g' iplist.txt); do
grep "^[^|]*|[^|]*|${ip}" r.txt
done

grep line with exact pattern in first column

I have this script :
while read line; do grep $line my_annot | awk '{print $2}' ; done < foo.txt
But it doesn't return what I want.
The problem is that in foo.txt, when I have for instance Contig1, the script will return the column 2 of the file my_annot even if the pattern found is Contig12 and not Contig1 only!
I tried with $ at the end of the pattern but the problem is that it corresponds to end of line while this expression I search is in column 1 and therefore not end of line.
How can I tell to search this EXACT pattern and not those that contain this pattern?
####### ANSWER :
My script is :
annot='/home/mu/myannot'
awk 'NR == FNR { line[$0]; next } $1 in line { print $2 }' $1 $annot > out
It allows me to give the list of expression I want to find as first argument doing ./myscript.sh mylist
And I redirect the result in a file called out.
Thank you guys !!!!
You should use awk to do the whole thing:
awk 'NR == FNR { line[$0]; next } $1 in line { print $2 }' foo.txt my_annot
This reads each line of foo.txt, setting a key in the array line, then prints the second column of any lines whose first column exactly matches one of the keys in the array.
Of course I have made a guess that the format of your data is the same as in the other answer.
So you have a file like
Contig1 hugo
Contig12 paul
right?
Then this will help:
awk '$1~/^Contig1$/ {print $2}' foo.txt
I think this is what you want
while read line; do grep -w $line my_annot | awk '{print $2}' ; done < foo.txt
But it's not 100% clear (because of a lack of example data) whether it will work in all cases.

Regular expression to search column in text file

I am having trouble getting a regular expression that will search for an input term in the specified column. If the term is found in that column, then it needs to output that whole line.
These are my variables:
sreg = search word #Example: Adam
file = text file #Example: Contacts.txt
sfield = column number #Example: 1
the text file is in this format with a space being the field seperator, with many contact entries:
First Last Email Phone Category
Adam aster junfmr# 8473847548 word
Jeff Williams 43wadsfddf# 940342221995 friend
JOhn smart qwer#qwer 999999393 enemy
yooun yeall adada 111223123 other
zefir sentr jjdirutk#jd 8847394578 other
I've tried with no success:
grep "$sreg" "$file" | cut -d " " -f"$sfield"-"$sfield"
awk -F, '{ if ($sreg == $sfield) print $0 }' "$file"
awk -v s="$sreg" -v c="$sfield" '$c == s { print $0 }' "$file"
Thanks for any help!
awk may be the best solution for this:
awk -v field="$field" -v name="$name" '$field==name' "$file"
This checks if the field number $field has the value $name. If so, awk automatically prints the full line that contains it.
For example:
$ field=1
$ name="Adam"
$ file="your_file"
$ awk -v field="$field" -v name="$name" '$field==name' "$file"
Adam aster junfmr# 8473847548 word
As you can see, we give the parameters using -v var="$bash_var", so that you can use them inside awk.
Also, the space is the field separator, so you don't need to specify it since it is the default.
This works for me:
awk -v f="$sfield" -v reg="$sreg" '{if ($f ~ reg) {print $0}}' "$file"
Major problem is that you need an indirection from $sfield (ex, "1") to $($sfield) (ex, $1).
I tried using backtricks `, and also using ${!sfield}, but they don't work in awk, as awk does not accept this. Finally I found the way of passing variable into awk, converting to awk internal variabls (using -v).
Within awk, I found you can not even access variables outside. So I had to pass $sreg as well.
Update: I think using "~" instead of "==" is better because the original requirement said matchi==ng a regular expression.
For example,
sreg=Ad

Awk regex for a string of exact length that ends in ":"

I just can't get the regex right:
awk '$6 ~ /:${14}/ {print $6}' file
I need to print out the 6th field if it's 15 characters long and ends with a ":".
Here's an example: oAFKq7XS001224:
You need to use --posix as:
awk --posix '{ if ($6 ~ /^.{14}:$/) print $6}' file
Command in action
From awk manual page:
Interval expressions are only
available if either --posix or
--re-interval is specified on the command line.
What about:
awk '$6 ~ /^.{14}:$/ { print $6 } ' file