I'm writing a parser in bash where I have a text with one ":" in each line, and I need to output the part after a colon if the part before the colon matches the word "txt".
So I divided the text's lines by ":" and then tried to use if-statement in awk.
Command that I've tried:
echo "txt:hello" | awk -F: '{if [[ $1="txt" ]] then print $2 fi}'
But that resulted in a syntax error in the if-statement, so I wonder if the awk's if-else construction differs from basic bash's?
use if-statement in awk.
AWK is not Bash. AWK syntax more resembles C style.
awk -F: '{if ($1 == "txt") print $2}'
Or just:
awk -F: '$1 == "txt"{print $2}'
See https://www.gnu.org/software/gawk/manual/gawk.html#Getting-Started and https://www.gnu.org/software/gawk/manual/gawk.html#Very-Simple .
Related
I have a file called "align_summary.txt" which looks like this:
Left reads:
Input : 26410324
Mapped : 21366875 (80.9% of input)
of these: 451504 ( 2.1%) have multiple alignments (4372 have >20)
...more text....
... and several more lines of text....
I want to pull out the % of multiple alignments among all left aligned reads (in this case it's 2.1) in bash shell.
If I use this:
pcregrep -M "Left reads.\n..+.\n.\s+Mapped.+.\n.\s+of these" align_summary.txt | awk -F"\\\( " '{print $2}' | awk -F"%" '{print $1}' | sed -n 4p
It promptly gives me the output: 2.1
However, if I enclose the same expression in backticks like this:
leftmultiple=`pcregrep -M "Left reads.\n..+.\n.\s+Mapped.+.\n.\s+of these" align_summary.txt | awk -F"\\\( " '{print $2}' | awk -F"%" '{print $1}' | sed -n 4p`
I receive an error:
awk: syntax error in regular expression ( at
input record number 1, file
source line number 1
As I understand it, enclosing this expression in backticks affects the interpretation of the regular expression that includes "(" symbol, despite the fact that it is escaped by backslashes.
Why does this happen and how to avoid this error?
I would be grateful for any input and suggestions.
Many thanks,
Just use awk:
leftmultiple=$(awk '/these:.*multiple/{sub(" ","",$2);print $2}' FS='[(%]' align_summary.txt )
Always use $(...) instead of backticks but more importantly, just use awk alone:
$ leftmultiple=$( gawk -v RS='^$' 'match($0,/Left reads.\s*\n\s+.+\n\s+Mapped.+.\n.\s+of these[^(]+[(]\s*([^)%]+)/,a) { print a[1] }' align_summary.txt )
$ echo "$leftmultiple"
2.1
The above uses GNU awk 4.* and assumes you do need the complicated regexp that you were using to avoid false matches elsewhere in your input file. If that's not the case then the script can of course get much simpler.
I have a file like this (this is sample):
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
21.23.51.22|212.152.22.12|71.13.54.12|8.8.8.8
...
I have iplist.txt like this:
71.13.55.
12.33.23.
8.8.
4.2.
...
I need to grep if 3. column starts like in iplist.txt.
Like this:
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
I tried:
for ip in $(cat iplist.txt); do
awk -v var="$ip" -F '|' '{if ($3 ~ /^$var/) print $0;}' text.txt
done
But bash variable does not work in /^ / regex block. How can I do that?
First, you can use a concatenation of strings for the regular expression, it doesn't have to be a regex block. You can say:
'{if ($3 ~ "^" var) print $0;}'
Second, note above that you don't use a $ with variables inside awk. $ is only used to refer to fields by number (as in $3, or $somevar where somevar has a field number as its value).
Third, you can do everything in awk in which case you can avoid the shell loop and don't need the var:
awk -F'|' 'NR==FNR {a["^" $0]; next} { for (i in a) if ($3 ~ i) {print;next} }' iplist.txt r.txt
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
EDIT
As rightly pointed out in the comments, the .s in the patterns will match any character, not just a literal .. Thus we need to escape them before doing the match:
awk -F'|' 'NR==FNR {gsub(/\./,"\\."); a["^" $0]; next} { for (i in a) if ($3 ~ i) print }' iplist.txt r.txt
I'm assuming that you only want to output a given line once, even if it matches multiple patterns from iplist.txt. If you want to output a line multiple times for multiple matches (as your version would have done), remove the next from {print;next}.
Use var directly, instead of in /^$var/ ( adding ^ to the variable first):
awk -v var="^$ip" -F '|' '$3 ~ var' text.txt
By the way, the default action for a true condition is to print the current record, so, {if (test) {print $0}} can often be contracted to just test.
Here is a way with bash, sed and grep, it's straight forward and I think may be a bit cleaner than awk in this case:
IFS=$(echo -en "\n\b") && for ip in $(sed 's/\./\\&/g' iplist.txt); do
grep "^[^|]*|[^|]*|${ip}" r.txt
done
In awk I can search a field for a value like:
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2=="eaae" {print $0};'
aa,bb,cc
dd,eaae,ff
And I can search by regular expressions like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; /[a]{2}/ {print $0};'
aa,bb,cc
dd,eaae,ff
Can I force the awk to apply the regexp search to a specific field ? I'm looking for something like
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2==/[a]{2}/ {print $0};'
expecting result:
dd,eaae,ff
Anyone know how to do it using awk?
Accepted response - Operator "~" (thanks to hek2mgl):
$ echo -e "aa,bb,cc\ndd,eaae,ff" | awk 'BEGIN{FS=",";}; $2 ~ /[a]{2}/ {print $0};'
You can use :
$2 ~ /REGEX/ {ACTION}
If the regex should apply to the second field (for example) only.
In your case this would lead to:
awk -F, '$2 ~ /^[a]{2}$/' <<< "aa,bb,cc\ndd,eaae,ff"
You may wonder why I've just used the regex in the awk program and no print. This is because your action is print $0 - printing the current line - which is the default action in awk.
I have code like this
echo abc | awk '$0 ~ "a\(b\)c" {print $0}'
What if I only wanted what's in the parentheses instead of the whole line? This is obviously very simplified, and there is really a lot of awk code so I don't want to switch to sed or grep or something. Thanks
As far as I know you cannot do it in the pattern part, you must do it inside the action part with the match() function:
echo abc | awk '{ if ( match($0, /a(b)c/, a) > 0 ) { print a[1] } }'
It yields:
b
With GNU awk:
$ echo abc | awk '{print gensub(/a(b)c/,"\\1","")}'
b
I just can't get the regex right:
awk '$6 ~ /:${14}/ {print $6}' file
I need to print out the 6th field if it's 15 characters long and ends with a ":".
Here's an example: oAFKq7XS001224:
You need to use --posix as:
awk --posix '{ if ($6 ~ /^.{14}:$/) print $6}' file
Command in action
From awk manual page:
Interval expressions are only
available if either --posix or
--re-interval is specified on the command line.
What about:
awk '$6 ~ /^.{14}:$/ { print $6 } ' file