I'm trying to use awk with a user-defined variable ($EVENT, where $EVENT is a filename and also a column in a textfile) in the if condition, but it doesn't seem to recognize the variable. I've tried with various combinations of ', ", { and ( but nothing seem to work.
EVENT=19971010_1516.txt
awk '{if ($2=="$EVENT") print $3,$4,$8}' FILENAME.txt > output.txt
It is possible to use user-defined variables in awk commands? If so, how does the syntax work?
you cannot use $FOO directly in your code, because awk will think it is column FOO. (FOO is variable). but your FOO is empty. to use shell var, use -v like:
awk -v event="$EVENT" '{print event}' file
You can do:
awk '$2==event {print $3,$4,$8}' event="$EVENT" FILENAME.txt > output.txt
awk -v event="$EVENT" '$2==event {print $3,$4,$8}' FILENAME.txt > output.txt
See this post for more info:
How do I use shell variables in an awk script?
If you want to include the variable in the awk script literally then you need to enclose the script in double quotes (single quotes do not expand variables). So something like awk '{if ($2=="'"$EVENT"'") print $3,$4,$8}' FILENAME.txt > output.txt'. Which uses single quotes on the rest of the awk script to avoid needing to escape the $ characters but then uses double quotes for the event variable.
That being said you almost certainly want to expose the shell variable to awk as an awk variable which means you want to use the -v flag to awk. So something like awk -vevent="$EVENT" '{if ($2==event) print $3,$4,$8}' FILENAME.txt > output.txt. (Alternatively you could use something like awk '{if ($2==event) print $3,$4,$8}' event="$EVENT" FILENAME.txt > output.txt.)
You could also simplify your awk body a bit by using '$2 == event {print $3,$4,$8}' and let patterns do what they are supposed to do.
Related
I am trying to use different stuff with awk.
First, the use of some shell variables, which here shows how to use them.
Second, how to use a shell variable to match a pattern, which here points to use ~ operator.
Finally, I want to use some kind of or operator to match two shell variables.
Putting all together in foo.sh:
#!/bin/bash
START_TEXT="My start text"
END_TEXT="My end text"
awk -v start=$START_TEXT -v end=$END_TEXT '$0 ~ start || $0 ~ end { print $2 }' myfile
Which fails to run:
$ ./foo.sh
awk: fatal: cannot open file `text' for reading (No such file or directory)
So I think the OR-operator (||) does not work well with regex ~ operator.
I was guessing I may need to do the OR-thing inside the regex.
So I tried these two:
awk -v start=$START_TEXT -v end=$END_TEXT '$0~/start|end/ { print $2 }' myfile
awk -v start=$START_TEXT -v end=$END_TEXT '$0~start|end { print $2 }' myfile
With same failed result.
And even this thing fails...
awk -v start=$START_TEXT '$0~start { print $2 }' myfile
So I am doing something really wrong...
Any hints how to achieve this?
You can do the regex OR like this:
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~ start "|" end { print $2 }' myfile
awk knows the parameter passed to ~ operator is a regex, so we can just process it by insert the | or operator between two strings.
Also there's another way to pass variables into awk, like this:
awk '$0~ start "|" end { print $2 }' start="$START_TEXT" end="$END_TEXT" myfile
This will increase conciseness. But since it's less intuitive, so use it with caution.
Well, it seems #jxc pointed my problem in the comments: the shell variables need to be quoted.
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~start || $0~end { print $2 }' myfile
That made it work!
I have a file like this (this is sample):
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
21.23.51.22|212.152.22.12|71.13.54.12|8.8.8.8
...
I have iplist.txt like this:
71.13.55.
12.33.23.
8.8.
4.2.
...
I need to grep if 3. column starts like in iplist.txt.
Like this:
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
I tried:
for ip in $(cat iplist.txt); do
awk -v var="$ip" -F '|' '{if ($3 ~ /^$var/) print $0;}' text.txt
done
But bash variable does not work in /^ / regex block. How can I do that?
First, you can use a concatenation of strings for the regular expression, it doesn't have to be a regex block. You can say:
'{if ($3 ~ "^" var) print $0;}'
Second, note above that you don't use a $ with variables inside awk. $ is only used to refer to fields by number (as in $3, or $somevar where somevar has a field number as its value).
Third, you can do everything in awk in which case you can avoid the shell loop and don't need the var:
awk -F'|' 'NR==FNR {a["^" $0]; next} { for (i in a) if ($3 ~ i) {print;next} }' iplist.txt r.txt
71.13.55.12|212.152.22.12|71.13.55.12|8.8.8.8
81.23.45.12|212.152.22.12|71.13.55.13|8.8.8.8
61.53.54.62|212.152.22.12|71.13.55.14|8.8.8.8
EDIT
As rightly pointed out in the comments, the .s in the patterns will match any character, not just a literal .. Thus we need to escape them before doing the match:
awk -F'|' 'NR==FNR {gsub(/\./,"\\."); a["^" $0]; next} { for (i in a) if ($3 ~ i) print }' iplist.txt r.txt
I'm assuming that you only want to output a given line once, even if it matches multiple patterns from iplist.txt. If you want to output a line multiple times for multiple matches (as your version would have done), remove the next from {print;next}.
Use var directly, instead of in /^$var/ ( adding ^ to the variable first):
awk -v var="^$ip" -F '|' '$3 ~ var' text.txt
By the way, the default action for a true condition is to print the current record, so, {if (test) {print $0}} can often be contracted to just test.
Here is a way with bash, sed and grep, it's straight forward and I think may be a bit cleaner than awk in this case:
IFS=$(echo -en "\n\b") && for ip in $(sed 's/\./\\&/g' iplist.txt); do
grep "^[^|]*|[^|]*|${ip}" r.txt
done
I have a directory of files with filenames of the form file000.txt to filennn.txt. I would like to be able to specify a range of file names and print the content of those files based on a match. I have achieved it with a single file pattern:
$ gawk 'FILENAME ~/file038.txt/ {print FILENAME, $0}' file*.txt
file038.txt Some 038 text here
But I cannot get a pattern that would allow me to specify a range of file names, for instance
gawk 'FILENAME ~/file[038-040].txt/ {print FILENAME, $0}' file*.txt
I'm sure I'm missing something simple here, I'm an AWK newbie. Any suggestions?
you can do some substitution on the filename, for example:
awk '{x=FILENAME;gsub(/[^0-9]/,"",x);x+=0}x>10&&x<50{your logic}' file*.txt
in this way, file file011.txt ~ file049.txt would be handled with "your logic"
You can adjust the part: x>10&&x<50 for example, handle only file with the number in the name as odd/even/.... just write boolean expressions there.
Odd way but something on these lines:
awk '{ if (match(FILENAME,/file0[3-4][0-8].txt/)) { print FILENAME, $0}}' file*.txt
Solution using gawk and a recent version of bash
There is a bash primitive to handle file[038-040].txt. It makes the code quite simple:
gawk 'FNR==1 {print FILENAME, $0} {quit}' file{038..040}.txt
Key points:
FNR==1 {print FILENAME, $0}
This prints the filename and the first line of each file
{quit}
This saves time by skipping directly to the next file.
file{038..040}.txt
The construct {038..040} is a bash feature called brace expansion. bash will replace this with the file names that you want. If you want to test out brace expansion to see how it works, try it on the command line with this simple statement:
echo file{038..040}.txt
UPDATE 1: Mac OSX currently uses bash v3.2 which does not support leading zeros in brace expansion.
UPDATE 2: If there are missing files and you have a modern gawk (v4.0 or better), use this instead:
gawk 'BEGINFILE{ if (ERRNO) nextfile} FNR==1 {print FILENAME, $0} {quit}' file{038..040}.txt
Solution using gawk with a plain POSIX shell
gawk '{n=0+substr(FILENAME,5,3)} FNR==1 && n>=38 && n<=40 {print FILENAME, $0} {quit}' file*.txt
Explanation:
n=0+substr(FILENAME,5,3)
Extract the number from the filename. 0+ is a trick to force awk to treat n as numeric.
n>=38 && n<=40 {print FILENAME, $0}
This selects the file based on its number and prints the filename and first line.
{quit}
As before, this saves time by stopping awk from reading the rest of each file.
file*.txt
This can be expanded by any POSIX shell to the list of file names.
Should work
awk '(x=FILENAME)~/(3[8-9]|40).txt$/{print x,$0;quit}' file*.txt
As quit doesn't work(atleast with my version of awk) here is another way
awk 'FNR==((x=FILENAME)~/(3[8-9]|40).txt$/){print x,$0}' file*.txt
I have a file with some lines like these:
ENVIRONMENT="myenv"
ENV_DOMAIN='mydomain.net'
LOGIN_KEY=mykey.pem
I want to extract the parts after the = but without the surrounding quotes. I tried with gsub like this:
awk -F= '!/^(#|$)/ && /^ENVIRONMENT=/ {gsub(/"|'/, "", $2); print $2}'
Which ends up with -bash: syntax error near unexpected token ')' error. It works just fine for single matching: /"/ or /'/ but doesn't work when I try match either one. What am I doing wrong?
If you are just trying to remove the punctuation then you can do it as below....
# remove all punctuation
awk -F= '{print $2}' n.dat | tr -d [[:punct:]]
# only remove single and double quotes
awk -F= '{print $2}' n.dat | tr -d \''"\'
explanation:
tr -d \''"\' is to delete any single and double quotes.
tr -d [[:punct:]] to delete all character from the punctuation class
Sample output as below from 2nd command above (without quotes):
myenv
mydomain.net
mykeypem
The problem is not with awk, but with bash. The single quote inside the gsub is closing the open quote so that bash is trying to parse the command awk with arguments !/^...gsub(/"|/,, ,, $2 and then an unmatched close paren. Try replacing the single quote with '"'"' (so that bash will properly terminate the string, then apply a single quote, then reopen another string.)
Is awk really a requirement? If not, why don't you use a simple sed command:
sed -rn -e "s/^[^#]+='(.*)'$/\1/p" \
-e "s/^[^#]+=\"(.*)\"$/\1/p" \
-e "s/^[^#]+=(.*)/\1/p" data
This might seems over engineered, but it works properly with embedded quotes:
sh$ cat data
ENVIRONMENT="myenv"
ENV_DOMAIN='mydomain.net'
LOGIN_KEY=mykey.pem
PASSWD="good ol'passwd"
sh$ sed -rn -e "s/^[^#]+='(.*)'/\1/p" -e "s/^[^#]+=\"(.*)\"/\1/p" -e "s/^[^#]+=(.*)/\1/p" data
myenv
mydomain.net
mykey.pem
good ol'passwd
You can use awk like this:
awk -F "=['\"]?|['\"]" '{print $2}' file
myenv
mydomain.net
mykey.pem
This will work with your awk
awk -F= '!/^(#|$)/ && /^ENVIRONMENT=/ {gsub(/"/,"",$2);gsub(q,"",$2); print $2}' q=\' file
It is the single quote in the expression that create problems. Add it to an variable and it will work.
I did the following:
awk -F"=\"|='|'|\"|=" '{print $2}' file
myenv
mydomain.net
mykey.pem
This tells awk to use either =", =', ' or " as field separator.
This is because the awk program must be enclosed in single quotes when run as a command line program. The program can be tripped up if a single quote is contained inside the script. Special tricks can be made to use single quotes as strings inside the program. See Shell-Quoting Issues in the GNU Awk Manual.
One trick is to save the match string as a variable:
awk -F\= -v s="'|\"" '{gsub(s, "", $2); print $2}' file
Output:
myenv
mydomain.net
mykey.pem
Can we pass NR to a variable in awk ?
I have a script which goes like this :
awk -v { blah blah..
..........
count--
print count
}
if (count==0)
{print "The end of function"
print NR
exit
}
This is the awk part of the code . I want to pass the NR to var2 as :
sed -n ''"$var1"','"$var2"'p'
Which has to be reused several times !
Thanks for your replies .
If you only want to print a certain subset of lines you're almost there. The -v flag is the way to go.
awk -v var1=15 -v var2=25 'NR>=var1 && NR<=var2 {blah blah ...}'
Of course you have to change 15 and 25 to what you need. Observe that variables shoudn't be encapsulated in quotes.
As others have suggested, there are better ways to accomplish the overall goal.
However, in order to answer your specific question:
var2=$(awk 'END {print NR}' inputfile)
and add anything else you may need within the AWK script.
I don't know what you want to achieve with awk, sed and the NR variable. Do you mean the number of lines of the file?
This command gets it:
wc -l infile | sed -e 's/ .*$//'
So, use it with -v switch to awk and use it as you want. Next command will print 10 because infile has ten lines in my computer.
awk -v num_lines=$(wc -l infile | sed -e 's/ .*$//') 'BEGIN { print num_lines }'