Passing the value of NR to a variable in AWK - regex

Can we pass NR to a variable in awk ?
I have a script which goes like this :
awk -v { blah blah..
..........
count--
print count
}
if (count==0)
{print "The end of function"
print NR
exit
}
This is the awk part of the code . I want to pass the NR to var2 as :
sed -n ''"$var1"','"$var2"'p'
Which has to be reused several times !
Thanks for your replies .

If you only want to print a certain subset of lines you're almost there. The -v flag is the way to go.
awk -v var1=15 -v var2=25 'NR>=var1 && NR<=var2 {blah blah ...}'
Of course you have to change 15 and 25 to what you need. Observe that variables shoudn't be encapsulated in quotes.

As others have suggested, there are better ways to accomplish the overall goal.
However, in order to answer your specific question:
var2=$(awk 'END {print NR}' inputfile)
and add anything else you may need within the AWK script.

I don't know what you want to achieve with awk, sed and the NR variable. Do you mean the number of lines of the file?
This command gets it:
wc -l infile | sed -e 's/ .*$//'
So, use it with -v switch to awk and use it as you want. Next command will print 10 because infile has ten lines in my computer.
awk -v num_lines=$(wc -l infile | sed -e 's/ .*$//') 'BEGIN { print num_lines }'

Related

awk: match regex with or operator of two shell variables

I am trying to use different stuff with awk.
First, the use of some shell variables, which here shows how to use them.
Second, how to use a shell variable to match a pattern, which here points to use ~ operator.
Finally, I want to use some kind of or operator to match two shell variables.
Putting all together in foo.sh:
#!/bin/bash
START_TEXT="My start text"
END_TEXT="My end text"
awk -v start=$START_TEXT -v end=$END_TEXT '$0 ~ start || $0 ~ end { print $2 }' myfile
Which fails to run:
$ ./foo.sh
awk: fatal: cannot open file `text' for reading (No such file or directory)
So I think the OR-operator (||) does not work well with regex ~ operator.
I was guessing I may need to do the OR-thing inside the regex.
So I tried these two:
awk -v start=$START_TEXT -v end=$END_TEXT '$0~/start|end/ { print $2 }' myfile
awk -v start=$START_TEXT -v end=$END_TEXT '$0~start|end { print $2 }' myfile
With same failed result.
And even this thing fails...
awk -v start=$START_TEXT '$0~start { print $2 }' myfile
So I am doing something really wrong...
Any hints how to achieve this?
You can do the regex OR like this:
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~ start "|" end { print $2 }' myfile
awk knows the parameter passed to ~ operator is a regex, so we can just process it by insert the | or operator between two strings.
Also there's another way to pass variables into awk, like this:
awk '$0~ start "|" end { print $2 }' start="$START_TEXT" end="$END_TEXT" myfile
This will increase conciseness. But since it's less intuitive, so use it with caution.
Well, it seems #jxc pointed my problem in the comments: the shell variables need to be quoted.
awk -v start="$START_TEXT" -v end="$END_TEXT" '$0~start || $0~end { print $2 }' myfile
That made it work!

Add delimiters at specific indexes

I want to add a delimiter in some indexes for each line of a file.
I have a file with data:
10100100010000
20200200020000
And I know the offset of each column (2, 5 and 9)
With this sed command: sed 's/\(.\{2\}\)/&,/;s/\(.\{6\}\)/&,/;s/\(.\{11\}\)/&,/' myFile
I get the expected output:
10,100,1000,10000
20,200,2000,20000
but with a large number of columns (~200) and rows (300k) is really slow.
Is there an efficient alternative?
1st solution: With GNU awk could you please try following:
awk -v OFS="," '{$1=$1}1' FIELDWIDTHS="2 3 4 5" Input_file
2nd Solution: Using sed try following.
sed 's/\(..\)\(...\)\(....\)\(.....\)/\1,\2,\3,\4/' Input_file
3rd solution: awk solution using substr.
awk 'BEGIN{OFS=","} {print substr($0,1,2) OFS substr($0,3,3) OFS substr($0,6,4) OFS substr($0,10,5)}' Input_file
In above substr solution, I have taken 5 digits/characters in substr($0,10,5) in case you want to take all characters/digits etc starting from 10th position use substr($0,10) which will take rest of all line's characters/digits here to print.
Output will be as follows.
10,100,1000,10000
20,200,2000,20000
Modifying your sed command to make it add all the separators in one shot would likely make it perform better :
sed 's/^\(.\{2\}\)\(.\{3\}\)\(.\{4\}\)/\1,\2,\3,/' myFile
Or with extended regular expression:
sed -E 's/(.{2})(.{3})(.{4})/\1,\2,\3,/' myFile
Output:
10,100,1000,10000
20,200,2000,20000
With GNU awk for FIELDWIDTHS:
$ awk -v FIELDWIDTHS='2 3 4 *' -v OFS=',' '{$1=$1}1' file
10,100,1000,10000
20,200,2000,20000
You'll need a newer version of gawk for * at the end of FIELDWIDTHS to mean "whatever's left", with older version just choose a large number like 999.
If you start the substitutions from the back, you can use the number flag to s to specify which occurrence of any character you'd like to append a comma to:
$ sed 's/./&,/9;s/./&,/5;s/./&,/2' myFile
10,100,1000,10000
20,200,2000,20000
You could automate that a bit further by building the command with a printf statement:
printf -v cmd 's/./&,/%d;' 9 5 2
sed "$cmd" myFile
or even wrap that in a little shell function so we don't have to care about listing the columns in reverse order:
gencmd() {
local arr
# Sort arguments in descending order
IFS=$'\n' arr=($(sort -nr <<< "$*"))
printf 's/./&,/%d;' "${arr[#]}"
}
sed "$(gencmd 2 5 9)" myFile

Regular expression to search column in text file

I am having trouble getting a regular expression that will search for an input term in the specified column. If the term is found in that column, then it needs to output that whole line.
These are my variables:
sreg = search word #Example: Adam
file = text file #Example: Contacts.txt
sfield = column number #Example: 1
the text file is in this format with a space being the field seperator, with many contact entries:
First Last Email Phone Category
Adam aster junfmr# 8473847548 word
Jeff Williams 43wadsfddf# 940342221995 friend
JOhn smart qwer#qwer 999999393 enemy
yooun yeall adada 111223123 other
zefir sentr jjdirutk#jd 8847394578 other
I've tried with no success:
grep "$sreg" "$file" | cut -d " " -f"$sfield"-"$sfield"
awk -F, '{ if ($sreg == $sfield) print $0 }' "$file"
awk -v s="$sreg" -v c="$sfield" '$c == s { print $0 }' "$file"
Thanks for any help!
awk may be the best solution for this:
awk -v field="$field" -v name="$name" '$field==name' "$file"
This checks if the field number $field has the value $name. If so, awk automatically prints the full line that contains it.
For example:
$ field=1
$ name="Adam"
$ file="your_file"
$ awk -v field="$field" -v name="$name" '$field==name' "$file"
Adam aster junfmr# 8473847548 word
As you can see, we give the parameters using -v var="$bash_var", so that you can use them inside awk.
Also, the space is the field separator, so you don't need to specify it since it is the default.
This works for me:
awk -v f="$sfield" -v reg="$sreg" '{if ($f ~ reg) {print $0}}' "$file"
Major problem is that you need an indirection from $sfield (ex, "1") to $($sfield) (ex, $1).
I tried using backtricks `, and also using ${!sfield}, but they don't work in awk, as awk does not accept this. Finally I found the way of passing variable into awk, converting to awk internal variabls (using -v).
Within awk, I found you can not even access variables outside. So I had to pass $sreg as well.
Update: I think using "~" instead of "==" is better because the original requirement said matchi==ng a regular expression.
For example,
sreg=Ad

how to use awk to match the complete word

I have this script which takes the value of variable $userId as input & uses it in below awk command.
gzip -cd "input.csv.gz"|/usr/xpg4/bin/awk -v search="$userId" -F, 'BEGIN{ OFS=","} { if( match($4, search)) print $0 }' >>$outputFileNameUser
format of input.csv.gz is like below:
gzip -cd input.csv.gz|head -4
Circle,Date,Time,SubscriberId,OperatorId,VoucherNumber,Status
UPE,01-JUN-15,20:23:39,9936596081,,1161504025632821,Used
UPE,01-JUN-15,20:23:39,7755802655,,1161504038349788,Used
UPE,01-JUN-15,20:23:39,9793948511,,1161504027670339,Used
This awk command is matching 4th field of input file (i.e. SubscriberId) with variable $UserId. In the first row value of $4 is 9936596081.
So even if I provide "6596" (which is a part of 9936596081 ) as $UserId it will match the first row. I want to match the complete number(9936596081) not any part of the number.. I tried like this..
gzip -cd "${outputFileName}"|/usr/xpg4/bin/awk -v search="$userId" -F, 'BEGIN{ OFS=","} { if( $4 == "search")) print $0 }' >>$outputFileNameUser
but didn't work.. could you please help me on this?
Try it like this:
awk -v search="$userId" -F, '$4==search'

Using a user-set variable in an awk command

I'm trying to use awk with a user-defined variable ($EVENT, where $EVENT is a filename and also a column in a textfile) in the if condition, but it doesn't seem to recognize the variable. I've tried with various combinations of ', ", { and ( but nothing seem to work.
EVENT=19971010_1516.txt
awk '{if ($2=="$EVENT") print $3,$4,$8}' FILENAME.txt > output.txt
It is possible to use user-defined variables in awk commands? If so, how does the syntax work?
you cannot use $FOO directly in your code, because awk will think it is column FOO. (FOO is variable). but your FOO is empty. to use shell var, use -v like:
awk -v event="$EVENT" '{print event}' file
You can do:
awk '$2==event {print $3,$4,$8}' event="$EVENT" FILENAME.txt > output.txt
awk -v event="$EVENT" '$2==event {print $3,$4,$8}' FILENAME.txt > output.txt
See this post for more info:
How do I use shell variables in an awk script?
If you want to include the variable in the awk script literally then you need to enclose the script in double quotes (single quotes do not expand variables). So something like awk '{if ($2=="'"$EVENT"'") print $3,$4,$8}' FILENAME.txt > output.txt'. Which uses single quotes on the rest of the awk script to avoid needing to escape the $ characters but then uses double quotes for the event variable.
That being said you almost certainly want to expose the shell variable to awk as an awk variable which means you want to use the -v flag to awk. So something like awk -vevent="$EVENT" '{if ($2==event) print $3,$4,$8}' FILENAME.txt > output.txt. (Alternatively you could use something like awk '{if ($2==event) print $3,$4,$8}' event="$EVENT" FILENAME.txt > output.txt.)
You could also simplify your awk body a bit by using '$2 == event {print $3,$4,$8}' and let patterns do what they are supposed to do.