I'm new to AWK. Does anyone know how to print out the line number of last match of a file using awk.
Here's a small part of the Test.txt file content:
CLOSE #140,value=140
WAIT = #14039,value=143
CLOSE #140,value=144
WAIT #0,value=155
WAIT = #14039,value=158
CLOSE #140,value=160
This is the code I used so far
Success first line:
awk -F= '{if($NF >= 143 && $NF <= 158){print NR,exit}}' Test.txt
But for last line
awk -F= '{if($NF >= 143 && $NF <= 158){a=$0}} END{print a,NR}' Test.txt
It's only printed out the hold matching line and the last line number of the file.
How can I get the line number of the last match?
Please help me with some advice.
Use a = NR instead of a = $0 (because it's the line number you want to remember, not the line itself).
Apart from that, it would arguably be more awkish to write
awk -F= '$NF >= 143 && $NF <= 158 { a = NR } END { print a }' Test.txt
{if(){}} is a bit ugly.
Related
I have 2 files,
file1:
YARRA2
file2:
59204.9493055556
59205.5930555556
So, file1 has 1 line and file2 has 2 lines. If file1 has 1 line, and file2 has more than 1 line, I want to repeat the lines in file1 according to the number of lines in file2.
So, my code is this:
eprows=$(wc -l < file2)
awk '{ if( NR<2 && eprows>1 ) {print} {print}}' file1
but the output is
YARRA2
Any idea? I have also tried with
awk '{ if( NR<2 && $eprows>1 ) {print} {print}}' file1
but it is the same
You may use this awk solution:
awk '
NR == FNR {
++n2
next
}
{
s = $0
print;
++n1
}
END {
if (n1 == 1)
for (n1=2; n1 <= n2; ++n1)
print s
}' file2 file1
YARRA2
YARRA2
eprows=$(wc -l < file2)
awk '{ if( NR<2 && eprows>1 ) {print} {print}}' file1
Oops! You stepped hip-deep in mixed languages.
The eprows variable is a shell variable. It's not accessible to other processes except through the environment, unless explicitly passed somehow. The awk program is inside single-quotes, which would prevent interpreting eprows even if used correctly.
The value of a shell variable is obtained with $, so
echo $eprows
2
One way to insert the value into your awk script is by interpolation:
awk '{ if( NR<2 && '"$eprows"'>1 ) {print} {print}}' file1
That uses a lesser known trick: you can switch between single- and double-quotes as long as you don't introduce spaces. Because double-quoted strings in the shell are interpolated, awk sees
{ if( NR<2 && 2>1 ) {print} {print} }
Awk also lets you pass values to awk variables on the command line, thus:
awk -v eprows=$eprows '{ if( NR<2 && eprows >1 ) {print} {print}}' file1
but you'd have nicer awk this way:
awk -v eprows=$eprows 'NR < 2 && eprows > 1 { {print} {print} }' file1
whitespace and brevity being elixirs of clarity.
That works because in the awk pattern / action paradigm, pattern is anything that can be reduced to true/false. It need not be a regex, although it usually is.
One awk idea:
awk '
FNR==NR { cnt++; next } # count number of records in 1st file
# no specific processing for 2nd file => just scan through to end of file
END { if (FNR==1 && cnt >=2) # if 2nd file has just 1 record (ie, FNR==1) and 1st file had 2+ records then ...
for (i=1;i<=cnt;i++) # for each record in 1st file ...
print # print current (and only) record from 2nd file
}
' file2 file1
This generates:
YARRA2
YARRA2
I want to modify the following script:
awk 'NR>242 && $1 =='$t' {print $4, "\t" '$t'}' test.txt > file
I want to add a condition for the first "1 to 121" data (corresponding to the first 121 points) and then for the "122 to 242" data (which corresponds to the other 121 points).
so it becomes:
when NR>242 take the corresponding values of rows form 1 to 121 print them to file1
when NR>242 take the corresponding values of rows form 121 to 242 print them to file2
Thanks!
Generic solution: Adding more generic solution here, where you could give all line numbers inside lines variable of awk program. Once line number matches with values it will increase counter of file with 1 eg: from file1 to file2 OR file2 to file3 and so on...
awk -v val="$t" -v lines="121,242" -v count=1'
BEGIN{
num=split(lines,arr,",")
for(i=1;i<=num;i++){
line[arr[i]]
outputfile="file"count
}
}
FNR in arr[i]{
close(outputfile)
outputfile="file"++count
}
($1 == val){
print $4 "\t" val > (outputfile)
}
' Input_file
With your shown samples, please try following. This will print all lines from 1st line to 242nd line to file1 and 243 line onwards it will print output to file2. Also program has a shell variable named t passed into awk program's variable named val here.
awk -v val="$t" '
FNR==1{
outputfile="file1"
}
FNR==243{
outputfile="file2"
}
($1 == val){
print $4 "\t" val > (outputfile)
}
' Input_file
$ awk -v val="$t" '{c=int((NR-1)%242/121)+1}
$1==val {print $4 "\t" $1 > (output"c")}' file
this should take the first, third, etc blocks of 121 records to output1 and second, fourth, etc blocks of 121 records to output2 if they satisfy the condition.
If you want to skip first two blocks (first 242 records) just add && NR>242 condition to the existing one.
I have two files.
File 1 includes various types of SeriesDescriptions
"SeriesDescription": "Type_*"
"SeriesDescription": "OtherType_*"
...
File 2 contains information with only one SeriesDescription
"Name":"Joe"
"Age":"18"
"SeriesDescription":"Type_(Joe_text)"
...
I want to
compare the two files and find the lines that match for "SeriesDescription" and
print the line number of the matched text from File 1.
Expected Output:
"SeriesDescription": "Type_*" 24 11 (the correct line numbers in my files)
"SeriesDescription" will always be found on line 11 of File 2. I am having trouble matching given the * and have also tried changing it to .* without luck.
Code I have tried:
grep -nf File1.txt File2.txt
Successfully matches, but I want the line number from File1
awk 'FNR==NR{l[$1]=NR; next}; $1 in l{print $0, l[$1], FNR}' File2.txt File1.txt
This finds a match and prints the line number from both files, however, this is matching on the first column and prints the last line from File 1 as the match (since every line has the same column 1 for File 1).
awk 'FNR==NR{l[$2]=$3;l[$2]=NR; next}; $2 in l{print $0, l[$2], FNR}' File2.txt File1.txt
Does not produce a match.
I have also tried various settings of FS=":" without luck. I am not sure if the trouble is coming from the regex or the use of "" in the files or something else. Any help would be greatly appreciated!
With your shown samples, please try following. Written and tested in GNU awk, should work in any awk.
awk '
{ val="" }
match($0,/^[^_]*_/){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+/,"",val)
}
FNR==NR{
if(val){
arr[val]=$0 OFS FNR
}
next
}
(val in arr){
print arr[val] OFS FNR
}
' SeriesDescriptions file2
With your shown samples output will be:
"SeriesDescription": "Type_*" 1 3
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{ val="" } ##Nullifying val here.
match($0,/^[^_]*_/){ ##Using match to match value till 1st occurrence of _ here.
val=substr($0,RSTART,RLENGTH) ##Creating val which has sub string of above matched regex.
gsub(/[[:space:]]+/,"",val) ##Globally substituting spaces with NULL in val here.
}
FNR==NR{ ##This will execute when first file is being read.
if(val){ ##If val is NOT NULL.
arr[val]=$0 OFS FNR ##Create arr with index of val, which has value of current line OFS and FNR in it.
}
next ##next will skip all further statements from here.
}
(val in arr){ ##Checking if val is present in arr then do following.
print arr[val] OFS FNR ##Printing arr value with OFS, FNR value.
}
' SeriesDescriptions file2 ##Mentioning Input_file name here.
Bonus solution: If above is working fine for you AND you have this match only once in your file2 then you can exit from program to make it quick, in that case have above code in following way.
awk '
{ val="" }
match($0,/^[^_]*_/){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+/,"",val)
}
FNR==NR{
if(val){
arr[val]=$0 OFS FNR
}
next
}
(val in arr){
print arr[val] OFS FNR
exit
}
' SeriesDescriptions file2
I am trying to match all the lines in the below file to match. The awk will do that the problem is that the lines that do not match should be within plus or minus 10. I am not sure how to tell awk that the if a match is not found then use either plus or minus the coordinates in file. If no match is found after that then no match is in the file. Thank you :).
file
955763
957852
976270
bigfile
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75
chr1 957571 957852 chr1:957571-957852 AGRN-7|gc=61.2
chr1 970621 970740 chr1:970621-970740 AGRN-8|gc=57.1
awk
awk 'NR==FNR{A[$1];next}$3 in A' file bigfile > output
desired output (same as bigfile)
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75
chr1 957571 957852 chr1:957571-957852 AGRN-7|gc=61.2
If there's no difference between a row that matches and one that's close, you could just set all of the keys in the range in the array:
awk 'NR == FNR { for (i = -10; i <= 10; ++i) A[$1+i]; next }
$3 in A' file bigfile > output
The advantage of this approach is that only one lookup is performed per line of the big file.
You need to run a loop on array a:
awk 'NR==FNR {
a[$1]
next
}
{
for (i in a)
if (i <= $3+10 && i >= $3-10)
print
}' file bigfile > output
Your data already produces the desired output (all exact match).
$ awk 'NR==FNR{a[$1];next} $3 in a{print; next}
{for(k in a)
if((k-$3)^2<=10^2) {print $0, " --> within 10 margin"; next}}' file bigfile
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75
chr1 957571 957852 chr1:957571-957852 AGRN-7|gc=61.2
chr1 976251 976261 chr1:976251-976261 AGRN-8|gc=57.1 --> within 10 margin
I added a fake 4th row to get the margin match
I am trying to write an AWK script to parse a file of the form
> field1 - field2 field3 ...
lineoftext
anotherlineoftext
anotherlineoftext
and I am checking using regex if the first line is correct (begins with a > and then has something after it) and then print all the other lines. This is the script I wrote but it only verifies that the file is in a correct format and then doesn't print anything.
#!/bin/bash
# FASTA parser
awk ' BEGIN { x = 0; }
{ if ($1 !~ />.*/ && x == 0)
{ print "Not a FASTA file"; exit; }
else { x = 1; next; }
print $0 }
END { print " - DONE - "; }'
Basically you can use the following awk command:
awk 'NR==1 && /^>./ {p=1} p' file
On the first row NR==1 it checks whether the line starts with a > followed by "something" (/^>./). If that condition is true the variable p will be set to one. The p at the end checks whether p evaluates true and prints the line in that case.
If you want to print the error message, you need to revert the logic a bit:
awk 'NR==1 && !/^>./ {print "Not a FASTA file"; exit 1} 1' file
In this case the program prints the error messages and exits the program if the first line does not start with a >. Otherwise all lines gets printed because 1 always evaluates to true.
For this OP literally
awk 'NR==1{p=$0~/^>/}p' YourFile
# shorter version with info of #EdMorton
awk 'NR==1{p=/^>/}p' YourFile
for line after > (including)
awk '!p{p=$0~/^>/}p' YourFile
# shorter version with info of #EdMorton
awk '!p{p=/^>/}p' YourFile
Since all you care about is the first line, you can just check that, then exit.
awk 'NR > 1 { exit (0) }
! /^>/ { print "Not a FASTA file" >"/dev/stderr"; exit (1) }' file
As noted in comments, the >"/dev/stderr" is a nonportable hack which may not work for you. Regard it as a placeholder for something slightly more sophisticated if you want a tool which behaves as one would expect from a standard Unix tool (run silently if no problems; report problems to standard error).