Awk Replace Header of a Text File - replace

I need to fix the header of a text file.
Format:
Key firstRowColumn1 lastRowColumn1
1 Data
2 Data
3 Data
4 Data
Basically, the header has to have the first and last index, which is in the first column of my actual data. I have a set of files that look like this:
Key 0 0
1 Data
2 Data
3 Data
4 Data
How can I use awk to fix them to look like the following?
Key 1 4
1 Data
2 Data
3 Data
4 Data

You can for example use this:
$ awk 'NR==2 && FNR==2 {first=$1} NR>2 && FNR==1 {print "Key", first, prev; f=1; next} f{print} {prev=$1}' file file
Key 1 4
1 Data
2 Data
3 Data
4 Data
Explanation
It loops twice through the file. First time to fetch the data, then to print it.
NR==2 && FNR==2 {first=$1} in the first loop, get the value of the 1st field on the 2nd line. This is the "first" value.
NR>2 && FNR==1 {print "Key", first, prev; f=1; next} in case we are reading the file for the second time, print the header with the information gathered. Set the flag f as true, so that the lines will be printed from now on. Skip the record so that the current line is not printed.
f{print} in case the flag f is set, print the line. This will be done during the second read of the file.
{prev=$1} store the value of the first field, to be used in the next line to get the "end" value.
In case you want to update the current file, do:
awk '...' file file > new_file && mv new_file file

Try this:
echo "Key $(sed -n '2s/^\([^ ]\+\) .*/\1/p' yourfile) $(tac yourfile | sed -n '1s/^\([^ ]\+\) .*/\1/p')" && sed -n '2,$p' yourfile

Related

awk concatenate strings till contain substring

I have a awk script from this example:
awk '/START/{if (x) print x; x="";}{x=(!x)?$0:x","$0;}END{print x;}' file
Here's a sample file with lines:
$ cat file
START
1
2
3
4
5
end
6
7
START
1
2
3
end
5
6
7
So I need to stop concatenating when destination string would contain end word, so the desired output is:
START,1,2,3,4,5,end
START,1,2,3,end
Short Awk solution (though it will check for /end/ pattern twice):
awk '/START/,/end/{ printf "%s%s",$0,(/^end/? ORS:",") }' file
The output:
START,1,2,3,4,5,end
START,1,2,3,end
/START/,/end/ - range pattern
A range pattern is made of two patterns separated by a comma, in the
form ‘begpat, endpat’. It is used to match ranges of consecutive
input records. The first pattern, begpat, controls where the range
begins, while endpat controls where the pattern ends.
/^end/? ORS:"," - set delimiter for the current item within a range
here is another awk
$ awk '/START/{ORS=","} /end/ && ORS=RS; ORS!=RS' file
START,1,2,3,4,5,end
START,1,2,3,end
Note that /end/ && ORS=RS; is shortened form of /end/{ORS=RS; print}
You can use this awk:
awk '/START/{p=1; x=""} p{x = x (x=="" ? "" : ",") $0} /end/{if (x) print x; p=0}' file
START,1,2,3,4,5,end
START,1,2,3,end
Another way, similar to answers in How to select lines between two patterns?
$ awk '/START/{ORS=","; f=1} /end/{ORS=RS; print; f=0} f' ip.txt
START,1,2,3,4,5,end
START,1,2,3,end
this doesn't need a buffer, but doesn't check if START had a corresponding end
/START/{ORS=","; f=1} set ORS as , and set a flag (which controls what lines to print)
/end/{ORS=RS; print; f=0} set ORS to newline on ending condition. Print the line and clear the flag
f print input record as long as this flag is set
Since we seem to have gone down the rabbit hole with ways to do this, here's a fairly reasonable approach with GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='end' -v OFS=',' 'RT{$0=gensub(/.*(START)/,"\\1",1); $NF=$NF OFS RT; print}' file
START,1,2,3,4,5,end
START,1,2,3,end

get the last word in body of text

Given a body of text than can span a varying number of lines, I need to use a grep, sed or awk solution to search through many files for the same pattern and get the last word in the body.
A file can include formats such as these where the word I want can be named anything
call function1(input1,
input2, #comment
input3) #comment
returning randomname1,
randomname2,
success3
call function1(input1,
input2,
input3)
returning randomname3,
randomname2,
randomname3
call function1(input1,
input2,
input3)
returning anothername3,
randomname2, anothername3
I need to print out results as
success3
randomname3
anothername3
Also I need some the filename and line information about each .
I've tried
pcregrep -M 'function1.*(\s*.*){6}(\w+)$' filename.txt
which is too greedy and I still need to print out just the specific grouped value and not the whole pattern. The words function1 and returning in my sample code will always be named as this and can be hard coded within my expression.
Last word of code blocks
Split file in blocks using awk's record separator RS. A record will be defined as a block of text, records are separated by double newlines.
A record consists of fields, each two consecutive fields are separated by white space or a single newline.
Now all we have to do is print the last field for each record, resulting in following code:
awk 'BEGIN{ FS="[\n\t ]"; RS="\n\n"} { print $NF }' file
Explanation:
FS this is the field separator and is set to either a newline, a tab or a space: [\n\t ].
RS this is the record separator and is set to a doulbe newline: \n\n
print $NF this will print the field $ with index NF, which is a variable containing the number of fields. Hence this prints the last field.
Note: To capture all paragraphs the file should end in double newline, this can easily be achieved by pre processing the file using: $ echo -e '\n\n' >> file.
Alternate solution based on comments
A more elegant ans simple solution is as follows:
awk -v RS='' '{ print $NF }' file
How about the following awk solution:
awk 'NF == 0 {if(last) print last; last=""} NF > 0 {last=$NF} END {print last}' file
the $NF is getting the value of the last "word" where NF stands for number of fields. Then the last variable always stores the last word on a line and prints it if it encounters an empty line, representing the end of a paragraph.
New version with matches function1 condition.
awk 'NF == 0 {if(last && hasF) print last; last=hasF=""}
NF > 0 {last=$NF; if(/function1/)hasF=1}
END {if(hasF) print last}' filename.txt
This will produce the output you show from the input file you posted:
$ awk -v RS= '{print $NF}' file
success3
randomname3
anothername3
If you want to print FILENAME and line number like you mention then this may be what you want:
$ cat tst.awk
NF { nr=NR; last=$NF; next }
{ prt() }
END { prt() }
function prt() { if (nr) print FILENAME, nr, last; nr=0 }
$ awk -f tst.awk file
file 6 success3
file 13 randomname3
file 20 anothername3
If that doesn't do what you want, edit your question to provide clearer, more truly representative and accurate sample input and expected output.
This is the perl version of Shellfish's awk solution (plus the keywords):
perl -00 -nE '/function1/ and /returning/ and say ((split)[-1])' file
or, with one regex:
perl -00 -nE '/^(?=.*function1)(?=.*returning).*?(\S+)\s*$/s and say $1' file
But the key is the -00 option which reads the file a paragraph at a time.

Sed replace string, based on input file

Wanting to see if there is a better/quicker way to do this.
Basically, I have a file and I need to add some more information to it, based on one of its fields. e.g.
File to edit:
USER|ROLE
user1|role1
user1|role2
user2|role1
user2|role11
Input File:
Role|Application
role1|applicationabc
role2|application_qwerty
role3|application_new_app_new
role4|qwerty_abc_123
role11|applicationabc123
By the end, I want to be left with something like this:
USER|ROLE|Application
user1|role1|applicationabc
user1|role2|application_qwerty
user2|role11|applicationabc123
user2|role3|application_new_app_new
My idea:
cat inputfile | while IFS='|' read src rep
do
sed -i "s#\<$src\>#$src\|$rep#" /path/to/file/filename.csv
done
What I've written works to an extent, but it is very slow. Also, if it finds a match anywhere in the line, it will replace it. For example, for user2, and role11, the script would match role1 before it matches role11.
So my questions are:
Is there a quicker way to do this?
Is there a way to match against the exact expression/string? Putting quotes in my input file doesn't seem to work.
With join:
join -i -t "|" -1 2 -2 1 <(sort -t '|' -k2b,2 file) <(sort -t '|' -k 1b,1 input)
From the join manpage:
Important: FILE1 and FILE2 must be sorted on the join fields.
That's why we need to sort the two files first: file on the first field and input on the second.
Then join joins the two file on those fields -1 2 -2 1. Output would then be:
ROLE|USER|Application
role1|user1|applicationabc
role1|user2|applicationabc
role11|user2|applicationabc123
role2|user1|application_qwerty
Piece of cake with awk:
$ cat file1
USER|ROLE
user1|role1
user1|role2
user2|role1
user2|role11
$ cat file2
ROLE|Application
role1|applicationabc
role2|application_qwerty
role3|application_new_app_new
role4|qwerty_abc_123
role11|applicationabc123
$ awk -F'\\|' 'NR==FNR{a[$1]=$2; next}; {print $0 "|" a[$2]}' file2 file1
USER|ROLE|Application
user1|role1|applicationabc
user1|role2|application_qwerty
user2|role1|applicationabc
user2|role11|applicationabc123
Please try the following:
awk 'FNR==NR{A[$1]=$2;next}s=$2 in A{ $3=A[$2] }s' FS='|' OFS='|' file2 file1
or:
awk 'FNR==NR{A[$1]=$2;next} $3 = $2 in A ? A[$2] : 0' FS='|' OFS='|' file2 file1
Explanation
awk '
# FNR==NR this is true only when awk reading first file
FNR==NR{
# Create array A where index = field1($1) and value = field2($2)
A[$1]=$2
# stop processing and go to next line
next
}
# Here we read 2nd file that is file1 in your case
# var in Array returns either 1=true or 0=false
# if array A has index field2 ($2) then s will be 1 otherwise 0
# whenever s is 1 that is nothing but true state, we create new field
# $3 and its value will be array element corresponds to array index field2
s=$2 in A{
$3=A[$2]
}s
# An awk program is a series of condition-action pairs,
# conditions being outside of curly braces and actions being enclosed in them.
# A condition is considered false if it evaluates to zero or the empty string,
# anything else is true (uninitialized variables are zero or empty string,
# depending on context, so they are false).
# Either a condition or an action can be implied;
# braces without a condition are considered to have a true condition and
# are always executed if they are hit,
# and any condition without an action will print the line
# if and only if the condition is met.
# So finally }s at the end of script
# it executes the default action for every line,
# printing the line whenever s is 1 that is true
# which may have been modified by the previous action in braces
# FS = Input Field Separator
# OFS = Output Field Separator
' FS='|' OFS='|' file2 file1

Delete Specific Lines with AWK [or sed, grep, whatever]

Is it possible to remove lines from a file using awk? I'd like to find any lines that have Y in the last column and then remove any lines that match the value in column 2 of said line.
Before:
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1,N
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2,N
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1,Y
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2,Y
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N
So awk would find that row 3 has Y in the last column, then look at column 2 [TRACKINGKEY1] and remove all lines that have TRACKINGKEY1 in column 2.
Expected result:
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N
The reason for this is that our shipping program puts out a file whenever a shipment is processed, as well as when that shipment gets voided [in case of an error]. So what I end up with is the initial package info, then the same info indicating that it was voided, then yet another set of lines with the new shipment info. Unfortunately our ERP software has a fairly simple scripting language in which I can't even make an array so I'm limited to shell tools.
Thanks in advance!
One way is to take 2 pass to same file using awk:
awk -F, 'NR == FNR && $NF=="Y" && !($2 in seen){seen[$2]}
NR != FNR && !($2 in seen)' file file
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N
Explanation:
NR == FNR # if processing the file 1st time
&& $NF=="Y" # and last field is Y
&& !($2 in seen) { # we haven't seen field 2 before
seen[$2]} # store field 2 in array seen
}
NR != FNR # when processing the file 2nd time
&& !($2 in seen) # array seen doesn't have field 2
# take default action and print the line
This solution is kind of gross, but kind of fun.
grep ',Y$' file | cut -d, -f2 | sort -u | grep -vwFf - file
grep ',Y$' file -- find the lines with Y in the last column
cut -d, -f2 -- print just the tracking key from those lines
sort -u -- give just the unique keys
grep -vwFf - file --
read the unique tracking keys from stdin (-f -)
only consider them a match if they are whole words (-w)
they are fixed strings, not regular expressions (-F)
then exclude lines matching these patterns (-v) from file

Add consecutive entries in a column

I have a file that has the format
0.99987799 17743.000
1.9996300 75.000000
2.9993899 75.000000
3.9991500 102.00000
4.9988999 131.00000
5.9986601 130.00000
6.9984102 152.00000
7.9981699 211.00000
8.9979200 256.00000
9.9976797 259.00000
10.997400 341.00000
11.997200 373.00000
What I would like to do is add the data in the second column, every four lines. So a desired output would be
1 17743+75+75+102
2 131+130+52+211
3 256+259+341+373
How can this be done in awk?
I know that I can find a specific element in the file using
awk 'FNR == 5 {print $2}' file
but I don't know how to add 4 elements in a row. If I try for instance
awk '$2 {print FNR == 5}' file
I get nothing but zeros, so I don't know how to parse the column first and then the line. I also tried
awk 'BEGIN{i=4}
{
for (NR>=1 || NR<=i)
{
print $2
}
}' filename
but I get a syntax error at NR<=i. I also don't have any idea how to loop on the entire file. Any help or idea would be more than welcome! Or perhaps would it be better to do it in C++? I don't know which is more convenient...
I also tried
awk 'BEGIN{sum=0} {{sum += $2} if(FNR%4 == 0) { print sum; sum=0}}' infile.dat
but it doesn't seem to work properly...
awk 'NR%4==1{sum=$2; next}{sum+=$2} NR%4==0{print ++j,sum;}' input.txt
Output:
1 17995
2 624
3 1229
For first number of a group it stores value of second column in $2, for next 3 rows adds the value of the second column and sum. for last row of a group NR%4==0 prints the result.
If you don't need the row numbers before the sum results just remove ++j,.
awk '{print $2}' file | paste -d+ - - - - | bc
This works fine for me:
awk '{sum += $2}
FNR%4==0 {print FNR/4, sum; sum = 0}
END {if(FNR%4){print int(FNR/4)+1, sum}}' awktest.txt
with the result of:
1 17995
2 624
3 1229