condition on print in an awk if statment - if-statement

I'm working on a little code with awk: I'm looking for a pattern and if its find I would like to print the newt 3 lines. Without an if, no problem:
awk '/\/1/ {x=NR+3}(NR<=x) {print > "out"}' input
the file I use:
#_5:1:7:9569:21200/1
CAGAATGCCGTGGAACTGAAACGTCTGGC
+
CCCFFFFFHHHHHJJJJIJJIHIJJIJJI
#_5:1:7:9569:21200/2
GCACCATCATCACCGGTTCCGGGCAGCGC
+
CCCFFFFFHHFHHJJJGHJJJJJJJIGGI
#_5:1:11:12099:7543/1
CAGAATGCCGTGGAACTGAAACGTCTGGC
I would like to separate this file in two others as follow
File 1
#_5:1:7:9569:21200/1
CAGAATGCCGTGGAACTGAAACGTCTGGC
+
CCCFFFFFHHHHHJJJJIJJIHIJJIJJI
#_5:1:11:12099:7543/1
CAGAATGCCGTGGAACTGAAACGTCTGGC
File 2
#_5:1:7:9569:21200/2
GCACCATCATCACCGGTTCCGGGCAGCGC
+
CCCFFFFFHHFHHJJJGHJJJJJJJIGGI
But with the if I have syntax error on the print
awk '{ if (/\/1/) {x=NR+3}(NR<=x) {print > "file1"};} else (/\/2/) {x=NR+3}(NR<=x) {print > "file2"}' "input_file"
If someone has an idea to fix that
Thanks!

Some like this:
awk -F"/" '/^#/ {f=$2} {print > ("file"f+0)}' data
Need to add the +0 to remove space after the line.
cat file1
#_5:1:7:9569:21200/1
CAGAATGCCGTGGAACTGAAACGTCTGGC
+
CCCFFFFFHHHHHJJJJIJJIHIJJIJJI
#_5:1:11:12099:7543/1
CAGAATGCCGTGGAACTGAAACGTCTGGC
cat file2
#_5:1:7:9569:21200/2
GCACCATCATCACCGGTTCCGGGCAGCGC
+
CCCFFFFFHHFHHJJJGHJJJJJJJIGGI
Using -F/ divide line into $1 and $2 by the separator /
{f=$2} store the last digit of line starting with #
Then data is written to "file"f" so when f=1, it would be file1 etc.

Related

awk concatenate strings till contain substring

I have a awk script from this example:
awk '/START/{if (x) print x; x="";}{x=(!x)?$0:x","$0;}END{print x;}' file
Here's a sample file with lines:
$ cat file
START
1
2
3
4
5
end
6
7
START
1
2
3
end
5
6
7
So I need to stop concatenating when destination string would contain end word, so the desired output is:
START,1,2,3,4,5,end
START,1,2,3,end
Short Awk solution (though it will check for /end/ pattern twice):
awk '/START/,/end/{ printf "%s%s",$0,(/^end/? ORS:",") }' file
The output:
START,1,2,3,4,5,end
START,1,2,3,end
/START/,/end/ - range pattern
A range pattern is made of two patterns separated by a comma, in the
form ‘begpat, endpat’. It is used to match ranges of consecutive
input records. The first pattern, begpat, controls where the range
begins, while endpat controls where the pattern ends.
/^end/? ORS:"," - set delimiter for the current item within a range
here is another awk
$ awk '/START/{ORS=","} /end/ && ORS=RS; ORS!=RS' file
START,1,2,3,4,5,end
START,1,2,3,end
Note that /end/ && ORS=RS; is shortened form of /end/{ORS=RS; print}
You can use this awk:
awk '/START/{p=1; x=""} p{x = x (x=="" ? "" : ",") $0} /end/{if (x) print x; p=0}' file
START,1,2,3,4,5,end
START,1,2,3,end
Another way, similar to answers in How to select lines between two patterns?
$ awk '/START/{ORS=","; f=1} /end/{ORS=RS; print; f=0} f' ip.txt
START,1,2,3,4,5,end
START,1,2,3,end
this doesn't need a buffer, but doesn't check if START had a corresponding end
/START/{ORS=","; f=1} set ORS as , and set a flag (which controls what lines to print)
/end/{ORS=RS; print; f=0} set ORS to newline on ending condition. Print the line and clear the flag
f print input record as long as this flag is set
Since we seem to have gone down the rabbit hole with ways to do this, here's a fairly reasonable approach with GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='end' -v OFS=',' 'RT{$0=gensub(/.*(START)/,"\\1",1); $NF=$NF OFS RT; print}' file
START,1,2,3,4,5,end
START,1,2,3,end

How to replace a text sequence that includes "\n" in a text file

This may sound duplicated, but I can't make this works.
Consider:
_ = space
- = minus sign
particle_little.csv is a file of this form:
waste line to be deleted
__data__data__data
_-data__data_-data
__data_-data__data
I need to get a standard csv format in particle_std.csv, like this:
data,data,data
-data,data,-data
data,-data,data
I am trying to use tail and tr to do that conversion, here I split the command:
tail -n +2 particle_little.csv to delete the first line
| tr -s ' ' to remove duplicated spaces
| tr '/\b\n \b/' '\n' to delete the very beginning space
| tr ' ' ',' to change spaces for commas
> particle_std.csv to put it in a output file
But I get this (without the 4th step):
data
data
data
-data
...
Finally, the file is huge, so it is almost impossible to open in editors (I know there are super editors that maybe can)
I would suggest that you used awk:
$ cat file
waste line to be deleted
data data data
-data data -data
data -data data
$ awk -v OFS=, '{ $1 = $1 } NR > 1' file
data,data,data
-data,data,-data
data,-data,data
The script sets the output field separator OFS to , and reassigns the first field to itself $1 = $1, causing awk to touch each line (and replace the spaces with commas). Lines after the first, where NR > 1, are printed (the default action is to print the line).
So if I'm reading you right - ignore lines that don't start with whitespace. Comma separate everything else.
I'd suggest perl:
perl -lane 'next unless /^\s/; print join ",", #F';
This, when given:
waste line to be deleted
data data data
-data data -data
data -data data
On STDIN (Or specified in a filename) outputs:
data,data,data
-data,data,-data
data,-data,data
This is because:
-l strips linefeeds (and replaces them after each print);
-a autosplits on any whitespace
-n wraps it in a while ( <> ) { loop which iterates line by line - functionally it means it works just like sed/grep/tr and reads STDIN or files specified as args.
-e allows specifying a perl snippet.
In this case:
skip any lines that don't start with \s or any whitespace.
any other lines, join the fields (#F generated by -a) with , as delimiter. (This auto-inserts a linefeed because -l)
Then you can either redirect the output to a file (>output.csv) or use -i.bak to edit inplace.
You should probably use sed or awk for this:
sed -e 1d -e 's/^ *//' -e 's/ */,/g'
One way to do it in Awk is:
awk 'NR == 1 { next }
{ pad=""; for (i = 1; i <= NF; i++) { printf "%s%s", pad, $i; pad="," } print "" }'
but there's a better way to do it in Awk:
awk 'BEGIN { OFS=","} NR == 1 { next } { $1 = $1; print }' data
The BEGIN block sets the output field separator; the assignment $1 = $1; forces Awk to rework the output line; the print prints it.
I've left the first Awk version around because it shows there's more than one way to do it, and in some circumstances, such methods can be useful. But for this task, the second Awk version is better — simpler, more compact (and isomorphic with Tom Fenech's answer).

get the last word in body of text

Given a body of text than can span a varying number of lines, I need to use a grep, sed or awk solution to search through many files for the same pattern and get the last word in the body.
A file can include formats such as these where the word I want can be named anything
call function1(input1,
input2, #comment
input3) #comment
returning randomname1,
randomname2,
success3
call function1(input1,
input2,
input3)
returning randomname3,
randomname2,
randomname3
call function1(input1,
input2,
input3)
returning anothername3,
randomname2, anothername3
I need to print out results as
success3
randomname3
anothername3
Also I need some the filename and line information about each .
I've tried
pcregrep -M 'function1.*(\s*.*){6}(\w+)$' filename.txt
which is too greedy and I still need to print out just the specific grouped value and not the whole pattern. The words function1 and returning in my sample code will always be named as this and can be hard coded within my expression.
Last word of code blocks
Split file in blocks using awk's record separator RS. A record will be defined as a block of text, records are separated by double newlines.
A record consists of fields, each two consecutive fields are separated by white space or a single newline.
Now all we have to do is print the last field for each record, resulting in following code:
awk 'BEGIN{ FS="[\n\t ]"; RS="\n\n"} { print $NF }' file
Explanation:
FS this is the field separator and is set to either a newline, a tab or a space: [\n\t ].
RS this is the record separator and is set to a doulbe newline: \n\n
print $NF this will print the field $ with index NF, which is a variable containing the number of fields. Hence this prints the last field.
Note: To capture all paragraphs the file should end in double newline, this can easily be achieved by pre processing the file using: $ echo -e '\n\n' >> file.
Alternate solution based on comments
A more elegant ans simple solution is as follows:
awk -v RS='' '{ print $NF }' file
How about the following awk solution:
awk 'NF == 0 {if(last) print last; last=""} NF > 0 {last=$NF} END {print last}' file
the $NF is getting the value of the last "word" where NF stands for number of fields. Then the last variable always stores the last word on a line and prints it if it encounters an empty line, representing the end of a paragraph.
New version with matches function1 condition.
awk 'NF == 0 {if(last && hasF) print last; last=hasF=""}
NF > 0 {last=$NF; if(/function1/)hasF=1}
END {if(hasF) print last}' filename.txt
This will produce the output you show from the input file you posted:
$ awk -v RS= '{print $NF}' file
success3
randomname3
anothername3
If you want to print FILENAME and line number like you mention then this may be what you want:
$ cat tst.awk
NF { nr=NR; last=$NF; next }
{ prt() }
END { prt() }
function prt() { if (nr) print FILENAME, nr, last; nr=0 }
$ awk -f tst.awk file
file 6 success3
file 13 randomname3
file 20 anothername3
If that doesn't do what you want, edit your question to provide clearer, more truly representative and accurate sample input and expected output.
This is the perl version of Shellfish's awk solution (plus the keywords):
perl -00 -nE '/function1/ and /returning/ and say ((split)[-1])' file
or, with one regex:
perl -00 -nE '/^(?=.*function1)(?=.*returning).*?(\S+)\s*$/s and say $1' file
But the key is the -00 option which reads the file a paragraph at a time.

Select rows in a CSV not matching any pattern in pattern file in GNU Linux (AWK/SED/GREP)

I need to print all the lines in a CSV file when 3rd field does not matches a pattern in a pattern file. I've been doing the opposite, printing matches with the following script:
awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' FILE2 FILE1
FILE1
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832
FILE2
00567
98745
45486
54543
48349
96349
56485
19615
56496
39493
OUTPUT
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
How can I print lines not matching patterns in pattern file?
Thank you very much!
Invert the selection:
# v-- here
awk -F, 'FNR == NR { patterns[$0] = 1; next } !patterns[$3]' FILE2 FILE1
You just need invert the match from your previous question
grep -vf <( sed -e 's/^\|$/,/g' file2) file1
PS. see the -v flag

file comparision using Awk in linux

I have two files
File A.txt (Groupname; Groupid)
wheel:1
www:2
ftpshare:3
others:4
File B.txt (username:UserID:Groupid)
pi:1:1
useradmin:2:3
usertwo:3:3
trout:4:3
apachecri:5:2
guestthree:6:4
I need to create a output where it shows username:userID: Groupname like below
pi:1:wheel
useradmin:2:ftpshare
(and so on)
This needs to be done using awk for a unix class. After spending countless hrs trying to figure it out here is what I came up with.
awk -F ':' 'NR==FNR{ if ($2==[a-z]) a[$1] = $4;next} NF{ print $1, $2, $4}' fileA.txt fileB.txt
OR
awk -F, 'NR==FNR{ a[$2]=$2$1;next } NF{ print $1, $2 ((a[$2]==$2$3)?",ok":",nok") }' FileA.txt FileB.txt
can someone help me figure this out to get the right input and explain it to me what I am doing wrong.
You can use awk:
awk 'BEGIN{FS=OFS=":"} FNR==NR{a[$2]=$1; next} $3 in a{print $1, $2, a[$3]}' a.txt b.txt
pi:1:wheel
useradmin:2:ftpshare
usertwo:3:ftpshare
trout:4:ftpshare
apachecri:5:www
guestthree:6:others
How it works:
BEGIN{FS=OFS=":"} - Make input and output field separator as colon
FNR==NR - Execute this block for fileA only
{a[$2]=$1; next} - Create an associative array a with key as $2 and value as $1 and then skip to next record
$3 in a - Execute this block for 2nd file if $3 is found in array a
print $1, $2, a[$3] Print field1, field2 and a[field3]
I know you said you want to use awk, but you should also consider the standard tool designed for a task like this, namely join. Here is one way you could apply it:
join -o '2.1 2.2 1.1' -t: -1 2 -2 3 <(sort -t: -k2,2n fileA.txt) \
<(sort -t: -k3,3n fileB.txt)
Because the input to join needs to be sorted on the join-field, this method leaves the output unordered, if this is important use anubhava's answer.
Output in this case:
pi:1:wheel
apachecri:5:www
trout:4:ftpshare
useradmin:2:ftpshare
usertwo:3:ftpshare
guestthree:6:others