AWK if then loop conditional on column values - if-statement

I have 3 columns, I want to create a 4th column that is equal to the 2nd column only when the 3rd column is equal to 1 (otherwise, the value can be 0).
For example,
4 3 1
would become
4 3 1 3
whereas
4 3 2
would become
4 3 2 0
I tried it 3 ways, in all cases the 4th column is all zeroes:
'BEGIN {FS = "\t"}; {if ($3!=1) last=0; else last=$2} {print $1, $2, $3, last}'
'BEGIN {FS = "\t"}; {if ($3!=1) print $1, $2, $3, 0; else print $1, $2, $3, $2}'
'BEGIN {FS = "\t"}; {if ($3==1) print $1, $2, $3, $2; else print $1, $2, $3, 0}'

awk to the rescue
awk '{$(NF+1)=$3==1?$2:0}1'

$ awk '{print $0, ($3==1?$2:0)}' file
4 3 1 3
4 3 2 0

Related

Awk: From CSV to PDB (Protein Data Bank)

I have a CSV file with this format:
ATOM,3662,H,VAL,A,257,6.111,31.650,13.338,1.00,0.00,H
ATOM,3663,HA,VAL,A,257,3.180,31.995,13.768,1.00,0.00,H
ATOM,3664,HB,VAL,A,257,4.726,32.321,11.170,1.00,0.00,H
ATOM,3665,HG11,VAL,A,257,2.387,31.587,10.892,1.00,0.00,H
And I would like to format it according to PDB standards (fixed position):
ATOM 3662 H VAL A 257 6.111 31.650 13.338 1.00 0.00 H
ATOM 3663 HA VAL A 257 3.180 31.995 13.768 1.00 0.00 H
ATOM 3664 HB VAL A 257 4.726 32.321 11.170 1.00 0.00 H
ATOM 3665 HG11 VAL A 257 2.387 31.587 10.892 1.00 0.00 H
One can consider that everything is right-justified except for the first and the third column. The first is not a problem. The third however, it is left-justified when it length is 1-3 but shifted one position to the left when it is 4.
I have this AWK one-liner that almost does the trick:
awk -F, 'BEGIN {OFS=FS} {if(length($3) == 4 ) {pad=" "} else {pad=" "}} {printf "%-6s%5s%s%-4s%4s%2s%4s%11s%8s%8s%6s%6s%12s\n", $1, $2, $pad, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12}' < 1iy8_min.csv
Except for two things:
The exception of the third column. I was thinking about adding a condition which changes the padding before the third column according to the field length, but I cannot get it to work (the idea is illustrated in the above one-liner).
The other problem is that if there are no spaces between the fields, the padding does not work at all.
ATOM 3799 HH TYR A 267 -5.713 16.149 26.838 1.00 0.00 H
HETATM 3801 O7N NADA12688.285 19.839 10.489 1.00 20.51 O
In the above example, the second line should be:
HETATM 3801 O7N NAD A1268 8.285 19.839 10.489 1.00 20.51 O
But because there is no space between fields 5 and 6, everything gets shuffled. It think that A1268 is perceived as being one field. Maybe because the default awk delimiter seems to be a blank space. Is it possible to make it position-dependent?
UPDATE: The following solves the problem with the exception on the third column:
awk 'BEGIN {FS = ",";OFS = ""} { if(length($3) == 4 ) {pad = sprintf("%s", " ")} else {pad = sprintf("%2s", " ")} } { if(length($3) == 4 ) {pad2 = sprintf("%s", " ")} else {pad2 = sprintf("%s", "")} } {printf "%-6s%5s%s%-4s%s%3s%2s%4s%11s%8s%8s%6s%6s%12s\n", $1, $2, pad, $3, pad2, $4, $5, $6, $7, $8, $9, $10, $11, $12}' 1iy8_min.csv
However, OFS seems to be ignored...
UPDATE2: The problem was in the input file. Sorry about that. Solved.
The working script:
awk 'BEGIN{OFS=FS=","}{$7=sprintf("%.3f",$7)}1{$8=sprintf("%.3f",$8)}1{$9=sprintf("%.3f",$9)}1' ${file} | awk 'BEGIN {FS =","; OFS=""} { if(length($3) == 4 ) {pad = sprintf("%s", " ")} else {pad = sprintf("%2s", " ")} } { if(length($3) == 4 ) {pad2 = sprintf("%s", " ")} else {pad2 = sprintf("%s", "")} } {printf "%-6s%5s%s%-4s%s%3s%2s%4s%12s%8s%8s%6s%6s%12s\n", $1, $2, pad, $3, pad2, $4, $5, $6, $7, $8, $9, $10, $11, $12}' > ${root}_csv.pdb

replace patterns in a text file with an update file

I have this 2 csv file
old.csv
station,32145,80
station,32145,60
new.csv
station,32145,80
station,32145,801
expecting result
result.csv
station,32145,80,no change
station,32145,801,new
station,32145,60,Delete
I have used diff and awk to do the job, but I have slight issue. The row has no changed or the one deleted updated correctly but the new one is not. Anyone can show me where is my mistake?
diff -W999 --side-by-side old.csv new.csv |
awk '/[|][\t]/{split($0,a,"[|][\t]");print a[2]" No Change"};/[\t] *<$/{split($0,a,"[|][\t]* *<$");print a[1]" Delete"};/>[\t]/{split($0,a,">[\t]");print a[2]" New"}'
This should work:
awk -F, '
NR==FNR && NF {a[$0","]++; next}
NF {print ($0 in a) ? $0"no change" : $0"new"; delete a[$0]}
END {for (x in a) print x"delete"}' old.csv new.csv
Output:
station,32145,80,no change
station,32145,801,new
station,32145,60,delete
Update based on comments: Handle random . in second column
awk 'BEGIN{FS=OFS=","}
NR==FNR {gsub(/[.]/,"",$2);a[$0","]++; next}
NF {gsub(/[.]/,"",$2); print ($0 in a) ? $0"no change" : $0"new"; delete a[$0]}
END {for (x in a) print x"delete"}' old.csv new.csv
Code for awk:
new without trailing commas:
awk -v OFS="," 'NR==FNR {a[$0]=$0;next};{b[$0]=$0};$0==a[$0] {print $0, "no change"};a[$0]==0 {print $0, "new"};END {for (x in a) {if (b[x]==0) {print a[x], "Delete"}}}' old new
new with trailing commas:
$awk -v OFS="" 'NR==FNR {a[$0","]=$0",";next};{b[$0]=$0};$0==a[$0] {print $0, "no change"};a[$0]==0 {print $0, "new"};END {for (x in a) {if (b[x]==0) {print a[x], "Delete"}}}' old new

Awk gensub transformation

echo "0.123e2" | gawk '{print gensub(/([0-9]+\.[0-9]+)e([0-9]+)/, "\\1 * 10 ^ \\2", "g")}'
gives me "0.123 * 10 ^ 2" as a result as expected.
Is there a way to actually tell it to calculate the term to "12.3" ?
In general: Is there a way to modify/transform the matches (\\1,\\2,...)?
It could be easier with perl:
perl -pe 's/(\d+\.\d+e\d+)/ sprintf("%.1f",$1) /ge' filename
With your test data:
echo '0.123e2 xyz/$&" 0.3322e12)282 abc' | perl -pe 's/(\d+\.\d+e\d+)/ sprintf("%.1f",$1) /ge'
12.3 xyz/$&" 332200000000.0)282 abc
With awk:
awk '{
while ( match( $0, /[0-9]+\.[0-9]+e[0-9]+/ ) > 0 ) {
num = sprintf("%.1f", substr( $0, RSTART, RLENGTH ) )
sub( /[0-9]+\.[0-9]+e[0-9]+/, num )
}
print $0
}' filename
You just want to use printf to specify the output format:
$ echo "0.123e2" | awk '{printf "%.1f\n",$0}'
12.3

Working with AWK regex

I have a file in which have values in following format-
20/01/2012 01:14:27;UP;UserID;User=bob email=abc#sample.com
I want to pick each value from this file (not labels). By saying label, i mean to say that for string email=abc#sample.com, i only want to pick abc#sample.com and for sting User=bob, i only want to pic bob. All the Space separated values are easy to pick but i am unable to pick the values separated by Semi colon. Below is the command i am using in awk-
awk '{print "1=",$1} /;/{print "2=",$2,"3=",$3}' sample_file
In $2, i am getting the complete string till bob and rest of the string is assigned to $3. Although i can work with substr provided with awk but i want to be on safe side, string length may vary.
Can somebody tell me how to design such regex to parse my file.
You can set multiple delimiters using awk -F:
awk -F "[ \t;=]+" '{ print $1, $2, $3, $4, $5, $6, $7, $8 }' file.txt
Results:
value1 value2 value3 value4 label1 value5 label2 value6
EDIT:
You can remove anything before the equal signs using sub (/[^=]*=/,"", $i). This will allow you to just print the 'values':
awk 'BEGIN { FS="[ \t;]+"; OFS=" " } { for (i=1; i<=NF; i++) { sub (/[^=]*=/,"", $i); line = (line ? line OFS : "") $i } print line; line = "" }' file.txt
Results:
20/01/2012 01:14:27 UP UserID bob abc#sample.com

Separate string of digits into 3 columns using awk/sed

I have a string of digits in rows as below:
6390212345678912011012112121003574820069121409100000065471234567810
6390219876543212011012112221203526930428968109100000065478765432196
That I need to split into 6 columns as below:
639021234567891,201101211212100,3574820069121409,1000000,654712345678,10
639021987654321,201101211222120,3526930428968109,1000000,654787654321,96
Conditions:
Field 1 = 15 Char
Field 2 = 15 Char
Field 3 = 15 or 16 Char
Field 4 = 7 Char
Field 5 = 12 Char
Field 6 = 2 Char
Final Output:
639021234567891,3574820069121409,654712345678
639021987654321,3526930428968109,654787654321
It's not clear how detect whether field 3 should have 15 or 16 chars. But as draft for the first 3 fields you could use something like that:
echo 63902910069758520110121121210035748200670169758510 |
awk '{ printf("%s,%s,%s",substr($1,1,15),substr($1,16,15),substr($1,30,15)); }'
Or with sed:
echo $NUM | sed -r 's/^([0-9]{16})([0-9]{15})([0-9]{15,16}) ...$/\1,\2,\3, .../'
This will use 15 or 16 for the length of field 3 based the length of the whole string.
If you're using gawk:
gawk -v f3w=16 'BEGIN {OFS=","; FIELDWIDTHS="15 15 " f3w " 7 12 2"} {print $1, $3, $5}'
Do you know ahead of time what the width of Field 3 should be? Do you need it to be programatically determined? How? Based on the total length of the line? Does it change line-by-line?
Edit:
If you don't have gawk, then this is a similar approach:
awk -v f3w=16 'BEGIN {OFS=","; FIELDWIDTHS="15 15 " f3w " 7 12 2"; n=split(FIELDWIDTHS,fw," ")} { p=1; r=$0; for (i=1;i<=n;i++) { $i=substr(r,p,fw[i]); p += fw[i]}; print $1,$3,$5}'