awk: replace second column if not zero - regex

I'm trying to use awk to check the second column of a three column set of data and replace its value if it's not zero. I've found this regex to find the non-zero numbers, but I can't figure out how to combine gsub with print to replace the contents and output it to a new file. I only want to run the gsub on the second column, not the first or third. Is there a simple awk one-liner to do this? Or am I looking at doing something more complex? I've even tried doing an expression to check for zero, but I'm not sure how to do an if/else statement in awk.
The command that I had semi-success with was:
awk '$2 != 0 {print $1, 1, $3}' input > output
The problem is that it didn't print out the row if the second column was zero. This is where I thought either gsub or an if/else statement would work, but I can't figure out the awk syntax. Any guidance on this would be appreciated.

Remember that in awk, anything that is not 0 is true (though any string that is not "0" is also true). So:
awk '$2 { $2 = 1; print }' input > output
The $2 evaluates to true if it's not 0. The rest is obvious. This replicates your script.
If you want to print all lines, including the ones with a zero in $2, I'd go with this:
awk '$2 { $2 = 1 } 1' input > output
This does the same replacement as above, but the 1 at the end is short-hand for "true". And without a statement, the default statement of {print} is run.
Is this what you're looking for?
In action, it looks like this:
[ghoti#pc ~]$ printf 'none 0 nada\none 1 uno\ntwo 2 tvo\n'
none 0 nada
one 1 uno
two 2 tvo
[ghoti#pc ~]$ printf 'none 0 nada\none 1 uno\ntwo 2 tvo\n' | awk '$2 { $2 = 1 } 1'
none 0 nada
one 1 uno
two 1 tvo
[ghoti#pc ~]$

Is this what you want?
awk '$2 != 0 {print $1, 1, $3} $2 == 0 {print}' input > output
or with sed:
sed 's/\([^ ]*\) [0-9]*[1-9][0-9]* /\1 1 /' input > output

Related

Bash AWK and Regex Apply on specific Column

I have the following dataset
Name,quantity,unit
car,6,6
plane,7,5
ship,2,3.44
bike,8,7.66
I want to print only the names which has unit in whole numbers.
I have done the following which does not give out the result
#!/bin/bash
awk 'BEGIN {
FS=","
}
/^[0-9]*$/ {
print "Has Whole numbers: " $1
}
' file.csv
The result should be
Has Whole numbers: car
Has Whole numbers: plane
Added a couple of lines to your test data:
Name,quantity,unit
car,6,6
plane,7,5
ship,2,3.44
bike,8,7.66
Starship,1,1.0
Super Heavy,2,0
null,0,
And awk:
$ awk -F, 'int($3)==$3 ""' file
Output:
car,6,6
plane,7,5
Super Heavy,2,0
int($3) makes an integer of $3 and $3 "" turns $3 to a string.
If you are sure 3rd column is a number:
awk -F, '(NR != 1 && $3 !~ /\./){print "Has Whole numbers:", $1}' file.csv
or well actually its better the way you did it:
awk -F, '$3 ~ /^[0-9]$/{print "Has Whole numbers:", $1}' file
Try changing /^[0-9]*$/ to $3 ~ /^[0-9]*$/ && $3 != 0 once in your tried attempt it should work then.
In case you DO NOT want to hard code field number and want to find out unit field number automatically then try following.
awk -F="," -v field_val="unit" '
FNR==1{
for(j=1;j<=NF;j++){
if($j==field_val){
field_number=j
next
}
}
}
$field_number ~ /[0-9]*$/ && $field_number!=0{
print "Has whole numbers: " $1
}' Input_file

grep line with exact pattern in first column

I have this script :
while read line; do grep $line my_annot | awk '{print $2}' ; done < foo.txt
But it doesn't return what I want.
The problem is that in foo.txt, when I have for instance Contig1, the script will return the column 2 of the file my_annot even if the pattern found is Contig12 and not Contig1 only!
I tried with $ at the end of the pattern but the problem is that it corresponds to end of line while this expression I search is in column 1 and therefore not end of line.
How can I tell to search this EXACT pattern and not those that contain this pattern?
####### ANSWER :
My script is :
annot='/home/mu/myannot'
awk 'NR == FNR { line[$0]; next } $1 in line { print $2 }' $1 $annot > out
It allows me to give the list of expression I want to find as first argument doing ./myscript.sh mylist
And I redirect the result in a file called out.
Thank you guys !!!!
You should use awk to do the whole thing:
awk 'NR == FNR { line[$0]; next } $1 in line { print $2 }' foo.txt my_annot
This reads each line of foo.txt, setting a key in the array line, then prints the second column of any lines whose first column exactly matches one of the keys in the array.
Of course I have made a guess that the format of your data is the same as in the other answer.
So you have a file like
Contig1 hugo
Contig12 paul
right?
Then this will help:
awk '$1~/^Contig1$/ {print $2}' foo.txt
I think this is what you want
while read line; do grep -w $line my_annot | awk '{print $2}' ; done < foo.txt
But it's not 100% clear (because of a lack of example data) whether it will work in all cases.

single PCRE regex to swap '0' for '1' and '1' for '0' at a specific location in a string

My input string has either a '0' or a '1' at a specific location.
If it's a '1', I want to replace it with '0' and likewise if it's '0' replace it with a '1'.
Working on the assumption 'x' will never occur and my string is just a single character, I can do it with 3 regexes like so:
s/0/x/
s/1/0/
s/x/0/
but that's pretty messy. I was wondering if PCRE has something fancy that can do this with just one expression?
The presence of an other x in the string is not a problem since you don't need it:
sed 'y/01/10/' input >output
or without sed:
tr 01 10 <input >output
I'm posting this from the comment by #Casimir et Hippolyte as Community Wiki because I think it's the best answer, even above the one I posted myself.
Here is a gnu-awk command to flip 1 to 0 and 0 to 1 in the given input:
awk -v RS='[01]' '{ORS = xor(RT, 1)} 1' <<< "abc1foo0bar"
abc0foo1bar
awk -v RS='[01]' '{ORS = xor(RT, 1)} 1' <<< "1"
0
awk -v RS='[01]' '{ORS = xor(RT, 1)} 1' <<< "0"
1
awk -v RS='[01]' '{ORS = xor(RT, 1)} 1' <<< "foo1"
foo0
awk -v RS='[01]' '{ORS = xor(RT, 1)} 1' <<< "0bar"
1bar
This awk command uses a custom record separator as 0 or 1 and uses bitwise xor operation on RT variable.
You could use something like
perl -pe 's/([01])/ 0+!$1 /ge' file
where $1 is the captured string, ! produces its negation, and 0+ forces a numeric context. The /g flag says to process every match on a line, and the /e causes the substitution expression to be evaluated as Perl code.
IdeOne.com demo
Not sure what is your use case, but what about using sed?
echo 'ah1s' | sed 's/0/x/;s/1/0/;s/x/1/'
echo 'ah0s' | sed 's/0/x/;s/1/0/;s/x/1/'

Is there a way to obtain the current pattern searched in an AWK script?

The basic idea is this. Suppose that you want to search a file for multiple patterns from a pipe with awk :
... | awk -f - '{...}' someFile.txt
* '...' is just short for some code
* '-f -' indicates the pattern is taken from pipe
Is there a way to know which pattern is searched at each instant within the awk script
(like you know $1 is the first field, is there something like $PATTERN that contains the current pattern
searched or a way to get something like it?
More Elaboration:
if I have 2 files:
someFile.txt containing:
1
2
4
patterns.txt containing:
1
2
3
4
running this command:
cat patterns.txt |awk -f - '{...}' someFile.txt
What should I type between the braces such that only the pattern in patterns.txt that
has not been matched in someFile.txt is printed?(in this case the number 3 in patterns.txt is not matched)
Under the requirements that patterns.txt be supplied as stdin and that the processing be done with awk:
$ cat patterns.txt | awk 'FNR==NR{p=p "\n" $0;next;} p !~ $0' someFile.txt -
3
This was tested using GNU awk.
Explanation
We want to remove from patterns.txt anything that matches a line in someFile.txt. To do this, we first read in someFile.txt and create patterns from it. Next, we print only the lines from patterns.txt that do not match any of the patterns from someFile.txt.
FNR==NR{p=p "\n" $0;next;}
NR is the number of lines that awk has read so far and FNR is the number of lines that awk has read so far from the current file. Thus, if FNR==NR, we are still reading the first named file: someFile.txt. We save all such lines in the newline-separated variable p. We then tell awk to skip the remaining commands and jump to the next line.
p !~ $0
If we got here, then we are now reading the second named file on the command line which is - for stdin. This boolean condition evaluates to either true or false. If it is true, the line is printed. If not, it is skipped. In other words, the above is awk's crytic shorthand for:
p !~ $0 {print $0}
cmd | awk 'NR==FNR{pats[$0]; next} {for (p in pats) if ($0 ~ p) delete pats[p]} END{ for (p in pats) print p }' - someFile.txt
Another way in awk
cat patterns.txt | awk 'NR>FNR&&!($0 in a);{a[$0]}' someFile.txt -

Awk to skip the blank lines

The output of my script is tab delimited using awk as :
awk -v variable=$bashvariable '{print variable"\t single\t" $0"\t double"}' myinfile.c
The awk command is run in a while loop which updates the variable value and the file myinfile.c for every cycle.
I am getting the expected results with this command .
But if the inmyfile.c contains a blank line (it can contain) it prints no relevant information. can I tell awk to ignore the blank line ?
I know it can be done by removing the blank lines from myinfile.c before passing it on to awk .
I am in knowledge of sed and tr way but I want awk to do it in the above mentioned command itself and not a separate solution as below or a piped one.
sed '/^$/d' myinfile.c
tr -s "\n" < myinfile.c
Thanks in advance for your suggestions and replies.
There are two approaches you can try to filter out lines:
awk 'NF' data.txt
and
awk 'length' data.txt
Just put these at the start of your command, i.e.,
awk -v variable=$bashvariable 'NF { print variable ... }' myinfile
or
awk -v variable=$bashvariable 'length { print variable ... }' myinfile
Both of these act as gatekeepers/if-statements.
The first approach works by only printining out lines where the number of fields (NF) is not zero (i.e., greater than zero).
The second method looks at the line length and acts if the length is not zero (i.e., greater than zero)
You can pick the approach that is most suitable for your data/needs.
You could just add
/^\s*$/ {next;}
To the front of your script that will match the blank lines and skip the rest of the awk matching rules. Put it all together:
awk -v variable=$bashvariable '/^\s*$/ {next;} {print variable"\t single\t" $0"\t double"}' myinfile.c
may be you could try this out:
awk -v variable=$bashvariable '$0{print variable"\t single\t" $0"\t double"}' myinfile.c
Try this:
awk -v variable=$bashvariable '/^.+$/{print variable"\t single\t" $0"\t double"}' myinfile.c
I haven't seen this solution, so: awk '!/^\s*$/{print $1}' will run the block for all non-empty lines.
\s metacharacter is not available in all awk implementations, but you can also write !/^[ \t]*$/.
https://www.gnu.org/software/gawk/manual/gawk.html
\s Matches any space character as defined by the current locale. Think of it as shorthand for ‘[[:space:]]’.
Based on Levon's answer, you may just add | awk 'length { print $1 }' to the end of the command.
So change
awk -v variable=$bashvariable '{ whatever }' myinfile.c
to
awk -v variable=$bashvariable '{ whatever }' myinfile.c | awk 'length { print $1 }'
In case this doesn't work, use | awk 'NF { print $1 }' instead.
another awk way of only trimming out actually zero length lines but keep the ones with only spaces tabs is this :
awk 8 RS=
just doing awk NF trims out lines 3 (zero length) and 5 (spaces and tabs) …..
1 abc
2 def
3
4 3591952
5
6 93253
1 abc
2 def
3 3591952
4
5 93253
1 abc
2 def
3 3591952
4 93253
but the RS= approach keeps line 5 for u:
1 abc
2 def
3 3591952
4
5 93253
** lines with \013 \v VT :: \014 \f FF :: \015 \r CR aren't skipped by default FS = " ", despite them also belonging to POSIX [[:space:]]