AWK if last field equals - if-statement

I've done considerable searching and I'm getting frustrated. I'm trying have conditional response on the last field of a file that has dynamic number of fields. I've tried reworking the field separator, but that's been unsuccessful.
It prints the correct field when I use awk '{print $(NF)}'
I attempted to do something like this:
'{ if ($(NF) ~ /LTO/); print $NF}'
but all fields are returning true. I suspect that the variable substitution in the if statement is not returning the value I want it to.

{if ($(NF) ~ /LTO/); print $NF}
^
|
|____ Here is your problem. Your if statement has
no effect on the subsequent print. Remove ;
of course idiomatic way of writing this is
$NF~/LTO/{print $NF}

Related

remove duplicate lines (only first part) from a file

I have a list like this
ABC|Hello1
ABC|Hello2
ABC|Hello3
DEF|Test
GHJ|Blabla1
GHJ|Blabla2
And i want it to be this:
ABC|Hello1
DEF|Test
GHJ|Blabla1
so i want to remove the duplicates in each line before the: |
and only let the first one there.
A simple way using awk
$ awk -F"|" '!seen[$1]++ {print $0}' file
ABC|Hello1
DEF|Test
GHJ|Blabla1
The trick here is to set the appropriate field separator "|" in this case after which the individual columns can be accessed column-wise starting with $1. In this answer, am maintaining a unique-value array seen and printing the line only if the value from $1 is not seen previously.

Iterate over fields and print only those matching regex

I am having trouble getting this to work:
awk -F ";" '{for(i=1; i < NF;i++) $i ~ /^_.*/ {print $i}}'
I want to iterate over all the fields (records can have 7-9) and print only those that start with an _ except the line above gives me a syntax error at the print statement and if I omit the {print $i} I dont get any output.
How is the correct way to do this?
You're missing an if:
awk -F ";" '{for(i=1; i < NF;i++) if($i ~ /^_.*/) {print $i}}'
The structure of an awk program is condition { action } but what you have currently is all within an action block (the condition is true by default). Within the action block the if isn't implicit.
As an aside, the .* in the pattern is redundant; you may as well use /^_/ to match any string starting with _.
Note: since fields are 1-indexed, likely that the right loop condition is i <= NF. If you are sure that the last field is unnecessary, condition i < NF will do the job.

shell script : how to avoid null character when i read values from a file

i have many values in a file and in one line i am getting a null value. null value is coming as the last value in that file. I want to avoid null value and take the last integer value from the file. Can anyone help me with the Regular expression for doing that. i can post what i tried here.
value=`cat $working_tmp_dir/numbers.txt|tail -3| head -1|cut -f2 -d'='|cut -b 1-8`
when i tried the above i am not getting last integer value.. its giving me null.
sample values in the files are:
date=11052015
date=11062015
date=11092015
date=11122015
date=11192015
date=12172015
date=20160202
date="null value coming here"
the space in between numbers are just format issue.
Please help me with that.
This awk command should work:
awk -F= '$2+0 == $2{n=$2} END{print n}' file
20160202
$2+0 == $2 will be true only if $2 represents a number.
You could just parse your file with awk to get the last line that matches the pattern.
Based on your sample output update you could this if the line begin with date=
value=$(awk '/date=[0-9]/{a=$0}END{print a}' $working_tmp_dir/numbers.txt | grep -oP "\d+")
This will find lines that start with a number and set it to a variable a it will do this for each match and as a result the last match will be the final value set to the variable.
Also you could do this:
value=$(tail -n2 $working_tmp_dir/numbers.txt | grep -m1 -oP "(?<=date=)\d+$")
So based on the sample input this would set the variable value to 20160202.
Any reason why the following isn't sufficient?
egrep '^[0-9]+$' your_file | tail -1
You simply filter the lines with just integers on them with grep and pick the last one with tail.
Thanks for all the comments.. i tried everything which you guys has posted and also did some changes myself and learned a lot things from you guys.. thank you so much . atlast i tried this and i got the solution
value=`grep -Po "=\d+" $working_tmp_dir/numbers.txt |tail -1| sed 's/=//'`
this is perfectly working for me....

Gawk regexp to select sequence

sorry for the nth simple question on regexp but I'm not able to get what I need without a what seems to me a too complicated solution. I'm parsing a file containing sequence of only 3 letters A,E,D as in
AADDEEDDA
EEEEEEEE
AEEEDEEA
AEEEDDAAA
and I'd like to identify only those that start with E and ends in D with only one change in the sequence as for example in
EDDDDDDDD
EEEDDDDDD
EEEEEEEED
I'm fighting with the proper regexp to do that. Here my last attempt
echo "1,AAEDDEED,1\n2,EEEEDDDD,2\n3,EDEDEDED" | gawk -F, '{if($2 ~ /^E[(ED){1,1}]*D$/ && $2 !~ /^E[(ED){2,}]*D$/) print $0}'
which does not work. Any help?
Thanks in advance.
If i understand correctly your request a simple
awk '/^E+D+$/' file.input
will do the trick.
UPDATE: if the line format contains pre/post numbers (with post optional) as showed later in the example, this can be a possible pure regex adaptation (alternative to the use of field switch-F,):
awk '/^[0-9]+,E+D+(,[0-9]+)?$/' input.test
First of all, you need the regular expression:
^E+[^ED]*D+$
This matches one or more Es at the beginning, zero or more characters that are neither E nor D in the middle, and one or more Ds at the end.
Then your AWK program will look like
$2 ~ /^E+[^ED]*D+$/
$2 refers to the 2nd field of the current record, ~ is the regex matching operator, and /s delimit a regular expression. Together, these components form what is known in AWK jargon as a "pattern", which amounts to a boolean filter for input records. Note that there is no "action" (a series of statements in {s) specified here. That's because when no action is specified, AWK assumes that the action should be { print $0 }, which prints the entire line.
If I understand you correct you want to match patterns that starts with at least one E and then continues with at least one D until the end.
echo "1,AAEDDEED,1\n2,EEEEDDDD,2\n3,EDEDEDED" | gawk -F, '{if($2 ~ /^E+D+$) print $0}'

Print line after multiline match with sed

I am trying to create a script to pull out an account code from a file. The file itself is long and contains a lot of other data, but I have included below an excerpt of the part I am looking at (there is other contents before and after this excerpt)
The section of the file I am interested in sometimes look like this
Account Customer Order No. Whse Payment Terms Stock No. Original Invoice No.
VIN No.
AAAAAA01 9999 1000 30 days
and sometimes it looks like this
Account Customer Order No. Whse Payment Terms Stock No. Original Invoice No.
AAAAAA01 9999 1000 30 days
(one field cut off the end, where that field had been wrapping down onto it's own line)
I know I can use | tr -s ' ' | cut -d ' ' -F 1 to pull the code once I have the line it is on, but that is not a set line number (the content before this section is dynamic).
I am starting by trying to handled the case with the extra field, I figure it will be easy enough to make that an optional match with ?
The number of spaces used to separate the fields can change as this is essentially OCRed.
A few of my attempts so far - (assume the file is coming in from STDIN)
| sed -n '/\s*Account\s\+Customer Order No\.\s\+Whse\s\+Payment Terms\s\+Stock No\.\s\+Original Invoice No\.\s\+VIN No\.\s*/{n;p;}'
| sed -n '/\s*Account\s\+Customer Order No\.\s\+Whse\s\+Payment Terms\s\+Stock No\.\s\+Original Invoice No\.\s*\n\s*VIN No\.\s*/{n;p;}'
| sed -n '/\s*Account\s\+Customer Order No\.\s\+Whse\s\+Payment Terms\s\+Stock No\.\s\+Original Invoice No\.\s*\r\s*VIN No\.\s*/{n;p;}'
| sed -n '/\s*Account\s\+Customer Order No\.\s\+Whse\s\+Payment Terms\s\+Stock No\.\s\+Original Invoice No\.\s*\r\n\s*VIN No\.\s*/{n;p;}'
These all failed to match whatsoever
| sed -n '/\s*Account\s\+Customer Order No\.\s\+Whse\s\+Payment Terms\s\+Stock No\.\s\+Original Invoice No\.\s*/,/\s\*VIN No\.\s*/{n;p;}'
This at least matched something, but frustratingly printed the VIN No. line, followed by every second line after it. It also seems like it would be more difficult to mark as an optional part of the expression.
So, given an input of the full file (including either of the above excerpts), I am looking for an output of either
AAAAAA01 9999 1000 30 days
(which I can then trim to the required data) or AAAAAA01 if there is an easier way of getting straight to that.
This might work for you (GNU sed):
sed -n '/Account/{n;/VIN No\./n;p}' file
Use sed with the -n switch, this makes sed act like grep i.e. only print lines explicitly using the commands P or (this case) p.
/Account/ match a line with the pattern Account
For the above match only:
n normally this would print the current line and then read the next line into the pattern space, but as the -n is in action no printing takes place. So now the pattern space contains the next line.
/VIN No\./n If the current line contains Vin No effectively empty the pattern space and read in the next line.
p print whatever is currently in the pattern space.
So this a condition within a condition. When we encounter Action print either the following line or the line following that.
awk '/^\s*Account\s\+Customer Order No\.\s\+Whse\s\+Payment Terms\s\+Stock No\.\s\+Original Invoice No\.$/ {
getline;
if (/^\s*VIN No\.$/) getline;
print;
exit;
}'
Going strictly off your input, in both cases the desired field is on the last line. So to print the first field of the last line,
awk 'END {print $1}'
Result
AAAAAA01