how awk print the regex it maps? - regex

Here is the file contents:
# cat text
16:10:29 DEBUG MY_Output:90 1 5de0d275c2f55: send response
As I do regex map with awk:
# cat text | awk '{if($0~/[0-9]{2}:[0-9]{2}:[0-9]{2}.*/) print $0}'
(print nothing)
# cat text | awk '/[0-9]{2}:[0-9]{2}:[0-9]{2}.*/ {print $0}'
(print nothing)
# cat text | awk '/[0-9]{2}:[0-9]{2}:[0-9]{2}.*/1'
16:10:29 DEBUG MY_Output:90 1 5de0d275c2f55: send response
My Question is:
why {if($0~/[0-9]{2}:[0-9]{2}:[0-9]{2}.*/) print $0} , /[0-9]{2}:[0-9]{2}:[0-9]{2}.*/ {print $0} print nothing, but /[0-9]{2}:[0-9]{2}:[0-9]{2}.*/1 print the result.
As I expected, the three expressions play the same meaning, as [1] describes, 1, {print}, {print $0} do the same thing in action.
Also the another experiment likely to check:
# awk '{if(/[0-9]{2}:[0-9]{2}:[0-9]{2}/) print $0}' <<< "16:10:29"
(print nothing)
# awk '/[0-9]{2}:[0-9]{2}:[0-9]{2}/{print $0}' <<< "16:10:29"
(print nothing)
# awk '/[0-9]{2}:[0-9]{2}:[0-9]{2}/1' <<< "16:10:29"
16:10:29
Thank you very much.

After knowing OP's awk version(in comments section) looks like it it OLD one so [0-9]{2} is NOT supported in it(I believe so), so can you try following command once.
awk '/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/' Input_file
This should work with your provided awk version.
Also with old version of awk you could use --re-interval to get it worked, I don't have that version with me so couldn't test it.
awk --re-interval '/[0-9]{2}:[0-9]{2}:[0-9]{2}/' Input_file
In older versions of awk to invoke EREs we need to include --re-interval with awk codes, hence it is NOT working with normal awk code.
NOTE: In new versions of gawks --re-interval is depreciated since OP has old version of awk so have mentioned it in solution. Adding cross site reference link https://unix.stackexchange.com/questions/354553/awk-repetition-n-is-not-working as per OP's comments too here.

Related

print the last letter of each word to make a string using `awk` command

I have this line
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
i am trying to print the last letter of each word to make a string using awk command
awk '{ print substr($1,6) substr($2,6) substr($3,6) substr($4,6) substr($5,6) substr($6,6) }'
In case I don't know how many characters a word contains, what is the correct command to print the last character of $column, and instead of the repeding substr command, how can I use it only once to print specific characters in different columns
If you have just this one single line to handle you can use
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($i))} END{print r}' file
If you have multiple lines in the input:
awk '{r=""; for (i=1;i<=NF;i++) r = r "" substr($i,length($i)); print r}' file
Details:
{for (i=1;i<=NF;i++) r = r "" substr($i,length($i)) - iterate over all fields in the current record, i is the field ID, $i is the field value, and all last chars of each field (retrieved with substr($i,length($i))) are appended to r variable
END{print r} prints the r variable once awk script finishes processing.
In the second solution, r value is cleared upon each line processing start, and its value is printed after processing all fields in the current record.
See the online demo:
#!/bin/bash
s='UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS'
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s"
Output:
GMUCHOS
Using GNU awk and gensub:
$ gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' file
Output:
GMUCHOS
1st solution: With GNU awk you could try following awk program, written and tested eith shown samples.
awk -v RS='.([[:space:]]+|$)' 'RT{gsub(/[[:space:]]+/,"",RT);val=val RT} END{print val}' Input_file
Explanation: Set record separator as any character followed by space OR end of value/line. Then as per OP's requirement remove unnecessary newline/spaces from fetched value; keep on creating val which has matched value of RS, finally when awk program is done with reading whole Input_file print the value of variable then.
2nd solution: Using record separator as null and using match function on values to match regex (.[[:space:]]+)|(.$) to get last letter values only with each match found, keep adding matched values into a variable and at last in END block of awk program print variable's value.
awk -v RS= '
{
while(match($0,/(.[[:space:]]+)|(.$)/)){
val=val substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
END{
gsub(/[[:space:]]+/,"",val)
print val
}
' Input_file
Simple substitutions on individual lines is the job sed exists to do:
$ sed 's/[^ ]*\([^ ]\) */\1/g' file
GMUCHOS
using many tools
$ tr -s ' ' '\n' <file | rev | cut -c1 | paste -sd'\0'
GMUCHOS
separate the words to lines, reverse so that we can pick the first char easily, and finally paste them back together without a delimiter. Not the shortest solution but I think the most trivial one...
I would harness GNU AWK for this as follows, let file.txt content be
UDACBG UYAZAM DJSUBU WJKMBC NTCGCH DIDEVO RHWDAS
then
awk 'BEGIN{FPAT="[[:alpha:]]\\>";OFS=""}{$1=$1;print}' file.txt
output
GMUCHOS
Explanation: Inform AWK to treat any alphabetic character at end of word and use empty string as output field seperator. $1=$1 is used to trigger line rebuilding with usage of specified OFS. If you want to know more about start/end of word read GNU Regexp Operators.
(tested in gawk 4.2.1)
Another solution with GNU awk:
awk '{$0=gensub(/[^[:space:]]*([[:alpha:]])/, "\\1","g"); gsub(/\s/,"")} 1' file
GMUCHOS
gensub() gets here the characters and gsub() removes the spaces between them.
or using patsplit():
awk 'n=patsplit($0, a, /[[:alpha:]]\>/) { for (i in a) printf "%s", a[i]} i==n {print ""}' file
GMUCHOS
An alternate approach with GNU awk is to use FPAT to split by and keep the content:
gawk 'BEGIN{FPAT="\\S\\>"}
{ s=""
for (i=1; i<=NF; i++) s=s $i
print s
}' file
GMUCHOS
Or more tersely and idiomatic:
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' file
GMUCHOS
(Thanks Daweo for this)
You can also use gensub with:
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' file
GMUCHOS
The advantage here of both is that single letter "words" are handled properly:
s2='SINGLE X LETTER Z'
gawk 'BEGIN{FPAT="\\S\\>";OFS=""}{$1=$1}1' <<< "$s2"
EXRZ
gawk '{print gensub(/\S*(\S\>)\s*/,"\\1","g")}' <<< "$s2"
EXRZ
Where the accepted answer and most here do not:
awk '{for (i=1;i<=NF;i++) r = r "" substr($i,length($1))} END{print r}' <<< "$s2"
ER # WRONG
gawk '{print gensub(/([^ ]+)([^ ])( |$)/,"\\2","g")}' <<< "$s2"
EX RZ # WRONG

Display the same word using the grep with/without regex command [duplicate]

I have:
1 LINUX param1 value1
2 LINUXparam2 value2
3 SOLARIS param3 value3
4 SOLARIS param4 value4
I need awk to print all lines in which $2 is LINUX.
In awk:
awk '$2 == "LINUX" { print $0 }' test.txt
See awk by Example for a good intro to awk.
In sed:
sed -n -e '/^[0-9][0-9]* LINUX/p' test.txt
See sed by Example for a good intro to sed.
This is a case in which you can use the beautiful idiomatic awk:
awk '$2=="LINUX"' file
That is:
The default action of awk when in a True condition is to print the current line.
Since $2 == "LINUX" is true whenever the 2nd field is LINUX, this will print those lines in which this happens.
In case you want to print all those lines matching LINUX no matter if it is upper or lowercase, use toupper() to capitalize them all:
awk 'toupper($2)=="LINUX"' file
Or IGNORECASE with either of these syntaxs:
awk 'BEGIN {IGNORECASE=1} $2=="LINUX"' file
awk -v IGNORECASE=1 '$2=="LINUX"' file
My answer is very late, but no one has mentioned:
awk '$2~/LINUX/' file
Try these out:
egrep -i '^\w+ LINUX ' myfile
awk '{IGNORECASE=1}{if ($2 == "LINUX") print}' myfile
sed -ne '/^[0-9]* [Ll][Ii][Nn][Uu][Xx] /p' myfile
edit: modified for case insensitivity
I think it might be a good idea to include "exact" and "partial matching" cases using awk ))
So, for exact matching:
OTHER_SHELL_COMMAND | awk '$2 == "LINUX" { print $0 }'
And for partial matching:
OTHER_SHELL_COMMAND | awk '$2 ~ /LINUX/ { print $0 }'
In GNU sed case-insensitive matches can be made using the I modifier:
sed -n '/^[^[:space:]][[:space:]]\+linux[[:space:]]\+/Ip'
Will robustly match "linux", "Linux", "LINUX", "LiNuX" and others as the second field (after the first field which may be any non-whitespace character) and surrounded by any amount (at least one) of any whitespace (primarily space and tab, although you can use [:blank:] to limit it to strictly those).

Why does echo 'hello world' | awk '/hello\s/ {print $0}' produce nothing?

Why does this 'awk' command produce nothing?
echo 'hello world' | awk '/hello\s/ {print $0}'
I suppose the pattern /hello\s/ should match any line that has 'hello' followed by a whitespace, right?
For info, I am using awk in a Mac OS. The awk version is 20070501.
This works on OS X:
echo 'hello world' | awk '/hello[[:space:]]/ {print $0}'
As mentioned in the gawk docs (paraphrasing):
Think of \s like shorthand for [[:space:]]
You can also use [[:blank:]] to limit to space and tab only.
Having trouble finding some 'plain' awk docs. This seems legit, despite the name of the page.
echo 'hello world' | awk '/^hello / {print $0}'
This looks for every line that starts with "hello "

Pipe awk's results to sed (deletion)

I am using an awk command (someawkcommand) that prints these lines (awkoutput):
>Genome1
ATGCAAAAG
CAATAA
and then, I want to use this output (awkoutput) as the input of a sed command. Something like that:
someawkcommand | sed 's/awkoutput//g' file1.txt > results.txt
file1.txt:
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The final objective is to delete all lines in a file (file1.txt) containing the exact pattern found previously by awk.
The file results.txt contains (output of sed):
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
How should I write the sed command? Is there any simple way that sed will recognize the output of awk as its input?
Using GNU awk for multi-char RS:
$ cat file1
>Genome1
ATGCAAAAG
CAATAA
$ cat file2
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
$ gawk -v RS='^$' -v ORS= 'NR==FNR{rmv=$0;next} {sub(rmv,"")} 1' file1 file2
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The stuff that might be non-obvious to newcomers but are very common awk idioms:
-v RS='^$' tells awk to read the whole file as one string (instead of it's default one line at a time).
-v ORS= sets the Output Record Separator to the null string (instead of it's default newline) so that when the file is printed as a string awk doesn't add a newline after it.
NR==FNR is a condition that is only true for the first input file.
1 is a true condition invoking the default action of printing the current record.
Here is a possible sed solution:
someawkcommand | sed -n 's_.*_/&/d;_;H;${x;s_\n__g p}' | sed -f - file1.txt
First sed command turns output from someawkcommand into a sed expression.
Concretely, it turns
>Genome1
ATGCAAAAG
CAATAA
into:
/>Genome1/d;/ATGCAAAAG/d;/CAATAA/d;
(in sed language: delete lines containing those patterns; mind that you will have to escape /,[,],*,^,$ in your awk output if there are some, with another substitution for instance).
Second sed command reads it as input expression (-f - reads sed commands from file -, i.e. gets it from pipe) and applies to file file1.txt.
Remark for other readers:
OP wants to use sed, but as notified in comments, it may not be the easiest way to solve this question. Deleting lines with awk could be simpler. Another (easy) solution could be to use grep with -v (invert match) and -f (read patterns from files) options, in this way:
someawkcommand | grep -v -f - file1.txt
Edit: Following #rici's comments, here is a new command that takes output from awk as a single multiline pattern.
Disclaimer: It gets dirty. Kids, don't do it home. Grown-ups are strongly encouraged to consider avoiding sed for that.
someawkcommand | \
sed -n 'H;${x;s_\n__;s_\n_\\n_g;s_.*_H;${x;s/\\n//;s/&//g p}_ p}' | \
sed -n -f - file1.txt
Output from inner sed is:
H;${x;s/\n//;s/>Genome1\nATGCAAAAG\nCAATAA//g p}
Additional drawback: it will add an empty line instead of removed pattern. Can't fix it easily (problems if pattern is at beginning/end of file). Add a substitution to remove it if you really feel like it.
This is can more easily be done in awk, but the usual "eliminate duplicates" code is not correct. As I understand the question, the goal is to remove entire stanzas from the file.
Here's a possible solution which assumes that the first awk script outputs a single stanza:
awk 'NR == FNR {stanza[nstanza++] = $0; next}
$0 == stanza[i] {++i; next}
/^>/ && i == nstanza {i=0; next}
i {for (j=0; j<i; ++j) print stanza[j]; i=0}
{print $0;}
' <(someawkcommand) file1.txt
This might work for you (GNU sed):
sed '1{h;s/.*/:a;$!{N;ba}/p;d};/^>/!{H;$!d};x;s/\n/\\n/g;s|.*|s/&\\n*//g|p;$s|.*|s/\\n*$//|p;x;h;d' file1
sed -f - file2
This builds a script from file1 and then runs it against file2.
The script slurps in file2 and then does a gobal substitution(s) using the contents of file1. Finally it removes any blank lines at the end file caused by the contents deletion.
To see the script produced from file1, remove the pipe and the second sed command.
An alternative way would be to use diff and sed:
diff -e file2 file1 | sed 's/d/p/g' | sed -nf - file2

awk regex can't match ip addresses when trying to find repeating digits

I can't get the following to match any IP addresses
awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/{print $0}' maillog
or this one...
awk '/[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/' maillog
but this works...
awk '/127.0.0.1/{print $0}' maillog
and so does this...
awk '/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]/{print $0}' maillog
What am I doing wrong in the first two?
To use interval {1,3} with gnu awk you my need to enable it with --re-interval like this:
awk --re-interval '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/{print $0}' maillog
They are just fine.
The following is working for me.
$ echo "2.168.1.1" | awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/{print $0}'
2.168.1.1
$ echo "2.1.1.1" | awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/{print $0}'
2.1.1.1
$ echo "22.1.1.1" | awk '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/{print $0}'
22.1.1.1
I would investigate your maillog and make sure that everything there is in plaintext.