I would like to understand awk a little better: I often search for regular expressions and many times I am interested only in the Nth occurrence. I always did this task using pipes say:
awk '/regex/' file | awk 'NR%N==0'
How can I do the same task with awk (or perl) without piping?
Are there some instances in which using pipes is the most computationally efficient solution?
Every third:
awk '/line/ && !(++c%3)' infile
For example:
zsh-4.3.12[t]% cat infile
1line
2line
3line
4line
5line
6line
7line
8line
9line
10line
zsh-4.3.12[t]% awk '/line/ && !(++c%3)' infile
3line
6line
9line
zsh-4.3.12[t]% awk '/line/ && !(++c%2)' infile
2line
4line
6line
8line
10line
Just count the occurences and print every other Nth:
BEGIN { n=0 }
/myregex/ { n++; if(n==3) { n=0; print } }
You can use multiple conditions, e.g.:
awk -v N=10 '/regex/ { count++ } count == N { N=0; print $0 }'
awk '/regex/ { c=(c+1)%N; if(c==0) print}' N=3
try this:
awk '/yourRegex/{i++} i==N{print; exit;}' yourFile
this will print only the Nth match
Oh, if you need every Nth
how about:
awk '/yourRegex/{i++} (!(i%N) && i){print; i=0}' yourFile
Related
I have 2 files,
file1:
YARRA2
file2:
59204.9493055556
59205.5930555556
So, file1 has 1 line and file2 has 2 lines. If file1 has 1 line, and file2 has more than 1 line, I want to repeat the lines in file1 according to the number of lines in file2.
So, my code is this:
eprows=$(wc -l < file2)
awk '{ if( NR<2 && eprows>1 ) {print} {print}}' file1
but the output is
YARRA2
Any idea? I have also tried with
awk '{ if( NR<2 && $eprows>1 ) {print} {print}}' file1
but it is the same
You may use this awk solution:
awk '
NR == FNR {
++n2
next
}
{
s = $0
print;
++n1
}
END {
if (n1 == 1)
for (n1=2; n1 <= n2; ++n1)
print s
}' file2 file1
YARRA2
YARRA2
eprows=$(wc -l < file2)
awk '{ if( NR<2 && eprows>1 ) {print} {print}}' file1
Oops! You stepped hip-deep in mixed languages.
The eprows variable is a shell variable. It's not accessible to other processes except through the environment, unless explicitly passed somehow. The awk program is inside single-quotes, which would prevent interpreting eprows even if used correctly.
The value of a shell variable is obtained with $, so
echo $eprows
2
One way to insert the value into your awk script is by interpolation:
awk '{ if( NR<2 && '"$eprows"'>1 ) {print} {print}}' file1
That uses a lesser known trick: you can switch between single- and double-quotes as long as you don't introduce spaces. Because double-quoted strings in the shell are interpolated, awk sees
{ if( NR<2 && 2>1 ) {print} {print} }
Awk also lets you pass values to awk variables on the command line, thus:
awk -v eprows=$eprows '{ if( NR<2 && eprows >1 ) {print} {print}}' file1
but you'd have nicer awk this way:
awk -v eprows=$eprows 'NR < 2 && eprows > 1 { {print} {print} }' file1
whitespace and brevity being elixirs of clarity.
That works because in the awk pattern / action paradigm, pattern is anything that can be reduced to true/false. It need not be a regex, although it usually is.
One awk idea:
awk '
FNR==NR { cnt++; next } # count number of records in 1st file
# no specific processing for 2nd file => just scan through to end of file
END { if (FNR==1 && cnt >=2) # if 2nd file has just 1 record (ie, FNR==1) and 1st file had 2+ records then ...
for (i=1;i<=cnt;i++) # for each record in 1st file ...
print # print current (and only) record from 2nd file
}
' file2 file1
This generates:
YARRA2
YARRA2
I managed to extract the following response and comma separate it. It's comma seperated string and I'm only interested in comma separated values of the account_id's. How do you pattern match using sed?
Input: ACCOUNT_ID,711111111119,ENVIRONMENT,dev,ACCOUNT_ID,111111111115,dev
Expected Output: 711111111119, 111111111115
My $input variable stores the input
I tried the below but it joins all the numbers and I would like to preserve the comma ','
echo $input | sed -e "s/[^0-9]//g"
I think you're better served with awk:
awk -v FS=, '{for(i=1;i<=NF;i++)if($i~/[0-9]/){printf sep $i;sep=","}}'
If you really want sed, you can go for
sed -e "s/[^0-9]/,/g" -e "s/,,*/,/g" -e "s/^,\|,$//g"
$ awk '
BEGIN {
FS = OFS = ","
}
{
c = 0
for (i = 1; i <= NF; i++) {
if ($i == "ACCOUNT_ID") {
printf "%s%s", (c++ ? OFS : ""), $(i + 1)
}
}
print ""
}' file
711111111119,111111111115
The file I'm searching (fruit.text) looks something like the below snippet, the data appears in random order that I cannot control.
....fruit=apple,...qty=3,...condition=bad,....
...qty=4,...condition=great,...fruit=orange,...
...condition=ok,...qty=2,...fruit=banana,...
My Grep command is: grep -Eo 'fruit.[^,]*'\|'qty.[^,]*'\|'condition.[^,]*' fruit.txt
This results in output like:
fruit=apple
qty=3
condition=bad
qty=4
condition=great
fruit=orange
condition=ok
qty=2
fruit=banana
Which is correct, however, I'm looking for the output to be ordered as I specified in the grep cmd. ie, exactly like the below:
fruit=apple
qty=3
condition=bad
fruit=orange
qty=4
condition=great
fruit=banana
qty=2
condition=ok
A solution with gawk:
first i added some extra ',' to the input:
....,fruit=apple,...,qty=3,...,condition=bad,....
...,qty=4,...,condition=great,...,fruit=orange,...
...,condition=ok,...,qty=2,...,fruit=banana,...
Then i wrote this awk script (fruit.awk):
{ fruit ="";
qty="";
condition="";
for (i = 1;i <= NF; i++){
delete a;
split($i,a,"=");
if (a[1]=="fruit" ) { fruit=a[2]; }
if (a[1]=="qty") { qty=a[2] }
if (a[1]=="condition") { condition=a[2] }
}
}
{ print "fruit=" fruit;
print "qty=" qty;
print "condition=" condition;
}
output of: gawk -F , -f fruit.awk fruit.txt:
fruit=apple
qty=3
condition=bad
fruit=orange
qty=4
condition=great
fruit=banana
qty=2
condition=ok
Using sed in some steps:
sed -E 's/^/,/;
s/(.*),(condition[^,]*)/\2\r,\1/;
s/(.*),(qty=[^,]*)/\2,\1/;
s/(.*),(fruit=[^,]*)/\2,\1/;
s/\r.*//;
s/,/\n/g' input.txt
I start with inserting a , for input where the interesting data starts in the first field.
After condition I add a \r, so I can remove the garbage after finding the fruit.
I would like to reverse the complete text from the file.
Say if the file contains:
com.e.h/float
I want to get output as:
float/h.e.com
I have tried the command:
rev file.txt
but I have got all the reverse output: taolf/h.e.moc
Is there a way I can get the desired output. Do let me know. Thank you.
Here is teh link of teh sample file: Sample Text
You can use sed and tac:
str=$(echo 'com.e.h/float' | sed -E 's/(\W+)/\n\1\n/g' | tac | tr -d '\n')
echo "$str"
float/h.e.com
Using sed we insert \n before and after all non-word characters.
Using tac we reverse the output lines.
Using tr we strip all new lines.
If you have gnu-awk then you can do all this in a single awk command using 4 argument split function call that populates split strings and delimiters separately:
awk '{
s = ""
split($0, arr, /\W+/, seps)
for (i=length(arr); i>=1; i--)
s = s seps[i] arr[i]
print s
}' file
For non-gnu awk, you can use:
awk '{
r = $0
i = 0
while (match(r, /[^a-zA-Z0-9_]+/)) {
a[++i] = substr(r, RSTART, RLENGTH) substr(r, 0, RSTART-1)
r = substr(r, RSTART+RLENGTH)
}
s = r
for (j=i; j>=1; j--)
s = s a[j]
print s
}' file
Is it possible to use Perl?
perl -nlE 'say reverse(split("([/.])",$_))' f
This one-liner reverses all the lines of f, according to PO's criteria.
If prefer a less parentesis version:
perl -nlE 'say reverse split "([/.])"' f
For portability, this can be done using any awk (not just GNU) using substrings:
$ awk '{
while (match($0,/[[:alnum:]]+/)) {
s=substr($0,RLENGTH+1,1) substr($0,1,RLENGTH) s;
$0=substr($0,RLENGTH+2)
} print s
}' <<<"com.e.h/float"
This steps through the string grabbing alphanumeric strings plus the following character, reversing the order of those two captured pieces, and prepending them to an output string.
Using GNU awk's split, splitting from separators . and /, define more if you wish.
$ cat program.awk
{
for(n=split($0,a,"[./]",s); n>=1; n--) # split to a and s, use n from split
printf "%s%s", a[n], (n==1?ORS:s[(n-1)]) # printf it pretty
}
Run it:
$ echo com.e.h/float | awk -f program.awk
float/h.e.com
EDIT:
If you want to run it as one-liner:
awk '{for(n=split($0,a,"[./]",s); n>=1; n--); printf "%s%s", a[n], (n==1?ORS:s[(n-1)])}' foo.txt
Is there a nice bash one liner to map strings inside a file to a unique number?
For instance,
a
a
b
b
c
c
should be converted into
1
1
2
2
3
3
I am currently implementing it in C++ but a bash one-liner would be great.
awk '{if (!($0 in ids)) ids[$0] = ++i; print ids[$0]}'
This maintains an associative array called ids. Each time it finds a new string it assigns it a monotically increasing id ++i.
Example:
jkugelman$ echo $'a\nb\nc\na\nb\nc' | awk '{if (!($0 in ids)) ids[$0] = ++i; print ids[$0]}'
1
2
3
1
2
3
The awk solutions here are fine, but here's the same approach in pure bash (>=4)
declare -A stringmap
counter=0
while read string < INPUTFILE; do
if [[ -z ${stringmap[$string]} ]]; then
let counter+=1
stringmap[$string]=$counter
fi
done
for string in "${!stringmap[#]}"; do
printf "%d -> %s\n" "${stringmap[$string]}" "$string"
done
awk 'BEGIN { num = 0; }
{
if ($0 in seen) {
print seen[$0];
} else {
seen[$0] = ++num;
print num;
}
}' [file]
(Not exactly one line, ofcourse.)
slight modification without the if
awk '!($0 in ids){ids[$0]=++i}{print ids[$0]}' file