Grepping only certain lines of files - regex

I have a collection of files in a directory which I would like to search for a particular regular expression (=([14-9]|[23][0-9]), as it happens). But I only care when this pattern falls on the second, sixth, tenth, ..., 4n+2-th line.
Is there a good way to do this?

modification to the answer without using extra grep,
awk '/(=([14-9]|[23][0-9])/ && FNR % 4==2{print FNR":"$0}}' inputFile

You should pass it through awk first to get rid of the unwanted lines (and optionally put on line numbers so that you can still tell what the real lines are):
pax> echo 'L1
...> L2
...> L3
...> L4
...> L5
...> L6
...> L7
...> L8
...> L9
...> L10
...> L11
...> L12' | awk '{if ((FNR % 4)==2) {print FNR":"$0}}'
2:L2
6:L6
10:L10
(just use '{if ((FNR % 4)==2) {print}}' if you don't care about the line numbers). So something like:
awk '{if ((FNR % 4)==2) {print FNR":"$0}}' inputFile | grep '(=([14-9]|[23][0-9])'
should do the trick.

try to do this with awk. Someting like
BEGIN {i=0; n=0; }
/yourregegex/ {
if(i==n) { print $0; n= 4*i+2;}
}
{
i++;
}

Related

The IF statement is not working when working with a file and an external variable

I have 2 files,
file1:
YARRA2
file2:
59204.9493055556
59205.5930555556
So, file1 has 1 line and file2 has 2 lines. If file1 has 1 line, and file2 has more than 1 line, I want to repeat the lines in file1 according to the number of lines in file2.
So, my code is this:
eprows=$(wc -l < file2)
awk '{ if( NR<2 && eprows>1 ) {print} {print}}' file1
but the output is
YARRA2
Any idea? I have also tried with
awk '{ if( NR<2 && $eprows>1 ) {print} {print}}' file1
but it is the same
You may use this awk solution:
awk '
NR == FNR {
++n2
next
}
{
s = $0
print;
++n1
}
END {
if (n1 == 1)
for (n1=2; n1 <= n2; ++n1)
print s
}' file2 file1
YARRA2
YARRA2
eprows=$(wc -l < file2)
awk '{ if( NR<2 && eprows>1 ) {print} {print}}' file1
Oops! You stepped hip-deep in mixed languages.
The eprows variable is a shell variable. It's not accessible to other processes except through the environment, unless explicitly passed somehow. The awk program is inside single-quotes, which would prevent interpreting eprows even if used correctly.
The value of a shell variable is obtained with $, so
echo $eprows
2
One way to insert the value into your awk script is by interpolation:
awk '{ if( NR<2 && '"$eprows"'>1 ) {print} {print}}' file1
That uses a lesser known trick: you can switch between single- and double-quotes as long as you don't introduce spaces. Because double-quoted strings in the shell are interpolated, awk sees
{ if( NR<2 && 2>1 ) {print} {print} }
Awk also lets you pass values to awk variables on the command line, thus:
awk -v eprows=$eprows '{ if( NR<2 && eprows >1 ) {print} {print}}' file1
but you'd have nicer awk this way:
awk -v eprows=$eprows 'NR < 2 && eprows > 1 { {print} {print} }' file1
whitespace and brevity being elixirs of clarity.
That works because in the awk pattern / action paradigm, pattern is anything that can be reduced to true/false. It need not be a regex, although it usually is.
One awk idea:
awk '
FNR==NR { cnt++; next } # count number of records in 1st file
# no specific processing for 2nd file => just scan through to end of file
END { if (FNR==1 && cnt >=2) # if 2nd file has just 1 record (ie, FNR==1) and 1st file had 2+ records then ...
for (i=1;i<=cnt;i++) # for each record in 1st file ...
print # print current (and only) record from 2nd file
}
' file2 file1
This generates:
YARRA2
YARRA2

Removing multiple delimiters between outside delimiters on each line

Using awk or sed in a bash script, I need to remove comma separated delimiters that are located between an inner and outer delimiter. The problem is that wrong values ends up in the wrong columns, where only 3 columns are desired.
For example, I want to turn this:
2020/11/04,Test Account,569.00
2020/11/05,Test,Account,250.00
2020/11/05,More,Test,Accounts,225.00
Into this:
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
I've tried to use a few things, testing regex:
But I cannot find a solution to only select the commas in order to remove.
awk -F, '{ printf "%s,",$1;for (i=2;i<=NF-2;i++) { printf "%s ",$i };printf "%s,%s\n",$(NF-1),$NF }' file
Using awk, print the first comma delimited field and then loop through the rest of the field up to the last but 2 field printing the field followed by a space. Then for the last 2 fields print the last but one field, a comma and then the last field.
With GNU awk for the 3rd arg to match():
$ awk -v OFS=, '{
match($0,/([^,]*),(.*),([^,]*)/,a)
gsub(/,/," ",a[2])
print a[1], a[2], a[3]
}' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
or with any awk:
$ awk '
BEGIN { FS=OFS="," }
{
n = split($0,a)
gsub(/^[^,]*,|,[^,]*$/,"")
gsub(/,/," ")
print a[1], $0, a[n]
}
' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
Use this Perl one-liner:
perl -F',' -lane 'print join ",", $F[0], "#F[1 .. ($#F-1)]", $F[-1];' in.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
$F[0] : first element of the array #F (= first comma-delimited value).
$F[-1] : last element of #F.
#F[1 .. ($#F-1)] : elements of #F between the second from the start and the second from the end, inclusive.
"#F[1 .. ($#F-1)]" : the above elements, joined on blanks into a string.
join ",", ... : join the LIST "..." on a comma, and return the resulting string.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perl -pe 's{,\K.*(?=,)}{$& =~ y/,/ /r}e' file
sed -e ':a' -e 's/\(,[^,]*\),\([^,]*,\)/\1 \2/; t a' file
awk '{$1=$1","; $NF=","$NF; gsub(/ *, */,","); print}' FS=, file
awk '{for (i=2; i<=NF; ++i) $i=(i>2 && i<NF ? " " : ",") $i} 1' FS=, OFS= file
awk doesn't support look arounds, we could have it by using match function of awk; using that could you please try following, written and tested with shown samples in GNU awk.
awk '
match($0,/,.*,/){
val=substr($0,RSTART+1,RLENGTH-2)
gsub(/,/," ",val)
print substr($0,1,RSTART) val substr($0,RSTART+RLENGTH-1)
}
' Input_file
Yet another perl
$ perl -pe 's/(?:^[^,]*,|,[^,]*$)(*SKIP)(*F)|,/ /g' ip.txt
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
(?:^[^,]*,|,[^,]*$) matches first/last field along with the comma character
(*SKIP)(*F) this would prevent modification of preceding regexp
|, provide , as alternate regexp to be matched for modification
With sed (assuming \n is supported by the implementation, otherwise, you'll have to find a character that cannot be present in the input)
sed -E 's/,/\n/; s/,([^,]*)$/\n\1/; y/,/ /; y/\n/,/'
s/,/\n/; s/,([^,]*)$/\n\1/ replace first and last comma with newline character
y/,/ / replace all comma with space
y/\n/,/ change newlines back to comma
A similar answer to Timur's, in awk
awk '
BEGIN { FS = OFS = "," }
function join(start, stop, sep, str, i) {
str = $start
for (i = start + 1; i <= stop; i++) {
str = str sep $i
}
return str
}
{ print $1, join(2, NF-1, " "), $NF }
' file.csv
It's a shame awk doesn't ship with a join function builtin

How to reverse all the words in a file with bash in Ubuntu?

I would like to reverse the complete text from the file.
Say if the file contains:
com.e.h/float
I want to get output as:
float/h.e.com
I have tried the command:
rev file.txt
but I have got all the reverse output: taolf/h.e.moc
Is there a way I can get the desired output. Do let me know. Thank you.
Here is teh link of teh sample file: Sample Text
You can use sed and tac:
str=$(echo 'com.e.h/float' | sed -E 's/(\W+)/\n\1\n/g' | tac | tr -d '\n')
echo "$str"
float/h.e.com
Using sed we insert \n before and after all non-word characters.
Using tac we reverse the output lines.
Using tr we strip all new lines.
If you have gnu-awk then you can do all this in a single awk command using 4 argument split function call that populates split strings and delimiters separately:
awk '{
s = ""
split($0, arr, /\W+/, seps)
for (i=length(arr); i>=1; i--)
s = s seps[i] arr[i]
print s
}' file
For non-gnu awk, you can use:
awk '{
r = $0
i = 0
while (match(r, /[^a-zA-Z0-9_]+/)) {
a[++i] = substr(r, RSTART, RLENGTH) substr(r, 0, RSTART-1)
r = substr(r, RSTART+RLENGTH)
}
s = r
for (j=i; j>=1; j--)
s = s a[j]
print s
}' file
Is it possible to use Perl?
perl -nlE 'say reverse(split("([/.])",$_))' f
This one-liner reverses all the lines of f, according to PO's criteria.
If prefer a less parentesis version:
perl -nlE 'say reverse split "([/.])"' f
For portability, this can be done using any awk (not just GNU) using substrings:
$ awk '{
while (match($0,/[[:alnum:]]+/)) {
s=substr($0,RLENGTH+1,1) substr($0,1,RLENGTH) s;
$0=substr($0,RLENGTH+2)
} print s
}' <<<"com.e.h/float"
This steps through the string grabbing alphanumeric strings plus the following character, reversing the order of those two captured pieces, and prepending them to an output string.
Using GNU awk's split, splitting from separators . and /, define more if you wish.
$ cat program.awk
{
for(n=split($0,a,"[./]",s); n>=1; n--) # split to a and s, use n from split
printf "%s%s", a[n], (n==1?ORS:s[(n-1)]) # printf it pretty
}
Run it:
$ echo com.e.h/float | awk -f program.awk
float/h.e.com
EDIT:
If you want to run it as one-liner:
awk '{for(n=split($0,a,"[./]",s); n>=1; n--); printf "%s%s", a[n], (n==1?ORS:s[(n-1)])}' foo.txt

Printing the actual field delimiter value not the regular expression

Given the following input:
check1;check2
check1;;check2
check1,check2
and the awk command:
awk -F';+|,' '{print $1 FS $2}'
FS should contain the selected delimiter?
How can you print the delimiter which is selected i.e. either of ;, ;; or , not the regular expression that the describes the delimiters.
If the input is check1;check2 then the output should be check1;check2.
If you're using GNU Awk (gawk) you can use the 4th argument of split():
gawk '{split($0, a, /;+|,/, seps); print a[1] seps[1] a[2]}' file
Output:
check1;check2
check1;;check2
check1,check2
Using it within a loop is also easy to handle:
gawk '{nf = split($0, a, /;+|,/, seps); for (i = 1; i <= nf; ++i) printf "%s%s", a[i], seps[i]; print ""}' file
22011,25029;;3331,25275
6740,16516;;27292,1217
13480,31488;;7947,18804
328,30623;;12470,6883
If you only need the fields you would only have to touch a. Separators would be separated in seps and the indices of those are aligned with a.
I don't think awk stores the matched delimiter anywhere. If you use GNU awk, you can do it yourself:
gawk '{match($0, /([^;,]*)(;+|,)(.*)/, a); print a[1], a[2], a[3]}'
GNU awk has this feature for records not fields so you could also do something like this:
$ awk '{printf "%s%s",$0,RT}' RS=';+|,|\n' file
check1;check2
check1;;check2
check1,check2
Where RT is the value match by RS for the given record which you can see by:
$ awk '{printf "%s",RT}' RS=';+|,|\n' file
;
;;
,

awk regular expression print every N occurence

I would like to understand awk a little better: I often search for regular expressions and many times I am interested only in the Nth occurrence. I always did this task using pipes say:
awk '/regex/' file | awk 'NR%N==0'
How can I do the same task with awk (or perl) without piping?
Are there some instances in which using pipes is the most computationally efficient solution?
Every third:
awk '/line/ && !(++c%3)' infile
For example:
zsh-4.3.12[t]% cat infile
1line
2line
3line
4line
5line
6line
7line
8line
9line
10line
zsh-4.3.12[t]% awk '/line/ && !(++c%3)' infile
3line
6line
9line
zsh-4.3.12[t]% awk '/line/ && !(++c%2)' infile
2line
4line
6line
8line
10line
Just count the occurences and print every other Nth:
BEGIN { n=0 }
/myregex/ { n++; if(n==3) { n=0; print } }
You can use multiple conditions, e.g.:
awk -v N=10 '/regex/ { count++ } count == N { N=0; print $0 }'
awk '/regex/ { c=(c+1)%N; if(c==0) print}' N=3
try this:
awk '/yourRegex/{i++} i==N{print; exit;}' yourFile
this will print only the Nth match
Oh, if you need every Nth
how about:
awk '/yourRegex/{i++} (!(i%N) && i){print; i=0}' yourFile