Suppress the match itself in grep - regex

Suppose I'have lots of files in the form of
First Line Name
Second Line Surname Adress
Third Line etc
etc
Now I'm using grep to match the first line. But I'm doing this actually to find the second line. The second line is not a pattern that can be matched (it's just depend on the first line). My regex pattern works and the command I'm using is
grep -rHIin pattern . -A 1 -m 1
Now the -A option print the line after a match. The -m option stops after 1 match( because there are other line that matches my pattern, but I'm interested just for the first match, anyway...)
This actually works but the output is like that:
./example_file:1: First Line Name
./example_file-2- Second Line Surname Adress
I've read the manual but couldn't fidn any clue or info about that. Now here is the question.
How can I suppress the match itself ? The output should be in the form of:
./example_file-2- Second Line Surname Adress

sed to the rescue:
sed -n '2,${p;n;}'
The particular sed command here starts with line 2 of its input and prints every other line. Pipe the output of grep into that and you'll only get the even-numbered lines out of the grep output.
An explanation of the sed command itself:
2,$ - the range of lines from line 2 to the last line of the file
{p;n;} - print the current line, then ignore the next line (this then gets repeated)
(In this special case of all even lines, an alternative way of writing this would be sed -n 'n;p;' since we don't actually need to special-case any leading lines. If you wanted to skip the first 5 lines of the file, this wouldn't be possible, you'd have to use the 6,$ syntax.)

You can use sed to print the line after each match:
sed -n '/<pattern>/{n;p}' <file>
To get recursion and the file names, you will need something like:
find . -type f -exec sed -n '/<pattern>/{n;s/^/{}:/;p}' \;

If you have already read a book on grep, you could also read a manual on awk, another common Unix tool.
In awk, your task will be solved with a nice simple code. (As for me, I always have to refresh my knowledge of awk's syntax by going to the manual (info awk) when I want to use it.)
Or, you could come up with a solution combining find (to iterate over your files) and grep (to select the lines) and head/tail (to discard for each individual file the lines you don't want). The complication with find is to be able to work with each file individually, discarding a line per file.

You could pipe results though grep -v pattern

Related

regex command line with single-line flag

I would need to use regex in a bash script to substitute text in a file that might be on multiple lines.
I would pass s as flag in other regex engines that I know but I have a hard time for bash.
sed as far as I know doesn't support this feature.
perl it obviously does but I can not make it work in a one liner
perl -i -pe 's/<match.+match>//s $file
example text:
DONT_MATCH
<match some text here
and here
match>
DONT_MATCH
By default, . doesn't match a line feed. s simply makes . matches any character.
You are reading the file a line at a time, so you can't possibly match something that spans multiple lines. Use -0777 to treat the entire input as a one line.
perl -i -0777pe's/<match.+match>//s' "$file"
This might work for you (GNU sed):
sed '/^<match/{:a;/match>$/!{N;ba};s/.*//}' file
Gather up a collection of lines from one beginning <match to one ending match> and replace them by nothing.
N.B. This will act on all such collections throughout the file and the end-of-file condition will not effect the outcome. To only act on the first, use:
sed '/^<match/{:a;/match>$/!{N;ba};s/.*//;:b;n;bb}' file
To only act on the second such collection use:
sed -E '/^<match/{:a;/match>$/!{N;ba};x;s/^/x/;/^(x{2})$/{x;s/.*//;x};x}' file
The regex /^(x{2})$/ can be tailored to do more intricate matching e.g. /^(x|x{3,6})$/ would match the first and third to sixth collections.
With GNU sed:
$ sed -z 's/<match.*match>//g' file
DONT_MATCH
DONT_MATCH
With any sed:
$ sed 'H;1h;$!d;x; s/<match.*match>//g' file
DONT_MATCH
DONT_MATCH
Both the above approaches read the whole file into memory. If you have a big file (e.g. gigabytes), you might want a different approach.
Details
With GNU sed, the -z option reads in files with NUL as the record separator. For text files, which never contain NUL, this has the effect of reading the whole file in.
For ordinary sed, the whole file can be read in with the following steps:
H - Append current line to hold space
1h - If this is the first line, overwrite the hold space
with it
$!d - If this is not the last line, delete pattern space
and jump to the next line.
x - Exchange hold and pattern space to put whole file in
pattern space

"partial grep" to accelerate grep speed?

This is what I am thinking: grep program tries to pattern-match every pattern occurrence in the line, just like:
echo "abc abc abc" | grep abc --color
the result is that the three abc is all red colored, so grep did a full pattern matching to the line.
But think in this scenario, I have many big files to process, but the words that I am interested is very much likely to occur in the first few words. My job is to find the lines without the words in them. So if the grep program can continue to the next line when the words have been found without having to check the rest of the line, it would maybe significantly faster.
Is there a partial match option maybe in grep to do this?
like:
echo abc abc abc | grep --partial abc --color
with only the first abc colored red.
See this nice introduction to grep internals:
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
In particular:
GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines
would slow grep down by a factor of several times, because to find the
newlines it would have to look at every byte!
So instead of using line-oriented input, GNU grep reads raw data into
a large buffer, searches the buffer using Boyer-Moore, and only when
it finds a match does it go and look for the bounding newlines.
(Certain command line options like -n disable this optimization.)
So the answer is: No. It is way faster for grep to look for the next occurrence of the search string, rather than to look for a new line.
Edit: Regarding the speculation in the comments to that color=never would do the trick: I had a quick glance at the source code. The variable color_option is not used anywhere near the the actual search for the regex or the previous and upcoming newline in case a match has been found.
It might be that one could save a few CPU cycles when searching those line terminators. Possibly a real world difference shows up with pathological long lines and a very short search string.
If your job is to find the lines without the words in them, you can give sed a try to delete the lines containing the specific word.
sed '/word/d' input_file
Sed will probably continue to the next line when the first occurrence is found on the current line.
If you want to find lines without specific words, you can use grep to do this.
Try grep -v "abc" which means do the inverse. In this case, find lines without the string "abc".
If I have a file that looks like this:
line one abc
line two abc
line three def
Doing grep -v "abc" file.txt will return line three def.

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

How to use command grep with several lines?

With a shell script I'm looking for a way to make the grep command do one of the following two options:
a) Use the grep command to display the following 10 lines of a match in a file; ie, the command grep "pattern" file.txt will result in all lines of the file that has that pattern:
patternrestoftheline
patternrestofanotherline
patternrestofanotherline
...
So I'm looking for this:
patternrestoftheline
following line
following line
...
until the tenth
patternrestofanotherline
following line
following line
...
until the tenth
b) Use the grep command to display all lines within two patterns as if they were limits
patternA restoftheline
anotherline
anotherline
...
patternB restoftheline
I do not know if another command instead of grep is a better option.
I'm currently using a loop that solves my problem but is line by line, so with extremely large files takes too much time.
I'm looking for the solution working on Solaris.
Any suggestions?
In case (a), What do you expect to happen if the pattern occurs within the 10 lines?
Anyway, here are some awk scripts which should work (untested, though; I don't have Solaris):
# pattern plus 10 lines
awk 'n{--n;print}/PATTERN/{if(n==0)print;n=10}'
# between two patterns
awk '/PATTERN1/,/PATTERN2/{print}'
The second one can also be done similarly with sed
For your first task, use the -A ("after") option of grep:
grep -A 10 'pattern' file.txt
The second task is a typical sed problem:
sed -ne '/patternA/,/patternB/p' file.txt

Why does sed /^$/d delete only blank lines but /^$/p print all lines?

I'm able to use sed /^$/d <file> to delete all the blank lines in the file, but what if I want to print all the blank lines only? The command sed /^$/p <file> prints all the lines in file.
The reason I want to do this is that we use an EDA program (Expedition) that uses regex to run rules on the names of nets. I'm trying to find a way to search for all nets that don't have names assigned. I thought using ^$ would work, but it just ends up finding all nets, which is what /^$/p is doing too. So is there a different way to do this?
Unless otherwise specified sed will print the pattern space when it has finished processing it. If you look carefully at your output you'll notice that you get 2 blank lines for every one in the file. You'll have to use the -n command line switch to stop sed from printing.
sed -n /^$/p infile
Should work as you want.
You can also use grep as:
grep '^$' infile
Sed prints every line by default, and so the p flag is useless. To make it useful, you need to give sed the -n switch. Indeed, the following appears to do what you want:
sed -n /^$/p
think in another way, don't p, but !d
you may try:
sed '/^$/!d' yourFile