How to use command grep with several lines? - regex

With a shell script I'm looking for a way to make the grep command do one of the following two options:
a) Use the grep command to display the following 10 lines of a match in a file; ie, the command grep "pattern" file.txt will result in all lines of the file that has that pattern:
patternrestoftheline
patternrestofanotherline
patternrestofanotherline
...
So I'm looking for this:
patternrestoftheline
following line
following line
...
until the tenth
patternrestofanotherline
following line
following line
...
until the tenth
b) Use the grep command to display all lines within two patterns as if they were limits
patternA restoftheline
anotherline
anotherline
...
patternB restoftheline
I do not know if another command instead of grep is a better option.
I'm currently using a loop that solves my problem but is line by line, so with extremely large files takes too much time.
I'm looking for the solution working on Solaris.
Any suggestions?

In case (a), What do you expect to happen if the pattern occurs within the 10 lines?
Anyway, here are some awk scripts which should work (untested, though; I don't have Solaris):
# pattern plus 10 lines
awk 'n{--n;print}/PATTERN/{if(n==0)print;n=10}'
# between two patterns
awk '/PATTERN1/,/PATTERN2/{print}'
The second one can also be done similarly with sed

For your first task, use the -A ("after") option of grep:
grep -A 10 'pattern' file.txt
The second task is a typical sed problem:
sed -ne '/patternA/,/patternB/p' file.txt

Related

Get all Commands without arguments from history (with Regex)

I have just started with learning shell commands and how to script in bash.
Now I like to solve the mentioned task in the title.
What I get from history command (without line numbers):
ls [options/arguments] | grep [options/arguments]
find [...] exec- sed [...]
du [...]; awk [...] file
And how my output should look like:
ls
grep
find
sed
du
awk
I already found a solution, but it doesn't really satisfy me. So far I declared three arrays, used the readarray -t << (...) command twice, in order to save the content from my history and after that, in combination with compgen -ac, to get all commands which I can possibly run. Then I compared the contents from both with loops, and saved the command every time it matched a line in the "history" array. A lot of effort for an simple exercise, I guess.
Another solution I thought of, is to do it with regex pattern matching.
A command usually starts at the beginning of the line, after a pipe, an execute or after a semicolon. And maybe more, I just don't know about yet.
So I need a regex which gives me only the next word after it matched one of these conditions. That's the command I've found and it seems to work:
grep -oP '(?<=|\s/)\w+'
Here it uses the pipe | as a condition. But I need to insert the others too. So I have put the pattern in double quotes, created an array with all conditions and tried it as recommend:
grep -oP "(?<=$condition\s/)\w+"
But no matter how I insert the variable, it fails. To keep it short, I couldn't figure out how the command works, especially not the regex part.
So, how can solve it using regular expressions? Or with a better approach than mine?
Thank you in advance! :-)
This is simple and works quite well
history -w /dev/stdout | cut -f1 -d ' '
You can use this awk with fc command:
awk '{print $1}' <(fc -nl)
find
mkdir
find
touch
tty
printf
find
ls
fc -nl lists entries from history without the line numbers.

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

How to print only matches with sed?

Okay, this is an easy one, but I can't figure it out.
Basically I want to extract all links ([^<>]*) from a big html file.
I tried to do this with sed, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:
sed 's_[^<>]*_TEST_g'
If I run that on something like
<div>A google link</div>
<div>A google link</div>
I get
<div>TEST</div>
<div>TEST</div>
How can I get rid of everything else and just print the matches instead? My preferred end result would be:
A google link
A google link
PS. I know that my regexp is not the most flexible one, but it's enough for my intentions.
Match the whole line, put the interesting part in a group, replace by the content of the group. Use the -n option to suppress non-matching lines, and add the p modifier to print the result of the s command.
sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
Note that if there are multiple links on the line, this only prints the last link. You can improve on that, but it goes beyond simple sed usage. The simplest method is to use two steps: first insert a newline before any two links, then extract the links.
sed -n -e 's!</a>!&\n!p' | sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
This still doesn't handle HTML comments, <pre>, links that are spread over several lines, etc. When parsing HTML, use an HTML parser.
If you don't mind using perl like sed it can copy with very diverse input:
perl -n -e 's+(<a href=.*?</a>)+ print $1, "\n" +eg;'
Assuming that there is only one hyperlink per line the following may work...
sed -e 's_.*&lta href=_&lta href=_' -e 's_>.*_>ed &lt&lt'EOF'
-e 's_.*&lta href=_&lta href=_' -e 's_>.*_>_'
This might work for you (GNU sed):
sed '/<a href\>/!d;s//\n&/;s/[^\n]*\n//;:a;$!{/>/!{N;ba}};y/\n/ /;s//&\n/;P;D' file

Why does sed /^$/d delete only blank lines but /^$/p print all lines?

I'm able to use sed /^$/d <file> to delete all the blank lines in the file, but what if I want to print all the blank lines only? The command sed /^$/p <file> prints all the lines in file.
The reason I want to do this is that we use an EDA program (Expedition) that uses regex to run rules on the names of nets. I'm trying to find a way to search for all nets that don't have names assigned. I thought using ^$ would work, but it just ends up finding all nets, which is what /^$/p is doing too. So is there a different way to do this?
Unless otherwise specified sed will print the pattern space when it has finished processing it. If you look carefully at your output you'll notice that you get 2 blank lines for every one in the file. You'll have to use the -n command line switch to stop sed from printing.
sed -n /^$/p infile
Should work as you want.
You can also use grep as:
grep '^$' infile
Sed prints every line by default, and so the p flag is useless. To make it useful, you need to give sed the -n switch. Indeed, the following appears to do what you want:
sed -n /^$/p
think in another way, don't p, but !d
you may try:
sed '/^$/!d' yourFile

Suppress the match itself in grep

Suppose I'have lots of files in the form of
First Line Name
Second Line Surname Adress
Third Line etc
etc
Now I'm using grep to match the first line. But I'm doing this actually to find the second line. The second line is not a pattern that can be matched (it's just depend on the first line). My regex pattern works and the command I'm using is
grep -rHIin pattern . -A 1 -m 1
Now the -A option print the line after a match. The -m option stops after 1 match( because there are other line that matches my pattern, but I'm interested just for the first match, anyway...)
This actually works but the output is like that:
./example_file:1: First Line Name
./example_file-2- Second Line Surname Adress
I've read the manual but couldn't fidn any clue or info about that. Now here is the question.
How can I suppress the match itself ? The output should be in the form of:
./example_file-2- Second Line Surname Adress
sed to the rescue:
sed -n '2,${p;n;}'
The particular sed command here starts with line 2 of its input and prints every other line. Pipe the output of grep into that and you'll only get the even-numbered lines out of the grep output.
An explanation of the sed command itself:
2,$ - the range of lines from line 2 to the last line of the file
{p;n;} - print the current line, then ignore the next line (this then gets repeated)
(In this special case of all even lines, an alternative way of writing this would be sed -n 'n;p;' since we don't actually need to special-case any leading lines. If you wanted to skip the first 5 lines of the file, this wouldn't be possible, you'd have to use the 6,$ syntax.)
You can use sed to print the line after each match:
sed -n '/<pattern>/{n;p}' <file>
To get recursion and the file names, you will need something like:
find . -type f -exec sed -n '/<pattern>/{n;s/^/{}:/;p}' \;
If you have already read a book on grep, you could also read a manual on awk, another common Unix tool.
In awk, your task will be solved with a nice simple code. (As for me, I always have to refresh my knowledge of awk's syntax by going to the manual (info awk) when I want to use it.)
Or, you could come up with a solution combining find (to iterate over your files) and grep (to select the lines) and head/tail (to discard for each individual file the lines you don't want). The complication with find is to be able to work with each file individually, discarding a line per file.
You could pipe results though grep -v pattern