How can I exclude one word with grep? - regex

I need something like:
grep ^"unwanted_word"XXXXXXXX

You can do it using -v (for --invert-match) option of grep as:
grep -v "unwanted_word" file | grep XXXXXXXX
grep -v "unwanted_word" file will filter the lines that have the unwanted_word and grep XXXXXXXX will list only lines with pattern XXXXXXXX.
EDIT:
From your comment it looks like you want to list all lines without the unwanted_word. In that case all you need is:
grep -v 'unwanted_word' file

I understood the question as "How do I match a word but exclude another", for which one solution is two greps in series: First grep finding the wanted "word1", second grep excluding "word2":
grep "word1" | grep -v "word2"
In my case: I need to differentiate between "plot" and "#plot" which grep's "word" option won't do ("#" not being a alphanumerical).

If your grep supports Perl regular expression with -P option you can do (if bash; if tcsh you'll need to escape the !):
grep -P '(?!.*unwanted_word)keyword' file
Demo:
$ cat file
foo1
foo2
foo3
foo4
bar
baz
Let us now list all foo except foo3
$ grep -P '(?!.*foo3)foo' file
foo1
foo2
foo4
$

The right solution is to use grep -v "word" file, with its awk equivalent:
awk '!/word/' file
However, if you happen to have a more complex situation in which you want, say, XXX to appear and YYY not to appear, then awk comes handy instead of piping several greps:
awk '/XXX/ && !/YYY/' file
# ^^^^^ ^^^^^^
# I want it |
# I don't want it
You can even say something more complex. For example: I want those lines containing either XXX or YYY, but not ZZZ:
awk '(/XXX/ || /YYY/) && !/ZZZ/' file
etc.

Invert match using grep -v:
grep -v "unwanted word" file pattern

grep provides '-v' or '--invert-match' option to select non-matching lines.
e.g.
grep -v 'unwanted_pattern' file_name
This will output all the lines from file file_name, which does not have 'unwanted_pattern'.
If you are searching the pattern in multiple files inside a folder, you can use the recursive search option as follows
grep -r 'wanted_pattern' * | grep -v 'unwanted_pattern'
Here grep will try to list all the occurrences of 'wanted_pattern' in all the files from within currently directory and pass it to second grep to filter out the 'unwanted_pattern'.
'|' - pipe will tell shell to connect the standard output of left program (grep -r 'wanted_pattern' *) to standard input of right program (grep -v 'unwanted_pattern').

The -v option will show you all the lines that don't match the pattern.
grep -v ^unwanted_word

I excluded the root ("/") mount point by using grep -vw "^/".
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}'
/
/root/.m2
/root
/var
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}' | grep -vw "^/"
/root/.m2
/root
/var

I've a directory with a bunch of files. I want to find all the files that DO NOT contain the string "speedup" so I successfully used the following command:
grep -iL speedup *

Related

How can I pipe a bash variable into a GREP regex in the following command?

I have a command that runs fine as follows:
grep ",162$" xyz.txt | grep -v "^162,162$"
It is basically looking for lines that have a pattern like "number,162" except those where the number is 162.
Now, I want to run many such commands where 162 is replaced with other numbers like
grep ",$159" xyz.txt | grep -v "^159,159$"
grep ",$111" xyz.txt | grep -v "^111,111$"
.. and so on where the same number appears in every part of the command everywhere. So, I need a single place to change the value and the rest of the command can take that value wherever needed.
I tried doing something like this to repeat the number of changes I need to do when I want to try another number
x=162 | grep ",\$x$" xyz.txt | grep -v "^\$x,\$x$"
but it doesn't work. What changes should I do so that everytime I have to change the number value, I just go to the first position of the command and change only the value of x?
Whenever you are piping grep to grep, consider using awk:
var=YOUR_NUM
awk -F, -v pat="$var" '$NF==pat && $1!=pat' file
This plays a little with awk's fields: by setting the field separator to the comma, you are then able to treat each part separately.
You want to match those cases in which the last field is the given value and the first one is not. This is possible using $NF for the last field and using $1 for the first one.
Also, you can add an extra layer of control by checking NF (number of fields) and making the condition be something like !($1==pat && NF==2) && $NF==pat so that a line like 162,162,162 would be printed. But this really depends on what you want to do.
For example:
$ cat file
hello
hello,162
162,162
$ val=162
$ awk -F, -v pat="$val" '$NF==pat && $1!=pat' file
hello,162
Then it is just a matter of changing $var with the value you want.
If I understand it correctcly, you want something like that:
x=162
grep ",$x\$" xyz.txt | grep -v "^$x,$x\$"
Quoting is your friend if you're intending a one-liner:
X=123 /bin/sh -c 'grep ","$X"$" xyz.txt | grep -v "^"$X","$X"$"'
Single grep command can be used with negative lookbehind:
$ cat xyz.txt
12,12
112,12
1,12
42,12
$ grep -P '(?<!^12),12$' xyz.txt
112,12
1,12
42,12
Passing variable is tricky
$ x=12
$ grep -P "(?<!"^$x"),"$x"$" xyz.txt
112,12
1,12
42,12

bash search for multiple patterns on different lines in a file

I have a number of files and I want to filter out the ones that contain 2 patterns. However these patterns are on different lines. I've tried it using grep and awk but in both cases they only seem to work on matches patterns on the same line. I know grep is line based but I'm less familiar with awk. Here's what I came up with but it only works prints lines that match both strings:
awk '/string1/ && /string2/' file
Grep will easily handle this using xargs:
grep -l string1 * | xargs grep -l string2
Use this command in the directory where the files are located, and resulting matches will be displayed.
Depending om whether you really want to search for regexps:
gawk -v RS='^$' '/regexp1/ && /regexp2/ {print FILENAME}' file
or for strings:
gawk -v RS='^$' 'index($0,"string1") && index($0,"string2") {print FILENAME}' file
The above uses GNU awk for multi-char RS to read the whole file as a single record.
You can do it with find
find -type f -exec bash -c "grep -q string1 {} && grep -q string2 {} && echo {}" ";"
You could do it like this with GNU awk:
awk '/foo/{seenFoo++} /bar/{seenBar++} seenFoo&&seenBar{print FILENAME;seenFoo=seenBar=0;nextfile}' file*
That says... if you see foo, increment variable seenFoo, likewise if you see bar, increment variable seenBar. If, at any point, you have seen both foo and bar, print the name of the current file and skip to the next input file ignoring all remaining lines in current file, and, before you start the next file, clear the flags to say we have seen neither foo nor bar in the new file.

grep extract simple url - without scheme

I need to extract n url from a file. I've started with:
grep -E -o 'ftp://\S*' $filename
I know, that this particular url will start with ftp scheme and will end with some white character (space or newline).
I receive something like:
ftp:/dir/some_file.ext
But I need just a path (/dir/some_file.ext). Without scheme (ftp:// part)
Can I do it with the first regexp? Do I have to use a second one?
I cannot use anything else then grep/egrep.
If your grep supports -P (PCRE flag) then you can use:
grep -oP 'ftp:/\K/\S*' $filename
/dir/some_file.ext
If fore some reason you don't have grep -P available then pipe with another grep:
grep -oE 'ftp://\S*' file | grep -oE '/[^/].*'
/dir/some_file.ext
This gnu awk (due to multiple characters in Record Selector) may also do:
awk -v RS="ftp:/" 'NR>1 {print $1}' file

Match two strings in one line with grep

I am trying to use grep to match lines that contain two different strings. I have tried the following but this matches lines that contain either string1 or string2 which not what I want.
grep 'string1\|string2' filename
So how do I match with grep only the lines that contain both strings?
You can use
grep 'string1' filename | grep 'string2'
Or
grep 'string1.*string2\|string2.*string1' filename
I think this is what you were looking for:
grep -E "string1|string2" filename
I think that answers like this:
grep 'string1.*string2\|string2.*string1' filename
only match the case where both are present, not one or the other or both.
To search for files containing all the words in any order anywhere:
grep -ril \'action\' | xargs grep -il \'model\' | xargs grep -il \'view_type\'
The first grep kicks off a recursive search (r), ignoring case (i) and listing (printing out) the name of the files that are matching (l) for one term ('action' with the single quotes) occurring anywhere in the file.
The subsequent greps search for the other terms, retaining case insensitivity and listing out the matching files.
The final list of files that you will get will the ones that contain these terms, in any order anywhere in the file.
If you have a grep with a -P option for a limited perl regex, you can use
grep -P '(?=.*string1)(?=.*string2)'
which has the advantage of working with overlapping strings. It's somewhat more straightforward using perl as grep, because you can specify the and logic more directly:
perl -ne 'print if /string1/ && /string2/'
Your method was almost good, only missing the -w
grep -w 'string1\|string2' filename
You could try something like this:
(pattern1.*pattern2|pattern2.*pattern1)
The | operator in a regular expression means or. That is to say either string1 or string2 will match. You could do:
grep 'string1' filename | grep 'string2'
which will pipe the results from the first command into the second grep. That should give you only lines that match both.
And as people suggested perl and python, and convoluted shell scripts, here a simple awk approach:
awk '/string1/ && /string2/' filename
Having looked at the comments to the accepted answer: no, this doesn't do multi-line; but then that's also not what the author of the question asked for.
Don't try to use grep for this, use awk instead. To match 2 regexps R1 and R2 in grep you'd think it would be:
grep 'R1.*R2|R2.*R1'
while in awk it'd be:
awk '/R1/ && /R2/'
but what if R2 overlaps with or is a subset of R1? That grep command simply would not work while the awk command would. Lets say you want to find lines that contain the and heat:
$ echo 'theatre' | grep 'the.*heat|heat.*the'
$ echo 'theatre' | awk '/the/ && /heat/'
theatre
You'd have to use 2 greps and a pipe for that:
$ echo 'theatre' | grep 'the' | grep 'heat'
theatre
and of course if you had actually required them to be separate you can always write in awk the same regexp as you used in grep and there are alternative awk solutions that don't involve repeating the regexps in every possible sequence.
Putting that aside, what if you wanted to extend your solution to match 3 regexps R1, R2, and R3. In grep that'd be one of these poor choices:
grep 'R1.*R2.*R3|R1.*R3.*R2|R2.*R1.*R3|R2.*R3.*R1|R3.*R1.*R2|R3.*R2.*R1' file
grep R1 file | grep R2 | grep R3
while in awk it'd be the concise, obvious, simple, efficient:
awk '/R1/ && /R2/ && /R3/'
Now, what if you actually wanted to match literal strings S1 and S2 instead of regexps R1 and R2? You simply can't do that in one call to grep, you have to either write code to escape all RE metachars before calling grep:
S1=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< 'R1')
S2=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< 'R2')
grep 'S1.*S2|S2.*S1'
or again use 2 greps and a pipe:
grep -F 'S1' file | grep -F 'S2'
which again are poor choices whereas with awk you simply use a string operator instead of regexp operator:
awk 'index($0,S1) && index($0.S2)'
Now, what if you wanted to match 2 regexps in a paragraph rather than a line? Can't be done in grep, trivial in awk:
awk -v RS='' '/R1/ && /R2/'
How about across a whole file? Again can't be done in grep and trivial in awk (this time I'm using GNU awk for multi-char RS for conciseness but it's not much more code in any awk or you can pick a control-char you know won't be in the input for the RS to do the same):
awk -v RS='^$' '/R1/ && /R2/'
So - if you want to find multiple regexps or strings in a line or paragraph or file then don't use grep, use awk.
git grep
Here is the syntax using git grep with multiple patterns:
git grep --all-match --no-index -l -e string1 -e string2 -e string3 file
You may also combine patterns with Boolean expressions such as --and, --or and --not.
Check man git-grep for help.
--all-match When giving multiple pattern expressions, this flag is specified to limit the match to files that have lines to match all of them.
--no-index Search files in the current directory that is not managed by Git.
-l/--files-with-matches/--name-only Show only the names of files.
-e The next parameter is the pattern. Default is to use basic regexp.
Other params to consider:
--threads Number of grep worker threads to use.
-q/--quiet/--silent Do not output matched lines; exit with status 0 when there is a match.
To change the pattern type, you may also use -G/--basic-regexp (default), -F/--fixed-strings, -E/--extended-regexp, -P/--perl-regexp, -f file, and other.
Related:
How to grep for two words existing on the same line?
Check if all of multiple strings or regexes exist in a file
How to run grep with multiple AND patterns? & Match all patterns from file at once
For OR operation, see:
How do I grep for multiple patterns with pattern having a pipe character?
Grep: how to add an “OR” condition?
Found lines that only starts with 6 spaces and finished with:
cat my_file.txt | grep
-e '^ .*(\.c$|\.cpp$|\.h$|\.log$|\.out$)' # .c or .cpp or .h or .log or .out
-e '^ .*[0-9]\{5,9\}$' # numers between 5 and 9 digist
> nolog.txt
Let's say we need to find count of multiple words in a file testfile.
There are two ways to go about it
1) Use grep command with regex matching pattern
grep -c '\<\(DOG\|CAT\)\>' testfile
2) Use egrep command
egrep -c 'DOG|CAT' testfile
With egrep you need not to worry about expression and just separate words by a pipe separator.
grep ‘string1\|string2’ FILENAME
GNU grep version 3.1
Place the strings you want to grep for into a file
echo who > find.txt
echo Roger >> find.txt
echo [44][0-9]{9,} >> find.txt
Then search using -f
grep -f find.txt BIG_FILE_TO_SEARCH.txt
grep '(string1.*string2 | string2.*string1)' filename
will get line with string1 and string2 in any order
for multiline match:
echo -e "test1\ntest2\ntest3" |tr -d '\n' |grep "test1.*test3"
or
echo -e "test1\ntest5\ntest3" >tst.txt
cat tst.txt |tr -d '\n' |grep "test1.*test3\|test3.*test1"
we just need to remove the newline character and it works!
You should have grep like this:
$ grep 'string1' file | grep 'string2'
I often run into the same problem as yours, and I just wrote a piece of script:
function m() { # m means 'multi pattern grep'
function _usage() {
echo "usage: COMMAND [-inH] -p<pattern1> -p<pattern2> <filename>"
echo "-i : ignore case"
echo "-n : show line number"
echo "-H : show filename"
echo "-h : show header"
echo "-p : specify pattern"
}
declare -a patterns
# it is important to declare OPTIND as local
local ignorecase_flag filename linum header_flag colon result OPTIND
while getopts "iHhnp:" opt; do
case $opt in
i)
ignorecase_flag=true ;;
H)
filename="FILENAME," ;;
n)
linum="NR," ;;
p)
patterns+=( "$OPTARG" ) ;;
h)
header_flag=true ;;
\?)
_usage
return ;;
esac
done
if [[ -n $filename || -n $linum ]]; then
colon="\":\","
fi
shift $(( $OPTIND - 1 ))
if [[ $ignorecase_flag == true ]]; then
for s in "${patterns[#]}"; do
result+=" && s~/${s,,}/"
done
result=${result# && }
result="{s=tolower(\$0)} $result"
else
for s in "${patterns[#]}"; do
result="$result && /$s/"
done
result=${result# && }
fi
result+=" { print "$filename$linum$colon"\$0 }"
if [[ ! -t 0 ]]; then # pipe case
cat - | awk "${result}"
else
for f in "$#"; do
[[ $header_flag == true ]] && echo "########## $f ##########"
awk "${result}" $f
done
fi
}
Usage:
echo "a b c" | m -p A
echo "a b c" | m -i -p A # a b c
You can put it in .bashrc if you like.
grep -i -w 'string1\|string2' filename
This works for exact word match and matching case insensitive words ,for that -i is used
When the both strings are in sequence then put a pattern in between on grep command:
$ grep -E "string1(?.*)string2" file
Example if the following lines are contained in a file named Dockerfile:
FROM python:3.8 as build-python
FROM python:3.8-slim
To get the line that contains the strings: FROM python and as build-python then use:
$ grep -E "FROM python:(?.*) as build-python" Dockerfile
Then the output will show only the line that contain both strings:
FROM python:3.8 as build-python
If git is initialized and added to the branch then it is better to use git grep because it is super fast and it will search inside the whole directory.
git grep 'string1.*string2.*string3'
searching for two String and highlight only string1 and string2
grep -E 'string1.*string2|string2.*string1' filename | grep -E 'string1|string2'
or
grep 'string1.*string2\|string2.*string1' filename | grep -E 'string1\|string2'
ripgrep
Here is the example using rg:
rg -N '(?P<p1>.*string1.*)(?P<p2>.*string2.*)' file.txt
It's one of the quickest grepping tools, since it's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.
Use it, especially when you're working with a large data.
See also related feature request at GH-875.

How to find patterns across multiple lines using grep?

I want to find files that have "abc" AND "efg" in that order, and those two strings are on different lines in that file. Eg: a file with content:
blah blah..
blah blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah blah..
blah blah..
Should be matched.
Grep is an awkward tool for this operation.
pcregrep which is found in most of the modern Linux systems can be used as
pcregrep -M 'abc.*(\n|.)*efg' test.txt
where -M, --multiline allow patterns to match more than one line
There is a newer pcre2grep also. Both are provided by the PCRE project.
pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2:
% sudo port install pcre2
and via Homebrew as:
% brew install pcre
or for pcre2
% brew install pcre2
pcre2grep is also available on Linux (Ubuntu 18.04+)
$ sudo apt install pcre2-utils # PCRE2
$ sudo apt install pcregrep # Older PCRE
Here is a solution inspired by this answer:
if 'abc' and 'efg' can be on the same line:
grep -zl 'abc.*efg' <your list of files>
if 'abc' and 'efg' must be on different lines:
grep -Pzl '(?s)abc.*\n.*efg' <your list of files>
Params:
-P Use perl compatible regular expressions (PCRE).
-z Treat the input as a set of lines, each terminated by a zero byte instead of a newline. i.e. grep treats the input as a one big line. Note that if you don't use -l it will display matches followed by a NUL char, see comments.
-l list matching filenames only.
(?s) activate PCRE_DOTALL, which means that '.' finds any character or newline.
I'm not sure if it is possible with grep, but sed makes it very easy:
sed -e '/abc/,/efg/!d' [file-with-content]
sed should suffice as poster LJ stated above,
instead of !d you can simply use p to print:
sed -n '/abc/,/efg/p' file
I relied heavily on pcregrep, but with newer grep you do not need to install pcregrep for many of its features. Just use grep -P.
In the example of the OP's question, I think the following options work nicely, with the second best matching how I understand the question:
grep -Pzo "abc(.|\n)*efg" /tmp/tes*
grep -Pzl "abc(.|\n)*efg" /tmp/tes*
I copied the text as /tmp/test1 and deleted the 'g' and saved as /tmp/test2. Here is the output showing that the first shows the matched string and the second shows only the filename (typical -o is to show match and typical -l is to show only filename). Note that the 'z' is necessary for multiline and the '(.|\n)' means to match either 'anything other than newline' or 'newline' - i.e. anything:
user#host:~$ grep -Pzo "abc(.|\n)*efg" /tmp/tes*
/tmp/test1:abc blah
blah blah..
blah blah..
blah blah..
blah efg
user#host:~$ grep -Pzl "abc(.|\n)*efg" /tmp/tes*
/tmp/test1
To determine if your version is new enough, run man grep and see if something similar to this appears near the top:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression (PCRE, see
below). This is highly experimental and grep -P may warn of
unimplemented features.
That is from GNU grep 2.10.
This can be done easily by first using tr to replace the newlines with some other character:
tr '\n' '\a' | grep -o 'abc.*def' | tr '\a' '\n'
Here, I am using the alarm character, \a (ASCII 7) in place of a newline.
This is almost never found in your text, and grep can match it with a ., or match it specifically with \a.
awk one-liner:
awk '/abc/,/efg/' [file-with-content]
If you are willing to use contexts, this could be achieved by typing
grep -A 500 abc test.txt | grep -B 500 efg
This will display everything between "abc" and "efg", as long as they are within 500 lines of each other.
You can do that very easily if you can use Perl.
perl -ne 'if (/abc/) { $abc = 1; next }; print "Found in $ARGV\n" if ($abc && /efg/); }' yourfilename.txt
You can do that with a single regular expression too, but that involves taking the entire contents of the file into a single string, which might end up taking up too much memory with large files.
For completeness, here is that method:
perl -e '#lines = <>; $content = join("", #lines); print "Found in $ARGV\n" if ($content =~ /abc.*efg/s);' yourfilename.txt
I don't know how I would do that with grep, but I would do something like this with awk:
awk '/abc/{ln1=NR} /efg/{ln2=NR} END{if(ln1 && ln2 && ln1 < ln2){print "found"}else{print "not found"}}' foo
You need to be careful how you do this, though. Do you want the regex to match the substring or the entire word? add \w tags as appropriate. Also, while this strictly conforms to how you stated the example, it doesn't quite work when abc appears a second time after efg. If you want to handle that, add an if as appropriate in the /abc/ case etc.
If you need both words are close each other, for example no more than 3 lines, you can do this:
find . -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"
Same example but filtering only *.txt files:
find . -name *.txt -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"
And also you can replace grep command with egrep command if you want also find with regular expressions.
I released a grep alternative a few days ago that does support this directly, either via multiline matching or using conditions - hopefully it is useful for some people searching here. This is what the commands for the example would look like:
Multiline:
sift -lm 'abc.*efg' testfile
Conditions:
sift -l 'abc' testfile --followed-by 'efg'
You could also specify that 'efg' has to follow 'abc' within a certain number of lines:
sift -l 'abc' testfile --followed-within 5:'efg'
You can find more information on sift-tool.org.
Possible with ripgrep:
$ rg --multiline 'abc(\n|.)+?efg' test.txt
3:blah abc blah
4:blah abc blah
5:blah blah..
6:blah blah..
7:blah blah..
8:blah efg blah blah
Or some other incantations.
If you want . to count as a newline:
$ rg --multiline '(?s)abc.+?efg' test.txt
3:blah abc blah
4:blah abc blah
5:blah blah..
6:blah blah..
7:blah blah..
8:blah efg blah blah
Or equivalent to having the (?s) would be rg --multiline --multiline-dotall
And to answer the original question, where they have to be on separate lines:
$ rg --multiline 'abc.*[\n](\n|.)*efg' test.txt
And if you want it "non greedy" so you don't just get the first abc with the last efg (separate them into pairs):
$ rg --multiline 'abc.*[\n](\n|.)*?efg' test.txt
https://til.hashrocket.com/posts/9zneks2cbv-multiline-matches-with-ripgrep-rg
Sadly, you can't. From the grep docs:
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.
While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable. Those stuck with a version of the C Shell (instead of bash) will need to escape their bangs:
sed -e '/abc/,/efg/\!d' [file]
Which line unfortunately does not work in bash et al.
With silver searcher:
ag 'abc.*(\n|.)*efg' your_filename
similar to ring bearer's answer, but with ag instead. Speed advantages of silver searcher could possibly shine here.
#!/bin/bash
shopt -s nullglob
for file in *
do
r=$(awk '/abc/{f=1}/efg/{g=1;exit}END{print g&&f ?1:0}' file)
if [ "$r" -eq 1 ];then
echo "Found pattern in $file"
else
echo "not found"
fi
done
you can use grep incase you are not keen in the sequence of the pattern.
grep -l "pattern1" filepattern*.* | xargs grep "pattern2"
example
grep -l "vector" *.cpp | xargs grep "map"
grep -l will find all the files which matches the first pattern, and xargs will grep for the second pattern. Hope this helps.
If you have some estimation about the distance between the 2 strings 'abc' and 'efg' you are looking for, you might use:
grep -r . -e 'abc' -A num1 -B num2 | grep 'efg'
That way, the first grep will return the line with the 'abc' plus #num1 lines after it, and #num2 lines after it, and the second grep will sift through all of those to get the 'efg'.
Then you'll know at which files they appear together.
With ugrep released a few months ago:
ugrep 'abc(\n|.)+?efg'
This tool is highly optimized for speed. It's also GNU/BSD/PCRE-grep compatible.
Note that we should use a lazy repetition +?, unless you want to match all lines with efg together until the last efg in the file.
You have at least a couple options --
DOTALL method
use (?s) to DOTALL the . character to include \n
you can also use a lookahead (?=\n) -- won't be captured in match
example-text:
true
match me
false
match me one
false
match me two
true
match me three
third line!!
{BLANK_LINE}
command:
grep -Pozi '(?s)true.+?\n(?=\n)' example-text
-p for perl regular expressions
-o to only match pattern, not whole line
-z to allow line breaks
-i makes case-insensitive
output:
true
match me
true
match me three
third line!!
notes:
- +? makes modifier non-greedy so matches shortest string instead of largest (prevents from returning one match containing entire text)
you can use the oldschool O.G. manual method using \n
command:
grep -Pozi 'true(.|\n)+?\n(?=\n)'
output:
true
match me
true
match me three
third line!!
I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:
grep -Pzo ">tig00000034[^>]+" file.fasta > desired_sequence.fasta
P for perl based searches
z for making a line end in 0 bytes rather than newline char
o to just capture what matched since grep returns the whole line (which in this case since you did -z is the whole file).
The core of the regexp is the [^>] which translates to "not the greater than symbol"
As an alternative to Balu Mohan's answer, it is possible to enforce the order of the patterns using only grep, head and tail:
for f in FILEGLOB; do tail $f -n +$(grep -n "pattern1" $f | head -n1 | cut -d : -f 1) 2>/dev/null | grep "pattern2" &>/dev/null && echo $f; done
This one isn't very pretty, though. Formatted more readably:
for f in FILEGLOB; do
tail $f -n +$(grep -n "pattern1" $f | head -n1 | cut -d : -f 1) 2>/dev/null \
| grep -q "pattern2" \
&& echo $f
done
This will print the names of all files where "pattern2" appears after "pattern1", or where both appear on the same line:
$ echo "abc
def" > a.txt
$ echo "def
abc" > b.txt
$ echo "abcdef" > c.txt; echo "defabc" > d.txt
$ for f in *.txt; do tail $f -n +$(grep -n "abc" $f | head -n1 | cut -d : -f 1) 2>/dev/null | grep -q "def" && echo $f; done
a.txt
c.txt
d.txt
Explanation
tail -n +i - print all lines after the ith, inclusive
grep -n - prepend matching lines with their line numbers
head -n1 - print only the first row
cut -d : -f 1 - print the first cut column using : as the delimiter
2>/dev/null - silence tail error output that occurs if the $() expression returns empty
grep -q - silence grep and return immediately if a match is found, since we are only interested in the exit code
This should work too?!
perl -lpne 'print $ARGV if /abc.*?efg/s' file_list
$ARGV contains the name of the current file when reading from file_list
/s modifier searches across newline.
The filepattern *.sh is important to prevent directories to be inspected. Of course some test could prevent that too.
for f in *.sh
do
a=$( grep -n -m1 abc $f )
test -n "${a}" && z=$( grep -n efg $f | tail -n 1) || continue
(( ((${z/:*/}-${a/:*/})) > 0 )) && echo $f
done
The
grep -n -m1 abc $f
searches maximum 1 matching and returns (-n) the linenumber.
If a match was found (test -n ...) find the last match of efg (find all and take the last with tail -n 1).
z=$( grep -n efg $f | tail -n 1)
else continue.
Since the result is something like 18:foofile.sh String alf="abc"; we need to cut away from ":" till end of line.
((${z/:*/}-${a/:*/}))
Should return a positive result if the last match of the 2nd expression is past the first match of the first.
Then we report the filename echo $f.
To search recursively across all files (across multiple lines within each file) with BOTH strings present (i.e. string1 and string2 on different lines and both present in same file):
grep -r -l 'string1' * > tmp; while read p; do grep -l 'string2' $p; done < tmp; rm tmp
To search recursively across all files (across multiple lines within each file) with EITHER string present (i.e. string1 and string2 on different lines and either present in same file):
grep -r -l 'string1\|string2' *
Here's a way by using two greps in a row:
egrep -o 'abc|efg' $file | grep -A1 abc | grep efg | wc -l
returns 0 or a positive integer.
egrep -o (Only shows matches, trick: multiple matches on the same line produce multi-line output as if they are on different lines)
grep -A1 abc (print abc and the line after it)
grep efg | wc -l (0-n count of efg lines found after abc on the same or following lines, result can be used in an 'if")
grep can be changed to egrep etc. if pattern matching is needed
This should work:
cat FILE | egrep 'abc|efg'
If there is more than one match you can filter out using grep -v