I want to match a line with foo not followed by bar, e.g.
foo 123 <-- match
foo bar <-- not match
Using the following regex does not work:
echo "foo 123" | grep -E 'foo.*(?!bar).*'
Any idea?
On systems that don't have grep -P like OSX you can use this awk command:
awk -F 'foo' 'NF>1{s=$0; $1=""; if (!index($0,"bar")) print s}' file
Script Demo
You could try the below grep command which uses -P(perl-regexp) parameter,
grep -P 'foo(?:(?!bar).)*$' file
Example:
$ cat file
foo 123
foo bar
$ grep -P 'foo(?:(?!bar).)*$' file
foo 123
Or
use only negative lookahead to check whether the string bar is after to foo without matching any following character.
$ grep -P 'foo(?!.*bar)' file
foo 123
You can use -v to invert the match:
grep -v 'foo.*bar' file
Related
I am searching a large codebase for all occurrences of the company acronym, which is a small 3-character word like foo. I normally do this sort of thing with
grep -Rnoi 'foo' *
starting at the top of the code base. However, since this is a small word that can produce an overwhelming amount of false positives, like 'foobar' or 'foocat', how might I go about filtering out the false positives?
I was thinking something along the lines of...
grep -Rnoi 'foo' * | grep [excludeMagicOption] 'foobar'
where the displayed results shows all foo occurrences without 'foobar'. What are some options for doing this?
If I understand your question that you only want to match foo and not foocat, use the -w or --word-regexp option to match only whole word occurrences of foo. Example:
Input file
$ cat foo.txt
foo
foocat
foobar
foo
foofighter
Use Output
$ grep -Roniw 'foo' foo.txt
1:foo
4:foo
You can add more conditions to the initial regex to just match a set of whole words. From your example in the comment foo and foo-, you could use:
grep -Roniw 'foo[-]*' foo.txt
Input file
$ cat foo.txt
foo
foocat
foobar
foo
foofighter
foo-
Use Output
$ grep -Roniw 'foo' foo.txt
1:foo
4:foo
6:foo-
You can use a word boundary, denoted by \b in most (not all) Extended RE engines, and supported by egrep and grep -E. This includes start and end of line, and non-alphas.
For example: test.txt:
foo
foobar
foocat
foobar = foocat * 3
foobar = foo++
Feel the foo
What are the foo's price?
Strange how football changes.
Where is foo and bar?
Using:
grep -E '\bfoo\b' test.txt
Gives:
foo
foobar = foo++
Feel the foo
What are the foo's price?
Where is foo and bar?
Edit: Some regular expression engines use other character sequences for word boundaries. There is a summary here: http://www.regular-expressions.info/refwordboundaries.html
You want the -v option:
grep -Rnoi 'foo' * | grep -v 'foobar'
From grep --help:
-v, --invert-match select non-matching lines
Suppose I have this text
The code for 233-CO is the main reason for 45-DFG and this 45-GH
Now I have this regexp \s[0-9]+-\w+ which matches 233-CO, 45-DFG and 45-GH.
How can I display just the third match 45-GH?
sed -re 's/\s[0-9]+-\w+/\3/g' file.txt
where \3 should be the third regexp match.
Is it mandatory to use sed? You could do it with grep, using arrays:
text="The code for 233-CO is the main reason for 45-DFG and this 45-GH"
matches=( $(echo "$text" | grep -o -m 3 '\s[0-9]\+-\w\+') ) # store first 3 matches in array
echo "${matches[0]} ${matches[2]}" # prompt first and third match
To find the last occurence of your pattern, you can use this:
$ sed -re 's/.*\s([0-9]+-\w+).*/\1/g' file
45-GH
if awk is accepted, there is an awk onliner, you give the No# of match you want to grab, it gives your the matched str.
awk -vn=$n '{l=$0;for(i=1;i<n;i++){match(l,/\s[0-9]+-\w+/,a);l=substr(l,RSTART+RLENGTH);}print a[0]}' file
test
kent$ echo $STR #so we have 7 matches in str
The code for 233-CO is the main reason for 45-DFG and this 45-GH,foo 004-AB, bar 005-CC baz 006-DDD and 007-AWK
kent$ n=6 #now I want the 6th match
#here you go:
kent$ awk -vn=$n '{l=$0;for(i=1;i<=n;i++){match(l,/\s[0-9]+-\w+/,a);l=substr(l,RSTART+RLENGTH);}print a[0]}' <<< $STR
006-DDD
This might work for you (GNU sed):
sed -r 's/\b[0-9]+-[A-Z]+\b/\n&\n/3;s/.*\n(.*)\n.*/\1/' file
s/\b[0-9]+-[A-Z]+\b/\n&\n/3 prepend and append \n (newlines) to the third (n) pattern in question.
s/.*\n(.*)\n.*/\1/ delete the text before and after the pattern
With grep for matching and sed for printing the occurrence:
$ egrep -o '\b[0-9]+-\w+' file | sed -n '1p'
233-CO
$ egrep -o '\b[0-9]+-\w+' file | sed -n '2p'
45-DFG
$ egrep -o '\b[0-9]+-\w+' file | sed -n '3p'
45-GH
Or with a little awk passing the occurrence to print using the variable o:
$ awk -v o=1 '{for(i=0;i++<NF;)if($i~/[0-9]+-\w+/&&j++==o-1)print $i}' file
233-CO
$ awk -v o=2 '{for(i=0;i++<NF;)if($i~/[0-9]+-\w+/&&j++==o-1)print $i}' file
45-DFG
$ awk -v o=3 '{for(i=0;i++<NF;)if($i~/[0-9]+-\w+/&&j++==o-1)print $i}' file
45-GH
Is it possible using just one grep and regexp combination to achieve the following. Say I have a file like so:
$ cat f.txt
line 1 foo
line 2 boo
no match
line 3 blank
line X no match
I want to match all the lines that start with the word line and followed by a number but only display the what come after that, so the part that is matched by (.*).
$ grep -E '^line [0-9]+(.*)' f.txt
line 1 foo
line 2 boo
line 3 blank
Can you say match but don't display this part ^line [0-9]+ like doing the inverse of grep -o '^line [0-9]+'
So my expected output would look like this
$ grep -E ***__magic__*** f.txt
foo
boo
blank
You can use sed
~$ cat 1.txt
line 1 foo
line 2 boo
no match
line 3 blank
line X no match
$ grep -E '^line [0-9]' 1.txt | sed 's/^line [0-9] //'
foo
boo
blank
UPDATED
...or without using sed
$ grep -E '^line [0-9]' 1.txt | grep -oE '[a-z]*$'
foo
boo
blank
Given your example file:
$ cat cat_1.txt
line 1 foo
line 2 boo
no match
line 3 blank
line X no match
This is easy with Perl:
perl -lne 'print $1 if /^line \d+ (.*)/' cat_1.txt
Or with sed:
sed -En 's/^line [0-9]+ (.*)/\1/p' cat_1.txt
Either case, prints:
foo
boo
blank
I need something like:
grep ^"unwanted_word"XXXXXXXX
You can do it using -v (for --invert-match) option of grep as:
grep -v "unwanted_word" file | grep XXXXXXXX
grep -v "unwanted_word" file will filter the lines that have the unwanted_word and grep XXXXXXXX will list only lines with pattern XXXXXXXX.
EDIT:
From your comment it looks like you want to list all lines without the unwanted_word. In that case all you need is:
grep -v 'unwanted_word' file
I understood the question as "How do I match a word but exclude another", for which one solution is two greps in series: First grep finding the wanted "word1", second grep excluding "word2":
grep "word1" | grep -v "word2"
In my case: I need to differentiate between "plot" and "#plot" which grep's "word" option won't do ("#" not being a alphanumerical).
If your grep supports Perl regular expression with -P option you can do (if bash; if tcsh you'll need to escape the !):
grep -P '(?!.*unwanted_word)keyword' file
Demo:
$ cat file
foo1
foo2
foo3
foo4
bar
baz
Let us now list all foo except foo3
$ grep -P '(?!.*foo3)foo' file
foo1
foo2
foo4
$
The right solution is to use grep -v "word" file, with its awk equivalent:
awk '!/word/' file
However, if you happen to have a more complex situation in which you want, say, XXX to appear and YYY not to appear, then awk comes handy instead of piping several greps:
awk '/XXX/ && !/YYY/' file
# ^^^^^ ^^^^^^
# I want it |
# I don't want it
You can even say something more complex. For example: I want those lines containing either XXX or YYY, but not ZZZ:
awk '(/XXX/ || /YYY/) && !/ZZZ/' file
etc.
Invert match using grep -v:
grep -v "unwanted word" file pattern
grep provides '-v' or '--invert-match' option to select non-matching lines.
e.g.
grep -v 'unwanted_pattern' file_name
This will output all the lines from file file_name, which does not have 'unwanted_pattern'.
If you are searching the pattern in multiple files inside a folder, you can use the recursive search option as follows
grep -r 'wanted_pattern' * | grep -v 'unwanted_pattern'
Here grep will try to list all the occurrences of 'wanted_pattern' in all the files from within currently directory and pass it to second grep to filter out the 'unwanted_pattern'.
'|' - pipe will tell shell to connect the standard output of left program (grep -r 'wanted_pattern' *) to standard input of right program (grep -v 'unwanted_pattern').
The -v option will show you all the lines that don't match the pattern.
grep -v ^unwanted_word
I excluded the root ("/") mount point by using grep -vw "^/".
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}'
/
/root/.m2
/root
/var
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}' | grep -vw "^/"
/root/.m2
/root
/var
I've a directory with a bunch of files. I want to find all the files that DO NOT contain the string "speedup" so I successfully used the following command:
grep -iL speedup *
The pattern I'm looking for is this:
TXT.*\.txt
That pattern can occur multiple times in any given line. I would like to either extract each instance of the pattern out or alternatively delete the text that surrounds each instance using sed (or anything, really).
Thanks!
You can use Perl as:
$ cat file
foo TXT1.txt bar TXT2.txt baz
foo TXT3.txt bar TXT4.txt baz
$ perl -ne 'print "$1\n" while(/(TXT.*?\.txt)/g)' file
TXT1.txt
TXT2.txt
TXT3.txt
TXT4.txt
$
You can use grep as:
grep -o 'TXT[^.]*\.txt' file