How to make "grep" read patterns from a file? - regex

Suppose there is a large text file and I would like to print only the lines that do not match some patterns. Obviously, I can use egrep -v 'patter1|pattern2|pattern3. Now what if all those patterns are in a text file ? What is the best way to make egrep read patterns from the file ?

grep -v -f pattern_file

egrep has an -f option which does exactly that: you specify a file, and it reads patterns from that file, one per line.

Related

Grep matching two or more patterns within pattern file

I'd like to use grep to pull in matching patterns from a file only if the line contains two or more patterns contained within my pattern file. This is my rough idea of what the syntax looks like, but it doesn't work. Any pointers?
egrep -f -i pattern.txt {2,} file.txt >> output.txt
grep -E '/pattern1/' -E '/pattern2/' file
In this way you can scan multiple pattern in a single line using this command. try to google 'regular expressions with grep tutorial' and you will find the answer. also be specific when mentioning the pattern you want to search in a line.

Using sed to find lines with specific keywords

This is in bash using CentOS
I am attempting to use sed to scan a text file to find lines that contain both the phrases "define" and "REV_NUMBER" (what lies before, in between, and after doesn't matter). However, I also want to ignore lines that have "//" in them because these indicate comments (the source file is a Verilog file).
My code is as follows:
REV=$(SED -n '/define REV_NUMBER/p' text.vh <<< $REV)
RESULT=$(echo "$REV")
This covers all lines that include:
define REV_NUMBER
But I want it to include lines that have, say:
define REV_NUMBER
Or any number of whitespace between the words.
But ignore lines that have
//define REV_NUMBER
//define REV_NUMBER
// define REV_NUMBER
I stumped how to achieve this. New to bash/shell scripting and sed. Normally a C++ guy using strings.
Thanks
You can use this sed command:
sed -n '/define *REV_NUMBER/{\~^ *//~!p;}' file
define REV_NUMBER
Wouldn't three grep connected by pipe be more readable?
grep -v "//" $FILE | grep "define" | grep "REV_NUMBER"

Grep or in part of a string

Good day All,
A filename can either be
abc_source_201501.csv Or,
abc_source2_201501.csv
Is it possible to do something like grep abc_source|source2_201501.csv without fully listing out filename as the filenames I'm working with are much longer than examples given to get both options?
Thanks for assistance here.
Use extended regex flag in grep.
For example:
grep -E abc_source.?_201501.csv
would source out both lines in your example. You can think of other regex patterns that would suit your data more.
You can use Bash globbing to grep in several files at once.
For example, to grep for the string "hello" in all files with a filename that starts with abc_source and ends with 201501.csv, issue this command:
grep hello abc_source*201501.csv
You can also use the -r flag, to recursively grep in all files below a given folder - for example the current folder (.).
grep -r hello .
If you are asking about patterns for file name matching in the shell, the extended globbing facility in Bash lets you say
shopt -s extglob
grep stuff abc_source#(|2)_201501.csv
to search through both files with a single glob expression.
The simplest possibility is to use brace expansion:
grep pattern abc_{source,source2}_201501.csv
That's exactly the same as:
grep pattern abc_source{,2}_201501.csv
You can use several brace patterns in a single word:
grep pattern abc_source{,2}_2015{01..04}.csv
expands to
grep pattern abc_source_201501.csv abc_source_201502.csv \
abc_source_201503.csv abc_source_201504.csv \
abc_source2_201501.csv abc_source2_201502.csv \
abc_source2_201503.csv abc_source2_201504.csv

grep through binary file

I have a binary file that contains lines in the following form:
blabla^A2013.04.03-09:35:04^Ablabla
where ^A is the binary character 001.
I want to be able to perform a grep that will give me only what is between the ^A (not the whole line).
I know that flag -o is only for match but I don't know how to search for that binary character
You should be able to include control-A on the command line by simply typing control-A where you want it to appear. At worst, you might need to type control-V before it. You can also explore notations using bash's ANSI-C quoting such as $'\001'.
Try doing this :
grep --binary-files=text pattern file.txt
so :
$ grep --binary-files=text -oP '\^\K[^\^]+(?=\^)' file.txt
A2013.04.03-09:35:04

Grep regular expression to find words in any order

Context: I want to find a class definition within a lot of source code files, but I do not know the exact name.
Question: I know a number of words which must appear on the line I want to find, but I do not know the order in which they will appear. Is there a quick way to look for a number of words in any order on the same line?
For situations where you need to search on a large number of words, you can use awk as follows:
awk "/word1/&&/word2/&&/word3/" *.c
(If you are a cygwin user, the command is gawk.)
If you're trying to find foo, bar, and baz, you can just do:
grep foo *.c | grep bar | grep baz
That will find anything that has all three in any order. You can use word boundaries if you use egrep, otherwise that will match substrings.
While this is not an exact answer your grep question, but you should check the "ctags" command for generating tags file from the source code. For the source code objects this should help you a much more than an simple grep. check: http://ctags.sourceforge.net/ctags.html
Using standard basic regex recursively match starting from the current directory any .c file with the indicated words (case insesitive, bash flavour):
grep -r -i 'word1\|word2\|word3' ./*.c
Using standard extended regex:
grep -r -i -E 'word1|word2|word3' ./*.c
You can also use perl regex:
grep -r -i -P 'word1|word2|word3' ./*.c
If you need to search with a single grep command (for example, you are searching for multiple pattern alternatives on stdin), you could use:
grep -e 'word1.*word2' -e 'word2.*word1' -e 'alternative-word'
This would find anything which has word1 and word2 in either order, or alternative-word.
(Note that this method gets exponentially complicated as the number of words in arbitrary order increases.)