How do I search files containing a specific string using grep?

How do I search files containing a specific string using grep? - regex

For instance, if a file has a line "blahblah myID=1234567 blahblah", I want to search all files containing 1234567 somewhere in the whole file.
I tried grep -r '.* 1234567.* ' directory, but it didn't work.

Do the following:
grep -rw 'directory' -e "pattern"
-r is recursive and -w stands match the whole word.
example
grep -rw '/home/lib/foldername/' -e "1234567"
you can also use -n which will tell you the line number where it matched the string

files=$(ls -l /dir |awk '/^-/ {print $NF}')
for i in $files
do
cat $i | grep "1234567" >> output.txt
done
List files of /dir, and grep "1234567" and write to output.txt

Related

Extract all numbers from a text file and store them in another file

I have a text file which have lots of lines. I want to extract all the numbers from that file.
File contains text and number and each line contains only one number.
How can i do it using sed or awk in bash script?
i tried
#! /bin/bash
sed 's/\([0-9.0-9]*\).*/\1/' <myfile.txt >output.txt
but this didn't worked.

grep can handle this:
grep -Eo '[0-9\.]+' myfile.txt
-o tells to print only the matches and [0-9\.]+ is a regular expression to match numbers.
To put all numbers on one line and save them in output.txt:
echo $(grep -Eo '[0-9\.]+' myfile.txt) >output.txt
Text files should normally end with a newline characters. The use of echo above assures that this happens.
Non-GNU grep:
If your grep does not support the -o flag, try:
echo $(tr ' ' '\n' <myfile.txt | grep -E '[0-9\.]+') >output.txt
This uses tr to replace all spaces with newlines (so each number appears separately on a line) and then uses grep to search for numbers.

tr -sc '0-9.' ' ' "$file"
Will transform every string of non-digit-or-period characters into a single space.

You can also use Bash:
while read line; do
if [[ $line =~ [0-9\.]+ ]]; then
echo $BASH_REMATCH
fi
done <myfile.txt >output.txt

regex - match a word and replace content before it

I have several files with this line:
<2-10 digits> ; word
I want to replace all of the digits that come before that word with something else. How can I do that?

sed -i -e 's/.*word/something;word/g' <filename>
To loop over multiple files in a directory. I am assuming .txt file as the file extension:
for i in `\ls -1 *.txt`
do
sed -i -e 's/.*word/something;word/g' $i
done
Note: sed -i will modify the file interactively. So, test the command without -i option to check this is what you want and then go for it...

UPDATE: sed example:
s="999 abc 1234 ; word 567"
echo $s | sed 's/^\(.* \)[0-9][0-9]*\( ; word.*\)$/\1something\2/g'
OUTPUT:
999 abc something ; word 567

Match two strings in one line with grep

I am trying to use grep to match lines that contain two different strings. I have tried the following but this matches lines that contain either string1 or string2 which not what I want.
grep 'string1\|string2' filename
So how do I match with grep only the lines that contain both strings?

You can use
grep 'string1' filename | grep 'string2'
Or
grep 'string1.*string2\|string2.*string1' filename

I think this is what you were looking for:
grep -E "string1|string2" filename
I think that answers like this:
grep 'string1.*string2\|string2.*string1' filename
only match the case where both are present, not one or the other or both.

To search for files containing all the words in any order anywhere:
grep -ril \'action\' | xargs grep -il \'model\' | xargs grep -il \'view_type\'
The first grep kicks off a recursive search (r), ignoring case (i) and listing (printing out) the name of the files that are matching (l) for one term ('action' with the single quotes) occurring anywhere in the file.
The subsequent greps search for the other terms, retaining case insensitivity and listing out the matching files.
The final list of files that you will get will the ones that contain these terms, in any order anywhere in the file.

If you have a grep with a -P option for a limited perl regex, you can use
grep -P '(?=.*string1)(?=.*string2)'
which has the advantage of working with overlapping strings. It's somewhat more straightforward using perl as grep, because you can specify the and logic more directly:
perl -ne 'print if /string1/ && /string2/'

Your method was almost good, only missing the -w
grep -w 'string1\|string2' filename

You could try something like this:
(pattern1.*pattern2|pattern2.*pattern1)

The | operator in a regular expression means or. That is to say either string1 or string2 will match. You could do:
grep 'string1' filename | grep 'string2'
which will pipe the results from the first command into the second grep. That should give you only lines that match both.

And as people suggested perl and python, and convoluted shell scripts, here a simple awk approach:
awk '/string1/ && /string2/' filename
Having looked at the comments to the accepted answer: no, this doesn't do multi-line; but then that's also not what the author of the question asked for.

Don't try to use grep for this, use awk instead. To match 2 regexps R1 and R2 in grep you'd think it would be:
grep 'R1.*R2|R2.*R1'
while in awk it'd be:
awk '/R1/ && /R2/'
but what if R2 overlaps with or is a subset of R1? That grep command simply would not work while the awk command would. Lets say you want to find lines that contain the and heat:
$ echo 'theatre' | grep 'the.*heat|heat.*the'
$ echo 'theatre' | awk '/the/ && /heat/'
theatre
You'd have to use 2 greps and a pipe for that:
$ echo 'theatre' | grep 'the' | grep 'heat'
theatre
and of course if you had actually required them to be separate you can always write in awk the same regexp as you used in grep and there are alternative awk solutions that don't involve repeating the regexps in every possible sequence.
Putting that aside, what if you wanted to extend your solution to match 3 regexps R1, R2, and R3. In grep that'd be one of these poor choices:
grep 'R1.*R2.*R3|R1.*R3.*R2|R2.*R1.*R3|R2.*R3.*R1|R3.*R1.*R2|R3.*R2.*R1' file
grep R1 file | grep R2 | grep R3
while in awk it'd be the concise, obvious, simple, efficient:
awk '/R1/ && /R2/ && /R3/'
Now, what if you actually wanted to match literal strings S1 and S2 instead of regexps R1 and R2? You simply can't do that in one call to grep, you have to either write code to escape all RE metachars before calling grep:
S1=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< 'R1')
S2=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< 'R2')
grep 'S1.*S2|S2.*S1'
or again use 2 greps and a pipe:
grep -F 'S1' file | grep -F 'S2'
which again are poor choices whereas with awk you simply use a string operator instead of regexp operator:
awk 'index($0,S1) && index($0.S2)'
Now, what if you wanted to match 2 regexps in a paragraph rather than a line? Can't be done in grep, trivial in awk:
awk -v RS='' '/R1/ && /R2/'
How about across a whole file? Again can't be done in grep and trivial in awk (this time I'm using GNU awk for multi-char RS for conciseness but it's not much more code in any awk or you can pick a control-char you know won't be in the input for the RS to do the same):
awk -v RS='^$' '/R1/ && /R2/'
So - if you want to find multiple regexps or strings in a line or paragraph or file then don't use grep, use awk.

git grep
Here is the syntax using git grep with multiple patterns:
git grep --all-match --no-index -l -e string1 -e string2 -e string3 file
You may also combine patterns with Boolean expressions such as --and, --or and --not.
Check man git-grep for help.
--all-match When giving multiple pattern expressions, this flag is specified to limit the match to files that have lines to match all of them.
--no-index Search files in the current directory that is not managed by Git.
-l/--files-with-matches/--name-only Show only the names of files.
-e The next parameter is the pattern. Default is to use basic regexp.
Other params to consider:
--threads Number of grep worker threads to use.
-q/--quiet/--silent Do not output matched lines; exit with status 0 when there is a match.
To change the pattern type, you may also use -G/--basic-regexp (default), -F/--fixed-strings, -E/--extended-regexp, -P/--perl-regexp, -f file, and other.
Related:
How to grep for two words existing on the same line?
Check if all of multiple strings or regexes exist in a file
How to run grep with multiple AND patterns? & Match all patterns from file at once
For OR operation, see:
How do I grep for multiple patterns with pattern having a pipe character?
Grep: how to add an “OR” condition?

Found lines that only starts with 6 spaces and finished with:
cat my_file.txt | grep
-e '^ .*(\.c$|\.cpp$|\.h$|\.log$|\.out$)' # .c or .cpp or .h or .log or .out
-e '^ .*[0-9]\{5,9\}$' # numers between 5 and 9 digist
> nolog.txt

Let's say we need to find count of multiple words in a file testfile.
There are two ways to go about it
1) Use grep command with regex matching pattern
grep -c '\<\(DOG\|CAT\)\>' testfile
2) Use egrep command
egrep -c 'DOG|CAT' testfile
With egrep you need not to worry about expression and just separate words by a pipe separator.

grep ‘string1\|string2’ FILENAME
GNU grep version 3.1

Place the strings you want to grep for into a file
echo who > find.txt
echo Roger >> find.txt
echo [44][0-9]{9,} >> find.txt
Then search using -f
grep -f find.txt BIG_FILE_TO_SEARCH.txt

grep '(string1.*string2 | string2.*string1)' filename
will get line with string1 and string2 in any order

for multiline match:
echo -e "test1\ntest2\ntest3" |tr -d '\n' |grep "test1.*test3"
or
echo -e "test1\ntest5\ntest3" >tst.txt
cat tst.txt |tr -d '\n' |grep "test1.*test3\|test3.*test1"
we just need to remove the newline character and it works!

You should have grep like this:
$ grep 'string1' file | grep 'string2'

I often run into the same problem as yours, and I just wrote a piece of script:
function m() { # m means 'multi pattern grep'
function _usage() {
echo "usage: COMMAND [-inH] -p<pattern1> -p<pattern2> <filename>"
echo "-i : ignore case"
echo "-n : show line number"
echo "-H : show filename"
echo "-h : show header"
echo "-p : specify pattern"
}
declare -a patterns
# it is important to declare OPTIND as local
local ignorecase_flag filename linum header_flag colon result OPTIND
while getopts "iHhnp:" opt; do
case $opt in
i)
ignorecase_flag=true ;;
H)
filename="FILENAME," ;;
n)
linum="NR," ;;
p)
patterns+=( "$OPTARG" ) ;;
h)
header_flag=true ;;
\?)
_usage
return ;;
esac
done
if [[ -n $filename || -n $linum ]]; then
colon="\":\","
fi
shift $(( $OPTIND - 1 ))
if [[ $ignorecase_flag == true ]]; then
for s in "${patterns[#]}"; do
result+=" && s~/${s,,}/"
done
result=${result# && }
result="{s=tolower(\$0)} $result"
else
for s in "${patterns[#]}"; do
result="$result && /$s/"
done
result=${result# && }
fi
result+=" { print "$filename$linum$colon"\$0 }"
if [[ ! -t 0 ]]; then # pipe case
cat - | awk "${result}"
else
for f in "$#"; do
[[ $header_flag == true ]] && echo "########## $f ##########"
awk "${result}" $f
done
fi
}
Usage:
echo "a b c" | m -p A
echo "a b c" | m -i -p A # a b c
You can put it in .bashrc if you like.

grep -i -w 'string1\|string2' filename
This works for exact word match and matching case insensitive words ,for that -i is used

When the both strings are in sequence then put a pattern in between on grep command:
$ grep -E "string1(?.*)string2" file
Example if the following lines are contained in a file named Dockerfile:
FROM python:3.8 as build-python
FROM python:3.8-slim
To get the line that contains the strings: FROM python and as build-python then use:
$ grep -E "FROM python:(?.*) as build-python" Dockerfile
Then the output will show only the line that contain both strings:
FROM python:3.8 as build-python

If git is initialized and added to the branch then it is better to use git grep because it is super fast and it will search inside the whole directory.
git grep 'string1.*string2.*string3'

searching for two String and highlight only string1 and string2
grep -E 'string1.*string2|string2.*string1' filename | grep -E 'string1|string2'
or
grep 'string1.*string2\|string2.*string1' filename | grep -E 'string1\|string2'

ripgrep
Here is the example using rg:
rg -N '(?P<p1>.*string1.*)(?P<p2>.*string2.*)' file.txt
It's one of the quickest grepping tools, since it's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.
Use it, especially when you're working with a large data.
See also related feature request at GH-875.

How can I exclude one word with grep?

I need something like:
grep ^"unwanted_word"XXXXXXXX

You can do it using -v (for --invert-match) option of grep as:
grep -v "unwanted_word" file | grep XXXXXXXX
grep -v "unwanted_word" file will filter the lines that have the unwanted_word and grep XXXXXXXX will list only lines with pattern XXXXXXXX.
EDIT:
From your comment it looks like you want to list all lines without the unwanted_word. In that case all you need is:
grep -v 'unwanted_word' file

I understood the question as "How do I match a word but exclude another", for which one solution is two greps in series: First grep finding the wanted "word1", second grep excluding "word2":
grep "word1" | grep -v "word2"
In my case: I need to differentiate between "plot" and "#plot" which grep's "word" option won't do ("#" not being a alphanumerical).

If your grep supports Perl regular expression with -P option you can do (if bash; if tcsh you'll need to escape the !):
grep -P '(?!.*unwanted_word)keyword' file
Demo:
$ cat file
foo1
foo2
foo3
foo4
bar
baz
Let us now list all foo except foo3
$ grep -P '(?!.*foo3)foo' file
foo1
foo2
foo4
$

The right solution is to use grep -v "word" file, with its awk equivalent:
awk '!/word/' file
However, if you happen to have a more complex situation in which you want, say, XXX to appear and YYY not to appear, then awk comes handy instead of piping several greps:
awk '/XXX/ && !/YYY/' file
# ^^^^^ ^^^^^^
# I want it |
# I don't want it
You can even say something more complex. For example: I want those lines containing either XXX or YYY, but not ZZZ:
awk '(/XXX/ || /YYY/) && !/ZZZ/' file
etc.

Invert match using grep -v:
grep -v "unwanted word" file pattern

grep provides '-v' or '--invert-match' option to select non-matching lines.
e.g.
grep -v 'unwanted_pattern' file_name
This will output all the lines from file file_name, which does not have 'unwanted_pattern'.
If you are searching the pattern in multiple files inside a folder, you can use the recursive search option as follows
grep -r 'wanted_pattern' * | grep -v 'unwanted_pattern'
Here grep will try to list all the occurrences of 'wanted_pattern' in all the files from within currently directory and pass it to second grep to filter out the 'unwanted_pattern'.
'|' - pipe will tell shell to connect the standard output of left program (grep -r 'wanted_pattern' *) to standard input of right program (grep -v 'unwanted_pattern').

The -v option will show you all the lines that don't match the pattern.
grep -v ^unwanted_word

I excluded the root ("/") mount point by using grep -vw "^/".
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}'
/
/root/.m2
/root
/var
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}' | grep -vw "^/"
/root/.m2
/root
/var

I've a directory with a bunch of files. I want to find all the files that DO NOT contain the string "speedup" so I successfully used the following command:
grep -iL speedup *

find lines containing "^" and replace entire line with ""

I have a file with a string on each line... ie.
test.434
test.4343
test.4343t34
test^tests.344
test^34534/test
I want to find any line containing a "^" and replace entire line with a blank.
I was trying to use sed:
sed -e '/\^/s/*//g' test.file
This does not seem to work, any suggestions?

sed -e 's/^.*\^.*$//' test.file
For example:
$ cat test.file
test.434
test.4343
test.4343t34
test^tests.344
test^34534/test
$ sed -e 's/^.*\^.*$//' test.file
test.434
test.4343
test.4343t34
$
To delete the offending lines entirely, use
$ sed -e '/\^/d' test.file
test.434
test.4343
test.4343t34

other ways
awk
awk '!/\^/' file
bash
while read -r line
do
case "$line" in
*"^"* ) continue;;
*) echo "$line"
esac
done <"file"
and probably the fastest
grep -v "\^" file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I search files containing a specific string using grep? - regex

For instance, if a file has a line "blahblah myID=1234567 blahblah", I want to search all files containing 1234567 somewhere in the whole file. I tried grep -r '.* 1234567.* ' directory, but it didn't work.

Do the following: grep -rw 'directory' -e "pattern" -r is recursive and -w stands match the whole word. example grep -rw '/home/lib/foldername/' -e "1234567" you can also use -n which will tell you the line number where it matched the string

files=$(ls -l /dir |awk '/^-/ {print $NF}') for i in $files do cat $i | grep "1234567" >> output.txt done List files of /dir, and grep "1234567" and write to output.txt

Related

Extract all numbers from a text file and store them in another file

regex - match a word and replace content before it

Match two strings in one line with grep

How can I exclude one word with grep?

find lines containing "^" and replace entire line with ""

Categories

Resources