sed not replacing some spaces - regex

I am having some trouble getting SED to work right.
Input file:
$ cat txt
# nasty comment
blah blah blah this line is invalid
; this also isn't right
foo = 23 # comment here
blah=76876.8768 -- fubar
yoyo=76
tab_moo = -45.99
// comment
fubar = baz
#dfgpo=sf
####
Now how I parse it:
$ cat txt | sed -r 's/(#|--|;|\/\/).*//' | grep '=' | sed -r 's/[[:blank:]]+//'
foo= 23
blah=76876.8768
yoyo=76
tab_moo = -45.99
fubar= baz
The goal is to remove all comments and all inline whitespace.
I don't get why some spaces are left in the output. What am I doing wrong?

In sed, s/// only replaces the first occurrence on any given line. You need to add /g on the end:
sed -r 's/[[:blank:]]+//g'

Related

extracting string from a line in Unix

I've a file having content:
code_name:00:12 vertical 01 1.3489:27 vsftypyre.01 [91.02.01.6] 29.05.2018 {1705}
Expected Output:
code_name:00:12 29.05.2018 {1705}
I'm trying below command ,but it's not giving the result:
sed '/\bvertical.*\]/d' file_name
Am i missing something?
You need to use substitute command - d is for deleting entire line when given regex matches
$ sed 's/\bvertical.*\]//' ip.txt
code_name:00:12 29.05.2018 {1705}
$ # ] doesn't require escaping
$ sed 's/\bvertical.*]//' ip.txt
code_name:00:12 29.05.2018 {1705}
Note that * is greedy, so .*] will try to match as much as possible
$ echo 'good foo [123] baz [456]' | sed 's/foo.*]//'
good
$ # this will delete only up to first ] after 'foo'
$ echo 'good foo [123] baz [456]' | sed 's/foo[^]]*]//'
good baz [456]
Even though the question wasn't tagged with awk, it's easy to use this tool extract some columns:
awk '{print $1,$(NF-1),$NF}' <<< "code_name:00:12 vertical 01 1.3489:27 vsftypyre.01 [91.02.01.6] 29.05.2018 {1705}"
NF represents the number of fields of the current line, so $NF is the last element of that line.
If the records in your file are always in that shape, 8 fields separated by white space, then awk may be a simpler solution:
> cat file_name
code_name:00:12 vertical 01 1.3489:27 vsftypyre.01 [91.02.01.6] 29.05.2018 {1705}
> cat file_name | awk '{ print $1, $7, $8 }'
code_name:00:12 29.05.2018 {1705}
The above awk script meaning, for each record, print the 1st, 7th and 8th fields.

Using Regex to replace or append line in file | Linux Shell script

I am trying to find the proper command to execute on a file that looks something like this: (ie .cshrc files)
setenv BLAH foo
I need the command to replace the line where it detects the string BLAH and replace the entire line like such:
setenv BLAH newfoo
If BLAH doesn't exist in the file, then append it to the file.
I've played around with sed like such, but this does not achieve my goal.
sed 's/^.*?BLAH.*/setenv BLAH newfoo/g' text.txt > text.tmp && mv text.tmp text.txt
I've also played with awk and also cant seem to get the command working exactly how i want it.
awk -v s="setenv BLAH newfoo" '/^BLAH/{f=1;$0=s}7;END{if(!f)print s}' text.txt > text.tmp && mv text.tmp text.txt
Any ideas on how to achieve this would be awesome.
UPDATE
I am trying to make this script work as expected.
script
ARG1="$1"
sed -i "/BLAH/s/^.*\$/setenv ${ENV_KEY} ${ENV_VAL}/g" file.txt
# If the BLAH keyword isnt there, append the file then.
grep -v -q "BLAH" file.txt && echo "setenv BLAH ${ARG1}" >> file.txt
file.txt (default)
setenv BLAH randomstring
usage of script
script.sh newvalue
file.txt (result)
setenv BLAH newvalue
Currently the script seems to be appending the file everytime. I'd also tried a few of the other recommendations but i cannot get them to accept the incoming arg1 value in the sed string.
Intro
This could by done by this sed script:
sed '/BLAH/{s/ \w+$/ newfoo/;h};${x;/BLAH/ba;x;s/$/\nsetenv BLAH newfoo/;x;:a;x}'
Explanation
sed -e '
/^\(\|.*\W\)BLAH\(\W.*\|\)$/{ # lines matching word BLAH
s/ \w\+$/ newfoo/; # replace last word by "newfoo"
h; # Store for not adding them at end of file
};
${ # On last line...
x; # Swap stored line with current line
/BLAH/ba; # if match, branch to label a:
x; # Swap back
s/$/\nsetenv BLAH newfoo/; # Replace end of line with newline...
x; # Swap again
:a; # Label a:
x # Swap back
}'
You could use with in place edition switch:
sed -e'/BLAH/{h};${x;/BLAH/ba;x;s/$/\nsetenv BLAH newfoo/;x;:a;x}' -i text.txt
To be more accurate, search for delimited word BLAH:
sed -e '
/^\(.*\W\|\)BLAH\(\W.*\|\)$/{h}; # Store lines matching word BLAH
${ # On last line...
x; # Swap stored line with current line
/./ba; # if match, branch to label a:
x; # Swap back
s/$/\nsetenv BLAH newfoo/; # Replace end of line with newline...
x; # Swap again
:a; # Label a:
x # Swap back
}' <(echo BLAH=foo;seq 1 4)
BLAH=foo
1
2
3
4
while
sed -e '
/^\(.*\W\|\)BLAH\(\W.*\|\)$/{h}; # Store lines matching word BLAH
${ # On last line...
x; # Swap stored line with current line
/./ba; # if match, branch to label a:
x; # Swap back
s/$/\nsetenv BLAH newfoo/; # Replace end of line with newline...
x; # Swap again
:a; # Label a:
x # Swap back
}' <(echo BLAHBLAH=foo;seq 1 4)
BLAHBLAH=foo
1
2
3
4
setenv BLAH newfoo
You may see that second BLAH was replaced by .. This could be done because if 1st was not found, swap space is empty. So if there is at least one char, this mean that 1st did occure.
Clean and script
There is a way to make this more scriptable:
sedcmd='/^\(.*\W\|\)%s\(\W.*\|\)$/{s/ \w\+$/ %s/;h};'
sedcmd+='${x;/./ba;x;s/$/\\nsetenv %s %s/;x;:a;x}'
varnam=BLAH
varcnt=foo
filnam=/tmp/file.txt
printf -v sedcmd "$sedcmd" ${varnam} ${varcnt} ${varnam} ${varcnt}
sed -e "$sedcmd" -i "$filnam"
You could remove -i switch on last line...
You could try this by running:
sed -e "$sedcmd" <(echo setenv BLAH oldfoo;seq 1 4)
setenv BLAH foo
1
2
3
4
sed -e "$sedcmd" <(echo setenv BLAHBLAH oldfoo;seq 1 4)
setenv BLAHBLAH oldfoo
1
2
3
4
setenv BLAH foo
This awk command should work for both cases:
awk -v kw='BLAH' '$2 == kw{$3="newfoo"; seen=1} 1;
END{if (!seen) print "setenv " kw " newfoo"}' file
You can pass ny other keyword to search using -v kw=... command line option.
END block will execute only when given keyword is not found in the file.
This might work for you (GNU sed):
sed '/blah/h;//s/foo/newfoo/;$!b;G;/blah/!s/.$/&setenv blah newfoo/;t;P;d' file
If blah found store that line in the hold space (HS) and substitute newfoo for foo. Print each line if not end-of-file. At end-of-file append the HS to the pattern space and check for blah. If not found append setenv blah newfoo and print otherwise print the last line and delete the remainder.
I think you can make the sed a bit easier if you pass over it with a quick grep afterward and append the text if its still not found.
E.g.
ENV_KEY=BLAH
ENV_VAL=foo
sed -i "/BLAH/s/^.*\$/setenv ${ENV_KEY} ${ENV_VAL}/g" text.tmp
! grep -q "${ENV_KEY}" text.tmp && echo "setenv ${ENV_KEY} ${ENV_VAL}" >> text.tmp
This tells sed to search for the string BLAH, if it finds it then replace the entire line ^.*$. The -i option tells sed to update the file in place.

Bash sed match a string with newlines above and below

Here's an excerpt of a text file.
http_server = Server(
uuid = "9a44b850-c54f-11e3-9c1a-0800200c9a66",
)
# https_server = Server(
# uuid = "0c9cb0c0-c55e-11e3-9c1a-0800200c9a66",
# )
I want to use sed (or something similar) to extract the: "0c9cb0c0-c55e-11e3-9c1a-0800200c9a66" out of the file.
I've tried cat server.conf | sed -n 's/.*uuid = "\(.*\)",/\1/p' but it gives me both uuids. When I put in newlines like \n the sed doesn't work at all.
The unique marker for the uuid is https_server, the regex must make sure the uuid was inside the https_server.
Try this instead:
cat server.conf | sed -n -e '/https_server/{N;p}' | sed -n -e 's/.*uuid = "\([^ ]*\)",/\1/p'
Or this invoking sed once only:
cat server.conf | sed -n -e '/https_server/{N;s/.*uuid = "\([^ ]*\)",/\1/p}'
Or if there is chance of multiple empty lines between the https_server and uuid line inside the block:
cat server.conf | sed -n -e '/https_server/,/uuid/p' | sed -n -e 's/.*uuid = "\([^ ]*\)",/\1/p'
For making sure the uuid is inside the https_server, you can skip all lines until you reach that string:
cat server.conf | sed -n '/https_server/,//{//!p}' | sed -n 's/.*uuid = "\(.*\)",/\1/p'
This one looks for a https_server line which is possibly commented out, then extracts any uuid in the block which follows before the next closing parenthesis.
sed -n '/^#* *https_server *=.*(/,/)/!d;/^#* *uuid *= *"/!d;s///;s/",//p' searver.conf
This avoids the useless use of cat and the silly multiple sed invocations.
/regex/,/regex/
selects a region. The action !d simply discards any lines outside of this region.
/^ *uuid *= *"/
selects any line in the region matching this pattern. Again, !d discards any line which is not selected.
s///
deletes the previously matched pattern.
Finally,
s/",//
removes the quote and the comma at the end of the string.
Some sed dialects might want you to backslash the literal parentheses.
This might work for you (GNU sed):
sed -rn '/https_server/{n;s/.*uuid = "([^"]*)".*/\1/p;q}' server.conf
If the uuid is not on the next line but some other following line, use:
sed -rn '/https_server/{:a;n;/\)\s*$/b;s/.*uuid = "([^"]*)".*/\1/p;Ta;q}' server.conf

Using sed to delete all lines between two matching patterns

I have a file something like:
# ID 1
blah blah
blah blah
$ description 1
blah blah
# ID 2
blah
$ description 2
blah blah
blah blah
How can I use a sed command to delete all lines between the # and $ line? So the result will become:
# ID 1
$ description 1
blah blah
# ID 2
$ description 2
blah blah
blah blah
Can you please kindly give an explanation as well?
Use this sed command to achieve that:
sed '/^#/,/^\$/{/^#/!{/^\$/!d}}' file.txt
Mac users (to prevent extra characters at the end of d command error) need to add semicolons before the closing brackets
sed '/^#/,/^\$/{/^#/!{/^\$/!d;};}' file.txt
OUTPUT
# ID 1
$ description 1
blah blah
# ID 2
$ description 2
blah blah
blah blah
Explanation:
/^#/,/^\$/ will match all the text between lines starting with # to lines starting with $. ^ is used for start of line character. $ is a special character so needs to be escaped.
/^#/! means do following if start of line is not #
/^$/! means do following if start of line is not $
d means delete
So overall it is first matching all the lines from ^# to ^\$ then from those matched lines finding lines that don't match ^# and don't match ^\$ and deleting them using d.
$ cat test
1
start
2
end
3
$ sed -n '1,/start/p;/end/,$p' test
1
start
end
3
$ sed '/start/,/end/d' test
1
3
In general form, if you have a file with contents of form abcde, where section a precedes pattern b, then section c precedes pattern d, then section e follows, and you apply the following sed commands, you get the following results.
In this demonstration, the output is represented by => abcde, where the letters show which sections would be in the output. Thus, ae shows an output of only sections a and e, ace would be sections a, c, and e, etc.
Note that if b or d appear in the output, those are the patterns appearing (i.e., they're treated as if they're sections in the output).
Also don't confuse the /d/ pattern with the command d. The command is always at the end in these demonstrations. The pattern is always between the //.
sed -n -e '/b/,/d/!p' abcde => ae
sed -n -e '/b/,/d/p' abcde => bcd
sed -n -e '/b/,/d/{//!p}' abcde => c
sed -n -e '/b/,/d/{//p}' abcde => bd
sed -e '/b/,/d/!d' abcde => bcd
sed -e '/b/,/d/d' abcde => ae
sed -e '/b/,/d/{//!d}' abcde => abde
sed -e '/b/,/d/{//d}' abcde => ace
Another approach with sed:
sed '/^#/,/^\$/{//!d;};' file
/^#/,/^\$/: from line starting with # up to next line starting with $
//!d: delete all lines except those matching the address patterns
I did something like this long time ago and it was something like:
sed -n -e "1,/# ID 1/ p" -e "/\$ description 1/,$ p"
Which is something like:
-n suppress all output
-e "1,/# ID 1/ p" execute from the first line until your pattern and p (print)
-e "/\$ description 1/,$ p" execute from the second pattern until the end and p (print).
I might be wrong with some of the escaping on the strings, so please double check.
The example below removes lines between "if" and "end if".
All files are scanned, and lines between the two matching patterns are removed ( including them ).
IFS='
'
PATTERN_1="^if"
PATTERN_2="end if"
# Search for the 1st pattern in all files under the current directory.
GREP_RESULTS=(`grep -nRi "$PATTERN_1" .`)
# Go through each result
for line in "${GREP_RESULTS[#]}"; do
# Save the file and line number where the match was found.
FILE=${line%%:*}
START_LINE=`echo "$line" | cut -f2 -d:`
# Search on the same file for a match of the 2nd pattern. The search
# starts from the line where the 1st pattern was matched.
GREP_RESULT=(`tail -n +${START_LINE} $FILE | grep -in "$PATTERN_2" | head -n1`)
END_LINE="$(( $START_LINE + `echo "$GREP_RESULT" | cut -f1 -d:` - 1 ))"
# Remove lines between first and second match from file
sed -e "${START_LINE},${END_LINE}d;" $FILE > $FILE
done

How to find patterns across multiple lines using grep?

I want to find files that have "abc" AND "efg" in that order, and those two strings are on different lines in that file. Eg: a file with content:
blah blah..
blah blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah blah..
blah blah..
Should be matched.
Grep is an awkward tool for this operation.
pcregrep which is found in most of the modern Linux systems can be used as
pcregrep -M 'abc.*(\n|.)*efg' test.txt
where -M, --multiline allow patterns to match more than one line
There is a newer pcre2grep also. Both are provided by the PCRE project.
pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2:
% sudo port install pcre2
and via Homebrew as:
% brew install pcre
or for pcre2
% brew install pcre2
pcre2grep is also available on Linux (Ubuntu 18.04+)
$ sudo apt install pcre2-utils # PCRE2
$ sudo apt install pcregrep # Older PCRE
Here is a solution inspired by this answer:
if 'abc' and 'efg' can be on the same line:
grep -zl 'abc.*efg' <your list of files>
if 'abc' and 'efg' must be on different lines:
grep -Pzl '(?s)abc.*\n.*efg' <your list of files>
Params:
-P Use perl compatible regular expressions (PCRE).
-z Treat the input as a set of lines, each terminated by a zero byte instead of a newline. i.e. grep treats the input as a one big line. Note that if you don't use -l it will display matches followed by a NUL char, see comments.
-l list matching filenames only.
(?s) activate PCRE_DOTALL, which means that '.' finds any character or newline.
I'm not sure if it is possible with grep, but sed makes it very easy:
sed -e '/abc/,/efg/!d' [file-with-content]
sed should suffice as poster LJ stated above,
instead of !d you can simply use p to print:
sed -n '/abc/,/efg/p' file
I relied heavily on pcregrep, but with newer grep you do not need to install pcregrep for many of its features. Just use grep -P.
In the example of the OP's question, I think the following options work nicely, with the second best matching how I understand the question:
grep -Pzo "abc(.|\n)*efg" /tmp/tes*
grep -Pzl "abc(.|\n)*efg" /tmp/tes*
I copied the text as /tmp/test1 and deleted the 'g' and saved as /tmp/test2. Here is the output showing that the first shows the matched string and the second shows only the filename (typical -o is to show match and typical -l is to show only filename). Note that the 'z' is necessary for multiline and the '(.|\n)' means to match either 'anything other than newline' or 'newline' - i.e. anything:
user#host:~$ grep -Pzo "abc(.|\n)*efg" /tmp/tes*
/tmp/test1:abc blah
blah blah..
blah blah..
blah blah..
blah efg
user#host:~$ grep -Pzl "abc(.|\n)*efg" /tmp/tes*
/tmp/test1
To determine if your version is new enough, run man grep and see if something similar to this appears near the top:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression (PCRE, see
below). This is highly experimental and grep -P may warn of
unimplemented features.
That is from GNU grep 2.10.
This can be done easily by first using tr to replace the newlines with some other character:
tr '\n' '\a' | grep -o 'abc.*def' | tr '\a' '\n'
Here, I am using the alarm character, \a (ASCII 7) in place of a newline.
This is almost never found in your text, and grep can match it with a ., or match it specifically with \a.
awk one-liner:
awk '/abc/,/efg/' [file-with-content]
If you are willing to use contexts, this could be achieved by typing
grep -A 500 abc test.txt | grep -B 500 efg
This will display everything between "abc" and "efg", as long as they are within 500 lines of each other.
You can do that very easily if you can use Perl.
perl -ne 'if (/abc/) { $abc = 1; next }; print "Found in $ARGV\n" if ($abc && /efg/); }' yourfilename.txt
You can do that with a single regular expression too, but that involves taking the entire contents of the file into a single string, which might end up taking up too much memory with large files.
For completeness, here is that method:
perl -e '#lines = <>; $content = join("", #lines); print "Found in $ARGV\n" if ($content =~ /abc.*efg/s);' yourfilename.txt
I don't know how I would do that with grep, but I would do something like this with awk:
awk '/abc/{ln1=NR} /efg/{ln2=NR} END{if(ln1 && ln2 && ln1 < ln2){print "found"}else{print "not found"}}' foo
You need to be careful how you do this, though. Do you want the regex to match the substring or the entire word? add \w tags as appropriate. Also, while this strictly conforms to how you stated the example, it doesn't quite work when abc appears a second time after efg. If you want to handle that, add an if as appropriate in the /abc/ case etc.
If you need both words are close each other, for example no more than 3 lines, you can do this:
find . -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"
Same example but filtering only *.txt files:
find . -name *.txt -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"
And also you can replace grep command with egrep command if you want also find with regular expressions.
I released a grep alternative a few days ago that does support this directly, either via multiline matching or using conditions - hopefully it is useful for some people searching here. This is what the commands for the example would look like:
Multiline:
sift -lm 'abc.*efg' testfile
Conditions:
sift -l 'abc' testfile --followed-by 'efg'
You could also specify that 'efg' has to follow 'abc' within a certain number of lines:
sift -l 'abc' testfile --followed-within 5:'efg'
You can find more information on sift-tool.org.
Possible with ripgrep:
$ rg --multiline 'abc(\n|.)+?efg' test.txt
3:blah abc blah
4:blah abc blah
5:blah blah..
6:blah blah..
7:blah blah..
8:blah efg blah blah
Or some other incantations.
If you want . to count as a newline:
$ rg --multiline '(?s)abc.+?efg' test.txt
3:blah abc blah
4:blah abc blah
5:blah blah..
6:blah blah..
7:blah blah..
8:blah efg blah blah
Or equivalent to having the (?s) would be rg --multiline --multiline-dotall
And to answer the original question, where they have to be on separate lines:
$ rg --multiline 'abc.*[\n](\n|.)*efg' test.txt
And if you want it "non greedy" so you don't just get the first abc with the last efg (separate them into pairs):
$ rg --multiline 'abc.*[\n](\n|.)*?efg' test.txt
https://til.hashrocket.com/posts/9zneks2cbv-multiline-matches-with-ripgrep-rg
Sadly, you can't. From the grep docs:
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.
While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable. Those stuck with a version of the C Shell (instead of bash) will need to escape their bangs:
sed -e '/abc/,/efg/\!d' [file]
Which line unfortunately does not work in bash et al.
With silver searcher:
ag 'abc.*(\n|.)*efg' your_filename
similar to ring bearer's answer, but with ag instead. Speed advantages of silver searcher could possibly shine here.
#!/bin/bash
shopt -s nullglob
for file in *
do
r=$(awk '/abc/{f=1}/efg/{g=1;exit}END{print g&&f ?1:0}' file)
if [ "$r" -eq 1 ];then
echo "Found pattern in $file"
else
echo "not found"
fi
done
you can use grep incase you are not keen in the sequence of the pattern.
grep -l "pattern1" filepattern*.* | xargs grep "pattern2"
example
grep -l "vector" *.cpp | xargs grep "map"
grep -l will find all the files which matches the first pattern, and xargs will grep for the second pattern. Hope this helps.
If you have some estimation about the distance between the 2 strings 'abc' and 'efg' you are looking for, you might use:
grep -r . -e 'abc' -A num1 -B num2 | grep 'efg'
That way, the first grep will return the line with the 'abc' plus #num1 lines after it, and #num2 lines after it, and the second grep will sift through all of those to get the 'efg'.
Then you'll know at which files they appear together.
With ugrep released a few months ago:
ugrep 'abc(\n|.)+?efg'
This tool is highly optimized for speed. It's also GNU/BSD/PCRE-grep compatible.
Note that we should use a lazy repetition +?, unless you want to match all lines with efg together until the last efg in the file.
You have at least a couple options --
DOTALL method
use (?s) to DOTALL the . character to include \n
you can also use a lookahead (?=\n) -- won't be captured in match
example-text:
true
match me
false
match me one
false
match me two
true
match me three
third line!!
{BLANK_LINE}
command:
grep -Pozi '(?s)true.+?\n(?=\n)' example-text
-p for perl regular expressions
-o to only match pattern, not whole line
-z to allow line breaks
-i makes case-insensitive
output:
true
match me
true
match me three
third line!!
notes:
- +? makes modifier non-greedy so matches shortest string instead of largest (prevents from returning one match containing entire text)
you can use the oldschool O.G. manual method using \n
command:
grep -Pozi 'true(.|\n)+?\n(?=\n)'
output:
true
match me
true
match me three
third line!!
I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:
grep -Pzo ">tig00000034[^>]+" file.fasta > desired_sequence.fasta
P for perl based searches
z for making a line end in 0 bytes rather than newline char
o to just capture what matched since grep returns the whole line (which in this case since you did -z is the whole file).
The core of the regexp is the [^>] which translates to "not the greater than symbol"
As an alternative to Balu Mohan's answer, it is possible to enforce the order of the patterns using only grep, head and tail:
for f in FILEGLOB; do tail $f -n +$(grep -n "pattern1" $f | head -n1 | cut -d : -f 1) 2>/dev/null | grep "pattern2" &>/dev/null && echo $f; done
This one isn't very pretty, though. Formatted more readably:
for f in FILEGLOB; do
tail $f -n +$(grep -n "pattern1" $f | head -n1 | cut -d : -f 1) 2>/dev/null \
| grep -q "pattern2" \
&& echo $f
done
This will print the names of all files where "pattern2" appears after "pattern1", or where both appear on the same line:
$ echo "abc
def" > a.txt
$ echo "def
abc" > b.txt
$ echo "abcdef" > c.txt; echo "defabc" > d.txt
$ for f in *.txt; do tail $f -n +$(grep -n "abc" $f | head -n1 | cut -d : -f 1) 2>/dev/null | grep -q "def" && echo $f; done
a.txt
c.txt
d.txt
Explanation
tail -n +i - print all lines after the ith, inclusive
grep -n - prepend matching lines with their line numbers
head -n1 - print only the first row
cut -d : -f 1 - print the first cut column using : as the delimiter
2>/dev/null - silence tail error output that occurs if the $() expression returns empty
grep -q - silence grep and return immediately if a match is found, since we are only interested in the exit code
This should work too?!
perl -lpne 'print $ARGV if /abc.*?efg/s' file_list
$ARGV contains the name of the current file when reading from file_list
/s modifier searches across newline.
The filepattern *.sh is important to prevent directories to be inspected. Of course some test could prevent that too.
for f in *.sh
do
a=$( grep -n -m1 abc $f )
test -n "${a}" && z=$( grep -n efg $f | tail -n 1) || continue
(( ((${z/:*/}-${a/:*/})) > 0 )) && echo $f
done
The
grep -n -m1 abc $f
searches maximum 1 matching and returns (-n) the linenumber.
If a match was found (test -n ...) find the last match of efg (find all and take the last with tail -n 1).
z=$( grep -n efg $f | tail -n 1)
else continue.
Since the result is something like 18:foofile.sh String alf="abc"; we need to cut away from ":" till end of line.
((${z/:*/}-${a/:*/}))
Should return a positive result if the last match of the 2nd expression is past the first match of the first.
Then we report the filename echo $f.
To search recursively across all files (across multiple lines within each file) with BOTH strings present (i.e. string1 and string2 on different lines and both present in same file):
grep -r -l 'string1' * > tmp; while read p; do grep -l 'string2' $p; done < tmp; rm tmp
To search recursively across all files (across multiple lines within each file) with EITHER string present (i.e. string1 and string2 on different lines and either present in same file):
grep -r -l 'string1\|string2' *
Here's a way by using two greps in a row:
egrep -o 'abc|efg' $file | grep -A1 abc | grep efg | wc -l
returns 0 or a positive integer.
egrep -o (Only shows matches, trick: multiple matches on the same line produce multi-line output as if they are on different lines)
grep -A1 abc (print abc and the line after it)
grep efg | wc -l (0-n count of efg lines found after abc on the same or following lines, result can be used in an 'if")
grep can be changed to egrep etc. if pattern matching is needed
This should work:
cat FILE | egrep 'abc|efg'
If there is more than one match you can filter out using grep -v