I have lines in myfile like this:
mount -t cifs //hostname/path/ /mount/path/ -o username='xxxx',password='xxxxx'
I need to extract sub-strings from this based on condition "start with // till next white-space including //".
I can't parse with the position as it won't be the same in all matched lines.
So far I have extracted the sub-string using grep's perl assertion, but the result does not return the //.
The piece of code I've used is
cat myfile | grep " cifs " | grep -oP "(?<=/)[^\s]*" | grep -v ^/
Output:
hostname/path/
Expected Output:
//hostname/path/
Is there a way to get the desired output by modifying the perl regex, perhaps some other method?
Simple bash one line solution
grep " cifs " myfile | sed -e "s/ /\n/g" | grep '^\/\/'
You may consider using some non-PCRE based solutions like
sed -En '/ cifs /{s,.*(//[^[:space:]]+).*,\1,p}' file
grep -oE '//[^[:space:]]+' file
The grep solution simply extracts all occurrences of // and 1+ non-whitespace chars after from the file.
The sed solution finds lines containing cifs and then extracts the last occurrence of // and 1+ non-whitespace chars after on those lines.
Following command should do what you ask for
grep cifs myfile | cut -d ' ' -f 4
or
grep cifs myfile | nawk '{print $4}'
or
awk '/cifs/ { print $4 }' myfile
or
perl -ne "print $1 if /cifs\s+(\S+)/" myfile
Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"
I have the file ip.txt which contain the following
ata001dcfe16f85.mm.ph.ph.cox.net (24.252.231.220)
220.231.252.24.xxx.com (24.252.231.220)
and I made this bash command to extract ips :
grep -Eo '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' ip.txt | sort -u > good.txt
I want to edit the code so it extracts the ips between the parentheses ONLY . not all the ips on the line because the current code extract the ip 220.231.252.24
To get the IP within paranthesis all you need is to wrap the entire regex in an escaped \( \)
grep -Eo '\((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\)'
will give output as
(24.252.231.220)
(24.252.231.220)
if you want to get rid of the paranthesis as well in the output, look around would be usefull
grep -oP '(?<=\()(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?=\))'
would produce output as
24.252.231.220
24.252.231.220
a much more lighter version would be
grep -oP '(?<=\()(25[0-5]|2[0-4][0-9]|[01]?[0-9]{2}?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]{2}?)){3}(?=\))'
here
[0-9]{2} matches the number 2 times
(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]{2}?)){3} matches . followed by 3 digit number three times
The repeating lines can be removed using a pipe to uniq as
grep -oP '(?<=\()(25[0-5]|2[0-4][0-9]|[01]?[0-9]{2}?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]{2}?)){3}(?=\))' input | uniq
giving the output as
24.252.231.220
You can try awk
awk -F"[()]" '{print $(NF-1)}' file
24.252.231.220
24.252.231.220
Question
Let's say I have one line of text with a number placed somewhere (it could be at the beginning, in the middle or at the end of the line).
How to match and keep the first number found in a line using sed?
Minimal example
Here is my attempt (following this page of a tutorial on regular expressions) and the output for different positions of the number:
$echo "SomeText 123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "SomeText 123" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
As you can only the last digit is kept in the process whereas the desired output should be 123...
Using sed:
echo "SomeText 123SomeText 456" | sed -r 's/^[^0-9]*([0-9]+).*$/\1/'
123
You can also do this in gnu awk:
echo "SomeText 123SomeText 456" | awk '{print gensub(/^[^0-9]*([0-9]+).*$/, "\\1", $0)}'
123
To complement the sed solutions, here's an awk alternative (assuming that the goal is to extract the 1st number on each line, if any (i.e., ignore lines without any numbers)):
awk -F'[^0-9]*' '/[0-9]/ { print ($1 != "" ? $1 : $2) }'
-F'[^0-9]*' defines any sequence of non-digit chars. (including the empty string) as the field separator; awk automatically breaks each input line into fields based on that separator, with $1 representing the first field, $2 the second, and so on.
/[0-9]/ is a pattern (condition) that ensures that output is only produced for lines that contain at least one digit, via its associated action (the {...} block) - in other words: lines containing NO number at all are ignored.
{ print ($1!="" ? $1 : $2) } prints the 1st field, if nonempty, otherwise the 2nd one; rationale: if the line starts with a number, the 1st field will contain the 1st number on the line (because the line starts with a field rather than a separator; otherwise, it is the 2nd field that contains the 1st number (because the line starts with a separator).
You can also use grep, which is ideally suited to this task. sed is a Stream EDitor, which is only going to indirectly give you what you want. With grep, you only have to specify the part of the line you want.
$ cat file.txt
SomeText 123SomeText
123SomeText
SomeText 123
$ grep -o '[0-9]\+' file.txt
123
123
123
grep -o prints only the matching parts of a line, each on a separate line. The pattern is simple: one or more digits.
If your version of grep is compatible with the -P switch, you can use Perl-style regular expressions and make the command even shorter:
$ grep -Po '\d+' file.txt
123
123
123
Again, this matches one or more digits.
Using grep is a lot simpler and has the advantage that if the line doesn't match, nothing is printed:
$ echo "no number" | grep -Po '\d+' # no output
$ echo "yes 123number" | grep -Po '\d+'
123
edit
As pointed out in the comments, one possible problem is that this won't only print the first matching number on the line. If the line contains more than one number, they will all be printed. As far as I'm aware, this can't be done using grep -o.
In that case, I'd go with perl:
perl -lne 'print $1 if /.*?(\d+).*/'
This uses lazy matching (the question mark) so only non-digit characters are consumed by the .* at the start of the pattern. The $1 is a back reference, like \1 in sed. If there are more than one number on the line, this only prints the first. If there aren't any at all, it doesn't print anything:
$ echo "no number" | perl -ne 'print "$1\n" if /.*?(\d+).*/'
$ echo "yes123number456" | perl -lne 'print $1 if /.*?(\d+).*/'
123
If for some reason you still really want to use sed, you can do this:
sed -n 's/^[^0-9]*\([0-9]\{1,\}\).*$/\1/p'
unlike the other answers, this is compatible with all version of sed and will only print lines that contain a match.
Try this sed command,
$echo "SomeText 123SomeText" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123
Another example,
$ echo "SomeText 123SomeText 456" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123 456
It prints all the numbers in a file and the captured numbers are separated by spaces while printing.
I want to find files that have "abc" AND "efg" in that order, and those two strings are on different lines in that file. Eg: a file with content:
blah blah..
blah blah..
blah abc blah
blah blah..
blah blah..
blah blah..
blah efg blah blah
blah blah..
blah blah..
Should be matched.
Grep is an awkward tool for this operation.
pcregrep which is found in most of the modern Linux systems can be used as
pcregrep -M 'abc.*(\n|.)*efg' test.txt
where -M, --multiline allow patterns to match more than one line
There is a newer pcre2grep also. Both are provided by the PCRE project.
pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2:
% sudo port install pcre2
and via Homebrew as:
% brew install pcre
or for pcre2
% brew install pcre2
pcre2grep is also available on Linux (Ubuntu 18.04+)
$ sudo apt install pcre2-utils # PCRE2
$ sudo apt install pcregrep # Older PCRE
Here is a solution inspired by this answer:
if 'abc' and 'efg' can be on the same line:
grep -zl 'abc.*efg' <your list of files>
if 'abc' and 'efg' must be on different lines:
grep -Pzl '(?s)abc.*\n.*efg' <your list of files>
Params:
-P Use perl compatible regular expressions (PCRE).
-z Treat the input as a set of lines, each terminated by a zero byte instead of a newline. i.e. grep treats the input as a one big line. Note that if you don't use -l it will display matches followed by a NUL char, see comments.
-l list matching filenames only.
(?s) activate PCRE_DOTALL, which means that '.' finds any character or newline.
I'm not sure if it is possible with grep, but sed makes it very easy:
sed -e '/abc/,/efg/!d' [file-with-content]
sed should suffice as poster LJ stated above,
instead of !d you can simply use p to print:
sed -n '/abc/,/efg/p' file
I relied heavily on pcregrep, but with newer grep you do not need to install pcregrep for many of its features. Just use grep -P.
In the example of the OP's question, I think the following options work nicely, with the second best matching how I understand the question:
grep -Pzo "abc(.|\n)*efg" /tmp/tes*
grep -Pzl "abc(.|\n)*efg" /tmp/tes*
I copied the text as /tmp/test1 and deleted the 'g' and saved as /tmp/test2. Here is the output showing that the first shows the matched string and the second shows only the filename (typical -o is to show match and typical -l is to show only filename). Note that the 'z' is necessary for multiline and the '(.|\n)' means to match either 'anything other than newline' or 'newline' - i.e. anything:
user#host:~$ grep -Pzo "abc(.|\n)*efg" /tmp/tes*
/tmp/test1:abc blah
blah blah..
blah blah..
blah blah..
blah efg
user#host:~$ grep -Pzl "abc(.|\n)*efg" /tmp/tes*
/tmp/test1
To determine if your version is new enough, run man grep and see if something similar to this appears near the top:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression (PCRE, see
below). This is highly experimental and grep -P may warn of
unimplemented features.
That is from GNU grep 2.10.
This can be done easily by first using tr to replace the newlines with some other character:
tr '\n' '\a' | grep -o 'abc.*def' | tr '\a' '\n'
Here, I am using the alarm character, \a (ASCII 7) in place of a newline.
This is almost never found in your text, and grep can match it with a ., or match it specifically with \a.
awk one-liner:
awk '/abc/,/efg/' [file-with-content]
If you are willing to use contexts, this could be achieved by typing
grep -A 500 abc test.txt | grep -B 500 efg
This will display everything between "abc" and "efg", as long as they are within 500 lines of each other.
You can do that very easily if you can use Perl.
perl -ne 'if (/abc/) { $abc = 1; next }; print "Found in $ARGV\n" if ($abc && /efg/); }' yourfilename.txt
You can do that with a single regular expression too, but that involves taking the entire contents of the file into a single string, which might end up taking up too much memory with large files.
For completeness, here is that method:
perl -e '#lines = <>; $content = join("", #lines); print "Found in $ARGV\n" if ($content =~ /abc.*efg/s);' yourfilename.txt
I don't know how I would do that with grep, but I would do something like this with awk:
awk '/abc/{ln1=NR} /efg/{ln2=NR} END{if(ln1 && ln2 && ln1 < ln2){print "found"}else{print "not found"}}' foo
You need to be careful how you do this, though. Do you want the regex to match the substring or the entire word? add \w tags as appropriate. Also, while this strictly conforms to how you stated the example, it doesn't quite work when abc appears a second time after efg. If you want to handle that, add an if as appropriate in the /abc/ case etc.
If you need both words are close each other, for example no more than 3 lines, you can do this:
find . -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"
Same example but filtering only *.txt files:
find . -name *.txt -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"
And also you can replace grep command with egrep command if you want also find with regular expressions.
I released a grep alternative a few days ago that does support this directly, either via multiline matching or using conditions - hopefully it is useful for some people searching here. This is what the commands for the example would look like:
Multiline:
sift -lm 'abc.*efg' testfile
Conditions:
sift -l 'abc' testfile --followed-by 'efg'
You could also specify that 'efg' has to follow 'abc' within a certain number of lines:
sift -l 'abc' testfile --followed-within 5:'efg'
You can find more information on sift-tool.org.
Possible with ripgrep:
$ rg --multiline 'abc(\n|.)+?efg' test.txt
3:blah abc blah
4:blah abc blah
5:blah blah..
6:blah blah..
7:blah blah..
8:blah efg blah blah
Or some other incantations.
If you want . to count as a newline:
$ rg --multiline '(?s)abc.+?efg' test.txt
3:blah abc blah
4:blah abc blah
5:blah blah..
6:blah blah..
7:blah blah..
8:blah efg blah blah
Or equivalent to having the (?s) would be rg --multiline --multiline-dotall
And to answer the original question, where they have to be on separate lines:
$ rg --multiline 'abc.*[\n](\n|.)*efg' test.txt
And if you want it "non greedy" so you don't just get the first abc with the last efg (separate them into pairs):
$ rg --multiline 'abc.*[\n](\n|.)*?efg' test.txt
https://til.hashrocket.com/posts/9zneks2cbv-multiline-matches-with-ripgrep-rg
Sadly, you can't. From the grep docs:
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.
While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable. Those stuck with a version of the C Shell (instead of bash) will need to escape their bangs:
sed -e '/abc/,/efg/\!d' [file]
Which line unfortunately does not work in bash et al.
With silver searcher:
ag 'abc.*(\n|.)*efg' your_filename
similar to ring bearer's answer, but with ag instead. Speed advantages of silver searcher could possibly shine here.
#!/bin/bash
shopt -s nullglob
for file in *
do
r=$(awk '/abc/{f=1}/efg/{g=1;exit}END{print g&&f ?1:0}' file)
if [ "$r" -eq 1 ];then
echo "Found pattern in $file"
else
echo "not found"
fi
done
you can use grep incase you are not keen in the sequence of the pattern.
grep -l "pattern1" filepattern*.* | xargs grep "pattern2"
example
grep -l "vector" *.cpp | xargs grep "map"
grep -l will find all the files which matches the first pattern, and xargs will grep for the second pattern. Hope this helps.
If you have some estimation about the distance between the 2 strings 'abc' and 'efg' you are looking for, you might use:
grep -r . -e 'abc' -A num1 -B num2 | grep 'efg'
That way, the first grep will return the line with the 'abc' plus #num1 lines after it, and #num2 lines after it, and the second grep will sift through all of those to get the 'efg'.
Then you'll know at which files they appear together.
With ugrep released a few months ago:
ugrep 'abc(\n|.)+?efg'
This tool is highly optimized for speed. It's also GNU/BSD/PCRE-grep compatible.
Note that we should use a lazy repetition +?, unless you want to match all lines with efg together until the last efg in the file.
You have at least a couple options --
DOTALL method
use (?s) to DOTALL the . character to include \n
you can also use a lookahead (?=\n) -- won't be captured in match
example-text:
true
match me
false
match me one
false
match me two
true
match me three
third line!!
{BLANK_LINE}
command:
grep -Pozi '(?s)true.+?\n(?=\n)' example-text
-p for perl regular expressions
-o to only match pattern, not whole line
-z to allow line breaks
-i makes case-insensitive
output:
true
match me
true
match me three
third line!!
notes:
- +? makes modifier non-greedy so matches shortest string instead of largest (prevents from returning one match containing entire text)
you can use the oldschool O.G. manual method using \n
command:
grep -Pozi 'true(.|\n)+?\n(?=\n)'
output:
true
match me
true
match me three
third line!!
I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:
grep -Pzo ">tig00000034[^>]+" file.fasta > desired_sequence.fasta
P for perl based searches
z for making a line end in 0 bytes rather than newline char
o to just capture what matched since grep returns the whole line (which in this case since you did -z is the whole file).
The core of the regexp is the [^>] which translates to "not the greater than symbol"
As an alternative to Balu Mohan's answer, it is possible to enforce the order of the patterns using only grep, head and tail:
for f in FILEGLOB; do tail $f -n +$(grep -n "pattern1" $f | head -n1 | cut -d : -f 1) 2>/dev/null | grep "pattern2" &>/dev/null && echo $f; done
This one isn't very pretty, though. Formatted more readably:
for f in FILEGLOB; do
tail $f -n +$(grep -n "pattern1" $f | head -n1 | cut -d : -f 1) 2>/dev/null \
| grep -q "pattern2" \
&& echo $f
done
This will print the names of all files where "pattern2" appears after "pattern1", or where both appear on the same line:
$ echo "abc
def" > a.txt
$ echo "def
abc" > b.txt
$ echo "abcdef" > c.txt; echo "defabc" > d.txt
$ for f in *.txt; do tail $f -n +$(grep -n "abc" $f | head -n1 | cut -d : -f 1) 2>/dev/null | grep -q "def" && echo $f; done
a.txt
c.txt
d.txt
Explanation
tail -n +i - print all lines after the ith, inclusive
grep -n - prepend matching lines with their line numbers
head -n1 - print only the first row
cut -d : -f 1 - print the first cut column using : as the delimiter
2>/dev/null - silence tail error output that occurs if the $() expression returns empty
grep -q - silence grep and return immediately if a match is found, since we are only interested in the exit code
This should work too?!
perl -lpne 'print $ARGV if /abc.*?efg/s' file_list
$ARGV contains the name of the current file when reading from file_list
/s modifier searches across newline.
The filepattern *.sh is important to prevent directories to be inspected. Of course some test could prevent that too.
for f in *.sh
do
a=$( grep -n -m1 abc $f )
test -n "${a}" && z=$( grep -n efg $f | tail -n 1) || continue
(( ((${z/:*/}-${a/:*/})) > 0 )) && echo $f
done
The
grep -n -m1 abc $f
searches maximum 1 matching and returns (-n) the linenumber.
If a match was found (test -n ...) find the last match of efg (find all and take the last with tail -n 1).
z=$( grep -n efg $f | tail -n 1)
else continue.
Since the result is something like 18:foofile.sh String alf="abc"; we need to cut away from ":" till end of line.
((${z/:*/}-${a/:*/}))
Should return a positive result if the last match of the 2nd expression is past the first match of the first.
Then we report the filename echo $f.
To search recursively across all files (across multiple lines within each file) with BOTH strings present (i.e. string1 and string2 on different lines and both present in same file):
grep -r -l 'string1' * > tmp; while read p; do grep -l 'string2' $p; done < tmp; rm tmp
To search recursively across all files (across multiple lines within each file) with EITHER string present (i.e. string1 and string2 on different lines and either present in same file):
grep -r -l 'string1\|string2' *
Here's a way by using two greps in a row:
egrep -o 'abc|efg' $file | grep -A1 abc | grep efg | wc -l
returns 0 or a positive integer.
egrep -o (Only shows matches, trick: multiple matches on the same line produce multi-line output as if they are on different lines)
grep -A1 abc (print abc and the line after it)
grep efg | wc -l (0-n count of efg lines found after abc on the same or following lines, result can be used in an 'if")
grep can be changed to egrep etc. if pattern matching is needed
This should work:
cat FILE | egrep 'abc|efg'
If there is more than one match you can filter out using grep -v