Regex for strings nearby other strings? - regex

I want to write a flexible regex for grep that will return search terms found within a certain distance from each other.
The ideal behavior is something like research databases; for example, where you can search for articles that have capital and GDP within 15 words of each other, which would include articles where the strings capital and GDP may be separated by five, six, seven, etc., alphanumeric strings of unspecified length. The regex statement would include punctuation (e.g., commas, periods, hyphens), but also accent marks and diacritics. So, results where chechè and lavi are no more than five strings apart.
I imagine the statement will involve lookaheads, and phrases like {1,15}, or maybe piping one grep thru another grep, but that loses the benefit of GREP_OPTIONS='--color=auto'. Constructing it is really beyond my skill set. I have a set of .txt documents that I want to run the search over, but making the regex flexible to change the distance between strings or to truncate the terms would also be useful for others who have things like fieldnotes or reading notes in a standard format.
EDIT
Below is a sample of passages taken from the Bible.
Ye shall buy meat of them for money, that ye may eat; and ye shall also buy water of them for money, that ye may drink. For the Lord thy God hath blessed thee in all the works of thy hand: he knoweth thy walking through this great wilderness: these forty years the Lord thy God hath been with thee; thou hast lacked nothing... Thou shalt sell me meat for money, that I may eat; and give me water for money, that I may drink: only I will pass through on my feet: (as the children of Esau which dwell in Seir, and the Moabites which dwell in Ar, did unto me:) until I shall pass over Jordan into the land which the Lord our God giveth us. But Sihon king of Heshbon would not let us pass by him: for the Lord thy God hardened his spirit, and made his heart obstinate, that he might deliver him into thy hand, as appeareth this day. And the Lord said unto me, Behold, I have begun to give Sihon and his land before thee: begin to possess, that thou mayest inherit his land. Then Sihon came out against us, he and all his people, to fight at Jahaz. And the Lord our God delivered him before us; and we smote him, and his sons, and all his people. And if the way be too long for thee, so that thou art not able to carry it; or if the place be too far from thee, which the Lord thy God shall choose to set his name there, when the Lord thy God hath blessed thee: then shalt thou turn it into money, and bind up the money in thine hand, and shalt go unto the place which the Lord thy God shall choose: and thou shalt bestow that money for whatsoever thy soul lusteth after, for oxen, or for sheep, or for wine, or for strong drink, or for whatsoever thy soul desireth: and thou shalt eat there before the Lord thy God, and thou shalt rejoice, thou, and thine household, and the Levite that is within thy gates; thou shalt not forsake him: for he hath no part nor inheritance with thee... Now it came to pass, that at what time the chest was brought unto the king’s office by the hand of the Levites, and when they saw that there was much money, the king’s scribe and the high priest’s officer came and emptied the chest, and took it, and carried it to his place again. Thus they did day by day, and gathered money in abundance. And when they had finished it, they brought the rest of the money before the king and Jehoiada, whereof were made vessels for the house of the Lord , even vessels to minister, and to offer withal, and spoons, and vessels of gold and silver. And they offered burnt offerings in the house of the Lord continually all the days of Jehoiada. Thou hast bought me no sweet cane with money, neither hast thou filled me with the fat of thy sacrifices; but thou hast made me to serve with thy sins, thou hast wearied me with thine iniquities... Howbeit there were not made for the house of the Lord bowls of silver, snuffers, basins, trumpets, any vessels of gold, or vessels of silver, of the money that was brought into the house of the Lord: but they gave that to the workmen, and repaired therewith the house of the Lord. Moreover they reckoned not with the men, into whose hand they delivered the money to be bestowed on workmen: for they dealt faithfully. The trespass money and sin money was not brought into the house of the Lord: it was the priests’.
If I wanted to grep for instances of where shalt and money are co-present within five words (including punctuation), how would I write that regex?
I'm not sure how to give expected results since grep --context=1 would include more than just the strings with 0-5 strings in between, but I imagine the results would identify:
shalt sell me meat for money
shalt thou turn it into money
money in thine hand, and shalt
shalt bestow that money
But would not return shall buy meat of them for money, since 'money' appears as the sixth string.

Well, it's not grep but this seems to do what you asked for using GNU awk for multi-char RS and word boundaries:
$ cat tst.awk
BEGIN {
RS="^$"
split(words,word)
}
{
gsub(/#/,"#A"); gsub(/{/,"#B"); gsub(/}/,"#C")
gsub("\\<"word[1]"\\>","{")
gsub("\\<"word[2]"\\>","}")
while ( match($0,/{[^{}]+}|}[^{}]+{/) ) {
tgt = substr($0,RSTART,RLENGTH)
gsub(/}/,word[2],tgt)
gsub(/{/,word[1],tgt)
gsub(/#C/,"}",tgt); gsub(/#B/,"{",tgt); gsub(/#A/,"#",tgt)
if ( gsub(/[[:space:]]+/,"&",tgt) <= range ) {
print tgt
}
$0 = substr($0,RSTART+length(word[1]))
}
}
$ awk -v words='money shalt' -v range=5 -f tst.awk file
shalt sell me meat for money
shalt thou turn it into money
money in thine hand, and shalt
shalt bestow that money
$ awk -v words='and him' -v range=10 -f tst.awk file
him: for the Lord thy God hardened his spirit, and
and made his heart obstinate, that he might deliver him
him before us; and
and we smote him
him, and
Note that the above works even with input like shalt sell me meat for money in thine hand, and shalt where one of the words (money) appears 5 words after the first occurrence of the other word (shalt) AND 5 words before a second occurrence of that first word (again, shalt):
$ echo 'shalt sell me meat for money in thine hand, and shalt' |
awk -v words='shalt money' -v range=5 -f tst.awk
shalt sell me meat for money
money in thine hand, and shalt
For colors, file names, and line numbers:
Do this to see the colors available to you in your terminal (each line will be output in a different color):
$ for ((c=0; c<$(tput colors); c++)); do tput setaf "$c"; tput setaf "$c" | cat -v; echo "=$c"; done; tput setaf 0
^[[30m=0
^[[31m=1
^[[32m=2
^[[33m=3
^[[34m=4
^[[35m=5
^[[36m=6
^[[37m=7
Now that you can see what those escape sequences and numbers mean, update the awk script to (\033 = ^[ = Esc):
$ cat tst.awk
BEGIN {
RS="^$"
split(words,word)
c["black"] = "\033[30m"
c["red"] = "\033[31m"
c["green"] = "\033[32m"
c["yellow"] = "\033[33m"
c["blue"] = "\033[34m"
c["pink"] = "\033[35m"
c["teal"] = "\033[36m"
c["grey"] = "\033[37m"
for (color in c) {
print c[color] color c["black"]
}
}
{
gsub(/#/,"#A"); gsub(/{/,"#B"); gsub(/}/,"#C")
gsub("\\<"word[1]"\\>","{")
gsub("\\<"word[2]"\\>","}")
while ( match($0,/{[^{}]+}|}[^{}]+{/) ) {
tgt = substr($0,RSTART,RLENGTH)
gsub(/}/,word[2],tgt)
gsub(/{/,word[1],tgt)
gsub(/#C/,"}",tgt); gsub(/#B/,"{",tgt); gsub(/#A/,"#",tgt)
if ( gsub(/[[:space:]]+/,"&",tgt) <= range ) {
print FILENAME, FNR, c["red"] tgt c["black"]
}
$0 = substr($0,RSTART+length(word[1]))
}
}
and when you run it you'll see a dump of all available colors and for each of your target text it will be preceded by the file name and line number within that file and the text will be colored in red:

Short answer:
grep 'shalt\W\+\(\w\+\W\+\)\{0,5\}money'
Maybe in both directions:
grep 'shalt\W\+\(\w\+\W\+\)\{0,5\}money\|money\W\+\(\w\+\W\+\)\{0,5\}shalt'
https://www.gnu.org/software/grep/manual/grep.html:
‘\w’
Match word constituent, it is a synonym for ‘[_[:alnum:]]’.
‘\W’
Match non-word constituent, it is a synonym for ‘[^_[:alnum:]]’.
Generic answer to construct the grep dynamicly, in this case with a shell function:
find_adjacent() {
dist="$1"; shift
grep1="$1"; shift
grep2="$1"; shift
between='\W\+\(\w\+\W\+\)\{0,'"$dist"'\}'
regex="$grep1$between$grep2\|$grep2$between$grep1"
printf 'Using the regex: %s\n' "$regex" 1>&2
grep "$regex" "$#"
}
Example usage:
echo 'shalt sell me meat for money
shalt thou turn it into money
money in thine hand, and shalt
shalt bestow that money
capital and GDP' | find_adjacent 3 shalt money -i --color=auto
or to match across lines:
find_adjacent 5 shalt money -z file_with_the_bible_passages.txt
Edit
As pointed out by EdMorton this only finds the first part of a continues match. It would still match the right line, but color highlighting would be of a bit.
To fix this the regex will get more complicated because it has to match any continues "shalt ... money ... shalt" in 4 cases:
"shalt ... money ... shalt"
"shalt ... money ... shalt ... money"
"money ... shalt ... money"
"money ... shalt ... money ... shalt"
This can be done by replacing the regex=... line with:
regex1="$grep1\($between$grep2$between$grep1\)\+"
regex2="$grep1$between$grep2\($between$grep1$between$grep2\)*"
regex3="$grep2\($between$grep1$between$grep2\)\+"
regex4="$grep2$between$grep1\($between$grep2$between$grep1\)*"
regex="$regex1\|$regex2\|$regex3\|$regex4"
Additionally it might be mixed up like this:
"shalt xxx shalt xxx money xxx money"
With a distance of max 3 words between, the above regex still would only find:
"shalt xxx shalt xxx money"
To mach those cases the only viable solution is, to only match the words themself and use look-aheads/look-behinds (needs more advanced implementation of regex e.g. GNU grep's -P for perl regular expressions):
find_adjacent() {
dist="$1"; shift
word1="$1"; shift
word2="$1"; shift
ahead='\W+(\w+\W+){0,'"$dist"'}'
behind='(\W+\w+){0,'"$dist"'}\W+'
regex="$word1(?=$ahead$word2)|(?<=$word2)$behind\K$word1|$word2(?=$ahead$word1)|(?<=$word1)$behind\K$word2"
printf 'Using the regex: %s\n' "$regex" 1>&2
grep -P "$regex" "$#"
}
Another example usage (search case insensitive, display filename and line, highlight the words found, search all files in a directory):
find_adjacent 15 capital GDP -i -Hn --color=auto -r folder_to_search

Related

Unix commands and regex to find the total counts for different criteria

I am a new to unix commands and regex. I applied the following commands to this English corpus and I am not sure about them.
a. Count the total number of words (tokens) : I got 2685545
wc -w testFile.txt
b. Count the total number of unique words (types). I wrote two different command, and not sure which one is correct. The number of types: 657286 or 74066
cat testFile.txt |perl -pe 's/\s/\n/g;' |sort |uniq -c
or
cat testFile.txt |perl -pe 's/\s/\n/g;' |sort |uniq -c |wc -w
c. Count the total number of unique words ignoring capitalization. I got 1910951
cat testFile.txt |perl -pe 's/[a-z]\w+/\n/g;' |sort |uniq -c
d. Count the total number of pure digits tokens.
cat testFile.txt |perl -pe 's/\s/\n/g;' |grep '[0-9]{1,}' |sort |uniq -c |wc -w
e. Count the total number of digits with non-word characters with them (e.g. 8,000.00) I got 18666230
wc -c testFile.txt |perl -pe ’s/[0-9]{1,}\W+[0-9]{1,}\W+[0-9]
{1,}/\n/g;’
f. Count the total number of words starting with capital letters. I got 1048
cat testFile.txt |perl -pe 's/[A-Z]\w+/\n/g;' |egrep '[A-Z]\w+' |wc -w
g. What are the top 15 most common first words of sentences
cat testFile.txt |perl -pe 's/\s/\n/g;' |sort |uniq -c |sort -nr
|head -15
h. What are the top most common capitalized words (that are not sentence initial).
perl -nE 'say $1 while /(\w*[A-Z]+\w*)/g' testFile.txt
I got this list (screenshot is part of the out put):
i. Count all occurrences of Roman numerals 2684068
cat testFile.txt |egrep -i '[IX|IV|V?I{1,3}]' |wc -w
Your help will be very appreciated!
Lets just take the first few line of the text you posted:
$ cat file.txt
RESOLUTION 55/100
Adopted at the 81st plenary meeting, on 4 December 2000, on the recommendation of the Committee (A/55/602/Add.2 and Corr.1, para. 94), by a recorded vote of 106 to 1, with 67 abstentions, as follows:
55/100. Respect for the right to universal freedom of travel and the vital importance of family reunification
The General Assembly,
Reaffirming that all human rights and fundamental freedoms are universal, indivisible, interdependent and interrelated,
Recalling the provisions of the Universal Declaration of Human Rights, as well as article 12 of the International Covenant on Civil and Political Rights,
Stressing that, as stated in the Programme of Action of the International Conference on Population and Development, family reunification of documented migrants is an important factor in international migration and that remittances by documented migrants to their countries of origin often constitute a very important source of foreign exchange and are instrumental in improving the well-being of relatives left behind,
Recalling its resolution 54/169 of 17 December 1999,
1. Once again calls upon all States to guarantee the universally recognized freedom of travel to all foreign nationals legally residing in their territory;
2. Reaffirms that all Governments, in particular those of receiving countries, must recognize the vital importance of family reunification and promote its incorporation into national legislation in order to ensure protection of the unity of families of documented migrants;
3. Calls upon all States to allow, in conformity with international legislation, the free flow of financial remittances by foreign nationals residing in their territory to their relatives in the country of origin;
4. Also calls upon all States to refrain from enacting, and to repeal if it already exists, legislation intended as a coercive measure that discriminates against individuals or groups of legal migrants by adversely affecting family reunification and the right to send financial remittance to relatives in the country of origin;
5. Decides to continue its consideration of this question at its fifty-seventh session under the item entitled "Human rights questions".
The first thing to do is define 'what is a word'?
RESOLUTION clearly is; what about 55/100?
What about things in quotes or parenthesis?
What about 'big' 'Big' 'Big!' 'Big?' 'Big.'? Are those all the same word or four different words? wc counts those as four different words.
Assuming you mean that a 'word' is the lowercase version stripped of all non word characters, you can use a regex to find all words and then lower case so that they compare the same.
In Perl, you can use the regex /\b(\p{L}+)/ to find words.
$ perl -lne 'while (/\b(\p{L}+)/g) {$h{lc($1)}++;} END{foreach (sort { $h{$b} <=> $h{$a} } keys(%h)) {print "$_: $h{$_}"}}' file.txt
of: 25
the: 20
to: 13
and: 11
in: 10
all: 6
that: 5
as: 5
migrants: 4
reunification: 4
family: 4
their: 4
its: 4
by: 4
rights: 4
on: 4
international: 4
...
Adding unique word counts and total word counts against your file, I get:
$ perl -lne 'while (/\b(\p{L}+)/g) {$h{lc($1)}++;}
END{print "unique words: ".scalar keys %h;
foreach (values %h) { $s+=$_ }
print "total words: $s";
foreach (sort { $h{$b} <=> $h{$a} } keys(%h)) {print "$_: $h{$_}"}}' testFile.txt | head
unique words: 11263
total words: 2616047
the: 272618
of: 176015
and: 138295
to: 101670
in: 67440
on: 36025
for: 32558
a: 24742

Why this working regex does not work with sed?

I have this type of text:
Song of Solomon 1:1: The song of songs, which is Solomon’s.
John 3:16:For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
III John 1:8: We therefore ought to receive such, that we might be fellowhelpers to the truth.
I am trying to remove the verse (or metadata if you will) and just get plain text the content. The example text shows three different types of verses (multiword, singleword and roman + word), I thought that it would be easier to detect from the beginning of each line, anything until "number:number:", and then substitute it with "" (empty string).
I tested a regex that seems to work (as I described):
First find until "number:number:" excluding it [or: .+?(?=(\s+)(\d+)(:)(\d+)(:))],
And then include the "number:number:" pattern [or: (\s+)(\d+)(:)(\d+)(:)]
Which leads to the following regex:
.+?(?=(\s+)(\d+)(:)(\d+)(:))(\s+)(\d+)(:)(\d+)(:)
The regex seems to work fine, you can try it here, the problem is that when I try to use the regex with sed it just does not work:
$ sed 's/.+?(?=(\s+)(\d+)(:)(\d+)(:))(\s+)(\d+)(:)(\d+)(:)//g' testcase.txt
It will produce the same text as the input, when it should produce:
The song of songs, which is Solomon’s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.
Any help please?
Thank you very much!
This awk should do:
awk -F": *" '{print $3}' file
The song of songs, which is Solomon.s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.
To make it more secure to the number:number: use this:
awk -F"[0-9]+:[0-9]+: *" '{print $2}' file
The song of songs, which is Solomon.s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.
This will also prevent problems with : within the text.
Using Adams regex, we can shorten it some.
awk -F"([0-9]+:){2} ?" '{print $2}' file
or
awk -F"([0-9]+:){2} ?" '{$0=$2}1' file
You can use the following sed command:
sed 's/.*[0-9]\+:[0-9]\+: *//' file.txt
If you have only basic posix regexes available, you need to use the following command:
sed 's/.*[0-9]\{1,\}:[0-9]\{1,\}: \{0,\}//' file.txt
I need to use \{1,\} since the \+ and \* operator is not part of the basic posix regex specification.
Btw, if you have GNU goodies, you also use grep:
grep -oP '.*([0-9]+:){2} *\K.*' file.txt
I'm using the \K option here. \K clears the current match until this point which can be used like a lookbehind assertion - but with a variable length.
This:
sed -r 's/.*([0-9]+:){2} ?//' testcase.txt
This is the job cut was invented to do:
$ cut -d: -f3- file
The song of songs, which is Solomon’s.
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.
We therefore ought to receive such, that we might be fellowhelpers to the truth.

How to add `\macro{}` around the first occurrence of each match from a list with Regex

I have a list of words, list.txt, like this:
fish
squirrel
bird
tree
mountain
I also have a file, text.txt, with passages like this:
The fish ate the birds.
The squirrel lived in the tree on the mountain.
The fish did not like eating squirrels as they lived too high in the trees.
I need to mark the first occurrences of all of the words from list.txt in the text.txt file, with a TeX code, like, \macro{}, e.g., the output would look like this:
The \macro{fish} ate the \macro{bird}s.
The \macro{squirrel} lived in the \macro{tree}house on the \macro{mountain}.
The fish did not like eating squirrels as they lived too high in the trees.
How can I add \macro{} to the first occurrence of each of the words that appears in the list in BASH?
Code for GNU sed:
$ sed -nr 's#(\w+)#s/\1/\1/;T\1;x;s/\1/\1/;x;t\1;x;s/.*/\& \1/;x;s/\1/\\\\macro\{\1\}/;:\1;$!N#p' list.txt|sed -rf - text.txt
$ cat list.txt
fish
squirrel
bird
tree
mountain
$ cat text.txt
The fish ate the birds.
The squirrel lived in the tree on the mountain.
The fish did not like eating squirrels as they lived too high in the trees.
$ sed -nr 's#(\w+)#s/\1/\1/;T\1;x;s/\1/\1/;x;t\1;x;s/.*/\& \1/;x;s/\1/\\\\macro\{\1\}/;:\1;$!N#p' list.txt|sed -rf - text.txt
The \macro{fish} ate the \macro{bird}s.
The \macro{squirrel} lived in the \macro{tree} on the \macro{mountain}.
The fish did not like eating squirrels as they lived too high in the trees.
Good & interesting problem.
I could come up with following awk for you:
awk 'NR==FNR{a[$1]=$1;next}
{for (v in a) if (a[v] != "") {r=sub(v, "\\macro{" v "}"); if (r) a[v]=""}
}'1 list.txt text.txt
I'm still new to Awk, but this seems to work. Just beware of words like "propane" when looking for "prop" (and you can't match the exact word because "props" wouldn't be changed to "\macro{prop}s"). You'd need a better dictionary and possibly a lot more than just Awk to handle cases like that.
NR==FNR {
#Skip empty lines.
if ($0 ~ /^$/)
next;
macros[$0] = "\\macro{"$0"}";
next;
}
{
for (name in macros) {
n = name;
#Sometimes a word may have a [ in it or other special chars.
gsub(/[.[\(*+?{|^$]/, "[&]", n);
if (sub(n, macros[name]))
delete macros[name];
}
print;
}
This will preserve white space (unlike any solution that assigns to fields) and won't incorrectly match the first 2 letters of "there" when looking for "the" (unlike any solution that doesn't enclose "word" in word delimiters "<...>" or equivalent)
$ gawk 'NR==FNR{list[$0];next}
{
for (word in list)
if ( sub("\\<"word"\\>","\\macro{&}") )
delete list[word]
}
1' list.txt text.txt
The \macro{fish} ate the birds.
The \macro{squirrel} lived in the \macro{tree} on the \macro{mountain}.
The fish did not like eating squirrels as they lived too high in the trees.
The only caveat with this solution is that if "word" contains any RE meta-characters (e.g. *, +) they will be evaluated by the sub(). Since you seem to be using English words that wouldn't happen, but if it can let us know as you need a different solution.
I see you posted that partial matches actually are desirable (e.g. "the" should match the start of "theory") so then you want this:
$ awk 'NR==FNR{list[$0];next}
{
for (word in list)
if ( sub(word,"\\macro{&}") )
delete list[word]
}
1' list.txt text.txt
as long as no RE metacharacters can appear in your matching words from list.txt, or this otherwise:
$ awk 'NR==FNR{list[$0];next}
{
for (word in list)
start = index($0,word)
if ( start > 0 ) {
$0 = substr($0,1,start-1) \
"\\macro{" word "}" \
substr($0,start+length(word))
delete list[word]
}
}
1' list.txt text.txt
That last is the most robust solution as it does a string comparison rather than an RE comparison so is unaffected by RE metacharacters and also will not affect white space (which I know you said you don't care about right now).

Grep Regex - Words in brackets?

I want to know the regex in grep to match everything that isn't a specific word. I know how to not match everything that isn't a single character,
gibberish blah[^.]*jack
That would match blah, jack and everything in between as long as the in between didn't contain a period. But is it possible to do something like this?
gibberish blah[^joe]*jack
Match blah, jack and everything in between as long as the in between didn't contain the word "joe"?
UPDATE:
I can also use AWK if that would better suit this purpose.
So basically, I just want to get the sentence "gibberish blah other words jack", as long as "joe" isn't in the other words.
Update 2 (The Answer, to a different question):
Sorry, I am tired. The sentence actually can contain the word "joe", but not two of them. So "gibberish blah jill joe moo jack" would be accepted, but "gibberish blah jill joe moo joe jack" wouldn't.
Anyway, I figured out the solution to my problem. Just grep for "gibberish.*jack" and then do a word count (wc) to see how many "joes" are in that sentence. If wc comes back with 1, then it's ok, but if it comes back with 2 or more, the sentence is wrong.
So, sorry for asking a question that wouldn't even solve my problem. I will mark sputnick's answer as the right one, since his answer looks like it would solve the original posts problem.
What you're looking for is named look around, it's an advanced regex technique in pcre & perl. It's used in modern languages. grep can handle this expressions if you have the -P switch. If you don't have -P, try pcregrep instead. (or any modern language).
See
http://www.perlmonks.org/?node_id=518444
http://www.regular-expressions.info/lookaround.html
NOTE
If you just want to negate a regex, maybe a simple grep -v "regex" will be sufficient. (It depends of your needs) :
$ echo 'gibberish blah other words jack' | grep -v 'joe'
gibberish blah other words jack
$ echo 'gibberish blah joe other words jack' | grep -v 'joe'
$
See
man grep | less +/invert-match
Try the negative lookbehind syntax:
blahish blah(?<!joe)*jack

greping two regex at the same time

How to use grep to search for two regex at the same time. Say, I am looking for "My name is" and "my bank account " in a text like:
My name is Mike. I'm 16 years old.
I have no clue how to solve my grep problem,but
if I manage to solve it, then I'll transfer
you some money from my bank account.
I'd like grep to return:
My name is
my bank account
Is it possible to do it with just one grep call or should I write a script to do that for me?
If you do not care about a trailing newline, simply use grep:
< file.txt grep -o "My name is\|my bank account" | tr '\n' ' '
If you would prefer a trailing newline, use awk:
awk -v RS="My name is|my bank account" 'RT != "" { printf "%s ", RT } END { printf "\n" }' file.txt
I'm not quite sure what you're after. The result you give doesn't seem to fit with anything grep can/will do. In particular, grep is line oriented, so if it finds a match in a line, it includes that entire line in the output. Assuming that's what you really want, you can just or the two patterns together:
grep ("My name is" | "my bank account")
Given the input above, this should produce:
My name is Mike. I'm 16 years old.
you some money from my bank account.
Alternatively, since you haven't included any meta-characters in your patterns, you could use fgrep (or grep -F) and put your patterns in a file, one per line. For two patterns this probably doesn't make a big difference, but if you want to look for lots of patterns, it'll probably be quite a bit faster (it uses the Aho-Corasick string search to search for all the patterns at once instead of searching for them one at a time).
The other possibility would be that you're looking for a single line that includes both my name is and my bank account. That's what #djechlin's answer would do. From the input above, that would produce no output, so I doubt it's what you want, but if it is, his answer is fairly reasonable. An alternative would be a pattern like ("My name is.*my bank account" | "my bank account.*My name is").
Yes. It is possible. I used sed. You can replace S1 and S2 with whatever you want
sed '/S1/{ s:.*:S1:;H};/S2/{ s:.*:S2:;H};${x;s:\n: :g;p};d'
Sed is much more complex than grep, and in this case I used it to simulate grep's behaviour that you wish.
pipe. grep expr1 file | grep expr2
for or - egrep '(expr1|expr2)' file