Regular expression not showing multiple line content - regex

I have a file with following format.
<hello>
<random1>
<random2>
....
....
....
<random100>
<bye>
I want to find whether bye and hello are there, and bye is below hello. I tried this regular expression.
grep "hello.*bye" filename
but it fails to match what I expected.

You could use pcregrep:
pcregrep -M 'hello(\n|.)*bye' filename
The -M option makes it possible to search for patterns that span line boundaries.
For your input, it'd produce:
<hello>
<random1>
<random2>
....
....
....
<random100>
<bye>

IF the input file is small enough, you can try:
grep "hello.*bye" <(tr $'\n' ' ' < filename)
This replaces all newlines with spaces and thus turns the file contents into a single line that grep searches at once.
If you'd rather simply remove newlines, use:
grep "hello.*bye" <(tr -d $'\n' < filename)

$ cat file1.txt
<hello>
<bye>
$ awk '/<hello>/ {hello=1} /<bye>/&&hello {bye=1; exit} END {exit !(hello && bye)}' \
file1.txt \
&& echo found || echo not found
found
$ cat file2.txt
<bye>
<hello>
$ awk '/<hello>/ {hello=1} /<bye>/&&hello {bye=1; exit} END {exit !(hello && bye)}' \
file2.txt \
&& echo found || echo not found
not found

Perl:
perl -0777 -lne 'print (/hello.*bye/s ? "y" : "n")'
or
perl -0777 -ne 'exit(! /hello.*bye/s)'
The -0777 options slurps the whole file as a single string. The "s" flag tells perl to allow "." to match a newline.

With GNU awk for a multi-char RS:
awk -v RS='^$' '{print (/hello.*bye/ ? "y" : "n")}'

Related

Find regular expression in a file matching a given value

I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel
As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]
My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file
This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.
This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'
As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file
You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)

bash regex multiple match in one line

I'm trying to process my text.
For example i got:
asdf asdf get.this random random get.that
get.it this.no also.this.no
My desired output is:
get.this get.that
get.it
So regexp should catch only this pattern (get.\w), but it has to do it recursively because of multiple occurences in one line, so easiest way with sed
sed 's/.*(REGEX).*/\1/'
does not work (it shows only first occurence).
Probably the good way is to use grep -o, but i have old version of grep and -o flag is not available.
This grep may give what you need:
grep -o "get[^ ]*" file
Try awk:
awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
You might need to tweak the regex between the slashes for your specific issue. Sample output:
$ awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
get.this
get.that
get.it
With awk:
awk -v patt="^get" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
printf "%s%s", $i, OFS;
print ""
}' <<< "$text"
bash
while read -a words; do
for word in "${words[#]}"; do
if [[ $word == get* ]]; then
echo -n "$word "
fi
done
echo
done <<< "$text"
perl
perl -lane 'print join " ", grep {$_ =~ /^get/} #F' <<< "$text"
This might work for you (GNU sed):
sed -r '/\bget\.\S+/{s//\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1 /g;s/ $//}' file
or if you want one per line:
sed -r '/\n/!s/\bget\.\S+/\n&\n/g;/^get/P;D' file

Extract all numbers from a text file and store them in another file

I have a text file which have lots of lines. I want to extract all the numbers from that file.
File contains text and number and each line contains only one number.
How can i do it using sed or awk in bash script?
i tried
#! /bin/bash
sed 's/\([0-9.0-9]*\).*/\1/' <myfile.txt >output.txt
but this didn't worked.
grep can handle this:
grep -Eo '[0-9\.]+' myfile.txt
-o tells to print only the matches and [0-9\.]+ is a regular expression to match numbers.
To put all numbers on one line and save them in output.txt:
echo $(grep -Eo '[0-9\.]+' myfile.txt) >output.txt
Text files should normally end with a newline characters. The use of echo above assures that this happens.
Non-GNU grep:
If your grep does not support the -o flag, try:
echo $(tr ' ' '\n' <myfile.txt | grep -E '[0-9\.]+') >output.txt
This uses tr to replace all spaces with newlines (so each number appears separately on a line) and then uses grep to search for numbers.
tr -sc '0-9.' ' ' "$file"
Will transform every string of non-digit-or-period characters into a single space.
You can also use Bash:
while read line; do
if [[ $line =~ [0-9\.]+ ]]; then
echo $BASH_REMATCH
fi
done <myfile.txt >output.txt

Regular expression to replace a word with another word on the same line unix

Let A,B,C,D are the words
Input File :
..
A/B/C/D
W/B/C/Z
L/B/C/O
..
Output file:
..
A/B/C/A
W/B/C/W
L/B/C/L
..
Replace the word D with word A one the same line, only if /B/C/ delimiter present in the line and like wise for the other lines
Any sed/awk/perl oneliner to accomplish that
This is a awk solution:
awk -F/ -v OFS=/ '$2=="B" && $3=="C" {$4=$1}1' input.txt
You can do:
sed -re 's/^([^/]*)(\/B\/C\/)([^/]*)$/\1\2\1/' file
Demo:
$ cat file
A/B/C/D
W/B/C/Z
L/B/C/O
$ sed -re 's/^([^/]*)(\/B\/C\/)([^/]*)$/\1\2\1/' file
A/B/C/A
W/B/C/W
L/B/C/L
pearl.306> echo "A/B/C/D"|awk '{split($0,a,"/");print a[1]"/"a[2]"/"a[3]"/"a[1]}'
A/B/C/A
pearl.307>
another way is:
pearl.309> echo "A/B/C/D" | awk -F"/" '{OFS="/"}{$NF=$1;print}'
A/B/C/A
pearl.310>
pearl.318> cat file1
A/B/C/D
W/B/C/Z
L/B/C/O
pearl.319> awk -F"/" '{OFS="/"}{$NF=$1;print}' file1
A/B/C/A
W/B/C/W
L/B/C/L
pearl.320>
This might work for you:
sed 's|^\(\(.\)/B/C/\).|\1\2|' file
if A/B/C/D are real words e.g. wordA/wordB/wordC/wordD, then:
sed 's/|^\(\([^/]*\)/wordB/wordC/\).*|\1\2|' file
This should do the trick. perl -p -e 's/D/A/g'
In sed sed -e 's/D/A/'
perl -pe 's#(/B/C/)(.*)#$1$`#' file
this should work +

find lines containing "^" and replace entire line with ""

I have a file with a string on each line... ie.
test.434
test.4343
test.4343t34
test^tests.344
test^34534/test
I want to find any line containing a "^" and replace entire line with a blank.
I was trying to use sed:
sed -e '/\^/s/*//g' test.file
This does not seem to work, any suggestions?
sed -e 's/^.*\^.*$//' test.file
For example:
$ cat test.file
test.434
test.4343
test.4343t34
test^tests.344
test^34534/test
$ sed -e 's/^.*\^.*$//' test.file
test.434
test.4343
test.4343t34
$
To delete the offending lines entirely, use
$ sed -e '/\^/d' test.file
test.434
test.4343
test.4343t34
other ways
awk
awk '!/\^/' file
bash
while read -r line
do
case "$line" in
*"^"* ) continue;;
*) echo "$line"
esac
done <"file"
and probably the fastest
grep -v "\^" file