How to use variable in regex?

How to use variable in regex? - regex

How do I replace regex with $var in this command?
echo "$many_lines" | sed -n '/regex/{g;1!p;};h'
$var could look like fs2#auto-17.
The sed command will output the line immediately before a regex, but not the line containing the regex.
If all this can be done easier with a Perl one-liner, then it is fine with me.

It is not beautiful, but this gives me the previous line to $var which is want I wanted.
echo "$many_lines" | grep -B1 "$var" | grep -v "$var"

In Perl regexes, you can interpolate variable contents into regexes like /$foo/. However, the contents will be interpreted as a pattern. If you want to match the literal content of $foo, you have to escape the metacharacters: $safe = quotemeta $foo; /$safe/. This can be shortended to /\Q$foo\E/, which is what you usually want. A trailing \E is optional.
I don't know if the sed regex engine has a similar feature.
A Perl one-liner: perl -ne'$var = "..."; print $p if /\Q$var/; $p=$_'

Use double quotes instead of single quotes to allow variable expansion:
echo $many_lines | sed -n "/$var/"'{g;1!p;};h'

Since you are looking for a line before the regex, with a single one liner it will not be that trivial and beautiful, but here is how I will do it (Using Perl only):
echo "$many_lines" | perl -nle 'print $. if /\Q$var/' | while read line_no; do
export line_no
echo $many_lines | perl -nle 'print if $. - 1 == $ENV{line_no}'
done
or if you want to do in one line
echo "$many_lines" | perl -nle 'BEGIN {my $content = ""; } $content .= $_; END { while ($content =~ m#([^\n]+)\n[^\n]+\Q$var#gosm) { print $1 }}'
Or this one, should definitely match:
echo "$many_lines" | perl -nle 'BEGIN {my #contents; } push #contents, $_; if ($contents[-1] =~ m#\Q$var#o)') { print $contents[-2] if defined $contents[-2]; }

Or you can use here-documents too, if you don't want to escape the double quotes!
In Perl it looks like this:
$heredoc = <<HEREDOC;
here is your text and $var is your parameter
HEREDOC
Its important to end the heredoc with the same string you began, in my example its "HEREDOC" in a new line!

Related

Bash regex =~ doesn’t support multiline mode?

using =~ operator to match output of a command and grab group from it. Code is as follows:
Comamndout=$(cmd) Match=‘^hello world’ If $Comamndout =~ $Match; then
echo something fi
Commandout is in pattern
Something
Hello world
But if statement is failing.
Is bash regex support multiline search with everyline start with ^ and end with $.

No, the =~ operator doesn't perform a multiline search. A newline must be matched literally:
string=$(cmd)
regexp='(^|'$'\n'')hello world'
if [[ $string =~ $regexp ]]; then
echo matches
fi

=~ would treat multiple lines as one line.
if [[ $(echo -e "abc\nd") =~ ^a.*d$ ]]; then
echo "find a string '$(echo -e "abc\nd")' that starts with a and ends with d"
fi
Output:
find a string 'abc
d' that starts with a and ends with d
P.S.
When processing multiple lines, it is common to use grep or read with either re-direct or pipeline.
For a grep and pipeline example:
# to find a line start with either a or e
echo -e "abc\nd\ne" | grep -E "^[ae]"
Output:
abc
e
For a read and redirect example:
while read line; do
if [[ $line =~ ^a} ]] ; then
echo "find a line '${line}' start with a"
fi
done <<< $(echo -e "abc\nd\ne")
Output:
find a line 'abc' start with a
P.S.
-e of echo means translate following \n into new line. -E of grep means using the extended regular expression to match.

How to match fields regex in bash

Made a regex that matches the field I want to assign to my variable in bash:
The regex is:
(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)
and the substring I am interested about is $3 (group 3)
Could anyone please give me command line to assign the substring to my variable?
Example:
MYVARIABLE=$(echo $FULLSTRING | grep -oP '(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)'
But this example obviously did not work
Thanks a lot

You may extract the Group 3 value using Bash regex matching:
text="1.23.23.45 This is what I want"
rx='(,? ?(\.?[0-9]{1,3}){4})+ (.*)'
if [[ $text =~ $rx ]]; then
echo "${BASH_REMATCH[3]}"
else
echo "No match!"
fi
See the online Bash demo printing This is what I want.
If there is a regex match (if [[ $text =~ $rx ]]), the contents of Group 3 are in "${BASH_REMATCH[3]}".

If you have Perl installed, then you can match against your regex and print the field you want:
MYVARIABLE=$(echo $FULLSTRING | perl -nE '/(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)/;say $3')
Example:
FULLSTRING=', .123.4.5.6 matchthis'
MYVARIABLE=$(echo $FULLSTRING | perl -nE '/(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)/;say $3')
echo $MYVARIABLE
Outputs: matchthis

One-liner to print all lines between two patterns

Using one line of Perl code, what is the shortest way possible to print all the lines between two patterns not including the lines with the patterns?
If this is file.txt:
aaa
START
bbb
ccc
ddd
END
eee
fff
I want to print this:
bbb
ccc
ddd
I can get most of the way there using something like this:
perl -ne 'print if (/^START/../^END/);'
That includes the START and END lines, though.
I can get the job done like this:
perl -ne 'if (/^START/../^END/) { print unless (/^(START)|(END)/); };' file.txt
But that seems redundant.
What I'd really like to do is use lookbehind and lookahead assertions like this:
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
But that doesn't work and I think I've got something just a little bit wrong in my regex.
These are just some of the variations I've tried that produce no output:
perl -ne 'print if (/^(?<=START)/../^.*$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=^END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/s);' file.txt

Read the whole file, match, and print.
perl -0777 -e 'print <> =~ /START.*?\n(.*?)END.*?/gs;' file.txt
May drop .*? after START|END if alone on line.
Then drop \n for a blank line between segments.
Read file, split line by START|END, print every odd of #F
perl -0777 -F"START|END" -ane 'print #F[ grep { $_ & 1 } (0..$#F) ]' file.txt
Use END { } block for extra processing. Uses }{ for END { }.
perl -ne 'push #r, $_ if (/^START/../^END/); }{ print "#r[1..$#r-1]"' file.txt
Works as it stands only for a single such segment in the file.

It seems kind of arbitrary to place a single-line restriction on this, but here's one way to do it:
$ perl -wne 'last if /^END/; print if $p; $p = 1 if /^START/;' file.txt

perl -e 'print split(/.*START.|END.*/s, join("", <>))' file.txt
perl -ne 'print if /START/../END/' file.txt | perl -ne 'print unless $.==1 or eof'
perl -ne 'print if /START/../END/' file.txt | sed -e '$d' -n -e '1\!p'

I don't see why you are so insistent on using lookarounds, but here are a couple of ways to do it.
perl -ne 'print if /^(?=START)/../^(?=END)/'
This finds the terminators without actually matching them. A zero-length match which satisfies the lookahead is matched.
Your lookbehind wasn't working because it was trying to find beginning of line ^ with START before it on the same line, which can obviously never match. Factor the ^ into the zero-width assertion and it will work:
perl -ne 'print if /(?<=^START)/../(?<=^END)/'
As suggested in comments by #ThisSuitIsBlackNot you can use the sequence number to omit the START and END tokens.
perl -ne '$s = /^START/../^END/; print if ($s>1 && $s !~ /E0/)'
The lookarounds don't contribute anything useful so I did not develop those examples fully. You can adapt this to one of the lookaround examples above if you care more about using lookarounds than about code maintainability and speed of execution.

Perl equivalent of grep -Eo

In shell scripting, grep -Eo {regex} {file} returns the matched part of the regex. For example:
$ echo \
'For support, visit <http://www.example1.com/support>
You can also visit <http://www.example2.com/products> for information.'
| grep -Eo 'http://[a-z0-9_.-]+/'
http://www.example1.com/
http://www.example2.com/
How would I do this with Perl?

Here are two ways:
In Perl, the special variable $& contains the matched part of the regular expression.
perl -ne 'print "$&\n" if m#http://[a-z0-9_.-]+/#' < input
If your regular expression contains capture groups, the patterns matched by those groups are assigned to the variables $1, $2, ...
perl -ne 'print "$1\n" if m#(http://[a-z0-9_.-]+/)#' < input

To get -o functionality I suggest the following:
echo abcdabcd | perl -lne 'while ($_ =~ s/(bc)//){print $1}'
bc
bc
echo abcdabcd | grep -Eo 'bc'
bc
bc
But for you example I suggest perl -pe 's|(http://[\w-\.]+/).*|$1|g':
echo 'http://www.example1.com/support' | perl -pe 's|(http://[\w-\.]+/).*|$1|g'
http://www.example1.com/

Problem with perl multiline matching

I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm seeing:
ABCD START
STOP EFGH
I expected the following to work but it doesn't end up replacing anything:
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
After doing some experimenting I found that the \s+ in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+ doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:
perl -pi -e 's/START\s+/START/s' input.txt
This creates the following intermediate file:
ABCD START STOP EFGH
Then I can run the original regex (although the /s is no longer needed):
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
This creates the final, desired file:
ABCD REPLACE EFGH
It seems like the intermediate step should not be necessary. Am I missing something?

You were close. You need either -00 or -0777:
perl -0777 -pi -e 's/START\s+/START/' input.txt

perl -p processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.
A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p):
$/ = undef;
$file = <>;
$file =~ s/START\s+STOP/REPLACE/sg;
print $file;
Note, I have added the /g modifier to specify global replacement.
As a shortcut for all that extra boilerplate, you can use your existing script with the -0777 option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'. Adding /g is still needed if you may need to make multiple replacements within the file.
A hiccup that you might run into, although not with this regex: if the regex were START.+STOP, and a file contains multiple START/STOP pairs, greedy matching of .+ will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?.
If you want to use the ^ and $ anchors for line boundaries anywhere in the string, then you also need the /m regex modifier.

A relatively simple one-liner (reading the file in memory):
perl -pi -e 'BEGIN{undef $/;} s/START\s+STOP/REPLACE/sg;' input.txt
Another alternative (not so simple), not reading the file in memory:
perl -ni -e '$a.=$_; \
if ( $a =~ s/START\s+STOP/REPLACE/s ) { print $a; $a=""; } \
END{$a && print $a}' input.txt

perl -MFile::Slurp -e '$content = read_file(shift); $content =~ s/START\s+STOP/REPLACE/s; print $content' input.txt

Here's a one-liner that doesn't read the entire file into memory at once:
perl -i -ne 'if (($x = $last . $_) =~ s/START\n\s*STOP/REPLACE/) \
{ print $x; $last = ""; } else { print $last; $last = $_; } \
print $last if eof ARGV' input.txt

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to use variable in regex? - regex

It is not beautiful, but this gives me the previous line to $var which is want I wanted. echo "$many_lines" | grep -B1 "$var" | grep -v "$var"

Use double quotes instead of single quotes to allow variable expansion: echo $many_lines | sed -n "/$var/"'{g;1!p;};h'

Or you can use here-documents too, if you don't want to escape the double quotes! In Perl it looks like this: $heredoc = <<HEREDOC; here is your text and $var is your parameter HEREDOC Its important to end the heredoc with the same string you began, in my example its "HEREDOC" in a new line!

Related

Bash regex =~ doesn’t support multiline mode?

How to match fields regex in bash

One-liner to print all lines between two patterns

Perl equivalent of grep -Eo

Problem with perl multiline matching

Categories

Resources