I am looking for a keyword in a multiline input using a regex like this,
if($input =~ /line/mi)
{
# further processing
}
The data in the input variable could be like this,
this is
multi line text
to be matched
using perl
The code works and matches the keyword line correctly. However, I would also like to obtain the line where the pattern was matched - "multi line text" - and store it into a variable for further processing. How do I go about this?
Thanks for the help.
You can grep out the lines into an array, which will then also serve as your conditional:
my #match = grep /line/mi, split /\n/, $input;
if (#match) {
# ... processing
}
TLP's answer is better but you can do:
if ($input =~ /([^\n]+line[^\n]+)/i) {
$line = $1;
}
I'd look if the match is in the multiline-String and in case it is, split it into lines and then look for the correct index number (starting with 0!):
#!/usr/bin/perl
use strict;
use warnings;
my $data=<<END;
this is line
multi line text
to be matched
using perl
END
if ($data =~ /line/mi){
my #lines = split(/\r?\n/,$data);
for (0..$#lines){
if ($lines[$_] =~ /line/){
print "LineNr of Match: " . $_ . "\n";
}
}
}
Did you try his?
This works for me. $1 represents the capture of regex inside ( and )
Provided there is only one match in one of the lines.If there are matches in multiple lines, then only the first one will be captured.
if($var=~/(.*line.*)/)
{
print $1
}
If you want to capture all the lines which has the string line then use below:
my #a;
push #a,$var=~m/(.*line.*)/g;
print "#a";
Related
I want to use grep with regex to match part of a line, then proceed to print that line and the next 2 lines. But I don’t want to print any match where the second line after the match includes another regex pattern.
Example text:
If the line was there is a loom in the gloom
would you want that line printed?
Just trying to understand if you're just
other than as part of gloom
if you really do want to exclude lines
even when loom appears on it's own elsewhere on the line
Searching for the pattern, gloom; using grep -Pn -A2 '^.*\b(gloom)\b.*$' * will print
If the line was there is a loom in the gloom
would you want that line printed?
Just trying to understand if you're just
..and
other than as part of gloom
if you really do want to exclude lines
even when loom appears on it's own elsewhere on the line
But I don’t want to print the second group that includes the word, elsewhere, on its third line.
Using Perl-regex.
Here is an example in Perl:
use v5.20.0; # signatures requires perl >= 5.20
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
{
my $lines = read_file('file.txt');
for my $i (0..$#$lines) {
my $line = $lines->[$i];
if ($line =~/\b(gloom)\b/) {
if (!match_second_pattern($lines, $i)) {
print_block($lines, $i);
}
}
}
}
sub print_block($lines, $i) {
my $N = $#$lines;
for my $j (0..2) {
last if $i+$j > $N;
print $lines->[$i+$j];
}
}
sub match_second_pattern($lines, $i) {
my $N = $#$lines;
return 0 if ($i + 2) > $N;
return $lines->[$i+2] =~ /elsewhere/;
}
sub read_file( $fn ) {
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my #lines = <$fh>;
close $fh;
return \#lines;
}
This type of problem is usually solved using negative lookahead. Unfortunately, I don't believe you can get command line grep to lookahead across line boundaries so this would take a Perl program to accomplish:
#!/usr/bin/perl
use strict;
my $s = "If the line was there is a loom in the gloom
would you want that line printed?
Just trying to understand if you're just
other than as part of gloom
if you really do want to exclude lines
even when loom appears on it's own elsewhere on the line";
while ($s =~ /^.*?\bgloom\b(?!.*\n.*\n.*?\belsewhere\b).*\n.*\n.*\n?/mg) {
print "$&";
}
See Perl Demo
See Regex Demo
If you want to have the input specified on the input line as coming in from stdin or a file, then:
#!/usr/bin/perl -w
use strict;
my $s = '';
# read from stdin or the file specified on the command line:
while (<>) {
$s .= $_ ;
}
while ($s =~ /^.*?\bgloom\b(?!.*\n.*\n.*?\belsewhere\b).*\n.*\n.*\n?/mg) {
print "$&";
}
You can use a GNU grep like
grep -oPzn '(?m)^.*?\bgloom\b(?!(?:.*\R){2}.*?\belsewhere\b)(?:.*\R){2}.*\R?' file > outputfile
Details:
-o - outputs matched texts, not just lines where the match occurred
z - now, line breaks are slurped and can be matched with regex
(?m)^ - start of a line
.*? - any zero or more chars other than line break chars, as few as possible
\bgloom\b - whole word gloom
(?!(?:.*\R){2}.*?\belsewhere\b) on the second line below gloom word, there should not be elsewhere as a whole word
(?:.*\R){2} - the rest of the current line, the next line and a line break
.*\R? - the whole (second) line with an optional line break (sequence).
See the online demo:
#!/bin/bash
s="If the line was there is a loom in the gloom
would you want that line printed?
Just trying to understand if you're just
other than as part of gloom
if you really do want to exclude lines
even when loom appears on it's own elsewhere on the line"
grep -oPzn '(?m)^.*?\bgloom\b(?!(?:.*\R){2}.*?\belsewhere\b)(?:.*\R){2}.*\R?' <<< "$s"
The following perl program has a regex written to serve my purpose. But, this captures results present within a string too. How can I only get strings separated by spaces/newlines/tabs?
The test data I used is present below:
http://sainikhil.me/stackoverflow/dictionaryWords.txt
use strict;
use warnings;
sub print_a_b {
my $file = shift;
$pattern = qr/(a|b|A|B)\S*(a|b|A|B)/;
open my $fp, $file;
my $cnt = 0;
while(my $line = <$fp>) {
if($line =~ $pattern) {
print $line;
$cnt = $cnt+1;
}
}
print $cnt;
}
print_a_b #ARGV;
You could consider using an anchor like \b: word boundary
That would help apply the regexp only after and before a word.
\b(a|b|A|B)\S*(a|b|A|B)\b
Simpler, as Avinash Raj adds in the comments:
(?i)\b[ab]\S*[ab]\b
(using the case insensitive flag or modifier)
If you have multiple words in the same line then you can use word boundaries in a regex like this:
(?i)\b[ab][a-z]*[ab]\b
The pattern code is:
$pattern = /\b[ab][a-z]*[ab]\b/i;
However, if you want to check for lines with only has a word, then you can use:
(?i)$[ab][a-z]*[ab]$
Update: for your comment * lines that begin and end with the same character*, you can use this regex:
(?i)\b([a-z])[a-z]*\1\b
But if you want any character and not letters only like above you can use:
(?i)\b(.)[a-z]*\1\b
I have the below code where I am trying to grep for a pattern in a variable. The variable has a multiline text in it.
Multiline text in $output looks like this
_skv_version=1
COMPONENTSEQUENCE=C1-
BEGIN_C1
COMPONENT=SecurityJNI
TOOLSEQUENCE=T1-
END_C1
CMD_ID=null
CMD_USES_ASSET_ENV=null_jdk1.7.0_80
CMD_USES_ASSET_ENV=null_ivy,null_jdk1.7.3_80
BEGIN_C1_T1
CMD_ID=msdotnet_VS2013_x64
CMD_ID=ant_1.7.1
CMD_FILE=path/to/abcI.vc12.sln
BEGIN_CMD_OPTIONS_RELEASE
-useideenv
The code I am using to grep for the pattern
use strict;
use warnings;
my $cmd_pattern = "CMD_ID=|CMD_USES_ASSET_ENV=";
my #matching_lines;
my $output = `cmd to get output` ;
print "output is : $output\n";
if ($output =~ /^$cmd_pattern(?:null_)?(\w+([\.]?\w+)*)/s ) {
print "1 is : $1\n";
push (#matching_lines, $1);
}
I am getting the multiline output as expected from $output but the regex pattern match which I am using on $output is not giving me any results.
Desired output
jdk1.7.0_80
ivy
jdk1.7.3_80
msdotnet_VS2013_x64
ant_1.7.1
Regarding your regular expression:
You need a while, not an if (otherwise you'll only be matching once); when you make this change you'll also need the /gc modifiers
You don't really need the /s modifier, as that one makes . match \n, which you're not making use of (see note at the end)
You want to use the /m modifier so that ^ matches the beginning of every new line, and not just the beginning of the string
You want to add \s* to your regular expression right after ^, because in at least one of your lines you have a leading space
You need parenthesis around $cmd_pattern; otherwise, you're getting two options, the first one being ^CMD_ID= and the second one being CMD_USES_ASSET_ENV= followed by the rest of your expression
You can also simplify the (\w+([\.]?\w+)*) bit down to (.+).
The result would be:
while ($output =~ /^\s*(?:$cmd_pattern)(?:null_)?(.+)/gcm ) {
print "1 is : $1\n";
push (#matching_lines, $1);
}
That being said, your regular expression still won't split ivy and jdk1.7.3_80 on its own; I would suggest adding a split and removing _null with something like:
while ($output =~ /^\s*(?:$cmd_pattern)(?:null_)?(.+)/gcm ) {
my $text = $1;
my #text;
if ($text =~ /,/) {
#text = split /,(?:null_)?/, $text;
}
else {
#text = $text;
}
for (#text) {
print "1 is : $_\n";
push (#matching_lines, $_);
}
}
The only problem you're left with is the lone line CMD_ID=null. I'm gonna leave that to you :-)
(I recently wrote a blog post on best practices for regular expressions - http://blog.codacy.com/2016/03/30/best-practices-for-regular-expressions/ - you'll find there a note to always require the /s in Perl; the reason I mention here that you don't need it is that you're not using the ones you actually need, and that might mean you weren't certain of the meaning of /s)
For example, I'm trying not to find lines of text with the following
!dkdkdkdkdk: dkdkdkdkdkdkdkadfkldllsls <------ do not find this line of code
!dksdjfslfjlk afldajfdklafjdla;fd <-------- find this line of code
I tried the following code but not working.
^!(?!:)*$
If you have a multiline input, you need to read the data in line by line and check if each line starts with ! and does not contain : with the following regex:
/^![^:]*$/
Or if you want to split conditions:
if (/^!/ and not /:/)
The character class [^:] will match any character other than a :.
Here is working sample:
#!/usr/bin/perl
use strict;
use warnings;
my #matches;
while (<DATA>) {
if (/^![^:]*$/) {
push #matches, $_;
print "$_\n";
}
}
__DATA__
!dkdkdkdkdk: dkdkdkdkdkdkdkadfkldllsls
!dksdjfslfjlk afldajfdklafjdla;fd
See IDEONE demo
The #matches will contain all the lines that passed the test.
Is there anything wrong with
if ( /^!/ and not /:/ ) { ... }
I'm using this regex in Perl to match and replace the following expressions:
_HI2_
_HI_2
HI2_
_HI_2
if ($subject =~ m/_?HI2?_?|HI2?_?/) {
# Successful match
} else {
# Match attempt failed
}
I also want to do this though:
The text is: ABCDEMAFGHIJ
This is a sequence HI in there but must be ignored because if you look left you can see that this line starts with The text is:.
The text is: ABCDEHI2FGHI
As above, two sequence of HI here.
How can I build into this regex a match and ignore it because of a line prefix?
Why not just match twice?
If $subject does not match /^The text is:/, run the replace ..
Try this regex:
/^(?!The text is:).*(?:_?HI2?_?|HI2?_?)/
Or use two matches like:
if($subject !~ /^This text is:/i && $subject =~ /_?HI2?_?|HI2?_?/)
I just discovered this brilliant resource here and the section on Perl.
You can find there details of a (*SKIP)(*F) construct which will blow your mind; your described problem as a one-liner:
cat > test.txt <<EOF
_HI2_
_HI_2xxxHI_2
The text is: ABCDEMAFGHIJ
HI2_
The text is: ABCDEHI2FGHI
_HI_2
EOF
perl -ne '/^The text is:.*$(*SKIP)(*F)|.+/ && s/_?HI_?2?_?/HAPPY/; print' test.txt
# or
perl -ne 's/(^The text is:.*$)(*SKIP)(*F)|_?HI_?2?_?/HAPPY/g; print' test.txt
I have new found love and respect for Perl; Sed is my go-to, but now I know how to skip lines (read: leave unchanged) in Perl, I will hesitate less
Try telling it is the start of the line with "^", ignore whitespaces if that you think is needed(I always tend to do it). Also you could mark the end of the string with "$"
if ($subject =~ m/^\s*_?HI2?_?|HI2?_?/) {
# Successful match
} else {
# Match attempt failed
}
Not the most elegant method but easy to understand (TIMTOWTDI :)
#!/usr/bin/perl
use strict;
use warnings;
my #text = ("ABCDEHI2FGHI", "The text is: ABCDEHI2FGHI");
for (#text) {
my $new = my_replace($_); # do the replacement
print "$new\n"; # print result
}
sub my_replace {
my ($text) = #_;
return $text if ($text =~ m/The text is:/); # return if prefixed / no replacement
$text =~ s/(_?HI2?_?|HI2?_?)/__replacement__/g; # do replace (give a replacement string here)
return $text; # return result of replacement
}
Otherwise you can use a "negative lookbehind".
To try see regex101 or debuggex.
/(?<!^The text is.*)(_?HI2?_?|HI2?_?)/