Matching multiline string in file using perl regex

Matching multiline string in file using perl regex - regex

I am reading in another perl file and trying to find all strings surrounded by quotations within the file, single or multiline. I've matched all the single lines fine but I can't match the mulitlines without printing the entire line out, when I just want the string itself. For example, heres a snippet of what I'm reading in:
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
so the output I'd like is
'Hello World!';
"chmod";
"This is a fun multiple line string, please match";
but I am getting:
'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
This is the code I am using to find the strings - all file content is stored in #contents:
my #strings_found = ();
my $line;
for(#contents) {
$line .= $_;
}
if($line =~ /(['"](.?)*["'])/s) {
push #strings_found,$1;
}
print #strings_found;
I am guessing I am only getting 'Hello World!'; correctly because I am using the $1 but I am not sure how else to find the others without looping line by line, which I would think would make it hard to find the multi line string as it doesn't know what the next line is.
I know my regex is reasonably basic and doesn't account for some caveats but I just wanted to get the basic catch most regex working before moving on to more complex situations.
Any pointers as to where I am going wrong?

Couple big things, you need to search in a while loop with the g modifier on your regex. And you also need to turn off greedy matching for what's inside the quotes by using .*?.
use strict;
use warnings;
my $contents = do {local $/; <DATA>};
my #strings_found = ();
while ($contents =~ /(['"](.*?)["'])/sg) {
push #strings_found, $1;
}
print "$_\n" for #strings_found;
__DATA__
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
Outputs
'Hello World!'
"chmod"
"This is a fun
multiple line string, please match"
You aren't the first person to search for help with this homework problem. Here's a more detailed answer I gave to ... well ... you ;) finding words surround by quotations perl

regexp matching (in perl and generally) are greedy by default. So your regexp will match from 1st ' or " to last. Print the length of your #strings_found array. I think it will always be just 1 with the code you have.
Change it to be not greedy by following * with a ?
/('"*?["'])/s
I think.
It will work in a basic way. Regexps are kindof the wrong way to do this if you want a robust solution. You would want to write parsing code instead for that. If you have different quotes inside a string then greedy will give you the 1 biggest string. Non greedy will give you the smallest strings not caring if start or end quote are different.
Read about greedy and non greedy.
Also note the /m multiline modifier.
http://perldoc.perl.org/perlre.html#Regular-Expressions

Related

Perl regex can't find match

So I'm working on my own little formatting correction script that uses Perl regex for substitution, but I can't get this one to match. I've used similar matching for other fixes but this one doesn't work and I can't figure out why.
# basically takes in a string to modify and the match and substitution strings
perlRegex(){
PERL_BADLANG=0 perl -le '
$string = '"'"''"${1}"''"'"';
$string =~ s/'"${2}"'/'"${3}"'/gm;
print "$string\n";
exit';
}
LINE_BREAK='\n'
# contents is the example below
EDITED=$(cat file.txt);
EDITED=$(perlRegex "${EDITED}" '(?<='"${LINE_BREAK}"')( +)([^{]+{$'"${LINE_BREAK}"')([^\s][^;]+;$)' '$1$2$1$1$3')
My current attempt is https://regex101.com/r/vgatOd/1 which gives me the output I want.
(?<=\n)( +)([^{]+{$\n)([^\s][^;]+;$)
to
$1$2$1$1$3
(?<=\n)( +) $1: copies the spaces at the beginning of the line
([^{]+{$\n) $2: captures the remaining content of the line with ending {
([^\s][^;]+;$) $3: captures the next line without a leading spaces, with ending ;
The substitution will add the spaces twice on before the second line.
Example input:
if (debug) {
Tools.DebugLine("Log");
}
Aim is to pad the Tools line to be at the correct column:
if (debug) {
Tools.DebugLine("Log");
}
Given the regex101 does what I would like it to do, I'm perplexed as to what part of it does not work in Perl regex.

Taken approach to make java file formatting isn't error prone, you should consider a better way to achieve the desired effect.
The chosen regular expression is quite excessive and mixes \n, , \s all of which falls under \s class.
The following demo code strips regular expression for simplification.
use strict;
use warnings;
my $data = do { local $/; <DATA> };
my $re = qr/([^\n]+?\{\s+)(\S+?;)(\s+\})/;
my $indent = ' ' x 8;
$data =~ s/$re/$1${indent}$2$3/gsm;
print $data;
__DATA__
if (debug) {
Tools.DebugLine("Log");
}
Output
if (debug) {
Tools.DebugLine("Log");
}
Please see Perl regular expressions

Solved by changing to just using \n instead of $ along with them
https://regex101.com/r/poQlwm/1
(?<=\n)([ ]+)([^{]+\{\n)([^\s][^;]+;\n)

Perl grep a multi line output for a pattern

I have the below code where I am trying to grep for a pattern in a variable. The variable has a multiline text in it.
Multiline text in $output looks like this
_skv_version=1
COMPONENTSEQUENCE=C1-
BEGIN_C1
COMPONENT=SecurityJNI
TOOLSEQUENCE=T1-
END_C1
CMD_ID=null
CMD_USES_ASSET_ENV=null_jdk1.7.0_80
CMD_USES_ASSET_ENV=null_ivy,null_jdk1.7.3_80
BEGIN_C1_T1
CMD_ID=msdotnet_VS2013_x64
CMD_ID=ant_1.7.1
CMD_FILE=path/to/abcI.vc12.sln
BEGIN_CMD_OPTIONS_RELEASE
-useideenv
The code I am using to grep for the pattern
use strict;
use warnings;
my $cmd_pattern = "CMD_ID=|CMD_USES_ASSET_ENV=";
my #matching_lines;
my $output = `cmd to get output` ;
print "output is : $output\n";
if ($output =~ /^$cmd_pattern(?:null_)?(\w+([\.]?\w+)*)/s ) {
print "1 is : $1\n";
push (#matching_lines, $1);
}
I am getting the multiline output as expected from $output but the regex pattern match which I am using on $output is not giving me any results.
Desired output
jdk1.7.0_80
ivy
jdk1.7.3_80
msdotnet_VS2013_x64
ant_1.7.1

Regarding your regular expression:
You need a while, not an if (otherwise you'll only be matching once); when you make this change you'll also need the /gc modifiers
You don't really need the /s modifier, as that one makes . match \n, which you're not making use of (see note at the end)
You want to use the /m modifier so that ^ matches the beginning of every new line, and not just the beginning of the string
You want to add \s* to your regular expression right after ^, because in at least one of your lines you have a leading space
You need parenthesis around $cmd_pattern; otherwise, you're getting two options, the first one being ^CMD_ID= and the second one being CMD_USES_ASSET_ENV= followed by the rest of your expression
You can also simplify the (\w+([\.]?\w+)*) bit down to (.+).
The result would be:
while ($output =~ /^\s*(?:$cmd_pattern)(?:null_)?(.+)/gcm ) {
print "1 is : $1\n";
push (#matching_lines, $1);
}
That being said, your regular expression still won't split ivy and jdk1.7.3_80 on its own; I would suggest adding a split and removing _null with something like:
while ($output =~ /^\s*(?:$cmd_pattern)(?:null_)?(.+)/gcm ) {
my $text = $1;
my #text;
if ($text =~ /,/) {
#text = split /,(?:null_)?/, $text;
}
else {
#text = $text;
}
for (#text) {
print "1 is : $_\n";
push (#matching_lines, $_);
}
}
The only problem you're left with is the lone line CMD_ID=null. I'm gonna leave that to you :-)
(I recently wrote a blog post on best practices for regular expressions - http://blog.codacy.com/2016/03/30/best-practices-for-regular-expressions/ - you'll find there a note to always require the /s in Perl; the reason I mention here that you don't need it is that you're not using the ones you actually need, and that might mean you weren't certain of the meaning of /s)

Perl regex to find keywords and not variables

I'm trying to create a regex as following :
print $time . "\n"; --> match only print because time is a variable ($ before)
$epoc = time(); --> match only time
My regex for the moment is /(?-xism:\b(print|time)\b)/g but it match time in $time in the first example.
Check here.
I tried things like [^\$] but then it doesn't match print anymore.
(I will have more keyword like print|time|...|...)
Thanks

Parsing perl code is a common and useful teaching tool since the student must understand both the parsing techniques and the code that they're trying to parse.
However, to do this properly, the best advice is to use PPI
The following script parses itself and outputs all of the barewords. If you wanted to, you could compare the list of barewords to the ones that you're trying to match. Note, this will avoid things within strings, comments, etc.
use strict;
use warnings;
use PPI;
#my $src = do {local $/; <DATA>}; # Could analyze the smaller code in __DATA__ instead
my $src = do {
local #ARGV = $0;
local $/;
<>;
};
# Load a document
my $doc = PPI::Document->new( \$src );
# Find all the barewords within the doc
my $barewords = $doc->find( 'PPI::Token::Word' );
for (#$barewords) {
print $_->content, "\n";
}
__DATA__
use strict;
use warnings;
my $time = time;
print $time . "\n";
Outputs:
use
strict
use
warnings
use
PPI
my
do
local
local
my
PPI::Document
new
my
find
for
print
content
__DATA__

What you need is a negative lookbehind (?<!\$), it's zero-width so it doesn't "consume" characters.
(?<!\$)a means match a if not preceded with a literal $. Note that we escaped $ since it means end of string (or line depending on the m modifier).
Your regex will look like (?-xism:\b(?<!\$)(print|time)\b).
I'm wondering why you are turning off the xism modifiers. They are off by default.So just use /\b(?<!\$)(?:print|time)\b/g as pattern.
Online demo
SO regex reference

finding words surround by quotations perl

I am reading another perl file line by line and need to find any words or set of words surround by single or double quotations. This is an example of the code I am reading in:
#!/usr/bin/env perl
use strict;
use warnings;
my $string = 'Hello World!';
print "$string\n";
Basically, I need to find and print out 'Hello World!' and "$string\n".
I've read my file in fine and stored its contents in an array. From there I'm looping over each line and find the desired set of words in the quotations using regex as such:
for(#contents) {
if(/\"|\'[^\"|\']*\"|\'/) {
print $_."\n";
}
}
which gives me the following output:
my $string = 'Hello World!';
print "$string\n";
I tried splitting the contents by whitespace and then trying to find a match, but that gives me this:
'Hello
World!'
"$string\n";
I've tried numerous solutions other suggested on here but to no avail. I have also tried Text::ParseText and using parse_line, but that gives me the complete wrong output.
Any ideas that could help me?

Just need to add some capturing parenthesis to your regex, instead of printing the whole line
use strict;
use warnings;
while (<DATA>) {
if(/(["'][^"']*["'])/) {
print "$1\n";
}
}
__DATA__
#!/usr/bin/env perl
use strict;
use warnings;
my $string = 'Hello World!';
print "$string\n";
Note, there are plenty of flaws in your regex though. For example '\'' Won't match properly. Neither will "He said 'boo'". To get closer you'll have to do some balanced parenthesis checking, but there isn't going to be any perfect solution.
For a solution that is a little closer, you could use the following:
if(/('(?:(?>[^'\\]+)|\\.)*'|"(?:(?>[^"\\]+)|\\.)*")/) {
That would take care of my above exceptions and also strings like print "how about ' this \" and ' more \n";, but there are still edge cases like the use of qq{} or q{}. Not to mention strings that span more than one line.
In other words, if your goal is perfect, this project may be outside of the scope of most people's skills, but hopefully the above will be of some help.

Maybe you can have more than one "string" to capture per line, one solution could be:
while(my $line=<STDIN>) {
while( $line =~ /[\'\"](.*?)[\'\"]/g ) {
print "matched: '$1'\n";
}
}
ie, input:
#!/usr/bin/env perl
use strict;
use warnings;
my $string = 'Hello World!' . 'asdsad';
print "$string\n";
and executing the code will give you:
matched: 'Hello World!'
matched: 'asdsad'
matched: '$string\n'

Print line of pattern match in Perl regex

I am looking for a keyword in a multiline input using a regex like this,
if($input =~ /line/mi)
{
# further processing
}
The data in the input variable could be like this,
this is
multi line text
to be matched
using perl
The code works and matches the keyword line correctly. However, I would also like to obtain the line where the pattern was matched - "multi line text" - and store it into a variable for further processing. How do I go about this?
Thanks for the help.

You can grep out the lines into an array, which will then also serve as your conditional:
my #match = grep /line/mi, split /\n/, $input;
if (#match) {
# ... processing
}

TLP's answer is better but you can do:
if ($input =~ /([^\n]+line[^\n]+)/i) {
$line = $1;
}

I'd look if the match is in the multiline-String and in case it is, split it into lines and then look for the correct index number (starting with 0!):
#!/usr/bin/perl
use strict;
use warnings;
my $data=<<END;
this is line
multi line text
to be matched
using perl
END
if ($data =~ /line/mi){
my #lines = split(/\r?\n/,$data);
for (0..$#lines){
if ($lines[$_] =~ /line/){
print "LineNr of Match: " . $_ . "\n";
}
}
}

Did you try his?
This works for me. $1 represents the capture of regex inside ( and )
Provided there is only one match in one of the lines.If there are matches in multiple lines, then only the first one will be captured.
if($var=~/(.*line.*)/)
{
print $1
}
If you want to capture all the lines which has the string line then use below:
my #a;
push #a,$var=~m/(.*line.*)/g;
print "#a";

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Matching multiline string in file using perl regex - regex

Related

Perl regex can't find match

Perl grep a multi line output for a pattern

Perl regex to find keywords and not variables

finding words surround by quotations perl

Print line of pattern match in Perl regex

Categories

Resources