\Q is ignored it seams - regex

Can someone tell me why my script below search/replace when I am using \Q$btype when it works when I hard code with center instead?
The script is suppose to insert $$ after \end{center}.
#!/usr/bin/perl
my $line = '\end{tabular}
\end{center}
end:text
';
my $btype = "center";
$line =~ s/\\end\{\Q$btype\}/\\end\{\Q$btype\}\$\$/g;
print "$line\n";

You need to stop the escaping:
$line =~ s/\\end\{\Q$btype\E\}/\\end\{$btype\}\$\$/g;
# here __^^
it could be reduce to:
$line =~ s/\\end\{\Q$btype\E\}\K/\$\$/g; # 5.10+
or
$line =~ s/(\\end\{\Q$btype\E\})/$1\$\$/g;
or
$line =~ s/\\end\{\Q$btype\E\}/$&\$\$/g;
From ThisSuitIsBlackNot's comment:
Don't use it with perl before v5.20 because of performance issue.
or
$line =~ s/(?<=\\end\{\Q$btype\E\})/\$\$/g;

Related

How to capture every match in a global regex substitution?

I realize it is possible to achieve this with a slight workaround, but I am hoping there is a simpler way (since I often make use of this type of expression).
Given the example string:
my $str = "An example: sentence!*"
A regex can be used to match each punctuation mark and capture them in an array.
Thereafter, I can simply repeat the regex and replace the matches as in the following code:
push (#matches, $1), while ($str =~ /([\*\!:;])/);
$str =~ s/([\*\!:;])//g;
Would it be possible to combine this into a single step in Perl where substitution occurs globally while also keeping tabs on the replaced matches?
You can embed code to run in your regular expression:
my #matches;
my $str = 'An example: sentence!*';
$str =~ s/([\*\!:;])(?{push #matches, $1})//g;
But with a match this simple, I'd just do the captures and substitution separately.
Yes, it's possible.
my #matches;
$str =~ s/[*!:;]/ push #matches, $&; "" /eg;
However, I'm not convinced that the above is faster or clearer than the following:
my #matches = $str =~ /[*!:;]/g;
$str =~ tr/*!:;//d;
Use:
my $str = "An example: sentence!*";
my #matches = $str =~ /([\*\!:;])/g;
say Dumper \#matches;
$str =~ tr/*!:;//d;
Output:
$VAR1 = [
':',
'!',
'*'
];
Is that what you're looking for ?
my ($str, #matches) = ("An example: sentence!*");
#first method :
($str =~ s/([\*\!:;])//g) && push(#matches, $1);
#second method :
push(#matches, $1) while ($str =~ s/([\*\!:;])//g);
Try:
my $str = "An example: sentence!*";
push(#mys, ($str=~m/([^\w\s])/g));
print join "\n", #mys;
Thanks.

How to grep capture a multiline pattern of a file in Perl

I have a file that looks something like this:
Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"
I want to grep the file for the line \attribute3= then I want to save the string found inside the quotation marks to a separate variable.
Here's what I have so far:
#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
while (my $line = <$fh>) {
if ($line =~ /\/attribute3=/g){
print $line . "\n";
}
}
That's printing out /attribute3="all*the*things*I'm*interested*in*are*inside*here** but
I want all*the*things*I'm*interested*in*are*inside*here**and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sentence.*I*think*we*have*enough*words.
So what I did next is:
#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
my $part_I_want;
while (my $line = <$fh>) {
if ($line =~ /\/attribute3=/g){
$line =~ /^/\attribute3=\"(.*?)/; # capture everything after the quotation mark
$part_I_want .= $1; # the capture group; save the stuff on line 1
# keep adding to the string until we reach the closing quotation marks
next (unless $line =~ /\"/){
$part_I_want .= $_;
}
}
}
The code above doesn't work. How do I grep capture a multiline pattern between two characters (in this case it's quotation marks)?
my $str = do { local($/); <DATA> };
$str =~ /attribute3="([^"]*)"/;
$str = $1;
$str =~ s/\n/ /g;
__DATA__
Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"
Read the entire file into a single variable and use /attribute3=\"([^\"]*)\"/ms
From the command line:
perl -n0e '/\/attribute3="(.*)"/s && print $1' foo.txt
This is basically what you had, but the 0 flag is the equivalent of undef $/ within the code. From the man page:
-0[octal/hexadecimal]
specifies the input record separator ($/) as an octal or hexadecimal number. If there are no digits, the null character is the separator.

Put regex match only into array, not entire line

I am trying to check each line of a document for a regex match.
If the line has a match, I want to push the match only into an array.
In the code below, I thought that using the g operator at the end of the regex delimiters would make $lines value the regex match only. Instead $lines value is the entire line of the document containing the match...
my $line;
my #table;
while($line = <$input>){
if($line =~ m/foo/g){
push (#table, $line);
}
}
print #table;
If any one could help me get my matches into an array, it is much appreciated.
Thanks.
p.s.
Still learning... so any explanations of concepts I may have missed is also much appreciated.
g modifier in s///g is for global search and replace.
If you just want to push matching pattern into an array, you need to capture matching pattern enclosed by (). Captured elements are stored in variable $1, $2, etc..
Try following modification to your code:
my #table;
while(my $line = <$input>){
if($line =~ m/(foo)/){
push (#table, $1);
}
}
print #table;
Refer to this documentation for more details.
Or if you want to avoid needless use of global variables,
my #table;
while(my $line = <$input>){
if(my #captures = $line =~ m/(foo)/){
push #table, #captures;
}
}
which simplifies to
my #table;
while(my $line = <$input>){
push #table, $line =~ m/(foo)/;
}
Expanding on jkshah's answer a little, I'm explicitly storing the matches in #matches instead of using the magic variable $1 which I find a little harder to read.
"__DATA__" is a simple way to store lines in a filehandle in a perl source file.
use strict;
use warnings;
my #table;
while(my $line = <DATA>){
my #matches = $line =~ m/(foo)/;
if(#matches) {
warn "found: " . join(',', #matches );
push(#table,#matches);
}
}
print #table;
__DATA__
herp de derp foo
yerp fool foo flerp
heyhey
If you file is not very big(100-500mb fine for 2 GB RAM) then you can use below.Here I am extracting numbers if matched in line.It will be much faster than the foreach loop.
#!/usr/bin/perl
open my $file_h,"<abc" or die "ERROR-$!";
my #file = <$file_h>;
my $file_cont = join(' ',#file);
#file =();
my #match = $file_cont =~ /\d+/g;
print "#match";

System command execution using Perl

I have a Perl script which runs a perforce command and stores the result in a variable $command.
Then it is stored in a file log.txt, and by using a regex the relevant data is taken out.
When I run that command alone the following things pop out:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
The code goes as follows:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
my $command = qx |p4 monitor show -ale|;
open FH, '>>', "log.txt";
print FH $command;
close FH;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while ($line = <$log_fh>) {
chomp $line;
next if not $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
my $killpro;
foreach my $kill (keys %stat) {
print "$kill\n";
}
The following gets the number 8869 but how to do it without log.txt. Is using an array a better way to do it or hash is fine?
Please correct me as I am still learning.
Seems like your main stumbling block is getting line-by-line input for your loop?
Splitting on newlines should do the trick:
my $killid;
my #lines = split("\n", $command); #split on newlines
for my $line (#lines) {
next if not $line =~ /(\d+)\s+/;
my $id = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
$killid = $id;
}
}
One caveat: you mentioned an output of 8870, but I'm getting 8869. The regexps you gave are looking for a line with "integration" and "IDLE none", and for your example input that appears to match 8869.
A hash is fine, though if you're using only one key in it (which seems to be the case), you might as well just use a single variable.
If you assign the result of a qx construct to an array instead of a scalar, then it will be split into lines automatically for you. This code demonstrates.
use strict;
use warnings;
my #lines = qx|p4 monitor show -ale|;
my %stat;
for my $line (#lines) {
chomp $line;
next unless $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
$stat{$killid} = $line;
}
}
print "$_\n" for keys %stat;

Different ways to test for $1 after regex?

Normally when I check if the regex succeeded I do
if ($var =~ /aaa(\d+)bbb(\d+)/) { # $1 and $2 should be defined now }
but I recall seeing a variation of this that seamed shorter. Perhaps it was only with one buffer.
Can anyone think or other ways to test if $1 after a successful regex?
You can avoid $1 and similar altogether:
if (my ($anum, $bnum) = $var =~ /aaa(\d+)bbb(\d+)/) {
# Work with $anum and $bnum
}
The only shorter way that I can think of is if the match is on $_. So for instance:
for (#strings) {
if (m/aaa(\d+)bbb(\d+)/) {
...
If the match succeeds then $1 and $2 will be populated.
never forget about
use strict;
use warnings;
I like plain syntax in Perl, but not in this way:
my $str = 'abc101abc';
$str =~ m/(\d+)/ and do {print $1;}
OR
$str =~ m/(\d+)/ and print $1;
OR
($str in $_, so $_ = $str;)
m/(\d+)/ and print $1;
BUT! TIMTOWTDI helps you to dream about your own style :)
I prefer old-if style.
Reading both answers, I now recall that this was what I had seen
my $str = 'abc101abc';
$str =~ m/(\d+)/;
print $1 if $1;
print $1 if $str =~ m/(\d+)/;