Extract words between begin and end, \G \K - regex

This pattern does the work
(?:\G(?!\A)|begin).*?\K(keyword)(?=.*end)
String:
begin
keyword
keyword
end
I get what I want (keyword keyword) in just one capture group, but if the string has this:
begin
keyword
keyword
end
keyword
end
I get three matches, How to stop in the first end ?
Can be this pattern be better, optimized?
demo regex

I would hate to run across such a regex in code. Any small change and it's broken.
I'd open a filehandle on a reference to the string then read its lines. Skip everything until you run into the starting line, then read everything up to the ending line:
use v5.26;
my $string =<<~'HERE';
begin
keyworda
keywordb
end
keywordc
end
HERE
open my $fh, '<', \$string;
while( <$fh> ) { last if /\Abegin/ }
my #keywords;
while( <$fh> ) {
last if /^end/;
chomp;
push #keywords, $_;
}
say join "\n", #keywords;
This outputs:
keyworda
keywordb
Or, break it up into two regexes. One sets the starting position, then you repeatedly match as long as the line isn't the ending line. This is a bit cleaner, but some people may be confused by the global matching in scalar context:
use v5.26;
my $string =<<~'HERE';
begin
keyworda
keywordb
end
keywordc
end
HERE
my #keywords;
if( $string =~ / ^ begin \R /gmx ) {
while( $string =~ /\G (?!end \R) (\N+) \R /gx ) {
push #keywords, $1;
}
}
say join "\n", #keywords;

Use regular expression and store match in an array
my #result = $data =~ /begin\n(.*?)\nend/sg;
then output to console
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $data = do { local $/; <DATA> };
my #result = $data =~ /begin\n(.*?)\nend/sg;
say '-' x 35 . "\n" . $_ for #result;
__DATA__
begin
keyword 1
keyword 2
end
keyword
end
keyword
begin
keyword 3
keyword 4
end
keyword
keyword
Output
-----------------------------------
keyword 1
keyword 2
-----------------------------------
keyword 3
keyword 4

You can use not equal in grouping to fetch the data from begin to end.
my #keyws = ($data=~/begin((?:(?!begin|end).)*)end/sg);
use Data::Dumper;
print Dumper #keyws;
It's my way to doing in LaTeX.

Related

Save result from flip-flop in variable?

I have about 1kB of text from STDIN
my $f = join("", <STDIN>);
and I would like to get the content between open1 and close1, so /open1/../close1/ comes to mind.
I have only seen it been used in one liners and in scripts in while-loops and $_.
Question
How can I get the result from /open1/../close1/ in my script when everything is in $f?
Capturing all matches with a single regular expression
If you want to capture all the lines between open1 and start1 markers (excluding the markers), it is easily done with a single regular expression:
my $f = join("", <STDIN>);
my #matches = ( $f =~ m/\bopen1\b(.*?)\bclose1\b/gs );
for my $m (#matches) {
print "$m";
}
where
s modifier treats the string as a single line;
g modifier captures all the matches;
(.*?) matches a group of any characters using the lazy quantifier
Using the range operator
The range operator (so-called flip-flop) is not very convenient for this task if you want to avoid capturing the markers, because an expression like /open1/ .. /close1/ returns true for the lines matching the patterns.
The expression /^open1$/ .. /^close1$/ returns false until /^open1$/ is true. The left regular expression stops being evaluated once it matches the line, and keeps returning true until /^close1$/ becomes true. When the right expression matches, the cycle is restarted. Thus, the open1 and close1 markers are included into $matches.
It is even less convenient, if the input is stored in a variable, because you will need to read the contents of the variable line by line, e.g.:
my $matches = "";
my #lines = split /\n/, $f;
foreach my $line (#lines) {
if ($line =~ m/^open1$/ .. $line =~ m/^close1$/) {
$matches .= "$line\n";
}
}
Note, it is possible to use arbitrary Perl expressions as operands of the range operator. I wouldn't recommend this code, as it is not very efficient, and not very readable. At the same time it is easy to adapt the first example to the case where the open1 and close1 markers are included into the set of matches, e.g.:
my #matches = ( $f =~ m/\bopen1\b(.*?)\bclose1\b/gs );
for my $m (#matches) {
print "open1${m}close1\n";
}
You can rewrite how $f is generated so that it takes advantage of the flip-flop inside a while loop:
my ( $f, $matched );
while ( <> ) {
$f .= $_;
$matched .= $_ if /open1/ .. /close1/;
}
Another way is to create a new inputs stream out of the contents of $f.
open my $fh, '<', \$f;
while (<$fh>) {
if (/open1/ .. /close1/) {
...
}
}
You can also employ split. To get what is between the first pair of open1 and close1
my $open_to_close = (split /open1|close1/, $f)[1];
The delimiter can be either open1 or close1, so returned is a list of three elements: before open1, between them, and after close1. We take the second element.
If there are more open1/close1 pairs take all odd-indexed elements.
Either get the array as well
my #parts = split /open1|close1/, $f;
my #all_open_to_close = #parts[ grep { $_ & 1 } 0..$#parts ];
or get it directly from the list
my #all_open_to_close =
grep { CORE::state $i; ++$i % 2 == 0 } split /open1|close1/, $f;
The state is a feature
from v5.10. If you already use that you don't need CORE:: prefix.

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

A non-greedy Perl regular expression

I need to write a script which does the following:
$ cat testdata.txt
this is my file containing data
for checking pattern matching with a patt on the back!
only one line contains the p word.
$ ./mygrep5 pat th testdata.txt
this is my file containing data
for checking PATTERN MATCHING WITH a PATT ON THe back!
only one line contains the p word.
I have been able to print the line which is amended with the "a" capitalized as well. I have no idea how to only take what is needed.
I have been messing around (below is my script so far) and all I manage to return is the "PATT ON TH" part.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dump 'pp';
my ($f, $s, $t) = #ARGV;
my #output_lines;
open(my $fh, '<', $t);
while (my $line = <$fh>) {
if ($line =~ /$f/ && $line =~ /$s/) {
$line =~ s/($f.+?$s)/$1/g;
my $sub_phrase = uc $1;
$line =~ s/$1/$sub_phrase/g;
print $line;
}
#else {
# print $line;
#}
}
close($fh);
which returns: "for checking pattern matching with a PATT ON THe back!"
How can I fix this problem?
It sounds like you want to capitalize from pat to th except for instances of a surrounded by spaces. The easiest way is to uppercase the whole thing, and then fix any instances of A surrounded by spaces.
sub capitalize {
my $s = shift;
my $uc = uc($s);
$uc =~ s/ \s \K A (?=\s) /a/xg;
return $uc;
}
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
The downside is that will replacing any existing A surrounded by spaces with a. The following is more complicated, but it doesn't suffer from that problem:
sub capitalize {
my $s = shift;
my #parts = $s =~ m{ \G ( \s+ | \S+ ) }xg;
for (#parts) {
$_ = uc($_) if $_ ne "a";
}
return join('', #parts);
}
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
The rest of the code can be simplified:
#!/usr/bin/perl
use strict;
use warnings;
sub capitalize { ... }
my $f = shift;
my $s = shift;
while (<>) {
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
print;
}
So, if you want to match each sequence that starts with pat and ends with th, non-greedily, and uppercase that sequence, you can simply use an expression on the right side of your substitution:
$line =~ s/($f.+?$s)/uc($1)/eg;
And that's it.

perl - extract only from function definitions and not from function declarations

FILE CONTAINS FUNCTION DEFINITIONS AND FUNCTION DECLARATIONS
eErrorT ChainCtrlUpdateCameraRoute(ChainCtrlT* pChainCtrl, RouteListItemT* pNewRoute, RouteListItemT* pCurrRoute);
eErrorT ChainCtrlSetJpgSnapshotFile(ChainCtrlT* pChainCtrl, RouteListItemT* pRoute, char * dst_chain, char *jpg_file_path)
{
}
MY CODE
use strict;
use warnings;
use vars qw(#temp $index $i);
open(my $FILE, "< a.c") or die $!;
my #arr = <$FILE>;
foreach(#arr){
if($_ =~ /^ \S+ \s+ \S+ \s* \( (.+?) \) /xsmg) { # extracts function arguments
my $arg = $1;
my #arr = map /(\w+)$/, split /\W*?,\W*/, $arg;
print #temp = map "$_\n", #arr;
}
}
GIVES OUTPUT
pChainCtrl
pNewRoute
pCurrRoute
pChainCtrl
pRoute
dst_chain
jpg_file_path
OUTPUT NEEDED
pChainCtrl
pRoute
dst_chain
jpg_file_path
I need to extract arguments only from function definition(ChainCtrlSetJpgSnapshotFile) and not declaration(ChainCtrlUpdateCameraRoute).
I need to look whether the line with (..) doesn't have ";" in the same line. but I am unable to get regex command for it
Trying to understand what is the technical (lexical) difference, I see a ";" at the end of the definition, and no no ";" at the end of the declaration. In addition you have a { at the next line.
Utilizing these features (if they are consitent, fingers x'ed)
#ignore lines with ; (followed by optional spaces) at the end
print #temp = map "$_\n", #arr
unless $_ =~ /;\s*$/;
An option may be look for the { at the next line, and only print previous line, if you had a match, and this line start with a {. (to be left as an exercise...)

perl insert and or replace string variable into a string

the requirement I have is to check a string and based on particular set of chars either insert or replace with prefix string
$prefix = "DV1";
Following are my source $input strings:
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
the first chars before first . can be of any length
but after the first . the chars ABC remain constant followed by any one character - these four chars will always be there in my input string.
after these 4 chars, the i/p string may have two alphanumeric chars - T2 in this case.
what needs to be done is check if $input has "T2" (can be any two alphanum chars) and if it has then replace those 2 chars with D1 (any two chars from $prefix)
if $input does not have "T2", then insert $prefix
This can be done quite straightforwardly with a single substitution. This program demonstrates
The pattern looks for the sequence .ABC followed by any non-dot character. The \K protects that part of the pattern from being changed. Then there may be two optional non-dot characters, followed by a dot. The replacement string is D1 if the two optional characters were present, or the value of $prefix if not
use strict;
use warnings;
my $prefix = 'DV1';
while (<DATA>) {
s/\.ABC[^.]\K([^.]{2})?(?=\.)/$1 ? 'D1' : $prefix/e;
print;
}
__DATA__
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
output
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
9C5.ABCSDV1.MIM.EDR2.QR
Here's the code you can try..
I am assuming that, T2 can be a string of length 2 any alphanumeric characters.. It can be A4, or 5B...
#!/perl/bin
use v5.14;
use warnings;
my $str = "9C5.ABCS.MIM.EDR2.QR";
my $str1 = "SS7.ABCWT2.RSND.LTE1.QR";
my $prefix = "DV1";
my $file = 'D:\Programming\Perl\Learning Perl\chapter_1\demo.txt';
open my $fh, '<', $file or die $!;
foreach (<$fh>) {
if (m/(^.*\.ABC\w)\w{2}\./g) {
s/(^.*\.ABC\w)\w{2}\./$1D1\./;
} else {
s/(^.*\.ABC\w)\./$1$prefix\./;
}
say; # Takes current line as default($_). We don't need to specify it.
}
Input File: -
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCEX4.VCE2.QR
Y88.ABCN5W.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
Output: -
SS7.ABCWD1.RSND.LTE1.QR # Replace T2
IT4.ABCED1.VCE2.QR # Replace X4
Y88.ABCND1.MIM.EDR2.QR # Replace 5W
9C5.ABCSDV1.MIM.EDR2.QR # Does not contains T2. Add DV1
Try the following code, and tell me if it fits your needs :
#!/usr/bin/perl -l
use strict;
use warnings;
my $text =<<EOF;
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
EOF
my $prefix = "DV1";
for (split "\n", $text) {
s/^(\w+\.ABC\w)T2/$1D1/ || s/^/$prefix/;
print;
}
OUTPUT
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
DV19C5.ABCS.MIM.EDR2.QR