perl - extracting function arguments from multi line function definitions - regex

eErrorT ChainCtrlInitChains(ChainCtrlT* pChainCtrl,
char* name,
int instance)
{
....
}
eErrorT ChainCtrlInit(ChainCtrlT* pChainCtrl, void* pOwner)
{
....
}
MY CODE
open(my $FILE, "< a.c") or die $!;
my #arr = <$FILE>;
foreach(#arr){
if ($_ =~ /^ \S+ \s+ \S+ \s* \( (.+?) \) /xsmg) {
my $arg = $1;
my #arr = map /(\w+)$/, split /\W*?,\W*/, $arg;
print my #temp = map "$_\n", #arr
unless $_ =~ /;\s*$/;
}
}
Basically I am extracting function arguments from the function definitions.
But in this case I am only able to extract arguments for 2nd function(ChainCtrlInit) and not for 1st function(ChainCtrlinitchains).
DESIRED OUTPUT
pChainCtrl
name
instance
pChainCtrl
pOwner
OUTPUT I AM GETTING
pChainCtrl
pOwner

The problem you have is when you do "foreach", $_ in the block inherits each element in the array.
For example,
foreach(#arr).. the first iteration of the loop assign $_ with "eErrorT ChainCtrlInitChains(ChainCtrlT* pChainCtrl,\n", so your regex will fail.
And you also use temp variables that don't have too much use. I have improved the code as the following:
my $arr = do { local $/; <$FILE> }; #Copy one of the comments above.
#note there's a slight difference in the 'while' regex to your code
while ($arr =~ /^ \S+ \s+ \S+ \s* (\( .+? \)) /xsmg) {
my #args = $1 =~ /(\w+)[,)]/g; #This assumes what you want always ends in
#a ',' or a ')', hence the gentle modification in the 'while' regex.
local $" = "\n";
say "#args";
}

The problem is that you are reading the file line by line and so the regex can never extend across multiple lines - if you load the file then make it a single string with embedded new lines it will work
eg a quick hack gives
open(my $FILE, "< a.c") or die $!;
my #arr = <$FILE>;
my $file = join('',#arr);
my #matches = $file =~ /^ \S+ \s+ \S+ \s* \( (.+?) \) /xsmg;
foreach (#matches) {
my $arg = $_;
my #arr = map /(\w+)$/, split /\W*?,\W*/, $arg;
print my #temp = map "$_\n", #arr
unless $_ =~ /;\s*$/;
print "\n";
}

#Ad-vic, in addition to #atleypnorth's solution to slurp the whole file into a string, there could be a problem within your split-map statement.
this
my #arr = map /(\w+)$/, split /\W*?,\W*/, $arg;
should be this
my #arr = map /(\w+)\W*$/, split /\W*?,\W*/, $arg;
because the last element from split gets residual characters.
Once you get the hang of Perl, you could streamline it into this -
$/ = "";
open(my $FILE, "< a.c") or die $!;
my $str = <$FILE>;
foreach ( $str =~ /^ \S+ \s+ \S+ \s* \( (.+?) \) /xsmg ) {
print map "$_\n", /(\w+) [^\w,]* (?:,|$)/xg ;
print "\n";
}

Related

A non-greedy Perl regular expression

I need to write a script which does the following:
$ cat testdata.txt
this is my file containing data
for checking pattern matching with a patt on the back!
only one line contains the p word.
$ ./mygrep5 pat th testdata.txt
this is my file containing data
for checking PATTERN MATCHING WITH a PATT ON THe back!
only one line contains the p word.
I have been able to print the line which is amended with the "a" capitalized as well. I have no idea how to only take what is needed.
I have been messing around (below is my script so far) and all I manage to return is the "PATT ON TH" part.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dump 'pp';
my ($f, $s, $t) = #ARGV;
my #output_lines;
open(my $fh, '<', $t);
while (my $line = <$fh>) {
if ($line =~ /$f/ && $line =~ /$s/) {
$line =~ s/($f.+?$s)/$1/g;
my $sub_phrase = uc $1;
$line =~ s/$1/$sub_phrase/g;
print $line;
}
#else {
# print $line;
#}
}
close($fh);
which returns: "for checking pattern matching with a PATT ON THe back!"
How can I fix this problem?
It sounds like you want to capitalize from pat to th except for instances of a surrounded by spaces. The easiest way is to uppercase the whole thing, and then fix any instances of A surrounded by spaces.
sub capitalize {
my $s = shift;
my $uc = uc($s);
$uc =~ s/ \s \K A (?=\s) /a/xg;
return $uc;
}
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
The downside is that will replacing any existing A surrounded by spaces with a. The following is more complicated, but it doesn't suffer from that problem:
sub capitalize {
my $s = shift;
my #parts = $s =~ m{ \G ( \s+ | \S+ ) }xg;
for (#parts) {
$_ = uc($_) if $_ ne "a";
}
return join('', #parts);
}
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
The rest of the code can be simplified:
#!/usr/bin/perl
use strict;
use warnings;
sub capitalize { ... }
my $f = shift;
my $s = shift;
while (<>) {
s{ ( \Q$f\E .* \Q$s\E ) }{ capitalize($1) }xseg;
print;
}
So, if you want to match each sequence that starts with pat and ends with th, non-greedily, and uppercase that sequence, you can simply use an expression on the right side of your substitution:
$line =~ s/($f.+?$s)/uc($1)/eg;
And that's it.

Perl parsing JavaScript file regex, to catch quotes only at the beginning and end of the returned string

I'm just starting to learn Perl. I need to parse JavaScript file. I came up with the following subroutine, to do it:
sub __settings {
my ($_s) = #_;
my $f = $config_directory . "/authentic-theme/settings.js";
if ( -r $f ) {
for (
split(
'\n',
$s = do {
local $/ = undef;
open my $fh, "<", $f;
<$fh>;
}
)
)
{
if ( index( $_, '//' ) == -1
&& ( my #m = $_ =~ /(?:$_s\s*=\s*(.*))/g ) )
{
my $m = join( '\n', #m );
$m =~ s/[\'\;]//g;
return $m;
}
}
}
}
I have the following regex, that removes ' and ; from the string:
s/[\'\;]//g;
It works alright but if there is a mentioned chars (' and ;) in string - then they are also removed. This is undesirable and that's where I stuck as it gets a bit more complicated for me and I'm not sure how to change the regex above correctly to only:
Remove only first ' in string
Remove only last ' in string
Remove ont last ; in string if exists
Any help, please?
You can use the following to match:
^'|';?$|;$
And replace with '' (empty string)
See DEMO
Remove only first ' in string
Remove only last ' in string
^[^']*\K'|'(?=[^']*$)
Try this .See demo.
https://regex101.com/r/oF9hR9/8
Remove ont last ; in string if exists
;(?=[^;]*$)
Try this.See demo.
https://regex101.com/r/oF9hR9/9
All three in one
^[^']*\K'|'(?=[^']*$)|;(?=[^;]*$)
See Here
You can use this code:
#!/usr/bin/perl
$str = "'string; 'inside' another;";
$str =~ s/^'|'?;?$//g;
print $str;
IDEONE demo
The main idea is to use anchors: ^ beginning of string, $ end of string and ;? matches the ";" symbol at the end only if it is present (? quantifier is making the pattern preceding it optional).EDIT: Also, ; will get removed even if there is no preceding '.
I suggest that your original code should look more like this. It is much more idiomatic Perl and I think more straightforward to follow
sub __settings {
my ($_s) = #_;
my $file = "$config_directory/authentic-theme/settings.js";
return unless -r $file;
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my #file = <$fh>;
chomp #file;
for ( #file ) {
next if m{//};
if ( my #matches = $_ =~ /(?:$_s\s*=\s*(.*))/g ) {
my $matches = join "\n", #matches;
$matches =~ tr/';//d;
return $matches;
}
}
}

perl - extract only from function definitions and not from function declarations

FILE CONTAINS FUNCTION DEFINITIONS AND FUNCTION DECLARATIONS
eErrorT ChainCtrlUpdateCameraRoute(ChainCtrlT* pChainCtrl, RouteListItemT* pNewRoute, RouteListItemT* pCurrRoute);
eErrorT ChainCtrlSetJpgSnapshotFile(ChainCtrlT* pChainCtrl, RouteListItemT* pRoute, char * dst_chain, char *jpg_file_path)
{
}
MY CODE
use strict;
use warnings;
use vars qw(#temp $index $i);
open(my $FILE, "< a.c") or die $!;
my #arr = <$FILE>;
foreach(#arr){
if($_ =~ /^ \S+ \s+ \S+ \s* \( (.+?) \) /xsmg) { # extracts function arguments
my $arg = $1;
my #arr = map /(\w+)$/, split /\W*?,\W*/, $arg;
print #temp = map "$_\n", #arr;
}
}
GIVES OUTPUT
pChainCtrl
pNewRoute
pCurrRoute
pChainCtrl
pRoute
dst_chain
jpg_file_path
OUTPUT NEEDED
pChainCtrl
pRoute
dst_chain
jpg_file_path
I need to extract arguments only from function definition(ChainCtrlSetJpgSnapshotFile) and not declaration(ChainCtrlUpdateCameraRoute).
I need to look whether the line with (..) doesn't have ";" in the same line. but I am unable to get regex command for it
Trying to understand what is the technical (lexical) difference, I see a ";" at the end of the definition, and no no ";" at the end of the declaration. In addition you have a { at the next line.
Utilizing these features (if they are consitent, fingers x'ed)
#ignore lines with ; (followed by optional spaces) at the end
print #temp = map "$_\n", #arr
unless $_ =~ /;\s*$/;
An option may be look for the { at the next line, and only print previous line, if you had a match, and this line start with a {. (to be left as an exercise...)

perl - extracting arguments from function definitions and putting it as comment above it

............
########### NEED TO PUT ARGUMENTS HERE AS COMMENT #########
eErrorT ChainCtrlInitChains(ChainCtrlT* pChainCtrl,
char* name,
int instance,
void* pOwner,
)
{
....
}
.........
i want to extract and put it above function definition as comment. There are many similar function definitions.
open(my $FILE1, "< a.c") or die $!;
#arr = <$FILE1>;
foreach(#arr){
if($_ =~ /^ \S+ \s+ \S+ \s* \( (.+?) \) /xsmg) {
my $arg = $1;
my #arr = map /(\w+)$/, split /\W*?,\W*/, $arg;
print #temp = map ' * #param[in/out] '."$_\n", #arr
unless $_ =~ /;\s*$/;
}
}
It works when I use $str, But then I can't splice the arguments as comment above the function definitions.
$str = <$FILE1>;
Here you go:
use File::Copy;
open my $FILE,'<','a.c' or die "open failed: $!\n";
$file_slurp = do { local $/;<$FILE>};
$file_slurp =~ s{ ^ ( \w+ \s+ \w+ \s* \( (.+?) \) )}{&print_args($2,$1)}xmesg;
close($FILE) or die "Couldn't close file: $!\n";
copy "a.c","a.c.bak" or die "Copy failed: $!\n";
open my $NEW_FILE,'>','a.c' or die "Truncating a.c failed: $!\n";
print $NEW_FILE $file_slurp and unlink "a.c.bak";
sub print_args {
($args,$proto) = #_;
#comments = map { ' * #param[in/out] '."$_" } split /\s*(?:^|,)\s*/,$args;
return join "\n",(#comments,$proto)
}
Test the code first by removing the unlink so that a backup copy of your source is kept on disk. When your'e confident it does what you want, you can put back the unlink so that it seems that your original file was modified in place.

Match different variant of a word using regex Perl

I am splitting sentences at individual space characters, and then matching these terms against keys of hashes. I am getting matches only if the terms are 100% similar, and I am struggling to find a perfect regex that could match several occurrences of the same word. Eg. Let us consider I have a term 'antagon' now it perfectly matches with the term 'antagon' but fails to match with antagonists, antagonistic or pre-antagonistic, hydro-antagonist etc. Also I need a regex to match occurrences of words like MCF-7 with MCF7 or MC-F7 silencing the effect of special characters and so on.
This is the code that I have till now; thr commented part is where I am struggling.
(Note: Terms in the hash are stemmed to root form of a word).
use warnings;
use strict;
use Drug;
use Stop;
open IN, "sample.txt" or die "cannot find sample";
open OUT, ">sample1.txt" or die "cannot find sample";
while (<IN>) {
chomp $_;
my $flag = 0;
my $line = lc $_;
my #full = ();
if ( $line =~ /<Sentence.*>(.*)<\/Sentence>/i ) {
my $string = $1;
chomp $string;
$string =~ s/,/ , /g;
$string =~ s/\./ \. /g;
$string =~ s/;/ ; /g;
$string =~ s/\(/ ( /g;
$string =~ s/\)/ )/g;
$string =~ s/\:/ : /g;
$string =~ s/\::/ :: )/g;
my #array = split / /, $string;
foreach my $word (#array) {
chomp $word;
if ( $word =~ /\,|\;|\.|\(|\)/g ) {
push( #full, $word );
}
if ( $Stop_words{$word} ) {
push( #full, $word );
}
if ( $Values{$word} ) {
my $term = "<Drug>$word<\/Drug>";
push( #full, $term );
}
else {
push( #full, $word );
}
# if($word=~/.*\Q$Values{$word}\E/i)#Changed this
# {
# $term="<Drug>$word</$Drug>";
# print $term,"\n";
# push(#full,$term);
# }
}
}
my $mod_str = join( " ", #full );
print OUT $mod_str, "\n";
}
I need a regex to match occurances of words like MCF-7 with MCF7 or
MC-F7
The most straightforward approach is just to strip out the hyphenss i.e.
my $ignore_these = "[-_']"
$word =~ s{$ignore_these}{}g;
I am not sure what is stored in your Value hash, so its hard to tell what you expect to happen
if($word=~/.*\Q$Values{$word}\E/i)
However, the kind of thing I imagin you want is (simplified your code somewhat)
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use 5.10.0;
use Data::Dumper;
while (<>) {
chomp $_;
my $flag = 0;
my $line = lc $_;
my #full = ();
if ( $line =~ /<Sentence.*>(.*)<\/Sentence>/i ) {
my $string = $1;
chomp $string;
$string =~ s/([,\.;\(\)\:])/ $1 /g; # squished these together
$string =~ s/\:\:/ :: )/g; # typo in original
my #array = split /\s+/, $string; # split on one /or more/ spaces
foreach my $word (#array) {
chomp $word;
my $term=$word;
my $word_chars = "[\\w\\-_']";
my $word_part = "antagon";
if ($word =~ m{$word_chars*?$word_part$word_chars+}) {
$term="<Drug>$word</Drug>";
}
push(#full,$term); # push
}
}
my $mod_str = join( " ", #full );
say "<Sentence>$mod_str</Sentence>";
}
This gives me the following output, which is my best guess at what you expect:
$ cat tmp.txt
<Sentence>This in antagonizing the antagonist's antagonism pre-antagonistically.</Sentence>
$ cat tmp.txt | perl x.pl
<Sentence>this in <Drug>antagonizing</Drug> the <Drug>antagonist's</Drug> <Drug>antagonism</Drug> <Drug>pre-antagonistically</Drug> .</Sentence>
$
perl -ne '$things{$1}++while s/([^ ;.,!?]*?antagon[^ ;.,!?]++)//;END{print "$_\n" for sort keys %things}' FILENAME
If the file contains the following:
he was an antagonist
antagonize is a verb
why are you antagonizing her?
this is an alpha-antagonist
This will return:
alpha-antagonist
antagonist
antagonize
antagonizing
Below is the a regular (not one-liner) version:
#!/usr/bin/perl
use warnings;
use strict;
open my $in, "<", "sample.txt" or die "could not open sample.txt for reading!";
open my $out, ">", "sample1.txt" or die "could not open sample1.txt for writing!";
my %things;
while (<$in>){
$things{$1}++ while s/([^ ;.,!?]*?antagon[^ ;.,!?]++)//
}
print $out "$_\n" for sort keys %things;
You may want to take another look at your assumptions on your approach. What it sounds like to me is that you are looking for words which are within a certain distance of a list of words. Take a look at the Levenshtein distance formula to see if this is something you want. Be aware, however, that computing this might take exponential time.