Perl: identification of unknown code

Perl: identification of unknown code - regex

I have to include a perl script into a web system for my work, however the original author is no longer here and no one else in the office knows perl.
I've only first dealt with perl this morning and i'm stuck trying to figure out a couple lines that maybe someone might be able to help with
1] $customer = $q->param('account') || '';
2] $customer =~ s/[^\d]//g;
3] $customer ||= '';
4] if( $customer and ( $customer =~ /^10\d{5}$/ or $customer eq '1' ) ) {
5] $no_error = 1;
6] }
I found out that line 2 removes all non-numeric values, but I am not so sure how that statement actually functions..?
Again, i'm unsure to what =~ /^10\d{5}$/ means.
Line 3 is the main one i cannot figure out, i'm used to || meaning logical OR.

1) Accept the parameter if it's a value that Perl considers "true", otherwise, an empty string.
2) A regex substitution, eliminating any non-digits found in $customer.
3) If $customer is a "false" Perl value, set it to an empty string.
4) If $customer is a 'true' value, and either a 7-digit number starting with 10, or the string, '1'...
5) Set $no_error to the numeric value 1.
6) Close a block.
The || and ||= operators are explained in perldoc perlop. In some cases, they're a lurking bug because "0" may be a legitimate value for the parameter, yet would trigger the 'or' clause, which is one reason why the // and //= operators were introduced in Perl 5.10. Of course if the current code isn't broken in its use of ||, don't introduce a new bug by "fixing" it. ;)
Regular expressions are explained in perlre, perlrequick, perlretut, and perlop.
What constitutes true and false values is described in perlintro, perlsyn and perldata.

$customer ||= '';
is same as
$customer = $customer || '';
or
if (!$customer) { $customer = ""; }

Related

Converting string into mathematical function in Tcl

Let's say I have a string, maybe "Enable && Signal" for simplicity's sake.
I'd like to convert this string to standard && operations in tcl, such that Enable && Signal would return 0 if any of the value is false and 1 only when both are true.
Is there an easy way to do this, As for my case i would need a generic method where the number of arguments can be any and perform logical/relational operations like && || == <= > != etc
Any help and insights would be very much appreciated.
Thanks
I initially tried to split the arguments into conditions list and data list but could not handled the precedence of operations. Like == need to be done first and later && operations for n^n combinations

I'm assuming that in your example, Enable and Signal are Tcl variables. So, all that would be needed to be able to pass the string to expr is to prepend a '$' to all identifiers. That can be done with regsub as follows:
set str "Enable && Signal"
regsub -all {\m[A-Za-z]\w*\M} $str {$&} expr
set result [expr $expr]
Due to the \m\M, This will properly leave numbers like 1e3 alone. But this method falls short if you also want to be able to use functions, like sin(x). If that is also a requirement, a negative lookahead may help:
set str "sin(x) * cos(y)"
regsub -all {\m[A-Za-z]\w*\M(?!\()} $str {$&} expr
puts $expr
This produces: sin($x) * cos($y)

Why is division parsed as regular expression?

This is part of my code:
my $suma = U::round $item->{ suma }; # line 36
$ts += $suma;
$tnds += U::round $suma /6;
}
return( $ts, $tnds );
}
sub create { #line 46
my( $c ) = shift;
my $info = $c->req->json;
my $header = #$info[0];
my $details = #$info[1];
my $agre = D::T Agreement => $header->{ agreement_id };
my( $total_suma, $total_nds ) = total( $details );
my $saldo = 0;
my $iid = #$details[0]->{ period };
my $interval = D::T Period => $iid //7; # line 58
# This is first Invoice if operator do not provide activation date
my $is_first = !$details->[0]{valid_from} && $iid && $interval;
When this module is loaded I gen an error:
Can't load application from file "lib/MaitreD/Controller/ManualDocument.pm line 38, near "my $interval = D::T Period => $iid /"
Unknown regexp modifier "/6" at lib/MaitreD/Controller/ManualDocument.pm line 38, at end of line
Global symbol "$pkg" requires explicit package name (did you forget to declare "my $pkg"?) at lib/MaitreD/Controller/ManualDocument.pm line 41.
...
Is this indirect object call guilty?
Because when I put parentheses at U::round( $suma /6 ) there is no errors

Here are some thoughts on this, and a plausible explanation. A simple reproduction
perl -wE'sub tt { say "#_" }; $v = 7; tt $v /3'
gives me
Search pattern not terminated at -e line 1.
So it tries to parse a regex in that subroutine call, as stated, and the question is: why?
With parenthesis around argument(s) it works as expected. With more arguments following it it fails the same way, but with arguments preceding it it works
perl -wE'sub tt { say "#_" }; $v = 7; tt $v /3, 3' # fails the same way
perl -wE'sub tt { say "#_" }; $v = 7; tt 3, $v /3' # works
Equipping the tt sub with a prototype doesn't change any of this.
By the error it appears that the / triggers the search for the closing delimiter and once it's not found the whole thing fails. So why is this interpreted as a regex and not division?
It seems that tt $v are grouped in parsing, and interpreted as a sub and its arguments, since they're followed by a space; then /3 is taken separately and then that does look like a regex.† That would still fail as a syntax error but perhaps the regex parsing failure comes first.
Then the difference between other comma-separated terms coming before or after is clear -- with tt 3, ... the following $v /3 is a term for the next argument, and is parsed as division.
This still leaves another issue. All builtins that I tried don't have this problem, be they list or unary operators, with a variety of prototypes (push, chr, splice, etc) -- except for print, which does have the same looking problem. And which fails both with and without parens.
perl -wE'$v=110; say for unpack "A1A1", $v /2' #--> 5 5
perl -wE'$v=200; say chr $v /2' #--> d
perl -wE'$v=3; push #ary, $v /2; say "#ary"' #--> 1.5
perl -wE'$v = 7; say $v /3' # fails, the same way
perl -wE'$v = 7; say( $v /3 )' # fails as well, same way
A difference is that print obeys "special" parsing rules, and which allow the first argument to be a filehandle. (Also, it has no prototype but that doesn't appear to matter.)
Then the expression print $v /3... can indeed be parsed as print filehandle EXPR, and the EXPR starting with / is parsed as a regex. The same works with parenthesis.‡
All this involves some guesswork as I don't know how the parser does it. But it is clearly a matter of details of how a subroutine call is parsed, what (accidentally?) includes print as well.
An obvious remedy of using parens on (user-defined) subroutines is reasonable in my view. The other fix is to be consistent with spaces around math operators, to either not have them on either side or to use them on both sides -- that is fine as well, even as it's itchy (spaces? really?).
I don't know what to say about there being a problem with say( $v /3 ) though.
A couple more comments on the question.
By the text of the error message in the question, Unknown regexp modifier "/6", it appears that there the / is taken as the closing delimiter, unlike in the example above. And there is more in that message, which is unclear. In the end, we do have a very similar parsing question.
As for
Is this indirect object call guilty?
I don't see an indirect object call there, only a normal subroutine call. Also, the example from this answer displays very similar behavior and rules out the indirect object syntax.
† Another possibility may be that $v /3 is parsed as a term, since it follows the (identifiable!) subroutine name tt. Then, the regex binding operator =~ binds more tightly than the division, and here it is implied by clearly attempting to bind to $_ by default.
I find this less likely, and it also can't explain the behavior of builtins, print in particular.
‡
Then one can infer that other builtins with an optional comma-less first argument (and so without a prototype) go the same way but I can't readily think of any.

Perl thinks that the symbol / is a start of a regular expression and not a division operator. https://perldoc.perl.org/perlre - You can check the perldoc for regular expressions.
You can try adding a whitespace character before 6 like so: $tnds += U::round $suma / 6;

Regex : conditions in captured variables

This is my data (in a file):
5807035;Fab;2015/01/05;04;668100;18:06:01,488;18:06:02,892
5807028;Opt;2015/01/05;04;836100;17:12:45,223;17:12:47,407
5807028;Fab;2015/01/05;04;836100;17:12:47,470;17:12:48,172
5807027;Opt;2015/01/05;04;926100;17:12:31,807;17:12:34,365
5807027;Fab;2015/01/05;04;926100;17:12:34,443;17:12:37,095
5807026;Opt;2015/01/05;04;682100;17:12:11,698;17:12:19,062
5807026;Fab;2015/01/05;04;682100;17:12:19,124;17:12:21,667
5807025;Opt;2015/01/05;04;217100;17:12:00,669;17:12:02,635
This is my Perl code :
while ( $data =~ m/(\d+);(Opt|Fab);(.+);(\d{2});(.+);(.+);(.+)\n(\d+);(Opt|Fab);.+;\d{2};.+;(.+);(.+)\n/g ) {
if ( "$1" eq "$8" && "$2" ne "$9" ) {
print OUTFILE "$1;$3;$4;$5;$6;$7;$10;$11\n";
}
}
The lines 1 and 2 match the regex, but do not satisfy the condition of the if statement. That's fine.
On the other hand, the lines 2 and 3 satisfy the regex, AND the condition of the if statement. However, it these lines are not retrieved.
I suppose it's because the regex read two lines, then the next two lines, etc. I think I should include the condition of the if statement in the regex (if I'm not mistaken).
What do you guys think ?
The variable $data holds the content of my CSV file.

Since you want to check line 1 & 2, then 2 & 3, you need to prevent the regex engine from consuming the 2nd line by placing the regex to match the second line in a look-ahead:
while ( $data =~ m/(\d+);(Opt|Fab);(.+);(\d{2});(.+);(.+);(.+)\n(?=(\d+);(Opt|Fab);.+;\d{2};.+;(.+);(.+)\n)/g ) {
I didn't think too much when I first answer, but as #ThisSuitIsBlackNot suggested in the comment, using regular expression to parse CSV results in low maintainability code. Using CSV library to parse the data and process them is a better idea here.

Perl - Regexp to manipulate .csv

I've got a function in Perl that reads the last modified .csv in a folder, and parses it's values into variables.
I'm finding some problems with the regular expressions.
My .csv look like:
Title is: "NAME_NAME_NAME"
"Period end","Duration","Sample","Corner","Line","PDP OUT TOTAL","PDP OUT OK","PDP OUT NOK","PDP OUT OK Rate"
"04/12/2014 11:00:00","3600","1","GPRS_OUT","ARG - NAME 1","536","536","0","100%"
"04/12/2014 11:00:00","3600","1","GPRS_OUT","USA - NAME 2","1850","1438","412","77.72%"
"04/12/2014 11:00:00","3600","1","GPRS_OUT","AUS - NAME 3","8","6","2","75%"
.(ignore this dot, you will understand later)
So far, I've had some help to parse the values into some variables, by:
open my $file, "<", $newest_file
or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {
my ($date_time, $duration, $sample, $corner, $country_name, $pdp_in_total, $pdp_in_ok, $pdp_in_not_ok, $pdp_in_ok_rate)
= parse_line ',', 0, $line;
my ($date, $time) = split /\s+/, $date_time;
my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
print "$date, $time, $country, $name, $pdp_in_total, $pdp_in_ok_rate";
}
The problems are:
I don't know how to make the first AND second line (that are the column names from the .csv) to be ignored;
The file sometimes come with 2-5 empty lines in the end of the file, as I show in my sample (ignore the dot in the end of it, it doesn't exists in the file).
How can I do this?

When you have a csv file with column headers and want to parse the data into variables, the simplest choice would be to use Text::CSV. This code shows how you get your data into the hash reference $row. (I.e. my %data = %$row)
use strict;
use warnings;
use Text::CSV;
use feature 'say';
my $csv = Text::CSV->new({
binary => 1,
eol => $/,
});
# open the file, I use the DATA internal file handle here
my $title = <DATA>;
# Set the headers using the header line
$csv->column_names( $csv->getline(*DATA) );
while (my $row = $csv->getline_hr(*DATA)) {
# you can now access the variables via their header names, e.g.:
if (defined $row->{Duration}) { # this will skip the blank lines
say $row->{Duration};
}
}
__DATA__
Title is: "NAME_NAME_NAME"
"Period end","Duration","Sample","Corner","Line","PDP IN TOTAL","PDP IN OK","PDP IN NOT OK","PDP IN OK Rate"
"04/12/2014 10:00:00","3600","1","GRPS_INB","CHN - Name 1","1198","1195","3","99.74%"
"04/12/2014 10:00:00","3600","1","GRPS_INB","ARG - Name 2","1198","1069","129","89.23%"
"04/12/2014 10:00:00","3600","1","GRPS_INB","NLD - Name 3","813","798","15","98.15%"
If we print one of the $row variables with Data::Dumper, it shows the structure we are getting back from Text::CSV:
$VAR1 = {
'PDP IN TOTAL' => '1198',
'PDP IN NOT OK' => '3',
'PDP IN OK' => '1195',
'Period end' => '04/12/2014 10:00:00',
'Line' => 'CHN - Name 1',
'Duration' => '3600',
'Sample' => '1',
'PDP IN OK Rate' => '99.74%',
'Corner' => 'GRPS_INB'
};

open ...
my $names_from_first_line = <$file>; # you can use them or just ignore them
while($my line = <$file>) {
unless ($line =~ /\S/) {
# skip empty lines
next;
}
..
}
Also, consider using Text::CSV to handle CSV format

1) I don't know how to make the first line (that are the column names from the .csv) to be ignored;
while ( my $line = <$file> ) {
chomp $line;
next if $. == 1 || $. == 2;
2) The file sometimes come with 2-5 empty lines in the end of the file, as I show in my sample (ignore the dot in the end of it, it doesn't exists in the file).
while ( my $line = <$file> ) {
chomp $line;
next if $. == 1 || $. == 2;
next if $line =~ /^\s*$/;

You know that the valid lines will start with dates. I suggest you simply skip lines that don't start with dates in the format you expect:
while ( my $line = <$file> ) {
warn qq(next if not $line =~ /^"\d{2}-\d{2}-d{4}/;); # Temp debugging line
next if not $line =~ /^"\d{2}-\d{2}-d{4}/;
warn qq($line matched regular expression); # Temp debugging line
...
}
The /^"\d{2}-\d{2}-d{4}",/ is a regular expression pattern. The pattern is between the /.../:
^ - Beginning of the line.
" - Quotation Mark.
\d{2} - Followed by two digits.
- - Followed by a dash.
\d{2] - Followed by two more digits.
- - Followed by a dash.
\d{4} - Followed by four more digits
This should be describing the first part of your line which is the date in MM-DD-YYYY format surrounded by quotes and followed by a comma. The =~ tells Perl that you want the thing on the left to match the regular expression on the right.
Regular expressions can be difficult to understand, and is one of the reasons why Perl has such a reputation of being a write-only language. Regular expressions have been likened to sailor cussing. However, regular expressions is an extremely powerful tool, and worth the effort to learn. And with some experience, you'll be able to easily decode them.
The next if... syntax is similar to:
if (...) {
next;
}
Normally, you shouldn't use post-fix if and never use unless (which is if's opposite). They can make your program more difficult to understand. However, when placed right after the opening line of a loop like this, they make a clear statement that you're filtering out lines you don't want. I could have written this (and many people would argue this is preferable):
next unless $line =~ /^"\d{2}-\d{2}-d{4}",/;
This is saying you want to skip lines unless they match your regular expression. It's all a matter of personal preference and what do you think is easier for the poor schlub who comes along next year and has to figure out what your program is doing.
I actually thought about this and decided that if not ... was saying that I expect almost all lines in the file to match my format, and I want to toss away the few exceptions. To me, next unless ... is saying that there are some lines that match my regular expression, and many lines that don't, and I want to only work on lines that match.
Which gets us to the next part of programming: Watching for things that will break your program. My previous answer didn't do a lot of error checking, but it should. What happens if a line doesn't match your format? What if the split didn't work? What if the fields are not what I expect? You should really check each statement to make sure it actually worked. Almost all functions in Perl will return a zero, a null string, or an undef if they don't work. For example, the open statement.
open my $file, "<", $newest_file
or die qq(Cannot open file "$newest_file" for reading.);
If open doesn't work, it returns a file handle value of zero. The or states that if open doesn't return a non-zero file handle, execute the line that follows which kills your program.
So, look through your program, and see any place where you make an assumption that something works as expected and think what happens if it didn't. Then, add checks in your program to something if you get that exception. It could be that you want to report the error or log the error and skip to the next line. It could be that you want your program to come to a screeching halt. It could be that you can recover from the error and continue. What ever you do, check for possible errors (especially from user input) and handle possible errors.
Debugging
I told you regular expressions are tricky. Yes, I made a mistake assuming that your date was a separate field. Instead, it's followed by a space then the time which means that the final ", in the regular expression should not be there. I've fixed the above code. However, you may still need to test and tweak. Which brings us into debugging in Perl.
You can use warn statements to help debug your program. If you copy a statement, then surround it with warn qq(...);, Perl will print out the line (filling out variables) and the line number. I even create macros in my various editors to do this for me.
The qq(...) is a quote like operator. It's another way to do double quotes around a string. The nice thing is that the string can contain actual quotation marks, and the qq(...); will still work.
Once you've finished debugging, you can search for your warn statements and delete them. Perl comes with a powerful built in debugger, and many IDEs integrate with it. However, sometimes it's just easier to toss in a few warn statements to see what's going on in your code -- especially if you're having issues with regular expressions acting up.

Perl: "Quantifier in {,} bigger than 32766 in regex"

Let's say I want to find in a large (300,000 letters) the word "dogs" with the distance between letters exactly 40,000 letters in between. So I do:
$mystring =~ m/d.{40000}o.{40000}g.{40000}s/;
This will work quite well in other (slower) languages but in Perl it throws me "Quantifier in {,} bigger than 32766 in regex".
So:
Can we use a bigger number as the quantifier somehow?
If not, is there another good way to find what I want? Note that "dogs" is only an example; I want to do this for any word and any jump size (and fast).

If you really need to do this fast I would look at a custom search based on the ideas of Boyer-Moore string search. A regular expression is parsed into a finite state machine. Even a clever, compact representation of such a FSM is not going to be a very effective way to execute a search like you describe.
If you really want to continue along the lines you are now you can just concatenate two expressions like .{30000}.{10000} which is the same as .{40000} in practice.

I think index might be better suited for this task. Something along the lines of the completely untested:
sub has_dogs {
my $str = shift;
my $start = 0
while (-1 < (my $pos = index $$str, 'd', $start)) {
no warnings 'uninitialized';
if ( ('o' eq substr($$str, $pos + 40_000, 1)) and
('g' eq substr($$str, $pos + 80_000, 1)) and
('s' eq substr($$str, $pos + 120_000, 1)) ) {
return 1;
}
}
return;
}

40,000 = 2 * 20,000
/d(?:.{20000}){2}o(?:.{20000}){2}g(?:.{20000}){2}s/s

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js