Perl pattern match and arithmetic operation at the same time - regex

Can i make match pattern and arithmetic operation at the same time ?
print 5 / 3 !~ /\.\d*/;
result 5 , why ?
$str = 5 / 3;
print $str !~ /\.\d*/;
total correct.
How can i make in the one expression ?

Default order of operations is giving you the unexpected result. Instead, try:
print +(5 / 3) !~ /\.\d*/;
But, as pointed out by others, that's a terrible way to test whether 3 divides 5. You have the modulus operator for that:
print 5 % 3 == 0;

It is returning 5 because 3 !~ /\.\d*/ returns 1 and 5 / 1 = 5`.
You can wrap your arithmetic expression in parens to have Perl evaluate it first:
print ((5 / 3) !~ /\.\d*/);

You just need to use brackets!
What happend in your code is basically:
print 5 / (3 !~ /\.\d*/);
So the RegEx comes first, then the / division.
I think you want to do something like:
print ((5 / 3) !~ /\.\d*/);
# or
my $division = 5 / 3;
print $division if $division !~ /\.\d*/;
# or
# print (5 / 3) if (5 / 3) !~ /\.\d*/;
# but the calculation need to be twice here!
If i understand your problem correct, you just want to print if the division does not return a float:
print "test" if 5 / 3 == int 5 / 3
print "test 2" if 5 / 5 == int 5 / 5
Output:
test 2
There a way more better, faster and elegant ways to check this than using a RegExp.

Related

how to match content except and, or, ||, && in perl regex

for example like this
$str = "1 < 4 and 8 > 2 or 4 * 3 or $m =~ /^\d+&\$/";
I would like to capture
1 < 4
8 > 2
4 * 3
$m =~ /^d+&\$/
however, $str =~ /\s+(?<operators>and|or|&&|\|\|){1,}\s+/; doesn't work, any help to modify
To set $str to that, you should use single quotes (or escape all the meta characters).
my $str = '1 < 4 and 8 > 2 or 4 * 3 or $m =~ /^\d+&\$/';
my #capture = split /\s+(?:and|or|&&|\|\|)\s+/, $str;

Get next 5 lines after regexp is matched in tcl

How to get the next 5 lines after a certain pattern is matched in TCL
I've some 30lines of output and need only few lines in between...
Might be easier to split the output into a list of lines so you can use lsearch:
% set output [exec seq 10]
1
2
3
4
5
6
7
8
9
10
% set lines [split $output \n]
1 2 3 4 5 6 7 8 9 10
% set idx [lsearch -regexp $lines {4}]
3
% set wanted [lrange $lines $idx+1 $idx+5]
5 6 7 8 9
Just append something to your regular expression! Like this:
([^\n]*\n){5}
Glenn Jackman's solution is probably better, but the line processing command in fileutil can be preferable for some variations.
package require fileutil
Given a file that looks like this:
% cat file.txt
1
2
3
4
5
6
7
8
9
10
Now, for each line in the file
set n 0
set re 4
set nlines 5
::fileutil::foreachLine line file.txt {
if {$n > 0} {
puts $line
incr n -1
}
if {$n == 0 && [regexp $re $line]} {
set n $nlines
}
}
If the counter n is greater than 0, print the line and decrement. If n is equal to 0 and the regular expression matches the line, set n to $nlines (5).
# output:
5
6
7
8
9
Documentation: fileutil package, if, incr, package, puts, Syntax of Tcl regular expressions, regexp, set

Pattern matching in perl (Lookahead and Condition on word Index)

I have a long string, containing alphabetic words and each delimited by one single character ";" . The whole string also starts and ends with a ";" .
How do I count the number of occurrences of a pattern (started with ";") if index of a success match is divisible by 5.
Example:
$String = ";the;fox;jumped;over;the;dog;the;duck;and;the;frog;"
$Pattern = ";the(?=;f)"
OUTPUT: 1
Since:
Note 1: In above case, the $Pattern ;the(?=;f) exists as the 1st and 10th words in the $String; however; the output result would be 1, since only the index of second match (10) is divisible by 5.
Note 2: Every word delimited by ";" counts toward the index set.
Index of the = 1 -> this does not match since 1 is not divisible by 5
Index of fox = 2
Index of jumped = 3
Index of over = 4
Index of the = 5 -> this does not match since the next word (dog) starts with "d" not "f"
Index of dog = 6
Index of the = 7 -> this does not match since 7 is not divisible by 5
Index of duck = 8
Index of and = 9
Index of the = 10 -> this does match since 10 is divisible by 5 and the next word (frog) starts with "f"
Index of frog = 11
If possible, I am wondering if there is a way to do this with a single pattern matching without using list or array as the $String is extremely long.
Use Backtracking control verbs to process the string 5 words at a time
One solution is to add a boundary condition that the pattern is preceded by 4 other words.
Then setup an alteration so that if your pattern is not matched, the 5th word is gobbled and then skipped using backtracking control verbs.
The following demonstrates:
#!/usr/bin/env perl
use strict;
use warnings;
my $string = ";the;fox;jumped;over;the;dog;the;duck;and;the;frog;";
my $pattern = qr{;the(?=;f)};
my #matches = $string =~ m{
(?: ;[^;]* ){4} # Preceded by 4 words
(
$pattern # Match Pattern
|
;(*SKIP)(*FAIL) # Or consume 5th word and skip to next part of string.
)
}xg;
print "Number of Matches = " . #matches . "\n";
Outputs:
Number of Matches = 1
Live Demo
Supplemental Example using Numbers 1 through 100 in words
For additional testing, the following constructs a string of all numbers in word format from 1 to 100 using Lingua::EN::Numbers.
For the pattern it looks for a number that's a single word with the next number that begins with the letter S.
use Lingua::EN::Numbers qw(num2en);
my $string = ';' . join( ';', map { num2en($_) } ( 1 .. 100 ) ) . ';';
my $pattern = qr{;\w+(?=;s)};
my #matches = $string =~ m{(?:;[^;]*){4}($pattern|;(*SKIP)(*FAIL))}g;
print "#matches\n";
Outputs:
;five ;fifteen ;sixty ;seventy
Reference for more techniques
The following question from last month is a very similar problem. However, I provided 5 different solutions in addition to the one demonstrated here:
In Perl, how to count the number of occurences of successful matches based on a condition on their absolute positions
You can count the number of semicolons in each substring up to the matching position. For a million-word string, it takes 150 seconds.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = join ';', q(),
map { qw( the fox jumped over the dog the duck and the frog)[int rand 11] }
1 .. 1000;
$string .= ';';
my $pattern = qr/;the(?=;f)/;
while ($string =~ /$pattern/g) {
my $count = substr($string, 0, pos $string) =~ tr/;//;
say $count if 0 == $count % 5;
}
Revised Answer
One relatively simple way to achieve what you want is by replacing the delimiters in the original text that occur on a 5-word-index boundary:
$text =~ s/;/state $idx++ % 5 ? ',' : ';'/eg;
Now you just need to trivially adjust your $pattern to look for ;the,f instead of ;the;f. You can use the =()= pseudo-operator to return the count:
my $count =()= $text =~ /;the(?=,f)/g;
Original answer after the break. (Thanks to #choroba for pointing out the correct interpretation of the question.)
Character-Based Answer
This uses the /g regex modifier in combination with pos() to look at matching words. For illustration, I print out all matches (not just those on 5-character boundaries), but I print (match) beside those on 5-char boundaries. The output is:
;the;fox;jumped;over;the;dog;the;duck;and;the;frog
^....^....^....^....^....^....^....^....^....^....
`the' #0 (match)
`the' #41
And the code is:
#!/usr/bin/env perl
use 5.010;
my $text = ';the;fox;jumped;over;the;dog;the;duck;and;the;frog';
say $text;
say '^....^....' x 5;
my $pat = qr/;(the)(?=;f)/;
#$pat = qr/;([^;]+)/;
while ($text =~ /$pat/g) {
my $pos = pos($text) - length($1) - 1;
say "`$1' \#$pos". ($pos % 5 ? '' : ' (match)');
}
First of, pos is also possible as a left hand side expression. You could make use of the \G assertion in combination with index (since speed is of concern for you). I expanded your example to showcase that it only "matches" for divisibles of 5 (your example also allowed for indices not divisible by 5 to be 1 a solution, too). Since you only wanted the number of matches, I only used a $count variable and incremented. If you want something more, use the normal if {} clause and do something in the block.
my $string = ";the;fox;jumped;over;the;dog;the;duck;and;the;frog;or;the;fish";
my $pattern = qr/;the(?=;f)/;
my ($index,$count, $position) = (0,0,0);
while(0 <= ($position = index $string, ';',$position)){
pos $string = $position++; #add one to $position, to terminate the loop
++$count if (!(++$index % 5) and $string =~/\G$pattern/);
}
say $count; # says 1, not 2
You could use the experimental features of regexes to solve you problem (especially the (?{}) blocks). Before you do, you really should read the corresponding section in the perldocs.
my ($index, $count) = (0,0);
while ($string =~ /; # the `;'
(?(?{not ++$index % 5}) # if with a code condition
the(?=;f) # almost your pattern, but we'll have to count
|(*FAIL)) # else fail
/gx) {
$count++;
}

How to check if a variable (e.g. $3) exists so that I don't get a warning about "uninitialized value"?

I am trying to run a script on a .csv file that converts sizes in feet and inches to separate columns of width in inches and height in inches.
It works fine when not using strict mode, but returns a warning message when use warnings; is set.
My problem occurs here:
if ($data[$index_num] =~ /(\d+)'( (\d+)'')? x (\d+)'( (\d+)'')?/) {
my $width = $1*12+$3;
my $height = $4*12+$6;
...
}
Since sometimes the size could be 5' x 8' the special variables $3 and $6 do not exist.
All possible size formats are as follows:
5' 1'' x 6' 7'' (All Single Digits)
10' 1'' x 5' 10'' (Combination of Single and Double Digits)
5' x 8' (No Inches)
4' 6'' x 7' (Combination of No Inches and Inches)
The warning I receive is:
Use of uninitialized value $6 in addition (+) at script.pl line 47, line 567.
This is happening in either case 3 or 4 where the inches are missing on either the width or height.
How can I check to see if either $3 or $6 exists before trying to use them?
You can use defined function:
my $height;
if(defined($6))
{
$height = $4*12+$6;
}
else
{
$height = $4*12;
}
First, let's get rid of useless captures:
/(\d+)'(?: (\d+)'')? x (\d+)'(?: (\d+)'')?/
Captures that didn't match are set to undef, so you can used defined or //.
my $width = $1 * 12 + ($2 // 0);
my $height = $3 * 12 + ($4 // 0);
// was introduced in Perl 5.10. For backwards compatibility, you can use the following:
my $width = $1 * 12 + ($2 || 0);
my $height = $3 * 12 + ($4 || 0);
The second version uses zero when $2 (or $4) is false (undef, zero, or empty string) rather than undef, but that's perfectly fine here.
If you just want to test for a number AND you're happy to calculate a result when there is no match then you could say:
if ($data[$index_num] =~ /(\d+)\'( (\d+)\'\')? x (\d+)\'( (\d+)\'\')?/) {
my $width = ( $1 || 0 ) * 12 + ( $3 || 0 );
my $height = ( $4 || 0 ) * 12 + ( $6 || 0 );
...
}
Why does this work? Because if, say, $1 is undefined, then ($1 || 0) returns the right-hand side, or 0. Likewise if $1 is zero then the left-hand-side of the OR condition will also fail but return 0 nonetheless. And if $1 is non-zero and defined then the left-hand-side will be returned ($1).
If you're going to use the || technique then ensure you put the left-hand and right-hand sides inside parenthesis to avoid operator-precedence side effects.

How can I use Perl to validate this data containing balanced text?

I have a text file filled with sentences with unique pattern. The unique pattern is:
NAME [ e_NAME ]
simple rule: the "NAME" must follow after "e_" if the "e_" appearers inside the brackets!
The problem comes out when the string is complicated. I'll show the end point situations that may be hard to analyse:
Lines that won't match the rule:
(1) NAME1[blabla+NAME2[blabla+e_BAD2]+e_NAME1]
(2) NAME1[blabla] + NAME2[e_BAD2]
(3) NAME1[NAME2[blabla+e_BAD2]] + NAME3[e_BAD3]
(4) NAME1[e_NAME1BAD1] -> means it has to be only NAME1
Lines that match the rule:
(1) FOO1[blabla + 1]
(2) [blalbla] + bla
(3) bla + blabla
(4) FOO1[ccc + ddd + FOO2[e_FOO2]] = 123
(5) FOO1[cc + FOO2[ dd ] ] + FOO3[e_FOO3]
I already asked this question but I couldn't catch this end points...
Edited after requirements were clarified
Either Text::Balanced or Regexp::Common might be useful. I initially posted an answer using the former but didn't like it very much. The following example uses Regexp::Common and seems fairly straightforward.
use strict;
use warnings;
use Regexp::Common;
my $PRE = '[^[]*?';
my $VAR = '\w+';
my $BRACK = $RE{balanced}{-parens=>'[]'};
my $POST = '.*';
while (<DATA>){
my ($bad, $full);
# Brackets, if any, must balance
$bad = 1 unless s/\[/[/g == s/\]/]/g;
$full = $_;
until ($bad){
# Find some bracketed text and store all components.
my ($pre, $var, $brack, $post) =
$full =~ /^($PRE)($VAR)($BRACK)($POST)$/;
last unless defined $brack;
# Create a copy of the bracketed text, removing both the outer
# brackets and all instances of inner-bracketed text.
chop (my $clean = substr $brack, 1);
$clean =~ s/$BRACK/ /g;
# If e_FOO exists, FOO must equal $var.
$bad = 1 if $clean =~ /e_(\w+)/ and $1 ne $var;
# Remove the part of $full we've already checked.
substr($full, 0, length($pre) + length($var) + 1, '');
}
print if $bad;
}
# Your test data, with some trailing comments.
__DATA__
NAME1[blabla+NAME2[blabla+e_BAD2]+e_NAME1] NOT OK 1
NAME1[blabla] + NAME2[e_BAD2] NOT OK 2
NAME1[NAME2[blabla+e_BAD2]] + NAME3[e_BAD3] NOT OK 3
NAME1[e_NAME1BAD1] NOT OK 4
FOO1[blabla + 1] OK 1
[blalbla] + bla OK 2
bla + blabla OK 3
FOO1[ccc + ddd + FOO2[e_FOO2]] = 123 OK 4
FOO1[cc + FOO2[ dd ] ] + FOO3[e_FOO3] OK 5
Maybe you are looking for something like:
if ($string =~ /(\w+)\[e\\_(\w+)/ && $1 eq $2) {
print "Pattern '$1' contained in string '$string'\n";
}
Based on the accepted answer to your first question, I came up with this:
use strict;
use warnings;
while (<DATA>) {
my $l = $_;
while (s/(\w+)\[([^\[\]]*)\]//) {
my ($n, $chk) = ($1, $2);
unless ($chk =~ /\be_$n\b/) {
warn "Bad line: $l";
last;
}
}
}
The \b checks for a word boundary. This version still doesn't check for unbalanced brackets, but it does seem to catch all the examples you gave, and will also complain when the e_NAME1 is inside another nested block, like so:
NAME1[stuff + NAME2[e_NAME1 + e_NAME2] + morestuff]
use Text::Balanced;
CPAN is wonderful.