Manipulating backreferences for substitution in perl

Manipulating backreferences for substitution in perl - regex

As a part of an attempt to replace scientific numbers with decimal numbers I want to save a backreference into a string variable, but it doesn't work.
My input file is:
,8E-6,
,-11.78E-16,
,-17e+7,
I then run the following:
open FILE, "+<C:/Perl/input.txt" or die $!;
open(OUTPUT, "+>C:/Perl/output.txt") or die;
while (my $lines = <FILE>){
$find = "(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
$noofzeroesbeforecomma = eval("$7-length($4)");
$replace = '"foo $noofzeroesbeforecomma bar"';
$lines =~ s/$find/$replace/eeg;
print (OUTPUT $lines);
}
close(FILE);
I get
foo bar
foo bar
foo bar
where I would have expected
foo 6 bar
foo 14 bar
foo 7 bar
$noofzeroesbeforecomma seems to be empty or non-existant.
Even with the following adjustment I get an empty result
$noofzeroesbeforecomma = $2;
Only inserting $2 directly in the replace string gives me something (which is then, unfortunately, not what I want).
Can anyone help?
I'm running Strawberry Perl (5.16.1.1-64bit) on a 64-bit Windows 7 machine, and quite inexperienced with Perl

Your main problem is not using
use strict;
use warnings;
warnings would have told you
Use of uninitialized value $7 in concatenation (.) or string at ...
Use of uninitialized value $4 in concatenation (.) or string at ...
I would recommend you try and find a module that can handle scientific notation, rather than trying to hack your own.
Your code, in a working order might look something like this. As you can see, I have put a q() around your eval string to avoid it being evaluated before $7 and $4 exists. I also removed the eval itself, since while double eval on an eval is somewhat excessive.
use strict;
use warnings;
while (my $lines = <DATA>) {
my $find="(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
my $noof = q|$7-length($4)|;
$lines =~ s/$find/$noof/eeg;
print $lines;
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
6
14
7
As a side note, not using strict is asking for trouble. Doing it while using a variable name such as $noofzeroesbeforecomma is asking for twice the trouble, as it is rather easy to make typos.

This is not about backreferences but the original problem, transforming numbers from scientific notation. I'm sure there are some cases in which this fails:
#!/usr/bin/env perl
use strict;
use warnings;
use bignum;
for (<DATA>) {
next unless /([+-]?\d+(?:\.\d+)?)[Ee]([+-]\d+)/;
print $1 * 10 ** $2 . "\n";
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
0.000008
-0.000000000000001178
-170000000

I suggest you use the Regexp::Common::number plugin for the Regexp::Common module which will find all real numbers for you and allow you to replace those that have an exponent marker
This code shows the idea. using the -keep option makes the module put each component into one of the $N variables. The exponent marker - e or E - is in $7, so the number can be transformed depending on whether this was present
use strict;
use warnings;
use Regexp::Common;
my $real_re = $RE{num}{real}{-keep};
while (<>) {
s/$real_re/ $7 ? sprintf '%.20f', $1 : $1 /eg;
print;
}
output
Given your example input, this code produces the following. The values can be tidied up further using additional code in the substitution
,0.00000800000000000000,
,-0.00000000000000117800,
,-170000000.00000000000000000000,

The problem is that Perl can handle all those types of expressions. And since the standard item of data in Perl is the string, you would only need to capture the expression to use it. So, take this expression:
/(-?\d+(?:.\d+)?[Ee][+-]?\d+)/
to extract it from the surrounding text and use sprintf to format it, like Borodin showed.
However, if it helps you to see a better case of what you tried to do, this works better
my ( $whole, $frac, $expon )
= $line =~ m/(?:,)-?(0|[1-9]\d*)(?:\.(\d*))?[eE]([+\-]?\d+)(?:,)/
;
my $num = $expon - length( $frac );
Why not capture the sign with the exponent anyway, if you're going to do arithmetic with it?
It's better to name your captures and eschew eval when it's not necessary.
The substitution--as is--doesn't make much sense.
Really, since neither the symbols or the digits can be case sensitive, just put a (?i) at the beginning, and avoid the E "character class" [Ee]:
/((?i)-?\d+(?:.\d+)?e[+-]?\d+)/

Related

Perl regex to find keywords and not variables

I'm trying to create a regex as following :
print $time . "\n"; --> match only print because time is a variable ($ before)
$epoc = time(); --> match only time
My regex for the moment is /(?-xism:\b(print|time)\b)/g but it match time in $time in the first example.
Check here.
I tried things like [^\$] but then it doesn't match print anymore.
(I will have more keyword like print|time|...|...)
Thanks

Parsing perl code is a common and useful teaching tool since the student must understand both the parsing techniques and the code that they're trying to parse.
However, to do this properly, the best advice is to use PPI
The following script parses itself and outputs all of the barewords. If you wanted to, you could compare the list of barewords to the ones that you're trying to match. Note, this will avoid things within strings, comments, etc.
use strict;
use warnings;
use PPI;
#my $src = do {local $/; <DATA>}; # Could analyze the smaller code in __DATA__ instead
my $src = do {
local #ARGV = $0;
local $/;
<>;
};
# Load a document
my $doc = PPI::Document->new( \$src );
# Find all the barewords within the doc
my $barewords = $doc->find( 'PPI::Token::Word' );
for (#$barewords) {
print $_->content, "\n";
}
__DATA__
use strict;
use warnings;
my $time = time;
print $time . "\n";
Outputs:
use
strict
use
warnings
use
PPI
my
do
local
local
my
PPI::Document
new
my
find
for
print
content
__DATA__

What you need is a negative lookbehind (?<!\$), it's zero-width so it doesn't "consume" characters.
(?<!\$)a means match a if not preceded with a literal $. Note that we escaped $ since it means end of string (or line depending on the m modifier).
Your regex will look like (?-xism:\b(?<!\$)(print|time)\b).
I'm wondering why you are turning off the xism modifiers. They are off by default.So just use /\b(?<!\$)(?:print|time)\b/g as pattern.
Online demo
SO regex reference

Regex for Evalue substitution in Perl

What I am trying to achieve is convert the Evalue 1e-2 to 0.01.
my $cutoff = "1e-12";
if ($cutoff =~ m/^\de-{1}\d+?$/){
$cutoff = s/e-/*10^(-/;
$cutoff .= ")";
}
print "$cutoff\n";
This is part of a bigger script and running it under use warnings; always gives me Use of uninitialized value $_ in substitution (s///) at test.pl line 4, <STDIN> line 1.
Does anyone spot the mistake here? I cannot seem to be able to do so.

The warning you get is because you used = rather than =~ in front of the substitution operator. You need:
$cutoff =~ s/e-/*10^(-/;
But that isn't the only problem here. You would also have to eval the statement to get what you wanted, which would not only be a bad design, but completely unnecessary. Perl natively treats values like "1e-12" as numbers, so you should not be doing this with a regex at all. You can simply format the output:
printf '%d',$val;
That will convert 1e-2 to .01. If you need to do create very long numbers like this, look into an appropriate module.

Do you realise that "1e-2" is already a valid format for a number in Perl? you just need to persuade Perl to treat it as a number.
$ perl -E'$x= "1e-2"; say $x'
1e-2
$ perl -E'$x= "1e-2"; $x+=0; say $x'
0.01
Adding zero to it ensures that Perl knows it is a number.

How to pass a replacing regex as a command line argument to a perl script

I am trying to write a simple perl script to apply a given regex to a filename among other things, and I am having trouble passing a regex into the script as an argument.
What I would like to be able to do is somthing like this:
> myscript 's/hi/bye/i' hi.h
bye.h
>
I have produced this code
#!/utils/bin/perl -w
use strict;
use warnings;
my $n_args = $#ARGV + 1;
my $regex = $ARGV[0];
for(my $i=1; $i<$n_args; $i++) {
my $file = $ARGV[$i];
$file =~ $regex;
print "OUTPUT: $file\n";
}
I cannot use qr because apparently it cannot be used on replacing regexes (although my source for this is a forum post so I'm happy to be proved wrong).
I would rather avoid passing the two parts in as seperate strings and manually doing the regex in the perl script.
Is it possible to pass the regex as an argument like this, and if so what is the best way to do it?

There's more than one way to do it, I think.
The Evial Way:
As you basically send in a regex expression, it can be evaluated to get the result. Like this:
my #args = ('s/hi/bye/', 'hi.h');
my ($regex, #filenames) = #args;
for my $file (#filenames) {
eval("\$file =~ $regex");
print "OUTPUT: $file\n";
}
Of course, following this way will open you to some very nasty surprises. For example, consider passing this set of arguments:
...
my #args = ('s/hi/bye/; print qq{MINE IS AN EVIL LAUGH!\n}', 'hi.h');
...
Yes, it will laugh at you most evailly.
The Safe Way:
my ($regex_expr, #filenames) = #args;
my ($substr, $replace) = $regex_expr =~ m#^s/((?:[^/]|\\/)+)/((?:[^/]|\\/)+)/#;
for my $file (#filenames) {
$file =~ s/$substr/$replace/;
print "OUTPUT: $file\n";
}
As you can see, we parse the expression given to us into two parts, then use these parts to build a full operator. Obviously, this approach is less flexible, but, of course, it's much more safe.
The Easiest Way:
my ($search, $replace, #filenames) = #args;
for my $file (#filenames) {
$file =~ s/$search/$replace/;
print "OUTPUT: $file\n";
}
Yes, that's right - no regex parsing at all! What happens here is we decided to take two arguments - 'search pattern' and 'replacement string' - instead of a single one. Will it make our script less flexible than the previous one? No, as we still had to parse the regex expression more-or-less regularly. But now user clearly understand all the data that is given to a command, which is usually quite an improvement. )
#args in both examples corresponds to #ARGV array.

The s/a/b/i is an operator, not simply a regular expression, so you need to use eval if you want it to be interpreted properly.
#!/usr/bin/env perl
use warnings;
use strict;
my $regex = shift;
my $sub = eval "sub { \$_[0] =~ $regex; }";
foreach my $file (#ARGV) {
&$sub($file);
print "OUTPUT: $file\n";
}
The trick here is that I'm substituting this "bit of code" into a string to produce Perl code that defines an anonymous subroutine $_[0] =~ s/a/b/i; (or whatever code you pass it), then using eval to compile that code and give me a code reference I can call from within the loop.
$ test.pl 's/foo/bar/' foo nicefood
OUTPUT: bar
OUTPUT: nicebard
$ test.pl 'tr/o/e/' foo nicefood
OUTPUT: fee
OUTPUT: nicefeed
This is more efficient than putting an eval "\$file =~ $regex;" inside the loop as then it'll get compiled and eval-ed at every iteration rather than just once up-front.
A word of warning about eval - as raina77ow's answer explains, you should avoid eval unless you're 100% sure you are always getting your input from a trusted source...

s/a/b/i is not a regex. It is a regex plus substitution. Unless you use the string eval, make this work might be pretty tough (consider s{a}<b>e and so on).

The trouble is that you are trying to pass a perl operator when all you really need to pass is the arguments:
myscript hi bye hi.h
In the script:
my ($find, $replace, #files) = #ARGV;
...
$file =~ s/$find/$replace/i;
Your code is a bit clunky. This is all you need:
use strict;
use warnings;
my ($find, $replace, #files) = #ARGV;
for my $file (#files) {
$file =~ s/$find/$replace/i;
print "$file\n";
}
Note that this way allows you to use meta characters in the regex, such as \w{2}foo?. This can be both a good thing and a bad thing. To make all characters intepreted literally (disable meta characters), you can use \Q ... \E like so:
... s/\Q$find\E/$replace/i;

What regular expression do I use for replacing numbers with leading zeros?

I have a bunch of strings like this:
my $string1 = "xg0000";
my $string2 = "fx0015";
What do I do to increase the number in the string by 1 but also maintain the leading zeros to keep the length of the string the same.
I tried this:
$string =~ s/(\d+)/0 x length(int($1)) . ($1+1)/e;
It doesn't seem to work on all numbers. Is regex what I'm supposet to use to do this or is there a better way?

How about a little perl magic? The ++ operator will work even on strings, and 0000 will magically turn into 0001.
Now, we can't modify $1 since it is readonly, but we can use an intermediate variable.
use strict;
use warnings;
my $string = "xg0000";
$string =~ s/(\d+)/my $x=$1; ++$x/e;
Update:
I didn't think of this before, but it actually works without a regex:
C:\perl>perl -we "$str = 'xg0000'; print ++$str;"
xg0001
Still does not solve the problem DavidO pointed out, with 9999. You would have to decide what to do with those numbers. Perl has a rather interesting solution for it:
C:\perl>perl -we "$str = 'xg9999'; print ++$str;"
xh0000

You can do it with sprintf too, and use the length you compute from the number of digits that you capture:
use strict;
use warnings;
my $string = "xg00000";
foreach ( 0 .. 9 ) {
$string =~ s/([0-9]+)\z/
my $l = length $1;
sprintf "%0${l}d", $1 + 1;
/e;
print "$string\n";
}

This is a really bad task to solve with a regexp. Increasing a number can change an unlimited number of digits, and can in fact also change the number of non-zero digits! Unless you have sworn an oath to use only regexes for a year, use regex to extract the number and then sprintf "%06d" $x+1 to regenerate the new number with the desired width.

changing several expressions in one line in perl

I want to take a line containing several expressions of the same structure, containing 4 digit hexa numbers, and changing the number in that structure according to a hash table. I tried using this next peace of code:
while ($line =~ s/14'h([0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f])/14'h$hash_point->{$1}/g){};
Where $hash_point is a pointer to the hash table.
But it tells me that I try to use an undefined value, when I tried running the fallowing code:
while ($line =~ s/14'h([0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f])/14'h----/g){print $1," -> ",$hash_point->{$1},"\n";};
It changed all the wanted numbers to "----" but printed out the values only 2 times (there were much more changes).
Where is the problem?

This is what I used in the end:
$line =~ s/14'h([0-9a-f][0-9a-f][0-9a-f][0-9a-f])/"14'h".$hash_point->{$1}/ge;
and in order to account for numbers not in the hash I've added:
$line =~ s/14'h([0-9a-f][0-9a-f][0-9a-f][0-9a-f])/"14'h".((hash_point->{$1}) or ($1))/ge;
I also wanted to know what numbers don't appear at the hash:
$line =~ s/14'h([0-9a-f][0-9a-f][0-9a-f][0-9a-f])/"14'h".(($hash_point->{$1}) or (print "number $1 didn't change\n") &&($1))/ge;
and finaly, I wanted to be able to control whether the massage from the previous stage would be printed, I've added the use of $flag which in defined only if I want the massages to appear:
$line =~ s/14'h([0-9a-f][0-9a-f][0-9a-f][0-9a-f])/"14'h".(($hash_point->{$1}) or (((defined($flag)) && (print "number $1 didn't change\n")) or ($1)))/ge;

Your regexp seems to work well for me except when hexa number is not present in the hash.
I tried:
#!/usr/bin/perl
use 5.10.1;
use strict;
use warnings;
use Data::Dumper;
my $line = q!14'hab63xx14'hab88xx14'hab64xx14'hab65xx14'hcdef!;
my $hash_point = {
ab63 => 'ONE',
ab64 => 'TWO',
ab65 => 'THREE',
};
while ($line =~ s/14'h([0-9,a-f][0-9,a-f][0-9,a-f][0-9,a-f])/14'h$hash_point->{$1}/g){};
say $line;
This produces:
Use of uninitialized value in concatenation (.) or string at C:\tests\perl\test5.pl line 15.
Use of uninitialized value in concatenation (.) or string at C:\tests\perl\test5.pl line 15.
14'hONExx14'hxx14'hTWOxx14'hTHREExx14'h
The errors are for numbers ab88 and cdef that are not keys in the hash.

Just a small correction, but both of your regexes don't do what you think it does.
/[a-f,0-9]/
Matches any character from a to f, 0 to 9, and a comma. You are looking for
/[a-z0-9]/
Not that this is what is breaking your program (M42 probably got it right, but we can't be sure unless you show us the hash).
Also, apologies, not enough rep to actually answer to other posts.
EDIT:
Well, you go through a lot of hoops in that answer, so here's how I'd do it instead:
s/14'h\K(\p{AHex}{4})/if (defined($hash_point->{$1})) {
$hash_point->{$1};
} else {
say $1 if $flag;
$1;
}/ge
Mainly because chaining and's and &&'s and sosuch generally makes for fairly hard-to-understand code. All whitespace is optional, so squash it for the one-liner!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Manipulating backreferences for substitution in perl - regex

Related

Perl regex to find keywords and not variables

Regex for Evalue substitution in Perl

How to pass a replacing regex as a command line argument to a perl script

What regular expression do I use for replacing numbers with leading zeros?

changing several expressions in one line in perl

Categories

Resources