Perl - Why does shift lose its value after being used? - regex

This code works - It takes an array of full txt file paths and strips them so that when $exam_nums[$x] is called, it returns the file name
for (0..$#exam_nums)
{
$exam_nums[$_] =~ s/\.txt$//; #remove extension
$exam_nums[$_] =~ s/$dir//g; #remove path
}
When I try to do this for a single variable, it doesn't work. I'm calling a subroutine and sending it a present, but the variable is empty at the end. (It is getting into the if statement block, because the other lines in there run fine.) Here's the code:
Call to the sub:
notify($_);
The $_ is from a foreach(#files) loop that works
The sub:
sub notify
{
if(shift)
{
$ex_num = shift;
$ex_num =~ s/\.txt$//; #remove extension
$ex_num =~ s/$dir//g; #remove path
print $ex_num;
print "\nanything";
}
}
I tried taking out the $ in the "remove extension" portion of the regex, but that didn't help.

You're shifting TWICE. The first shift in the if statement removes the value, the second shift gets nothing. shift has a side-effect of actually modifying #_. in addition to returning the first element, it removes the first element permanently from #_.
EDIT: from man perlfunc
shift ARRAY
shift Shifts the first value of the array off and returns it,
shortening the array by 1 and moving everything down. If there
are no elements in the array, returns the undefined value. If
ARRAY is omitted, shifts the #_ array within the lexical scope
of subroutines and formats, ...

You are attempting to extract your ex_num argument from #_ (the argument list) twice: shift (which alters #_) is not the same as $_[0] (which just looks at the first element of #_ but does not alter it). See perldoc -f shift.
Also, your function is closing over $dir, which may or may not be your intent. (See perldoc perlfaq7 for more information about closures.) I've taken that out and added it as an additional function parameter:
sub notify
{
my ($ex_num, $dir) = #_;
return unless $ex_num;
$ex_num =~ s/\.txt$//; # remove extension
$ex_num =~ s/$dir//g; # remove path
print $ex_num . "\n";
}

I'd use File::Basename instead of rolling my own. It allows you to parse file paths into their directory, filename and suffix.

As per Jim Garrison's info, I pulled a switch to fix the problem:
sub notify
{
$ex_num = shift;
if($ex_num)
{
$ex_num =~ s/\.txt$//; #remove extension
$ex_num =~ s/$dir//g; #remove path
}
}

Uses a core module, local variables and Perl 5.10.
use 5.010;
use File::Basename;
sub notify {
my $ex_num = shift;
my $name = basename($ex_num, '.txt');
say $name;
}

Related

perl regex error: Modification of a read-only value attempted

I have this perl script:
use strict;
use warnings;
foreach my $line (" ^?[?12;12A", " ^?[A") {
print "$line\n";
$line =~ s/\s?[[:cntrl:]]\[(\?)?([0-9]{1,2}(;[0-9]{1,2})?)?[a-zA-Z]//g;
print "$line\n";
}
Those are two strings that start with a space, then a control character, then some regular ascii characters. It results with this error:
$ perl foo.pl
[?12;12A
Modification of a read-only value attempted at foo.pl line 6.
$
What am I doing wrong?
In a foreach loop the loop variable ("topicalizer") is but an alias for the currently processed list element; by changing it we really change the element.
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
The loop in the question iterates over a list of string literals, and those are read-only. Attempting to change that is a fatal error (perldiag) (this case, of foreach, is given as one example.
Some ways around this are shown in Hameed's answer, to store them in an array, or to assign the string literal to a variable first.
Or, use the "non-destructive" modifier on the substitution operator, s///r, which doesn't change the original but returns the changed value (or the original if it didn't change)
my $new_line = $line =~ s/.../.../r;
In your case $line is a read-only value.
You can fix this in two ways:
Work with an actual array like my #testarray = (" ^?[?12;12A", " ^?[A");
Assign the value of $line to another variable and modify that:
my $tmp = $line;
$tmp =~ s/\s?[[:cntrl:]]\[(\?)?([0-9]{1,2}(;[0-9]{1,2})?)?[a-zA-Z]//g;

Perl search and replace enters endless loop

I am trying to match and replace in multiple files some string using
local $/;
open(FILE, "<error.c");
$document=<FILE>;
close(FILE);
$found=0;
while($document=~s/([a-z_]+)\.h/$1_new\.h/gs){
$found=$found+1;
};
open(FILE, ">error.c");
print FILE "$document";
close(FILE);'
It enters an endless loop, because the result of the substitution is matched again by the regular expression searched for. But shouldn't this be avoided by the s///g construct?
EDIT:
I found that also a foreach loop will not do exactly what I want (it will replace all occurrences, but print only one of them). The reason seems to be that the perl substitution and and search behave quite differently in the foreach() and while() constructs. To have a solution to replace in multiple files which outputs also all individual replacements, I came up with the following body:
# mandatory user inputs
my #files;
my $subs;
my $regex;
# additional user inputs
my $fileregex = '.*';
my $retval = 0;
my $opt_testonly=0;
foreach my $file (#files){
print "FILE: $file\n";
if(not($file =~ /$fileregex/)){
print "filename does not match regular expression for filenames\n";
next;
}
# read file
local $/;
if(not(open(FILE, "<$file"))){
print STDERR "ERROR: could not open file\n";
$retval = 1;
next;
};
my $string=<FILE>;
close(FILE);
my #locations_orig;
my #matches_orig;
my #results_orig;
# find matches
while ($string =~ /$regex/g) {
push #locations_orig, [ $-[0], $+[0] ];
push #matches_orig, $&;
my $result = eval("\"$subs\"");
push #results_orig, $result;
print "MATCH: ".$&." --> ".$result." #[".$-[0].",".$+[0]."]\n";
}
# reverse order
my #locations = reverse(#locations_orig);
my #matches = reverse(#matches_orig);
my #results = reverse(#results_orig);
# number of matches
my $length=$#matches+1;
my $count;
# replace matches
for($count=0;$count<$length;$count=$count+1){
substr($string, $locations[$count][0], $locations[$count][1]-$locations[$count][0]) = $results[$count];
}
# write file
if(not($opt_testonly) and $length>0){
open(FILE, ">$file"); print FILE $string; close(FILE);
}
}
exit $retval;
It first reads the file creates lists of the matches, their positions and the replacement text in each file (printing each match). Second it will replace all occurrences starting from the end of the string (in order not to change the position of previous messages). Finally, if matches were found, it writes the string back to the file. Can surely be more elegant, but it does what I want.
$1_new will still match ([a-z_]+). It enters an endless loop because you use while there. With the s///g construct, ONE iteration will replace EVERY occurence in the string.
To count the replacements use:
$replacements = () = $document =~ s/([a-z_]+)\.h/$1_new\.h/gs;
$replacements will contain the number of replaced matches.
If you essentially just want the matches, not the replacements:
#matches = $document =~ /([a-z_]+)\.h/gs;
You can then take $replacement = scalar #matches to obtain their count.
I'd say you're over-engineering this. I did this in the past with:
perl -i -p -e 's/([a-z_]+)\.h/$1_new\.h/g' error.c
This works correctly when the substituted string contains the matching pattern.
the /g option is like a loop in itself. I think you want this:
while($document=~s/([a-z_]+)(?!_new)\.h/$1_new\.h/s){
$found=$found+1;
};
Because you are replacing the match with itself and more, you need a negative lookahead assertion.

Ignore blank variable return from qx

I'm having difficulty with one little bit of my code.
open ("files","$list");
while (my $sort = <files>) {
chomp $sort;
foreach my $key (sort keys %ips) {
if ($key =~ $sort) {
print "key $key\n";
my $match =qx(iptables -nL | grep $key 2>&1);
print "Match Results $match\n";
chomp $match;
my $banned = $1 if $match =~ (/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/);
print "Banned Results $banned\n";
if ($key =~ $banned) {
print "Already banned $banned\n";
} else {
system ("iptables -A INPUT -s $key -j DROP");
open my $fh, '>>', 'banned.out';
print "Match Found we need to block it $key\n";
print $fh "$key:$timestamp\n";
close $fh;
}
}
}
}
So basically what I'm doing is opening a list of addresses 1 per line.
Next I'm sorting down my key variable from another section of my script and matching it with my list, if it matches then it continues on to the if statement.
Now with that matched key I need to check and see if its blocked already or not, so I'm using a qx to execute iptables and grep for that variable. If it matches everything works perfectly.
If it does not match, in other words my iptables -nL | grep $key returns a blank value instead of moving on to my else statement it "grabs" that blank value for $match and continues to execute.
For the life of me I can't figure out how to strip that blank value out and basically show it as no return.
I know there are modules for iptables etc however I have to keep this script as generic as possible.
The problem is that, when iptables returns no results, $banned is left at its default value of undef. Used as a regex, $banned matches every string, so your condition:
if ($key =~ $banned) {
always matches. I think what you meant to write was probably
if ($key eq $banned) {
which will fail if either $banned is undef (because $matched was empty or didn't match the regex) or if the IP address you pulled out with the regex was somehow different from $key.
If you're confident that the first IP in the iptables result will be the same as $key then you could simplify your condition to just
if ($match =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/) {
I suggest you put the entire output from iptables -nL into an array and grep it using Perl. That way you will be calling the utility only once, and it is easy to detect an empty list.
If you write
my #iptables = qx(iptables -nL);
at the top of your code, then you can query this output by
my #match = grep /\b$key\b/, #iptables;
and if there are no records that contain the IP address then a subsequent
if (#match) { ... }
will fail.
There are a few other problems with your code. Firstly, you must always use strict and use warnings at the start of your program, and declare all variables at their first point of use. This will uncover many simple errors that you may otherwise easily overlook, and applies especially if you are asking for help with your code.
Your open call should look like
open my $fh, '<', $file or die $!;
together with
while (my $sort = <$fh>) { ... }
And you seem to have missed the point of hashes. There is no need to read through all of the keys in a hash looking for a match, as the hash elements can be accessed directly with $ips{$sort}. If the value returned is undef then the element doesn't exist, or you can explicitly check for its existence with if (exists $ips{$sort}) { ... }.
I cannot help further as I have no access to a platform that provides iptables. If you need more help then please post some output from the utility.

Removing newline character from a string in Perl

I have a string that is read from a text file, but in Ubuntu Linux, and I try to delete its newline character from the end.
I used all the ways. But for s/\n|\r/-/ (I look whether it finds any replaces any new line string) it replaces the string, but it still goes to the next line when I print it. Moreover, when I used chomp or chop, the string is completely deleted. I could not find any other solution. How can I fix this problem?
use strict;
use warnings;
use v5.12;
use utf8;
use encoding "utf-8";
open(MYINPUTFILE, "<:encoding(UTF-8)", "file.txt");
my #strings;
my #fileNames;
my #erroredFileNames;
my $delimiter;
my $extensions;
my $id;
my $surname;
my $name;
while (<MYINPUTFILE>)
{
my ($line) = $_;
my ($line2) = $_;
if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) {
#chop($line2);
$line2 =~ s/^\n+//;
print $line2 . " WRONG FORMAT!\n";
}
else {
#print "INSERTED:".$13."\n";
my($id) = $13;
my($name) = $2;
print $name . "\t" . $id . "\n";
unshift(#fileNames, $line2);
unshift(#strings, $line2 =~ /[^\W_]+/g);
}
}
close(MYINPUTFILE);
The correct way to remove Unicode linebreak graphemes, including CRLF pairs, is using the \R regex metacharacter, introduced in v5.10.
The use encoding pragma is strongly deprecated. You should either use the use open pragma, or use an encoding in the mode argument on 3-arg open, or use binmode.
use v5.10; # minimal Perl version for \R support
use utf8; # source is in UTF-8
use warnings qw(FATAL utf8); # encoding errors raise exceptions
use open qw(:utf8 :std); # default open mode, `backticks`, and std{in,out,err} are in UTF-8
while (<>) {
s/\R\z//;
...
}
You are probably experiencing a line ending from a Windows file causing issues. For example, a string such as "foo bar\n", would actually be "foo bar\r\n". When using chomp on Ubuntu, you would be removing whatever is contained in the variable $/, which would be "\n". So, what remains is "foo bar\r".
This is a subtle, but very common error. For example, if you print "foo bar\r" and add a newline, you would not notice the error:
my $var = "foo bar\r\n";
chomp $var;
print "$var\n"; # Remove and put back newline
But when you concatenate the string with another string, you overwrite the first string, because \r moves the output handle to the beginning of the string. For example:
print "$var: WRONG\n";
It would effectively be "foo bar\r: WRONG\n", but the text after \r would cause the following text to wrap back on top of the first part:
foo bar\r # \r resets position
: WRONG\n # Second line prints and overwrites
This is more obvious when the first line is longer than the second. For example, try the following:
perl -we 'print "foo bar\rbaz\n"'
And you will get the output:
baz bar
The solution is to remove the bad line endings. You can do this with the dos2unix command, or directly in Perl with:
$line =~ s/[\r\n]+$//;
Also, be aware that your other code is somewhat horrific. What do you for example think that $13 contains? That'd be the string captured by the 13th parenthesis in your previous regular expression. I'm fairly sure that value will always be undefined, because you do not have 13 parentheses.
You declare two sets of $id and $name. One outside the loop and one at the top. This is very poor practice, IMO. Only declare variables within the scope they need, and never just bunch all your declarations at the top of your script, unless you explicitly want them to be global to the file.
Why use $line and $line2 when they have the same value? Just use $line.
And seriously, what is up with this:
if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) {
That looks like an attempt to obfuscate, no offence. Three nested negations and a bunch of unnecessary parentheses?
First off, since it is an if-else, just swap it around and reverse the regular expression. Second, [^\W_] a double negation is rather confusing. Why not just use [A-Za-z0-9]? You can split this up to make it easier to parse:
if ($line =~ /^(.+)(\.docx)\s*$/) {
my $pre = $1;
my $ext = $2;
You can wipe the linebreaks with something like this:
$line =~ s/[\n\r]//g;
When you do that though, you'll need to change the regex in your if statement to not look for them. I also don't think you want a /g in your if. You really shouldn't have a $line2 either.
I also wouldn't do this type of thing:
print $line2." WRONG FORMAT!\n";
You can do
print "$line2 WRONG FORMAT!\n";
... instead. Also, print accepts a list, so instead of concatenating your strings, you can just use commas.
You can do something like:
=~ tr/\n//
But really chomp should work:
while (<filehandle>){
chomp;
...
}
Also s/\n|\r// only replaces the first occurrence of \r or \n. If you wanted to replace all occurrences you would want the global modifier at the end s/\r|\n//g.
Note: if you're including \r for windows it usually ends its line as \r\n so you would want to replace both (e.g. s/(?:\r\n|\n)//), of course the statement above (s/\r|\n//g) with the global modifier would take care of that anyways.
$variable = join('',split(/\n/,$variable))

Quote - capture - question

Could someone explain, why I can use $1 two times and get different results?
perl -wle '"ok" =~ /(.*)/; sub { "huh?" =~ /(.*)/; print for #_ }->( "$1", $1 )'
(Found in: How to exclude submatches in Perl?)
The #_ argument array doesn't behave the way you think it does. The values in #_ in a subroutine are actually aliases for the real arguments:
The array #_ is a local array, but its elements are aliases for the actual scalar parameters.
When you say this:
sub s {
"huh?" =~ /(.*)/;
print for #_;
}
"ok" =~ /(.*)/;
s("$1", $1);
The $1 in the first argument to s is immediately evaluated by the string interpolation but the second argument is not evaluated, it is just noted that the second value in the sub's version of #_ is $1 (the actual variable $1, not its value). Then, inside s, the value of $1 is changed by your regular expression. And now, your #_ has an alias for the string "ok" followed by an alias for $1, these aliases are resolved by the print in your loop.
If you change the function to this:
sub s {
my #a = #_;
"huh?" =~ /(.*)/;
print for #a;
}
or even this:
sub s {
local $1;
"huh?" =~ /(.*)/;
print for #_;
}
Then you'll get the two lines of "ok" that you're expecting. The funny (funny peculiar, not funny ha-ha) is that those two versions of s produce your expected result for different reasons. The my #a = #_; version extracts the current values of the aliases in #_ before the regular expression gets its hands on $1; the local $1; version localizes the $1 variable to the sub leaving the alias in #_ referencing the version of $1 from outside the sub:
A local modifies the listed variables to be local to the enclosing block, file, or eval.
Oddities like this are why you should always copy the values of the numbered regex capture variables to variables of your as soon as possible and why you want to unpack #_ right at the beginning of your functions (unless you know why you don't want to do that).
Hopefully I haven't butchered the terminology too much, this is one of those weird corners of Perl that I've always stayed away from because I don't like juggling razor blades.
The sample code makes use of two facts:
The elements of the #_ array are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (and vice versa).
$1 is a global variable (albeit dynamically scoped to the current BLOCK), which automatically contains the subpattern from () from the last successful pattern match.
The first argument to the subroutine is an ordinary string ("ok"). The second argument is the global variable $1. But it is changed by the successful pattern match inside the subroutine, before the arguments are printed.
That happens because perl passes parameters by reference.
What you are doing is similar to:
my $a = 'ok';
sub foo {
$a = 'huh?';
print for #_;
}
my $b = $a;
foo($b, $a)
When the sub foo is called, $_[1] is actually an alias for $a and so its value gets modified when $a is modified.