I'm struggling to work out what I'm doing wrong here.
The goal of my code is to read a file, movie_script.txt, and then use regexes to sort each line into an array based on the character speaking the line. It does work, but I'm getting output preceded by these warnings:
Use of uninitialized value $char in string eq at filter.pl line 24, <$fh> line 13.
Use of uninitialized value $char in string eq at filter.pl line 26, <$fh> line 13.
Use of uninitialized value $char in string eq at filter.pl line 28, <$fh> line 13.
[...]
Hello, mother.
Oh. Well-- well, I, uh--
Well, uh, I think they must have popped by for something.
Mm, they-- they started following me yesterday.
Here's the code:
use strict;
use warnings;
my $filename = "movie_script.txt";
unless (-e $filename) {
print "Error: File does not exist.";
}
my #brian;
my #mandy;
my #followers;
open(my $fh, '<', $filename);
my $match = qr/^(\w+):(.+)$/i;
while (my $line = <$fh>) {
my $char = "";
my $scriptline = "";
if ($line) {
($char, $scriptline) = $line =~ $match;
if ($char eq "BRIAN") {
push(#brian, $scriptline);
} elsif ($char eq "MANDY") {
push(#mandy, $scriptline);
} elsif ($char eq "FOLLOWERS") {
push(#followers, $scriptline);
} else {
print($line);
}
}
}
foreach (#brian) {
print "$_\n";
}
I suspect that the problem is a line doesn't fit my regex, and it's causing problems for the variables $char and $scriptline, but I don't know how to confirm whether that's true, or how to find out which line is causing the problem.
I've tried running the Perl debugger using perl -d, but when I proceed through each line, I can't find the error. I've tried to set a breakpoint around the `else { print($line) } line, but I can't work out how to run the debugger until it reaches that line.
Is there an obvious reason in my code why I'm getting an uninitalised value problem?
If you've got lines in the (movie) script that do not have a character speaking them, your regex will not match, and $char and $scriptline will not have values. You will want to skip those lines in some way.
There's many ways to do that, but one way would be to move the match operator to the if condition:
if (($char, $scriptline) = $line =~ $match) {
if ($char eq "BRIAN") {
push(#brian, $scriptline);
} elsif ($char eq "MANDY") {
push(#mandy, $scriptline);
} elsif ($char eq "FOLLOWERS") {
push(#followers, $scriptline);
} else {
print($line);
}
}
The script will now ignore all lines that aren't dialog, push lines spoken by these specific cast members, and print lines spoken by others.
Consider asking Perl to tell you what the problem is.
if ($line) {
if (my ($char, $scriptline) = $line =~ $match) {
# Your existing code here
} else {
warn "Line [$line] doesn't match the regex [$match]\n";
}
Note I've also moved the declaration of $char and $scriptline into the smallest possible scope. There's no point in declaring them any earlier or pre-populating them (as you're going to overwrite that data in the match line).
The reason for your uninitialized warnings is that some lines in your file do not match the regex, but you still use the variables. The solution is to not use the variables when the regex does not match by checking with an if statement as shown in the example below.
A few quick points.
Consider using a hash instead of a number of arrays to store your lines. This will make the script re-usable and flexible.
It is unnecessary to hard code a file name. You can simply supply the file name on the command line, and use the diamond operator <> to read the file: while (my $line = <>) { ....
And run it with
$ lines.pl movie_script.txt
You do not need to add variables for the things you match with your regex. That is already taken care of since you use parentheses () in your regex. The matches are stored in pre-defined variables $1 and $2.
You capture leading spaces in your lines, which can be fixed by adding \s* in the regex in front of the second parenthesis.
You are using the /i modifier in your regex, which is unused, since you do not have any letters in your regex. (E.g. if you did /foo/i it would match FOO)
You can use the /s modifier to allow .+ to match the newline, so you do not have to add it back later.
Your program is a very basic, typical perl one-liner, and it doesn't need to be complicated. In this example, I am using the Data::Dumper module to show you what the resulting data structure looks like:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq = 1; # show newline in Dumper print
my %lines;
my $match = qr/^(\w+):\s*(.+)/s; # remove unused /i, add /s
while (my $line = <DATA>) {
if ($line =~ $match) { # Check before you use the variables
push #{$lines{$1}}, $2; # Store the matches in your hash
} # <--- if you want to do something with unmatched lines, use else
}
print for (#{$lines{BRIAN}});
print Dumper \%lines;
__DATA__
BRIAN: Hello, mother.
MANDY: Hi
BRIAN: Oh. Well-- well, I, uh--
FOLLOWERS: Hello
(I used the <DATA> filehandle to simulate your text file inside my program, just replace <DATA> with <>)
This program will output
Hello, mother.
Oh. Well-- well, I, uh--
$VAR1 = {
"BRIAN" => [
"Hello, mother.\n",
"Oh. Well-- well, I, uh--\n"
],
"MANDY" => [
"Hi\n"
],
"FOLLOWERS" => [
"Hello\n"
]
};
It will work with different file names, it will capture lines of any named characters, not just the 3 you hard coded.
Following perl script
takes a filename for processing or use default filename `movie_script.txt'
looks in the file for pattern CHARACTER: LINE and fills hash with data
sorts characters and prints their lines
use strict;
use warnings;
use feature 'say';
my $filename = shift || 'movie_script.txt';
my $match = qr/^(\w+):(.+)$/i;
my %script;
open my $fh, '<', $filename
or die "Couldn't open $filename";
while(<$fh>) {
next if /^\s*\Z/;
push #{$script{$1}}, $2 if /$match/;
}
close $fh;
for my $char ( sort keys %script ) {
say $char;
say "\t$_" for #{$script{$char}};
}
Output
BRIAN
Oh. Well-- well, I, uh--
Well, uh, I think they must have popped by for something.
Mm, they-- they started following me yesterday.
FOLLOWERS
The Messiah! The Messiah! Show us the Messiah!
The Messiah!
The Messiah! The Messiah!
Show us the Messiah! The Messiah! The Messiah! Show us the Messiah!
MANDY
Don't you 'hello mother' me. What are all those people doing out ther e?!
Come on! What have you been up to, my lad?!
'Popped by'?! 'Swarmed by', more like! There's a multitude out there!
Well, they can stop following you right now. Now, stop following my son! You ought to be ashamed of yourselves.
The who?
Huh, there's no Messiah in here. There's a mess, all right, but no Me ssiah. Now, go away!
Ooooh.
Now, you listen here! He's not the Messiah. He's a very naughty boy! Now, go away!
RIAN
Hello, mother.
Related
I am trying to deobfuscate code. This code uses a lot of long variable names which are substituted with meaningful names at the time of running the code.
How do I preserve the state while searching and replacing?
For instance, with an obfuscated line like this:
${${"GLOBALS"}["ttxdbvdj"]}=_hash(${$urqboemtmd}.substr(${${"GLOBALS"}["wkcjeuhsnr"]},${${"GLOBALS"}["gjbhisruvsjg"]}-${$rrwbtbxgijs},${${"GLOBALS"}["ibmtmqedn"]}));
There are multiple mappings in mappings.txt which match above obfuscated line like:
$rrwbtbxgijs = hash_length;
$urqboemtmd = out;
At the first run, it will replace $rrwbtbxgijs with hash_length in the obfuscated line above. Now, when it comes across the second mapping during the next iteration of the outer while loop, it will replace $urqboemtmd with out in the obfuscated line.
The problem is:
When it comes across first mapping, it does the substitution. However, when it comes across next mapping in the same line for a different matching string, the previous search/replace result is not there.
It should preserve the previous substitution. How do I do that?
I wrote a Perl script, which would pick one mapping from mapping.txt and search the entire obfuscated code for all the occurrences of this mapping and replace it with the meaningful text.
Here is the code I wrote:
#! /usr/bin/perl
use warnings;
($mapping, $input) = #ARGV;
open MAPPING, '<', $mapping
or die "couldn't read from the file, $mapping with error: $!\n";
while (<MAPPING>) {
chomp;
$line = $_;
($key, $value) = split("=", $line);
open INPUT, '<', $input;
while (<INPUT>) {
chomp;
if (/$key/) {
$_=~s/\Q$key/$value/g;
print $_,"\n";
}
}
close INPUT;
}
close MAPPING;
To match the literal meta characters inside your string, you can use quotemeta or:
s/\Q$key\E/$replace/
Just tell Perl not to interpret the characters in $key:
s/\Q$key/$value/g
Consider using B::Deobfuscate and gradually enter variable names into its configuration file as you figure out what they do.
I'm a little confused about your request to save state. What exactly are you doing/do you intend to do with the output? Here's an (untested) example of doing all the substitutions in one pass, if that helps?
my %map;
while ( my $line = <MAPPING> ) {
chomp $line;
my ($key, $value) = split("=", $line);
$map{$key} = $value;
}
close MAPPING;
my $search = qr/(#{[ join '|', map quotemeta, sort { length $b <=> length $a } keys %map ]})/;
while ( my $line = <INPUT> ) {
$line =~ s/$search/$map{$1}/g;
print OUTPUT $line;
}
open (FH,"report");
read(FH,$text,-s "report");
$fill{"place"} = "Dhahran";
$fill{"wdesc:desc"} = "hot";
$fill{"dayno.days"} = 4;
$text =~ s/%(\w+)%/$fill{$1}/g;
print $text;
This is the content of the "report" template file
"I am giving a course this week in %place%. The weather is %wdesc:desc%
and we're now onto day no %dayno.days%. It's great group of blokes on the
course but the room is like the weather - %wdesc:desc% and it gets hard to
follow late in the day."
For reasons that I won't go into, some of the keys in the hash I'll be using will have dots (.) or colons (:) in them, but the regex stops working for these, so for instance in the example above only %place% gets correctly replaced. By the way, my code is based on this example.
Any help with the regex greatly appreciated, or maybe there's a better approach...
You could loosen it right up and use "any sequence of anything that isn't a %" for the replaceable tokens:
$text =~ s/%([^%]+)%/$fill{$1}/g;
Good answers so far, but you should also decide what you want to do with %foo% if foo isn't a key in the %fill hash. Plausible options are:
Replace it with an empty string (that's what the current solutions do, since undef is treated as an empty string in this context)
Leave it alone, so "%foo%" stays as it is.
Do some kind of error handling, perhaps printing a warning on STDERR, terminating the translation, or inserting an error indicator into the text.
Some other observations, not directly relevant to your question:
You should use the three-argument version of open.
That's not the cleanest way to read an entire file into a string. For that matter, for what you're doing you might as well process the input one line at a time.
Here's how I might do it (this version leaves unrecognized "%foo%" strings alone):
#!/usr/bin/perl
use strict;
use warnings;
my %fill = ( place => 'Dhahran',
'wdesc:desc' => 'hot',
'dayno.days' => 4 );
my $filename = 'report';
open my $FH,,'<', $filename or die "$filename: $!\n";
while (my $line = <$FH>) {
foreach my $key (keys %fill) {
$line =~ s/\Q%$key%/$fill{$key}/g;
}
print $line;
}
And here's a version that dies with an error message if there's an unrecognized key:
#!/usr/bin/perl
use strict;
use warnings;
my %fill = ( place => 'Dhahran',
'wdesc:desc' => 'hot',
'dayno.days' => 4 );
my $filename = 'report';
open my $FH,,'<', $filename or die "$filename: $!\n";
while (my $line = <$FH>) {
$line =~ s/%([^%]*)%/Replacement($1)/eg;
print $line;
}
sub Replacement {
my($key) = #_;
if (exists $fill{$key}) {
return $fill{$key};
}
else {
die "Unrecognized key \"$key\" on line $.\n";
}
}
http://codepad.org/G0WEDNyH
$text =~ s/%([a-zA-Z0-9_\.\:]+)%/$fill{$1}/g;
By default \w equates to [a-zA-Z0-9_], so you'll need to add in the \. and \:.
I am reading a string from a file:
2343,0,1,0 ... 500 times ...3
Above is an example of $_ when it is read from a file. It is any number, followed by 500 comma separated 0's/1's then the number 3.
while(<FILE>){
my $string = $_;
chomp($string);
my $a = chop($string);
my $found;
if($string=~m/^[0-9]*\,((0,|1,){$i})/){
$found = $&.$a;
print OTH $found,"\n";
}
}
I am using chop to get the number 3 from the end of the string. Then matching the first number followed by $i occurences of 0, or 1. The problem I'm having is that chop is not working on the string for some reason. In the if statement when I try to concat the match and the chopped number all I get returned is the contents of $&.
I have also tried using my $a = substr $a,-1,1; to get the number 3 and this also hasn't worked.
The thing that's odd is that this code works in Eclipse on Windows, and when I put it onto a Linux server it won't work. Can anyone spot the silly mistake I'm making?
As a rule, I tend always to allow for unseen whitespace in my data. I find that it makes my code more robust expecting that somebody didn't see an extra space at the end of a line or string (as in writing to a log). So I think this would solve your problem:
my ( $a ) = $string =~ /(\S)\s*$/;
Of course, since you know you are looking for a number, it's better to be more precise:
my ( $a ) = $string =~ /(\d+)\s*$/;
Take care of the end of line char… I can not test here but I assume you just chop a newline. Try first to trim your string then chop it. See for example http://www.somacon.com/p114.php
Instead of trying to do it that way, why not use a regexp to pull out everything you need in one go?
my $x = "4123,0,1,0,1,4";
$x =~ /^[0-9]+,((?:0,|1,){4})([0-9]+)/;
print "$1\n$2\n";
Produces:
0,1,0,1,
4
Which is pretty much what you're looking for. Both sets of needed answers are in the match variables.
Note that I included ?: in the front of the 0,1, matching so that it didn't end up in the output match variables.
I'm really not sure what you are trying to achieve here but I've tried the code on Win32 and Solaris and it works. Are you sure $i is the correct number? Might be easier to use * or ?
use strict;
use warnings;
while(<DATA>){
my $string = $_;
chomp($string);
my $a = chop($string);
print "$string\n";
my $found;
if($string=~m/^[0-9]*\,((0,|1,)*)/){
$found = $&.$a;
print $found,"\n";
}
}
__DATA__
2343,0,1,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,3
I don't see much reason to use a regex in this case, just use split.
use strict;
use warnings;
use autodie; # open will now die on failure
my %data;
{
# limit the scope of $fh
open my $fh, '<', 'test.data';
while(<$fh>){
chomp;
s(\s+){}g; # remove all spaces
my($number,#bin) = split ',', $_;
# uncomment if you want to throw away the 3
# pop #bin if $bin[-1] == 3;
$data{$number} = \#bin;
}
close $fh;
}
If all you want is the 3
while(<$fh>){
# the .* forces it to look for the last set of numbers
my($last_number) = /.*([0-9]+)/;
}
Still plugging away at teaching myself Perl. I'm trying to write some code that will count the lines of a file that contain double letters and then place parentheses around those double letters.
Now what I've come up with will find the first occurrence of double letters, but not any other ones. For instance, if the line is:
Amp, James Watt, Bob Transformer, etc. These pioneers conducted many
My code will render this:
19 Amp, James Wa(tt), Bob Transformer, etc. These pioneers conducted many
The "19" is the count (of lines containing double letters) and it gets the "tt" of "Watt" but misses the "ee" in "pioneers".
Below is my code:
$file = '/path/to/file/electricity.txt';
open(FH, $file) || die "Cannot open the file\n";
my $counter=0;
while (<FH>) {
chomp();
if (/(\w)\1/) {
$counter += 1;
s/$&/\($&\)/g;
print "\n\n$counter $_\n\n";
} else {
print "$_\n";
}
}
close(FH);
What am I overlooking?
use strict;
use warnings;
use 5.010;
use autodie;
my $file = '/path/to/file/electricity.txt';
open my $fh, '<', $file;
my $counter = 0;
while (<$fh>) {
chomp;
if (/(\w)\1/) {
$counter++;
s/
(?<full>
(?<letter>\p{L})
\g{letter}
)
/($+{full})/xg;
$_ = $counter . ' ' . $_;
}
say;
}
You are overlooking a few things. strict and warnings; 5.010 (or higher!) for say; autodie so you don't have to keep typing those 'or die'; Lexical filehandles and the three-argument form of open; A bit nitpicky, but knowing when (not) to use parens for function calls; Understanding why you shouldn't use $&; The autoincrement operator..
But on the regex part specifically, $& is only set on matches (m//), not substitution Actually no, ysth is right as usual. Sorry!
(I took the liberty of modifying your regex a bit; it makes use of named captures - (?) instead of bare parens, accessed through \g{} notation inside the regex, and the %+ hash outside of it - and Unicode-style properties - \p{Etc}). A lot more about those in perlre and perluniprops, respectively.
You need to use a back reference:
#! /usr/bin/env perl
use warnings;
use strict;
my $line = "this is a doubble letter test of my scrippt";
$line =~ s/([[:alpha:]])(\1)/($1$2)/g;
print "$line\n";
And now the test.
$ ./test.pl
this is a dou(bb)le le(tt)er test of my scri(pp)t
It works!
When you do a substitution, you use the $1 to represent what is in the parentheses. When you are referring to a part of the regular expression itself, you use the \1 form.
The [[:alpha:]] is a special POSIX class. You can find out more information by typing in
$ perldoc perlre
at the command line.
You're overcomplicating things by messing around with $&. s///g returns the number of substitutions performed when used in scalar context, so you can do it all in one shot without needing to count matches by hand or track the position of each match:
#!/usr/bin/env perl
use strict;
use warnings;
my $text = 'James Watt, a pioneer of wattage engineering';
my $doubles = $text =~ s/(\w)\1/($1$1)/g;
print "$doubles $text\n";
Output:
4 James Wa(tt), a pion(ee)r of wa(tt)age engin(ee)ring
Edit: OP stated in comments that the exercise in question says not to use =~, so here's a non-regex-based solution, since all regex matches use =~ (implicitly or explicitly):
#!/usr/bin/env perl
use strict;
use warnings;
my $text = 'James Watt, a pioneer of wattage engineering';
my $doubles = 0;
for my $i (reverse 1 .. length $text) {
if (substr($text, $i, 1) eq substr($text, $i - 1, 1)) {
$doubles++;
substr($text, $i - 1, 2) = '(' . substr($text, $i - 1, 2) . ')';
}
}
print "$doubles $text\n";
The problem is that you're using $& in the second regex which only matched the first occurance of a double letter set
if (/(\w)\1/) { #first occurance matched, so the pattern in the replace regex will only be that particular set of double letters
Try doing something like this:
s/(\w)\1/\($1$1\)/g; instead of s/$&/\($&\)/g;
Full code after editing:
$file = '/path/to/file/electricity.txt';
open(FH, $file) || die "Cannot open the file\n";
my $counter=0;
while (<FH>) {
chomp();
if (s/(\w)\1/\($1$1\)/g) {
$counter++;
print "\n\n$counter $_\n\n";
} else {
print "$_\n";
}
}
close(FH);
notice that you can use the s///g replace in a conditional statement which is true when a replace occurred.
I'm writing a tiny program that takes user input using Getops, and based on it, the program will either try to match a pattern against some text, or substitute text for what matched.
The problem I'm having is that I can't get the substitution portion to work. I'm looking at the qr// entry in the man pages: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators but I'm not having any luck with it. I tried to model my code exactly like the docs in this case. I compile a match pattern, and substitute that into a substitution.
Could someone point out where I'm going wrong? (Don't worry about security too much, this is only a little script for personal use)
Here's what I'm looking at:
if($options{r}){
my $pattern = $options{r};
print "\nEnter Replacement text: ";
my $rep_text = <STDIN>;
#variable grab, add flags to pattern if they exist.
$pattern .= 'g' if $options{g};
$pattern .= 'i' if $options{i};
$pattern .= 's' if $options{s};
#compile that stuff
my $compd_pattern = qr"$pattern" or die $#;
print $compd_pattern; #debugging
print "Please enter the text you wish to run the pattern on: ";
my $text = <STDIN>;
chomp $text;
#do work and display
if($text =~ s/$compd_pattern/$rep_text/){ #if the text matched or whatever
print $text;
}
else{
print "$compd_pattern on \n\t{$text} Failed. ";
}
} #end R FLAG
When I run it with -r "/matt/" -i, and enter the replacement text 'matthew', on the text 'matt', it fails. Why is this?
EDIT:
Thanks for the answers guys ! That was really very helpful. I combined both of your suggestions into a working solution to the problem. I have to handle the /g flag a little differently. Here is the working sample:
if($options{r}){
my $pattern = $options{r};
print "\nEnter Replacement text: ";
my $rep_text = <STDIN>;
chomp $rep_text;
#variable grab, add flags to pattern if they exist.
my $pattern_flags .= 'i' if $options{i};
$pattern_flags .= 's' if $options{s};
print "Please enter the text you wish to run the pattern on: ";
my $text = <STDIN>;
chomp $text;
#do work and display
if($options{g}){
if($text =~ s/(?$pattern_flags:$pattern)/$rep_text/g){ #if the text matched or whatever (with the g flag)
print $text;
}
else{
print "$pattern on \n\t{$text} Failed. ";
}
}
else{
if($text =~ s/(?$pattern_flags:$pattern)/$rep_text/){ #if the text matched or whatever
print $text;
}
else{
print "$pattern on \n\t{$text} Failed. ";
}
}
} #end R FLAG
As chaos points out, you will encounter some difficulties using qr//. Do you really need to precompile the pattern? If not, a strategy like this might work:
my $pattern = 'matt';
my $text = 'Matt';
my $rep_text = 'Matthew';
my $pattern_opts = 'i';
print $text, "\n" if $text =~ s/(?$pattern_opts:$pattern)/$rep_text/;
Update in response to your new code: you might consider using an approach like this:
my ($orig, $patt, $rep, $flags) = qw(FooFooFoo foo bar ig);
my $make_replacement = $flags =~ s/g// ?
sub { $_[0] =~ s/(?$flags:$patt)/$rep/g } :
sub { $_[0] =~ s/(?$flags:$patt)/$rep/ }
;
if ( $make_replacement->($orig) ){
print $orig;
}
else {
print "Failed...";
}
Run it with -r "matt", not -r "/matt/". You don't need to, and in fact can't, supply pattern delimiters in your option string. The quotes are the delimiters in your qr. So it's actually looking for matt with slashes around it, the way you're running it, which isn't what you want. You're trying to use the quotes to tell Perl to treat your pattern string like it were source code, but unfortunately you can't do that.
All those pattern appends you're doing for the other options also won't work. You'll need to change the way you compile the regex if you want to do all that. Something like this might do it for /i and /s:
my $compd_pattern = qr/$pattern/ or die $#;
$compd_pattern = qr/$compd_pattern/i if $options{i};
$compd_pattern = qr/$compd_pattern/s if $options{s};
For /g you'll need to support an alternate version of the search/replace. /g isn't a valid modifier to qr//.