Why doesn't the first replacement have any effect? - regex

Most probably I'm missing something obvious here, but why do I need to call the search/replace regex twice to have any effect in the following code? If I call it only once, the replacement doesn't take place :-(
use strict;
use warnings;
use LWP::Simple;
my $youtubeCN = get(shift #ARGV);
die("Script tag not found!\n")
unless $youtubeCN =~ /<script src="(.*?)">/;
my $youtubeScr = $1;
# WHY ???
$youtubeScr =~ s/&/&/g;
$youtubeScr =~ s/&/&/g;
my $gmodScr = get($youtubeScr);
$gmodScr =~ s/http:\/\/\?container/http:\/\/www.gmodules.com\/ig\/ifr\?/;
print "<script type=\"text/javascript\">$gmodScr</script>\n";
Update: I call this script like this:
perl bork_youtube_channel.pl 'http://www.youtube.com/user/pennsays'
If amp isn't properly transformed into &, I will get back an HTML page (probably an error page) rather than Javascript at step 2.
Update: It turns out that the URL was double encoded after all. Thank you all for your help!

I suspect that if you look at the input data, it is doing the right thing - my guess is that in the middle of encoding and decoding, you're not seeing the real input and output. For example, try this:
use strict;
use warnings;
my $youtubeScr = "a&b";
$youtubeScr =~ s/&/&/g;
print $youtubeScr;
print "\n";
$youtubeScr =~ s/&/&/g;
print $youtubeScr;
print "\n";
This prints
a&b
a&b
In other words, it's already worked to start with.
Are you sure your original text isn't foo&amp;bar? That would give output of
foo&bar
foo&bar
with the above code.
PS My perl-fu sucks. Apologies for any language abuses in the above code, but I think it should still be helpful :)

Related

finding words surround by quotations perl

I am reading another perl file line by line and need to find any words or set of words surround by single or double quotations. This is an example of the code I am reading in:
#!/usr/bin/env perl
use strict;
use warnings;
my $string = 'Hello World!';
print "$string\n";
Basically, I need to find and print out 'Hello World!' and "$string\n".
I've read my file in fine and stored its contents in an array. From there I'm looping over each line and find the desired set of words in the quotations using regex as such:
for(#contents) {
if(/\"|\'[^\"|\']*\"|\'/) {
print $_."\n";
}
}
which gives me the following output:
my $string = 'Hello World!';
print "$string\n";
I tried splitting the contents by whitespace and then trying to find a match, but that gives me this:
'Hello
World!'
"$string\n";
I've tried numerous solutions other suggested on here but to no avail. I have also tried Text::ParseText and using parse_line, but that gives me the complete wrong output.
Any ideas that could help me?
Just need to add some capturing parenthesis to your regex, instead of printing the whole line
use strict;
use warnings;
while (<DATA>) {
if(/(["'][^"']*["'])/) {
print "$1\n";
}
}
__DATA__
#!/usr/bin/env perl
use strict;
use warnings;
my $string = 'Hello World!';
print "$string\n";
Note, there are plenty of flaws in your regex though. For example '\'' Won't match properly. Neither will "He said 'boo'". To get closer you'll have to do some balanced parenthesis checking, but there isn't going to be any perfect solution.
For a solution that is a little closer, you could use the following:
if(/('(?:(?>[^'\\]+)|\\.)*'|"(?:(?>[^"\\]+)|\\.)*")/) {
That would take care of my above exceptions and also strings like print "how about ' this \" and ' more \n";, but there are still edge cases like the use of qq{} or q{}. Not to mention strings that span more than one line.
In other words, if your goal is perfect, this project may be outside of the scope of most people's skills, but hopefully the above will be of some help.
Maybe you can have more than one "string" to capture per line, one solution could be:
while(my $line=<STDIN>) {
while( $line =~ /[\'\"](.*?)[\'\"]/g ) {
print "matched: '$1'\n";
}
}
ie, input:
#!/usr/bin/env perl
use strict;
use warnings;
my $string = 'Hello World!' . 'asdsad';
print "$string\n";
and executing the code will give you:
matched: 'Hello World!'
matched: 'asdsad'
matched: '$string\n'

Perl - Match string between two colons

My string looks like this
important stuff: some text 2: some text 3.
I want to only print "important stuff". So basically I want to print everything up to the first colon. I'm sure this is simple, but my regex foo is not so good.
Edit: Sorry I was doing something stupid and gave you a bad example line. It has been corrected.
Just restrict what you're matching to non-colons, [^:]*. Note, the ^ and : boundaries aren't actually needed, but they help document the intent behind the regex.
my $text = "important stuff: some text 2: some text 3."
if ($text =~ /^([^:]*):/) {
print "$1";
}
Consider just splitting on the colon:
use strict;
use warnings;
my $string = 'important stuff: some text 2: some text 3.';
my $important = ( split /:/, $string )[0];
print $important;
Output:
important stuff
Well, assume its a string
$test = "sass sg22gssg 22222 2222: important important :"
Assume you want all characters between.
Wrong answer: $test =~ /:(.+):/; # thank you for the change from .{1,}
Corrected.
$test =~ /:([^:]*):/;
print $1; #perl memory u can assign to a string ;
$found = $1;
As a cheat sheet of regex in perl. cheat sheet
I did test it.

Manipulating backreferences for substitution in perl

As a part of an attempt to replace scientific numbers with decimal numbers I want to save a backreference into a string variable, but it doesn't work.
My input file is:
,8E-6,
,-11.78E-16,
,-17e+7,
I then run the following:
open FILE, "+<C:/Perl/input.txt" or die $!;
open(OUTPUT, "+>C:/Perl/output.txt") or die;
while (my $lines = <FILE>){
$find = "(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
$noofzeroesbeforecomma = eval("$7-length($4)");
$replace = '"foo $noofzeroesbeforecomma bar"';
$lines =~ s/$find/$replace/eeg;
print (OUTPUT $lines);
}
close(FILE);
I get
foo bar
foo bar
foo bar
where I would have expected
foo 6 bar
foo 14 bar
foo 7 bar
$noofzeroesbeforecomma seems to be empty or non-existant.
Even with the following adjustment I get an empty result
$noofzeroesbeforecomma = $2;
Only inserting $2 directly in the replace string gives me something (which is then, unfortunately, not what I want).
Can anyone help?
I'm running Strawberry Perl (5.16.1.1-64bit) on a 64-bit Windows 7 machine, and quite inexperienced with Perl
Your main problem is not using
use strict;
use warnings;
warnings would have told you
Use of uninitialized value $7 in concatenation (.) or string at ...
Use of uninitialized value $4 in concatenation (.) or string at ...
I would recommend you try and find a module that can handle scientific notation, rather than trying to hack your own.
Your code, in a working order might look something like this. As you can see, I have put a q() around your eval string to avoid it being evaluated before $7 and $4 exists. I also removed the eval itself, since while double eval on an eval is somewhat excessive.
use strict;
use warnings;
while (my $lines = <DATA>) {
my $find="(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
my $noof = q|$7-length($4)|;
$lines =~ s/$find/$noof/eeg;
print $lines;
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
6
14
7
As a side note, not using strict is asking for trouble. Doing it while using a variable name such as $noofzeroesbeforecomma is asking for twice the trouble, as it is rather easy to make typos.
This is not about backreferences but the original problem, transforming numbers from scientific notation. I'm sure there are some cases in which this fails:
#!/usr/bin/env perl
use strict;
use warnings;
use bignum;
for (<DATA>) {
next unless /([+-]?\d+(?:\.\d+)?)[Ee]([+-]\d+)/;
print $1 * 10 ** $2 . "\n";
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
0.000008
-0.000000000000001178
-170000000
I suggest you use the Regexp::Common::number plugin for the Regexp::Common module which will find all real numbers for you and allow you to replace those that have an exponent marker
This code shows the idea. using the -keep option makes the module put each component into one of the $N variables. The exponent marker - e or E - is in $7, so the number can be transformed depending on whether this was present
use strict;
use warnings;
use Regexp::Common;
my $real_re = $RE{num}{real}{-keep};
while (<>) {
s/$real_re/ $7 ? sprintf '%.20f', $1 : $1 /eg;
print;
}
output
Given your example input, this code produces the following. The values can be tidied up further using additional code in the substitution
,0.00000800000000000000,
,-0.00000000000000117800,
,-170000000.00000000000000000000,
The problem is that Perl can handle all those types of expressions. And since the standard item of data in Perl is the string, you would only need to capture the expression to use it. So, take this expression:
/(-?\d+(?:.\d+)?[Ee][+-]?\d+)/
to extract it from the surrounding text and use sprintf to format it, like Borodin showed.
However, if it helps you to see a better case of what you tried to do, this works better
my ( $whole, $frac, $expon )
= $line =~ m/(?:,)-?(0|[1-9]\d*)(?:\.(\d*))?[eE]([+\-]?\d+)(?:,)/
;
my $num = $expon - length( $frac );
Why not capture the sign with the exponent anyway, if you're going to do arithmetic with it?
It's better to name your captures and eschew eval when it's not necessary.
The substitution--as is--doesn't make much sense.
Really, since neither the symbols or the digits can be case sensitive, just put a (?i) at the beginning, and avoid the E "character class" [Ee]:
/((?i)-?\d+(?:.\d+)?e[+-]?\d+)/

How to pass a replacing regex as a command line argument to a perl script

I am trying to write a simple perl script to apply a given regex to a filename among other things, and I am having trouble passing a regex into the script as an argument.
What I would like to be able to do is somthing like this:
> myscript 's/hi/bye/i' hi.h
bye.h
>
I have produced this code
#!/utils/bin/perl -w
use strict;
use warnings;
my $n_args = $#ARGV + 1;
my $regex = $ARGV[0];
for(my $i=1; $i<$n_args; $i++) {
my $file = $ARGV[$i];
$file =~ $regex;
print "OUTPUT: $file\n";
}
I cannot use qr because apparently it cannot be used on replacing regexes (although my source for this is a forum post so I'm happy to be proved wrong).
I would rather avoid passing the two parts in as seperate strings and manually doing the regex in the perl script.
Is it possible to pass the regex as an argument like this, and if so what is the best way to do it?
There's more than one way to do it, I think.
The Evial Way:
As you basically send in a regex expression, it can be evaluated to get the result. Like this:
my #args = ('s/hi/bye/', 'hi.h');
my ($regex, #filenames) = #args;
for my $file (#filenames) {
eval("\$file =~ $regex");
print "OUTPUT: $file\n";
}
Of course, following this way will open you to some very nasty surprises. For example, consider passing this set of arguments:
...
my #args = ('s/hi/bye/; print qq{MINE IS AN EVIL LAUGH!\n}', 'hi.h');
...
Yes, it will laugh at you most evailly.
The Safe Way:
my ($regex_expr, #filenames) = #args;
my ($substr, $replace) = $regex_expr =~ m#^s/((?:[^/]|\\/)+)/((?:[^/]|\\/)+)/#;
for my $file (#filenames) {
$file =~ s/$substr/$replace/;
print "OUTPUT: $file\n";
}
As you can see, we parse the expression given to us into two parts, then use these parts to build a full operator. Obviously, this approach is less flexible, but, of course, it's much more safe.
The Easiest Way:
my ($search, $replace, #filenames) = #args;
for my $file (#filenames) {
$file =~ s/$search/$replace/;
print "OUTPUT: $file\n";
}
Yes, that's right - no regex parsing at all! What happens here is we decided to take two arguments - 'search pattern' and 'replacement string' - instead of a single one. Will it make our script less flexible than the previous one? No, as we still had to parse the regex expression more-or-less regularly. But now user clearly understand all the data that is given to a command, which is usually quite an improvement. )
#args in both examples corresponds to #ARGV array.
The s/a/b/i is an operator, not simply a regular expression, so you need to use eval if you want it to be interpreted properly.
#!/usr/bin/env perl
use warnings;
use strict;
my $regex = shift;
my $sub = eval "sub { \$_[0] =~ $regex; }";
foreach my $file (#ARGV) {
&$sub($file);
print "OUTPUT: $file\n";
}
The trick here is that I'm substituting this "bit of code" into a string to produce Perl code that defines an anonymous subroutine $_[0] =~ s/a/b/i; (or whatever code you pass it), then using eval to compile that code and give me a code reference I can call from within the loop.
$ test.pl 's/foo/bar/' foo nicefood
OUTPUT: bar
OUTPUT: nicebard
$ test.pl 'tr/o/e/' foo nicefood
OUTPUT: fee
OUTPUT: nicefeed
This is more efficient than putting an eval "\$file =~ $regex;" inside the loop as then it'll get compiled and eval-ed at every iteration rather than just once up-front.
A word of warning about eval - as raina77ow's answer explains, you should avoid eval unless you're 100% sure you are always getting your input from a trusted source...
s/a/b/i is not a regex. It is a regex plus substitution. Unless you use the string eval, make this work might be pretty tough (consider s{a}<b>e and so on).
The trouble is that you are trying to pass a perl operator when all you really need to pass is the arguments:
myscript hi bye hi.h
In the script:
my ($find, $replace, #files) = #ARGV;
...
$file =~ s/$find/$replace/i;
Your code is a bit clunky. This is all you need:
use strict;
use warnings;
my ($find, $replace, #files) = #ARGV;
for my $file (#files) {
$file =~ s/$find/$replace/i;
print "$file\n";
}
Note that this way allows you to use meta characters in the regex, such as \w{2}foo?. This can be both a good thing and a bad thing. To make all characters intepreted literally (disable meta characters), you can use \Q ... \E like so:
... s/\Q$find\E/$replace/i;

Perl Regex Problem!

I am reading a string from a file:
2343,0,1,0 ... 500 times ...3
Above is an example of $_ when it is read from a file. It is any number, followed by 500 comma separated 0's/1's then the number 3.
while(<FILE>){
my $string = $_;
chomp($string);
my $a = chop($string);
my $found;
if($string=~m/^[0-9]*\,((0,|1,){$i})/){
$found = $&.$a;
print OTH $found,"\n";
}
}
I am using chop to get the number 3 from the end of the string. Then matching the first number followed by $i occurences of 0, or 1. The problem I'm having is that chop is not working on the string for some reason. In the if statement when I try to concat the match and the chopped number all I get returned is the contents of $&.
I have also tried using my $a = substr $a,-1,1; to get the number 3 and this also hasn't worked.
The thing that's odd is that this code works in Eclipse on Windows, and when I put it onto a Linux server it won't work. Can anyone spot the silly mistake I'm making?
As a rule, I tend always to allow for unseen whitespace in my data. I find that it makes my code more robust expecting that somebody didn't see an extra space at the end of a line or string (as in writing to a log). So I think this would solve your problem:
my ( $a ) = $string =~ /(\S)\s*$/;
Of course, since you know you are looking for a number, it's better to be more precise:
my ( $a ) = $string =~ /(\d+)\s*$/;
Take care of the end of line char… I can not test here but I assume you just chop a newline. Try first to trim your string then chop it. See for example http://www.somacon.com/p114.php
Instead of trying to do it that way, why not use a regexp to pull out everything you need in one go?
my $x = "4123,0,1,0,1,4";
$x =~ /^[0-9]+,((?:0,|1,){4})([0-9]+)/;
print "$1\n$2\n";
Produces:
0,1,0,1,
4
Which is pretty much what you're looking for. Both sets of needed answers are in the match variables.
Note that I included ?: in the front of the 0,1, matching so that it didn't end up in the output match variables.
I'm really not sure what you are trying to achieve here but I've tried the code on Win32 and Solaris and it works. Are you sure $i is the correct number? Might be easier to use * or ?
use strict;
use warnings;
while(<DATA>){
my $string = $_;
chomp($string);
my $a = chop($string);
print "$string\n";
my $found;
if($string=~m/^[0-9]*\,((0,|1,)*)/){
$found = $&.$a;
print $found,"\n";
}
}
__DATA__
2343,0,1,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,3
I don't see much reason to use a regex in this case, just use split.
use strict;
use warnings;
use autodie; # open will now die on failure
my %data;
{
# limit the scope of $fh
open my $fh, '<', 'test.data';
while(<$fh>){
chomp;
s(\s+){}g; # remove all spaces
my($number,#bin) = split ',', $_;
# uncomment if you want to throw away the 3
# pop #bin if $bin[-1] == 3;
$data{$number} = \#bin;
}
close $fh;
}
If all you want is the 3
while(<$fh>){
# the .* forces it to look for the last set of numbers
my($last_number) = /.*([0-9]+)/;
}