I'm new to Perl and I found behaviour which I don't understand and can't solve.
I'm making a small find and replace program and there are some things I need to do. I have bunch of files that I need to process. Then I have a list of find / replace rules in an external text file. In replacing there I need three special things:
Replacing utf-8 characters (Czech diacritics)
Work with adding/removing lines (so working in a slurp mode)
Use a regular expressions
I want a program that works alone, so I wrote it so that it takes three arguments:
The file to work on
What to find
What to replace.
I'm sending parameters in a loop from a bash script which parse the rules list and loads other files.
My problem is when I have a "\n" string in a rules list and I send it to the Perl script. If it's in the first part of replacement (in the find section) it looks for a newline correctly, but when it's in the second part (the replace section) it just prints \n instead of a newline.
I tried hardcoding "\n" to the string right into the variable instead of passing it from the list and then it works fine.
What's the reason Perl doesn't interpret the "\n" string there, and how can I make it work?
This is my code:
list.txt - One line from the external replacement list
1\. ?\\n?NÁZEV PŘÍPRAVKU;\\n<<K1>> NÁZEV PŘÍPRAVKU;
farkapitoly.sh - The bash script for parsing list.txt and cycling through all of the files and calling the Perl script
...
FILE="/home/tmp.txt"
while read LINE
do
FIND=`echo "$LINE" | awk -F $';' 'BEGIN {OFS = FS} {print $1}'`
REPLACE=`echo "$LINE" | awk -F $';' 'BEGIN {OFS = FS} {print $2}'`
perl -CA ./pathtiny.pl "$FILE" "$FIND" "$REPLACE"
done < list.txt
...
pathtiny.pl - The Perl script for find and replace
#!/usr/bin/perl
use strict;
use warnings;
use Modern::Perl;
use utf8; # Enable typing Unicode in Perl strings
use open qw(:std :utf8); # Enable Unicode to STDIN/OUT/ERR and filehandles
use Path::Tiny;
my $file = path("$ARGV[0]");
my $searchStr = "$ARGV[1]";
my $replaceStr = "$ARGV[2]";
# $replaceStr="\n<<K1>> NÁZEV PRÍPRAVKU"; # if I hardcode it here \n is replaced right away
print("Search String:", "$searchStr", "\n");
print("Replace String:", "$replaceStr", "\n\n");
my $guts = $file->slurp_utf8;
$guts =~ s/$searchStr/$replaceStr/gi;
$file->spew_utf8($guts);
If it's important, I'm using Linux Mint 13 64-bit on VirtualBox (under Win 8.1) and I have Perl v5.14.2. Every file is UTF-8 with Linux endings.
Example files can be found on pastebin. this should end up like this.
But examples varies a lot. I need a universal solution to write down newline in a replacement string so it replaces correctly.
The problem is that the replacement string is read literally from the file, so if your file contains
xx\ny
then you will read exactly those six characters. Also, the replacement part of a substitution is evaluated as if it was in double quotes. So your replacement string is "$replaceStr" which interpolates the variable and goes no further, so you will again have xx\nyy in the new string. (By the way, please avoid using capital letters in local Perl identifiers as in practice they are reserved for globals such as Module::Names.)
The answer lies in using eval, or its equivalent - the /e modifier on the substitution.
If I write
my $str = '<b>';
my $r = 'xx\ny';
$str =~ s/b/$r/;
then the replacement string is interpolated to xx\ny, as you have experienced.
A single /e modifier evaluates the replacement as an expression instead of just a double-quoted string, but of course $r as an expression is xx\ny again.
What you need is a second /e modifier, which does the same evaluation as a single /e and then does an additional eval of the result on top. For this it is cleanest if you use qq{ .. } as you need two levels of quotation.
If you write
$str =~ s/b/qq{"$r"}/ee
then perl will evaluate qq{"$r"} as an expression, giving "xx\nyy", which, when evaluated again will give you the string you need - the same as the expression 'xx' . "\n" . 'yy'.
Here's a full program
use strict;
use warnings;
my $s = '<b>';
my $r = 'xx\nyy';
$s =~ s/b/qq{"$r"}/ee;
print $s;
output
<xx
yy>
But don't forget that, if your replacement string contains any double quotes, like this
my $r = 'xx\n"yy"'
then they must be escaped before putting the through the substitution as the expression itself also uses double quotes.
All of this is quite hard to grasp, so you may prefer the String::Escape module which has an unbackslash function that will change a literal \n (and any other escapes) within a string to its equivalent character "\n". It's not a core module so you probably will need to install it.
The advantage is that you no longer need a double evaluation, as the replacement string can be just unbackslash $r which give the right result if it evaluated as an expression. It also handles double quotes in $r without any problem, as the expression doesn't use double quotes itself.
The code using String::Escape goes like this
use strict;
use warnings;
use String::Escape 'unbackslash';
my $s = '<b>';
my $r = 'xx\nyy';
$s =~ s/b/unbackslash $r/e;
print $s;
and the output is identical to that of the previous code.
Update
Here is a refactoring of your original program that uses String::Escape. I have removed Path::Tiny as I believe it is best to use Perl's built-in inplace-edit extension, which is documented under the General Variables section of perlvar.
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use 5.010;
use open qw/ :std :utf8 /;
use String::Escape qw/ unbackslash /;
our #ARGV;
my ($file, $search, $replace) = #ARGV;
print "Search String: $search\n";
print "Replace String: $replace\n\n";
#ARGV = ($file);
$^I = '';
while (<>) {
s/$search/unbackslash $replace/eg;
print;
}
You got \n as a content of a string. (as two chacters 1: \ and second n, and not as one newline.
Perl interprets the \n as newline when it is as literal (e.g. it is in your code).
The quick-fix would be:
my $replaceStr=eval qq("$ARGV[2]"); #evaling a string causes interpreting the \n as literal
or, if you don't like eval, you can use the String-Escape cpan module. (the unbackslash function)
You're wanting a literal string to be treated as if it were a double quoted string. To do that you'll have to translate any backslash followed by another character.
The other experts have shown you how to do that over the entire string (which is risky since it uses eval with unvalidated data). Alternatively, you could use a module, String::Escape, which requires an install (not a high bar, but too high for some).
However, the following does a translation of the return value string itself in a safe way, and then it can be used like a normal value in your other search and replace:
use strict;
use warnings;
my $r = 'xx\nyy';
$r =~ s/(\\.)/qq{"$1"}/eeg; # Translate \. as a double quoted string would
print $r;
Outputs:
xx
yy
Related
I have a string such as this
word <gl>aaa</gl> word <gl>aaa-bbb=ccc</gl>
where, if there is one ore more words enclosed in tags. In those instances where there are more than one words (which are usually separated by - or = and potentially other non-word characters), I'd like to make sure that the tags enclose each word individually so that the resulting string would be:
word <gl>aaa</gl> word <gl>aaa</gl>-<gl>bbb</gl>=<gl>ccc</gl>
So I'm trying to come up with a regex that would find any number of iterations of \W*?(\w+) and then enclose each word individually with the tags. And ideally I'd have this as a one-liner that I can execute from the command line with perl, like so:
perl -pe 's///g;' in out
This is how far I've gotten after a lot of trial and error and googling - I'm not a programmer :( ... :
/<gl>\W*?(\w+)\W*?((\w+)\W*?){0,10}<\/gl>/
It finds the first and last word (aaa and ccc). Now, how can I make it repeat the operation and find other words if present? And then how to get the replacement? Any hints on how to do this or where I can find further information would be much appreciated?
EDIT:
This is part of a workflow that does some other transformations within a shell script:
#!/bin/sh
perl -pe '#
s/replace/me/g;
s/replace/me/g;
' $1 > tmp
... some other commands ...
This needs a mini nested-parser and I'd recommend a script, as easier to maintain
use warnings;
use strict;
use feature 'say';
my $str = q(word <gl>aaa</gl> word <gl>aaa-bbb=ccc</gl>);
my $tag_re = qr{(<[^>]+>) (.+?) (</[^>]+>)}x; # / (stop markup highlighter)
$str =~ s{$tag_re}{
my ($o, $t, $c) = ($1, $2, $3); # open (tag), text, close (tag)
$t =~ s/(\w+)/$o$1$c/g;
$t;
}ge;
say $str;
The regex gives us its built-in "parsing," where words that don't match the $tag_re are unchanged. Once the $tag_re is matched, it is processed as required inside the replacement side. The /e modifier makes the replacement side be evaluated as code.
One way to provide input for a script is via command-line arguments, available in #ARGV global array in the script. For the use indicated in the question's "Edit" replace the hardcoded
my $str = q(...);
with
my $str = shift #ARGV; # first argument on the command line
and then use that script in your shell script as
#!/bin/sh
...
script.pl $1 > output_file
where $1 is the shell variable as shown in the "Edit" to the question.
In a one-liner
echo "word <gl>aaa</gl> word <gl>aaa-bbb=ccc</gl>" |
perl -wpe'
s{(<[^>]+>) (.+?) (</[^>]+>)}
{($o,$t,$c)=($1,$2,$3);$t=~s/(\w+)/$o$1$c/g; $t}gex;
'
what in your shell script becomes echo $1 | perl -wpe'...' > output_file. Or you can change the code to read from #ARGV and drop the -n switch, and add a print
#!/bin/sh
...
perl -wE'$_=shift; ...; say' $1 > output_file
where ... in one-liner indicate the same code as above, and say is now needed since we don't have the -p with which the $_ is printed out once it's processed.
The shift takes an element off of an array's front and returns it. Without an argument it does that to #ARGV when outside a subroutine, as here (inside a subroutine its default target is #_).
This will do it:
s/(\w+)([\-=])(?=\w+)/$1<\/gl>$2<gl>/g;
The /g at the end is the repeat and stands for "global". It will pick up matching at the end of the previous match and keep matching until it doesn't match anymore, so we have to be careful about where the match ends. That's what the (?=...) is for. It's a "followed by pattern" that tells the repeat to not include it as part of "where you left off" in the previous match. That way, it picks up where it left off by re-matching the second "word".
The s/ at the beginning is a substitution, so the command would be something like:
cat in | perl -pne 's/(\w+)([\-=])(?=\w+)/$1<\/gl>$2<gl>/g;$_' > out
You need the $_ at the end because the result of the global substitution is the number of substitutions made.
This will only match one line. If your pattern spans multiple lines, you'll need some fancier code. It also assumes the XML is correct and that there are no words surrounding dashes or equals signs outside of tags. To account for this would necessitate an extra pattern match in a loop to pull out the values surrounded by gl tags so that you can do your substitution on just those portions, like:
my $e = $in;
while($in =~ /(.*?<gl>)(.*?)(?=<\/gl>)/g){
my $p = $1;
my $s = $2;
print($p);
$s =~ s/(\w+)([\-=])(?=\w+)/$1<\/gl>$2<gl>/g;
print($s);
$e = $'; # ' (stop markup highlighter)
}
print($e);
You'd have to write your own surrounding loop to read STDIN and put the lines read in into $in. (You would also need to not use -p or -n flags to the perl interpreter since you're reading the input and printing the output manually.) The while loop above however grabs everything inside the gl tags and then performs your substitution on just that content. It prints everything occurring between the last match (or the beginning of the string) and before the current match ($p) and saves everything after in $e which gets printed after the last match outside the loop.
I've got a CSV that looks as follows:
A,01,ALPHA
00,D,CHARLIE
E,F,02
This is the desired file after transformation:
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
As you can see, the fields that are entirely numeric are left unquoted, whilst the alpha (or alphanumeric ones) are quoted.
What would be a sensible way to go about this in Perl ?
Already commented below, but I've tried stuff like
perl -pe 's/(\w+)/"$1"/g'
And that doesn't work because \w obviously picks up the numerics.
I recommend not reinventing the wheel, but rather to use an already existing module, as zdim recommends. Here is your example using Text::CSV_XS
test.pl
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
use Scalar::Util qw( looks_like_number );
my $csv = Text::CSV_XS->new();
while (my $row = $csv->getline(*STDIN)) {
my #quoted_row = map { looks_like_number($_) ? $_ : '"'. $_ .'"' } #$row;
print join(',',#quoted_row) . "\n";
}
Output
cat input | perl test.pl
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
Another one-liner, input file modified to add a line with alphanumeric fields
$ cat ip.csv
A,01,ALPHA
00,D,CHARLIE
E,F,02
23,AB12,53C
$ perl -F, -lane 's/.*[^0-9].*/"$&"/ foreach(#F); print join ",", #F' ip.csv
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
23,"AB12","53C"
To modify OP's attempt:
$ perl -pe 's/(^|,)\K\d+(?=,|$)(*SKIP)(*F)|\w+/"$&"/g' ip.csv
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
23,"AB12","53C"
(^|,)\K\d+(?=,|$)(*SKIP)(*F) this will skip the fields with digits alone and the alternate pattern \w+ will get replaced
It seems that you are after a one-liner. Here is a basic one
perl -lpe '$_ = join ",", map /^\d+$/ ? $_ : "\"$_\"", split ",";' input.csv
Splits each line by , and passes obtained list to map. There each element is tested for digits-only /^\d+$/ and passed untouched, or padded with " otherwise. Then map's return is joined by ,.
The -l removes newline, what is needed since " pad the whole line. The result is assigned back to $_ in order to be able to use -p so that there is no need for explicit print.
The code is very easily used in a script, if you don't insist on an one-liner.
Processing of csv files is far better done by modules, for example Text::CSV
The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i
I am trying to search for a substring and replace the whole string if the substring is found. in the below example someVal could be any value that is unknown to me.
how i can search for someServer.com and replace the whole string $oldUrl and with $newUrl?
I can do it on the whole string just fine:
$directory = "/var/tftpboot";
my $oldUrl = "someVal.someServer.com";
my $newUrl = "someNewVal.someNewServer.com";
opendir( DIR, $directory ) or die $!;
while ( my $files = readdir(DIR) ) {
next unless ( $files =~ m/\.cfg$/ );
open my $in, "<", "$directory/$files";
open my $out, ">", "$directory/temp.txt";
while (<$in>) {
s/.*$oldUrl.*/$newUrl/;
print $out $_;
}
rename "$directory/temp.txt", "$directory/$files";
}
Your script will delete much of your content because you are surrounding the match with .*. This will match any character except newline, as many times as it can, from start to end of each line, and replace it.
The functionality that you are after already exists in Perl, the use of the -pi command line switches, so it would be a good idea to make use of it rather than trying to make your own, which works exactly the same way. You do not need a one-liner to use the in-place edit. You can do this:
perl -pi script.pl *.cfg
The script should contain the name definitions and substitutions, and any error checking you need.
my $old = "someVal.someServer.com";
my $new = "someNewVal.someNewServer.com";
s/\Q$old\E/$new/g;
This is the simplest possible solution, when running with the -pi switches, as I showed above. The \Q ... \E is the quotemeta escape, which escapes meta characters in your string (highly recommended).
You might want to prevent partial matches. If you are matching foo.bar, you may not want to match foo.bar.baz, or snafoo.bar. To prevent partial matching, you can put in anchors of different kinds.
(?<!\S) -- do not allow any non-whitespace before match
\b -- match word boundary
Word boundary would be suitable if you want to replace server1.foo.bar in the above example, but not snafoo.bar. Otherwise use whitespace boundary. The reason we do a double negation with a negative lookaround assertion and negated character class is to allow beginning and end of line matches.
So, to sum up, I would do:
use strict;
use warnings;
my $old = "someVal.someServer.com";
my $new = "someNewVal.someNewServer.com";
s/(?<!\S)\Q$old\E(?!\S)/$new/g;
And run it with
perl -pi script.pl *.cfg
If you want to try it out beforehand (highly recommended!), just remove the -i switch, which will make the script print to standard output (your terminal) instead. You can then run a diff on the files to inspect the difference. E.g.:
$ perl -p script.pl test.cfg > test_replaced.cfg
$ diff test.cfg test_replaced.cfg
You will have to decide whether word boundary is more desirable, in which case you replace the lookaround assertions with \b.
Always use
use strict;
use warnings;
Even in small scripts like this. It will save you time and headaches.
If you want to match and replace any subdomain, then you should devise a specific regular expression to match them.
\b(?i:(?!-)[a-z0-9-]+\.)*someServer\.com
The following is a rewrite of your script using more Modern Perl techniques, including Path::Class to handle file and directory operations in a cross platform way and $INPLACE_EDIT to automatically handle the editing of a file.
use strict;
use warnings;
use autodie;
use Path::Class;
my $dir = dir("/var/tftpboot");
while (my $file = $dir->next) {
next unless $file =~ m/\.cfg$/;
local #ARGV = "$file";
local $^I = '.bak';
while (<>) {
s/\b(?i:(?!-)[a-z0-9-]+\.)*someServer\.com\b/someNewVal.someNewServer.com/;
print;
}
#unlink "$file$^I"; # Optionally delete backup
}
Watch for the Dot-Star: it matches everything that surrounds the old URL, so the only thing remaining on the line will be the new URL:
s/.*$oldUrl.*/$newUrl/;
Better:
s/$oldUrl/$newUrl/;
Also, you might need to close the output file before you try to rename it.
If the old URL contains special characters (dots, asterisks, dollar signs...) you might need to use \Q$oldUrl to suppress their special meaning in the regex pattern.
I have a string that is read from a text file, but in Ubuntu Linux, and I try to delete its newline character from the end.
I used all the ways. But for s/\n|\r/-/ (I look whether it finds any replaces any new line string) it replaces the string, but it still goes to the next line when I print it. Moreover, when I used chomp or chop, the string is completely deleted. I could not find any other solution. How can I fix this problem?
use strict;
use warnings;
use v5.12;
use utf8;
use encoding "utf-8";
open(MYINPUTFILE, "<:encoding(UTF-8)", "file.txt");
my #strings;
my #fileNames;
my #erroredFileNames;
my $delimiter;
my $extensions;
my $id;
my $surname;
my $name;
while (<MYINPUTFILE>)
{
my ($line) = $_;
my ($line2) = $_;
if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) {
#chop($line2);
$line2 =~ s/^\n+//;
print $line2 . " WRONG FORMAT!\n";
}
else {
#print "INSERTED:".$13."\n";
my($id) = $13;
my($name) = $2;
print $name . "\t" . $id . "\n";
unshift(#fileNames, $line2);
unshift(#strings, $line2 =~ /[^\W_]+/g);
}
}
close(MYINPUTFILE);
The correct way to remove Unicode linebreak graphemes, including CRLF pairs, is using the \R regex metacharacter, introduced in v5.10.
The use encoding pragma is strongly deprecated. You should either use the use open pragma, or use an encoding in the mode argument on 3-arg open, or use binmode.
use v5.10; # minimal Perl version for \R support
use utf8; # source is in UTF-8
use warnings qw(FATAL utf8); # encoding errors raise exceptions
use open qw(:utf8 :std); # default open mode, `backticks`, and std{in,out,err} are in UTF-8
while (<>) {
s/\R\z//;
...
}
You are probably experiencing a line ending from a Windows file causing issues. For example, a string such as "foo bar\n", would actually be "foo bar\r\n". When using chomp on Ubuntu, you would be removing whatever is contained in the variable $/, which would be "\n". So, what remains is "foo bar\r".
This is a subtle, but very common error. For example, if you print "foo bar\r" and add a newline, you would not notice the error:
my $var = "foo bar\r\n";
chomp $var;
print "$var\n"; # Remove and put back newline
But when you concatenate the string with another string, you overwrite the first string, because \r moves the output handle to the beginning of the string. For example:
print "$var: WRONG\n";
It would effectively be "foo bar\r: WRONG\n", but the text after \r would cause the following text to wrap back on top of the first part:
foo bar\r # \r resets position
: WRONG\n # Second line prints and overwrites
This is more obvious when the first line is longer than the second. For example, try the following:
perl -we 'print "foo bar\rbaz\n"'
And you will get the output:
baz bar
The solution is to remove the bad line endings. You can do this with the dos2unix command, or directly in Perl with:
$line =~ s/[\r\n]+$//;
Also, be aware that your other code is somewhat horrific. What do you for example think that $13 contains? That'd be the string captured by the 13th parenthesis in your previous regular expression. I'm fairly sure that value will always be undefined, because you do not have 13 parentheses.
You declare two sets of $id and $name. One outside the loop and one at the top. This is very poor practice, IMO. Only declare variables within the scope they need, and never just bunch all your declarations at the top of your script, unless you explicitly want them to be global to the file.
Why use $line and $line2 when they have the same value? Just use $line.
And seriously, what is up with this:
if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) {
That looks like an attempt to obfuscate, no offence. Three nested negations and a bunch of unnecessary parentheses?
First off, since it is an if-else, just swap it around and reverse the regular expression. Second, [^\W_] a double negation is rather confusing. Why not just use [A-Za-z0-9]? You can split this up to make it easier to parse:
if ($line =~ /^(.+)(\.docx)\s*$/) {
my $pre = $1;
my $ext = $2;
You can wipe the linebreaks with something like this:
$line =~ s/[\n\r]//g;
When you do that though, you'll need to change the regex in your if statement to not look for them. I also don't think you want a /g in your if. You really shouldn't have a $line2 either.
I also wouldn't do this type of thing:
print $line2." WRONG FORMAT!\n";
You can do
print "$line2 WRONG FORMAT!\n";
... instead. Also, print accepts a list, so instead of concatenating your strings, you can just use commas.
You can do something like:
=~ tr/\n//
But really chomp should work:
while (<filehandle>){
chomp;
...
}
Also s/\n|\r// only replaces the first occurrence of \r or \n. If you wanted to replace all occurrences you would want the global modifier at the end s/\r|\n//g.
Note: if you're including \r for windows it usually ends its line as \r\n so you would want to replace both (e.g. s/(?:\r\n|\n)//), of course the statement above (s/\r|\n//g) with the global modifier would take care of that anyways.
$variable = join('',split(/\n/,$variable))