Search and replace a special character in perl - regex

I want to search a character and replace it with a string. First, I search for ':' and replace it with 'to'. Next I want to search '$' and replace it with 'END'. This is the code that I've tried. In below code, it work for the first character but not the second character. I tried to use backslash to escape the special character '$' but it still did not work. What else can I do?
$string = "[9:8],
if ($string =~ /^.*:+/){
$stringreplaced =~ s/:/to/g;
}
elsif ($string =~ /^.*\$+/){
$stringreplaced =~ s/\$/END/g;
}

First of all, the code you posted doesn't even compile, yet you say it actually ran. Only post code that you've run.
Second, you're matching against the wrong string. You're checking if $string contains the character, but you replace the characters in $stringreplaced. ALWAYS use use strict; use warnings;. This would have caught this error.
Third, you only check if the character (: or $) is on the first line. This is because . doesn't match line feeds without /s.
Finally, You only check if the string contains $ if it doesn't contain : because you used elsif.
The following is all you need:
$string =~ s/:/to/g;
$string =~ s/\$/END/g;

Related

regular expression that matches any word that starts with pre and ends in al

The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i

Regex not working, at least in command line

I have a regex:
($value) = $line =~ /\ABC(.+?)\#/;
For input, e.g.:
(32321213321) ABC 24432.232 #Junk
Which is meant to catch the number between FD and #.
When I run it through the command line, it returns a space. Through Padre, it returns a space + the number before #.
Is there something wrong with the regex?
In your regex, you have escaped the A. This then becomes an escape sequence, an assertion \A to match the beginning of the string. Another version of the same escape is ^ . And your string does not start there, so the regex cannot match. You have another redundant escape as well, before #. The regex you need is
/ABC(.+?)#/
You can use:
$line =~ /ABC *([0-9 ]+?) *#/;
OR better:
$line =~ /ABC *(\d+(?: \d+)*) *#/;

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_

In Perl, how can I correctly extract URLs that are enclosed in parentheses?

I've got two question about Regexp::Common qw/URI/ and Regex in Perl.
I use Regexp::Common qw/URI/ to parse URI in the strings and delete them. But I've got an error when a URI is between parentheses.
For example: (http://www.example.com)
The error is caused by ')', and when it try to parse the URI, the app crash. So I've thought two fixes:
Do a simple (or I thought so) that writes a whitespace between parentheses and ) characters
The Regexp::Common qw/URI/ has a function that implement a fix.
In my code I've tried to implement the Regex but the app freezes. The code that I've tried is this:
use strict;
use Regexp::Common qw/URI/;
my $str = "Hello!!, I love (http://www.example.com)";
while ($str =~ m/\)/){
$str =~ s/\)/ \)/;
}
my ($uri) = $str =~ /$RE{URI}{-keep}/;
print "$uri\n";
print $str;
The output that I want is: (http://www.example.com )
I'm not sure, but I think that the problem is in $str =~ s/\)/ \)/;
BTW, I've got a question about Regexp::Common qw/URI/. I've got two string type:
ablalbalblalblalbal http://www.example.com
asfasdfasdf http://www.example.com aasdfasdfasdf
I want to remove the URI if it is the last component (and save it). And, if not, save it without removing it from the text.
You don't have to first test for a match to be able to use the s/// operator correctly: If the string does not match the search pattern, it will not do anything.
#!/usr/bin/perl
use strict; use warnings;
my $str = "Hello!!, I love (GOOGLE)";
$str =~ s/\)/ )/g;
print "$str\n";
The general problem of detecting URLs correctly in text is error-prone. See for example Jeff's thoughts on this.
my $str = "Hello!!, I love (GOOGLE)";
while ($str =~ m/)/){
$str =~ s/)/ )/;
}
Your program goes into an infinite loop at this point. To see why, try printing the value of $str each time round the loop.
my $str = "Hello!!, I love (GOOGLE)";
while ($str =~ m/)/){
$str =~ s/)/ )/;
print $str, "\n";
}
The first time it prints "Hello!!, I love (GOOGLE )". The while loop condition is then evaluated again. Your string still matches your regular expression (it still contains a closing parenthesis) so the replacement is run again and this time it prints out "Hello!!, I love (GOOGLE )" with two spaces.
And so it goes on. Each time round the loop another space is added, but each time you still have a closing parenthesis, so another substitution is run.
The simplest solution I can see is to only match the closing parenthesis if it is preceded by a non-whitespace character (using \S).
my $str = "Hello!!, I love (GOOGLE)";
while ($str =~ m/\S)/){
$str =~ s/)/ )/;
print $str, "\n";
}
In this case the loop is only executed once.
Why not just include the parentheses in the search? If the URLs will always be bracketed, then something like this:
#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common qw/URI/;
my $str = "Hello!!, I love (http://www.google.com)";
my ($uri) = $str =~ / \( ( $RE{URI} ) \) /x;
print "$uri\n";
The regex from Regex::Common can be used as part of a longer regex, it doesn't have to be used on its own. Also I've used the 'x' modifier on the regex to allow whitespace so you can see more clearly what is going on - the brackets with the backslashes are treated as characters to match, those without define what is to matched (presumably like the {-keep} - I've not used that before).
You could also make the brackets optional, with something like:
/ (?: \( ( $RE{URI} ) \) | ( $RE{URI} ) ) /
although that would result in two match variables, one undefined - so something like following would be needed:
my $uri = $1 || $2 || die "Didn't match a URL!";
There's probably a better way to do this, and also if you're not bothered about matching parentheses then you could simply make the brackets optional (via a '?') in the first regex...
To answer your second question about only matching URLs at the end of the line - have a look at Regex 'anchors' which can force a match against the beginning or end of a line: ^ and $ (or \A and \Z if you prefer). e.g. matching a URL at the end of a line only:
/$RE{URI}\Z/

Why does my regular expression fail with certain substitutions?

I am new to perl and not sure how to achieve the following.
I am reading a file and putting the lines in a variable called $tline. Next, I am trying to replace some character from the $tline.
This substitution fails if $tline has some special characters like (, ?,= etc in it. How to escape the special characters from this variable $tline?
if ($tline ne "") {
$tline =~ s/\//\%;
}
EDIT
Sorry for the confusions. Here is what I am trying to do.
$tline =~ s/"\//"\<\%\=request\.getContextPath\(\)\%\>\//;
This is working for most of the cases. But when the input file has ? in it, it is failing.
How about:
$tline =~ s/\Q$var\E/;
That will cause quotemeta to be applied to contents of $var which is being used as the pattern.
This isn't a valid regex:
$tline =~ s/\//\%;
It gets read like this to perl
$tline =~ s/a/%;
Where a = /
What you wanted to do is replace a forward-slash with a percent sign you probably want
$tline =~ s/\//%/;
Which is better written like this:
$tline =~ s,/,%,;
You probably also want to replace more than just the first forward-slash, so you want the /g flag:
$tline =~ s,/,%,g;
And, this exactly what tr (transliteration) does:
$tline =~ tr,/,%,;
UPDATE I think what you want is a simple quotemeta() which takes your input, and regex-escapes the meta characters
$ perl -e'print quotemeta("</foo?>")'
\<\/foo\?\>
You could place all your special characters between square brackets (called a "character class"). The following will replace all left parentheses, question marks and equal signs in your string with percent signs:
my $tline = 'fo(?=o';
$tline =~ s/[(?=]/%/g;
print "$tline\n";
Prints:
fo%%%o
quotemeta is a good function for getting a exact literal with special characters into a regex. And \Q and \E are good operators for doing the same thing inside the regex.
However, you're search expression is not that complex. In your edit, you're simply looking for a double quote and a slash. In fact, I've quite simplified your expression so it contains not a single backslash. So it's not a problem for quotemeta nor for that matter \Q and \E.
Once pared down, I don't see anything in your revised substitution that would cause a problem with '?' in $tline.
Key to the simplification is that '.', '(', and ')' mean nothing special to the replacement section of your expression, so this is equivalent:
$tline =~ s/"\//"<%=request.getContextPath()%>\//;
Not to mention easier to read. Of course this is even easier:
$tline =~ s|"/|"<%=request.getContextPath()%>/|;
Because in Perl, you can choose the delimiter you wish with the s operator.
But with any of these, this works:
use Test::More tests => 1;
my $tline = '"/?"';
$tline =~ s|"/|"<%=request.getContextPath()%>/|;
ok( $tline =~ /getContextPath/ );
It passes the test. Perhaps you're having a problem with more than one substitution on a line. That can be fixed with:
$tline =~ s|"/|"<%=request.getContextPath()%>/|g;
That g is the global switch on the end, saying make this substitution for as many times as it occurs in the input.
However, since I can see what you are doing, I suggest an even tighter specification of what you want to search:
$tline =~ s~\b(href|link|src)="/~$1="<%=2request.getContextPath()%>/~g;
And when I run this:
use Test::More tests => 2;
my $tline = '"/?"';
$tline =~ s/"\//"<%=request.getContextPath()%>\//;
ok( $tline =~ /getContextPath/ );
$tline = 'src="/?/?/beer"';
ok( $tline =~ s~\b(href|link|src)="/~$1="<%=request.getContextPath()%>/~g
);
I get two successes.
Your true problem is yet unspecified.
Well, one way to do it is to put all the characters you want to replace in square brackets. Like so:
$string =~ s/[,?=\/]//; # This will remove the first ',', '?', '=', or '/' from your string.
If you want to remove all the '?' in a string, for example, use a g on the end of it like so:
$string =~ s/[?]//g;
I'm a little rusty, but I believe that you only need a '\' in front of \ or /, (and of course the other special characters like \n,\t, etc...). Like so:
$string =~ s/[\\]/[\/]/g; # Switch from DOS to Unix delimiters.
$string =~ s/[\n\t]//g; # Remove all newlines and tabs
As others have said, the code you've posted isn't going to work since you forgot the last /. That's another nice reason to keep the "weird" characters in a box.