Space character in regex is not recognised - regex

I'm writing a simple program - please see below for my code with comments. Does anyone know why the space character is not recognised in line 10? When I run the code, it finds the :: but does not replace it with a space.
1 #!/usr/bin/perl
2
3 # This program replaces :: with a space
4 # but ignores a single :
5
6 $string = 'this::is::a:string';
7
8 print "Current: $string\n";
9
10 $string =~ s/::/\s/g;
11 print "New: $string\n";

Try s/::/ /g instead of s/::/\s/g.
The \s is actually a character class representing all whitespace characters, so it only makes sense to have it in the regular expression (the first part) rather than in the replacement string.

Use s/::/ /g. \s only denotes whitespace on the matching side, on the replacement side it becomes s.

Replace the \s with a real space.
The \s is shorthand for a whitespace matching pattern. It isn't used when specifying the replacement string.

Replace string should be a literal space, i.e.:
$string =~ s/::/ /g;

Related

Perl 6 regex not terminated

I have a Perl 6 code where I am doing the following:
if ($line ~~ /^\s*#/) { print "matches\n"; }
I'm getting this error:
===SORRY!===
Regex not terminated.
at line 2
------> <BOL>�<EOL>
Unable to parse regex; couldn't find final '/'
at line 2
------> <BOL>�<EOL>
expecting any of:
infix stopper
This is part of a Perl 5 code:
if ($line =~ /^\s*#/)
It used to work fine to identify lines that have an optional space and a #.
What's causing this error in Perl 6?
In Perl 6, everything from a lone1 # to the end of the line is considered a comment, even inside regexes.
To avoid this, make it a string literal by placing it inside quotes:
if $line ~~ / ^ \s* '#' / { say "matches"; }
(Escaping with \ should also work, but Rakudo seems to have a parsing bug which makes that not work when preceded by a space.
And quoting the character as shown here is the recommended way anyway – Perl 6 specifically introduced quoted strings inside regexes and made spaces insignificant by default, in order to avoid the backslash clutter that many Perl 5 regexes suffer from.)
More generally, all non-alphanumeric characters need to be quoted or escaped inside Perl 6 regexes in order to match them literally.
This is another deliberate non-backwards-compatible change from Perl 5, where this is a bit more complicated.
In Perl 6 there is a simple rule:
alphanumeric --> matches literally only when not escaped.
(When escaped, they either have special meanings, e.g. \s etc., or are forbidden.)
non-alphanumeric --> matches literally only when escaped.
(When not escaped, they either have special meanings, e.g. ., +, #, etc., or are forbidden.)
1 'Lone' meaning not part of a larger token such as a quoted string or the opener of an embedded comment.
A hash # is used as a comment marker in Perl 6 regexes.
Add a backslash \ to escape it like this
if ( $line =~ /^\s*\#/ )

Substitution with \s does not work as expected

I write regex to remove more than 1 space in a string. The code is simple:
my $string = 'A string has more than 1 space';
$string = s/\s+/\s/g;
But, the result is something bad: 'Asstringshassmoresthans1sspace'. It replaces every single space with 's' character.
There's a work around is instead of using \s for substitution, I use ' '. So the regex becomes:
$string = s/\s+/ /g;
Why doesn't the regex with \s work?
\s is only a metacharacter in a regular expression (and it matches more than just a space, for example tabs, linebreak and form feed characters), not in a replacement string. Use a simple space (as you already did) if you want to replace all whitespace by a single space:
$string = s/\s+/ /g;
If you only want to affect actual space characters, use
$string = s/ {2,}/ /g;
(no need to replace single spaces with themselves).
The answer to your question is that \s is a character class, not a literal character. Just as \w represents alphanumeric characters, it cannot be used to print an alphanumeric character (except w, which it will print, but that's beside the point).
What I would do, if I wanted to preserve the type of whitespace matched, would be:
s/\s\K\s*//g
The \K (keep) escape sequence will keep the initial whitespace character from being removed, but all subsequent whitespace will be removed. If you do not care about preserving the type of whitespace, the solution already given by Tim is the way to go, i.e.:
s/\s+/ /g
\s stands for matching any whitespace. It's equivalent to this:
[\ \t\r\n\f]
When you replace with $string = s/\s+/\s/g;, you are replacing one or more whitespace characters with the letter s. Here's a link for reference: http://perldoc.perl.org/perlrequick.html
Why doesn't the regex with \s work?
Your regex with \s does work. What doesn't work is your replacement string. And, of course, as others have pointed out, it shouldn't.
People get confused about the substitution operator (s/.../.../). Often I find people think of the whole operator as "a regex". But it's not, it's an operator that takes two arguments (or operands).
The first operand (between the first and second delimiters) is interpreted as a regex. The second operand (between the second and third delimiters) is interpreted as a double-quoted string (of course, the /e option changes that slightly).
So a substitution operation looks like this:
s/REGEX/REPLACEMENT STRING/
The regex recognises special characters like ^ and + and \s. The replacement string doesn't.
If people stopped misunderstanding how the substitution operator is made up, they might stop expecting regex features to work outside of regular expressions :-)

Wildcard beginning of a line in perl

How to use wildcard for beginning of a line?
Example, I want to replace abc with def.
This is what my file looks like
abc
abc
abc
hg abc
Now I want that abc should be replaced in only first 3 lines. How to do it?
$_ =~ s/['\s'] * abc ['\s'] * /def/g;
What condition to be put before beginning of first space?
Thanks
What about:
s/(^ *)abc/$1def/g
(^ *) -> zero or morespaces at start of line
This will strictly replace abc with def.
Also note I've used a real space and not \s because you said "beginning of first space". \s matches more characters than only space.
You are making a couple of mistakes in your regex
$_ =~ s/['\s'] * abc ['\s'] * /def/g;
You don't need /g (global, match as many times as possible) if you only want to replace from the beginning of the string (since that can only match once).
Inside a character class bracket all characters are literal except ], - and ^, so ['\s'] means "match whitespace or apostrophe '"
Spaces inside the regex is interpreted literally, unless the /x modifier is used (which it is not)
Quantifiers apply to whatever they immediately precede, so \s* means "zero or more whitespace", but \s * means "exactly one whitespace, followed by zero or more space". Again, unless /x is used.
You do not need to supply $_ =~, since that is the variable any regex uses unless otherwise specified.
If you want to replace abc, and only abc when it is the first non-whitespace in a line, you can do this:
s/^\s*\Kabc/def/
An alternate for the \K (keep) escape is to capture and put back
s/^(\s*)abc/$1def/
If you want to keep the whitespace following the target string abc, you do not need to do anything. If you want it removed, just add \s* at the end
s/^\s*\Kabc\s*/def/
Also note that this is simply a way to condense logic into one statement. You can also achieve the same by using very simple building blocks:
if (/^\s*abc/) { # if abc is the first non-whitespace
s/abc/def/; # ...substitute it
}
Since the substitution only happens once (if the /g modifier is not used), and only the first match is affected, this will flawlessly substitute abc for def.
Try this:
$_ =~ s/^['\s'] * abc ['\s'] * /def/g;
If you need to check from start of a line then use ^.
Also, I am not sure why you have ' and spaces in your regex. This should also work for you:
$_ =~ s/^[\s]*abc[\s]*/def/g;
Use ^ character, and remove unnecessary apostrophes, spaces and [ ] :
$_ =~ s/^\s*abc/def/g
If you want to keep those spaces that were before the "abc":
$_ =~ s/^(\s*)abc/\1def/g

perl regex - quantifier * not greedy enough to pickup the newline at end of string

Is it not quantifier * , greedy ? Should not \s* match 0 or more occurence of white spaces,and which in turn would match everything till end of the given input string ?
#!/usr/bin/perl
use strict;
use warnings;
my $input="Name : www.devserver.com\n";
$input=~s/\w+.:\s*//; # /s* should not it match everthing till \n at the end ?
print $input;
Please help me understand this behaviour.
\s* will match only a string consisting entirely of characters of the same class (namely, whitespace).
In your case, there is www.devserver.com between the leading and trailing spaces.
You may have tried to use . class instead of \s:
$input=~s/\w+.:.*//;
This also wouldn't touch the trailing newline! According to perlre:
To simplify multi-line substitutions, the "." character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't.
So, wrapping it up: the behavior you are expecting can be reproduced with the following substitution:
$input=~s/\w+.:.*//s;

How can I get my Perl regex not to use special characters from an interpolated variable? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I escape meta-characters when I interpolate a variable in Perl's match operator?
I am using the following regex to search for a string $word in the bigger string $referenceLine as follows :
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$word\b)/g
The problem happens when my $word substring contains some (, etc. Because it takes it as a part of the regex rather than the string to match and gives the following error :
Unmatched ( in regex; marked by <-- HERE in
m/( <-- HERE ?=\b( darsheel safary\b)/
at ./bleu.pl line 119, <REFERENCE> line 1.
Can somone please tell me a solution to this? I think If I could somehow get perl to understand that we want to look for the whole $word as it is without evaluating it, it might work out.
Use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b\Q$word\E\b)/g
to tell the regex engine to treat every character in $word as a literal character.
\Q marks the start, \E marks the end of a literal string in Perl regex.
Alternatively, you could do
$quote_word = quotemeta($word);
and then use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$quote_word\b)/g
One more thing (taken up here from the comments where it's harder to find:
Your regex fails in your example case because of the word boundary anchor \b. This anchor matches between a word character and a non-word character. It only makes sense if placed around actual words, i. e. \bbar\b to ensure that only bar is matched, not foobar or barbaric. If you put it around non-words (as in \b( darsheel safary\b) then it will cause the match to fail (unless there is a letter, digit or underscore right before the ().