What do these perl regular expression mean? - regex

chomp(); # Remove the newline.
$_ =~ s/\s*//; # No extra spaces
$_ =~ s/\\//; # Kill any line connector
I am not very familiar with Perl/regex. I am modifying an existing perl script with the above snippet. This chunk of code removes the newline character, spaces and line ending with '\' as connector.
My question is, in the following line, what do the two bold character mean? I understand that anything between '/ /' is regular expression. But,
1) What does the s preceding the '/ /' mean?
2) What does the second '/' at the end mean?
In $_ =~ s/\s* //;

s means "do replace".
$_ =~ s/SEARCH/REPLACE/;
So, s/\s*//; means find every white spaces and remove them (or replace them with empty string).

The "s" is the substitution function so it is saying "substitute"/"what matches here"/"with what's here"/
The "/" here is the opening and closing delimiter of the parameters: s/something_to_match/replacement. '/' is very commonly used as the delimiter character, but perl pretty much allows any character that isn't used in the regexes as the delimiter.
See: http://perldoc.perl.org/functions/s.html

Related

perl match consecutive newlines: `echo "aaa\n\n\nbbb" | perl -pe "s/\\n\\n/z/gm"`

This works:
echo "aaa\n\n\nbbb" | perl -pe "s/\\n/z/gm"
aaazzzbbbz
This doesn't match anything:
echo "aaa\n\n\nbbb" | perl -pe "s/\\n\\n/z/gm"
aaa
bbb
How do I fix, so the regex matches two consecutive newlines?
A linefeed is matched by \n
echo "a\n\n\b" | perl -pe's/\n/z/'
This prints azzb, and without the following newline, so with the next prompt on the same line. Note that the program is fed one line at a time so there is no need for /g modifier. (And which is why \n\n doesn't match.) That /m modifier is then unrelated to this example.†
I don't know in what form this is used but I'd imagine not with echo feeding the input? Then better test it with input in a file, or in a multi-line string (in which case /g may be needed).
An example
use warnings;
use strict;
use feature 'say';
# Test with multiline string
my $ml_str = "a\n\nb\n";
$ml_str =~ s/\n/z/g; #--> azzbz (no newline at the end)
print $ml_str;
say ''; # to terminate the line above
# Or to replace two consecutive newlines (everywhere)
$ml_str = "a\n\nb\n"; # restore the example string
$ml_str =~ s/\n\n/z/g; #--> azb\n
print $ml_str;
# To replace the consecutive newlines in a file read it into a string
my $file = join '', <DATA>; # lines of data after __DATA__
$file =~ s/\n\n/z/g;
print $file;
__DATA__
one
two
last
This prints
azzbz
azb
one
twoz
last
As a side note, I'd like to mention that with the modifier /s the . matches a newline as well. (For example, this is handy for matching substrings that may contain newlines by .* (or .+); without /s modifier that pattern stops at a newline.)
See perlrebackslash and search for newline.
† The /m modifier makes ^ and $ also match beginning and end of lines inside a multi-line string. Then
$multiline_string =~ s/$/z/mg;
will replace newlines inside the string. However, this example bears some complexities since some of the newlines stay.
You are applying substitution to only one line at a time, and one line will never have two newlines. Apply the substitution to the entire file instead:
perl -0777 -pe 's/\n\n/z/g'

Search and replace a special character in perl

I want to search a character and replace it with a string. First, I search for ':' and replace it with 'to'. Next I want to search '$' and replace it with 'END'. This is the code that I've tried. In below code, it work for the first character but not the second character. I tried to use backslash to escape the special character '$' but it still did not work. What else can I do?
$string = "[9:8],
if ($string =~ /^.*:+/){
$stringreplaced =~ s/:/to/g;
}
elsif ($string =~ /^.*\$+/){
$stringreplaced =~ s/\$/END/g;
}
First of all, the code you posted doesn't even compile, yet you say it actually ran. Only post code that you've run.
Second, you're matching against the wrong string. You're checking if $string contains the character, but you replace the characters in $stringreplaced. ALWAYS use use strict; use warnings;. This would have caught this error.
Third, you only check if the character (: or $) is on the first line. This is because . doesn't match line feeds without /s.
Finally, You only check if the string contains $ if it doesn't contain : because you used elsif.
The following is all you need:
$string =~ s/:/to/g;
$string =~ s/\$/END/g;

regular expression that matches any word that starts with pre and ends in al

The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i

Regex not working, at least in command line

I have a regex:
($value) = $line =~ /\ABC(.+?)\#/;
For input, e.g.:
(32321213321) ABC 24432.232 #Junk
Which is meant to catch the number between FD and #.
When I run it through the command line, it returns a space. Through Padre, it returns a space + the number before #.
Is there something wrong with the regex?
In your regex, you have escaped the A. This then becomes an escape sequence, an assertion \A to match the beginning of the string. Another version of the same escape is ^ . And your string does not start there, so the regex cannot match. You have another redundant escape as well, before #. The regex you need is
/ABC(.+?)#/
You can use:
$line =~ /ABC *([0-9 ]+?) *#/;
OR better:
$line =~ /ABC *(\d+(?: \d+)*) *#/;

Perl - remove first word in a string with regexps

I'm new to both Perl and reg-ex's, and I'm trying to remove the first word in a string (or the first word in a line in a text file) , along with any whitespace that follows it.
For example, if my string is 'one two abd123words', I want to remove 'one '.
The code I was trying is: $line =~/(\S)$/i;
but this only gives me the last word.
If it makes any difference, the word i'm trying to remove is an input, and stored as $arg.
To remove the first word of each line use:
$line =~ s/^\S+\s*//;
EDIT for a explanation:
s/.../.../ # Substitute command.
^ # (Zero-width) Begin of line.
\S+ # Non-space characters.
\s* # Blank-space characters.
// # Substitute with nothing, so remove them.
You mean, like this? :
my $line = 'one two abd123words';
$line =~ s/^\s*\S+\s*//;
# now $line is 'two abd123words'
(That removes any initial whitespace, followed by a one or more non-whitespace characters, followed by any newly-initial whitespace.)
In one-liner form:
$ perl -pi.bak -e 's{^\s*\S+\s*}//' file.txt