Perl Regex unable to select word with Special character $ - regex

I tried to split a string into words (one of the words has the special character $) , but the split didn't work. I want to split the string below into "Test" "Str$ing"
$test = "Test Str$ing";
my #words = split(" ",$test);
print "#words";
print "-------1End------------\n";
foreach my $str (split /(\s)+/, $test) {
print "$str\n";
}
print "-------End------------\n";
I executed the above code and got the below result, and, as you can see, the 2nd word is only half there:
Test Str
-------1End------------
Test
Str
-------End------------
Any help on this?

In Perl, a dollar sign inside a double-quoted string triggers interpolation. So this assignment:
$test = "Test Str$ing";
sets $test to the string Test Str followed by the value of the variable $ing. If $ing is not set (and you don't have strictures enabled, which would cause the program to fail at this point), the result is just Test Str.
To get a literal dollar sign you have to escape it with a backslash, or use single quotes instead:
$test = "Test Str\$ing";
# or
$test = 'Test Str$ing';
In any case, the first line in your program, after the #! anyway, should be use strict;. Then Perl will catch these errors and blow up instead of silently letting you shoot yourself in the foot. For good measure, you should add use warnings; too:
#!/usr/bin/env perl
use strict;
use warnings;
$test = "Test Str$ing";
Watch what happens when I try to run the above:
Global symbol "$test" requires explicit package name (did you forget to declare "my $test"?) at foo.pl line 5.
Global symbol "$ing" requires explicit package name (did you forget to declare "my $ing"?) at foo.pl line 5.
Execution of foo.pl aborted due to compilation errors.
Your program should look more like this, with minimal changes to pass strict:
#!/usr/bin/env perl
use strict;
use warnings;
my $test = 'Test Str$ing';
my #words = split ' ', $test;
print "#words";
print "-------1End------------\n";
foreach my $str (split /(\s)+/, $test) {
print "$str\n";
}
print "-------End------------\n";
That still seems odd to me, as you're printing all the words on a single line without a newline before the 1End sentinel, and then printing each of them on a line with blank lines between (not really blank, though - containing the whitespace from the original string). But if that's what you want, the above works. Output:
Test Str$ing-------1End------------
Test
Str$ing
-------End------------

You can do it with simple quotation marks:
#!/usr/bin/perl
$test = 'Test Str$ing';
my #words = split(' ',$test);
print "#words";
Otherwise Perl considers $ing to be a variable (which is empty).

The best practice to assign data to a variable is by using single quotes.
Ex:
my $var_name = 'TEST MESSAGE';
If the data which consists of variable and this is to be assigned to another variable then use double quotes. Data in double quotes will interpolated by PERL.
Ex:
my $var1 = 'TEST';
my $var2 = "$var1 MESSAGE";

Related

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Perl regex: How do you match multiple words in perl?

Im writing a small script which is supposed to match all strings within another file (words in between "" and '' including the "" and '' symbols as well).
Below is the regex statement i am currently using, however it only produces results for '(.*)' and not "(.*)"
my #string_matches = ($file_string =~ /'(.*)' | "(.*)"/g);
print "\n#string_matches";
Also how would I be able to include the "" or '' symbols in the results as well?(print out "string" instead of just string)
I've tried searching online but couldnt find any material on this
$file_string is basically a string version of an entire file.
use this : '(.*?)' | "(.*?)"
i guess the greedy operator is selecting your string upto the last '. make it lazy
IMHO
use this regex :
['"][^'"]*?['"]
this will also solve your problem of not getting the quotes inside the match.
demo here : http://regex101.com/r/dI6gD7
#!/usr/local/bin/perl
open my $fh, '<', "strings.txt"; #read the content of the file and assign it to $string;
read $fh, my $string, -s $fh;
close $fh;
while ($string =~ m/^['"]{1}(.*?)['"]{1,}$/mg) {
print $&;
}
You could use '[^']*' to match a string between single quotes, "[^"]*" for double quotes.
If you want to support other features, such as escape sequence, then you should consider using modules Text::ParseWords or Text::Balanced.
Note:
Because of the greediness of *, '.*' will match all characters between the first and last single quote, if your string has more than one single quoted substrings, this will only give one match instead of several ones.
You can use ('[^']*') instead of '([^']*)' to capture the single quotes and the substring between them, double quotes are similar.
Because '[^']*' and "[^"]*" cannot be matched at the same time, m/('[^']*')|("[^"]*")/ with /g will give some undefs in the returned list in list context, using m/('[^']*'|"[^"]*")/g can fix this problem.
Here is a test program:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(switch say);
use Data::Dumper;
my $file_string = q{Test "test in double quotes" test 'test in single quotes' and "test in double quotes again" test};
my #string_matches = ($file_string =~ /('[^']*'|"[^"]*")/g);
local $" = "\n";
print "#string_matches\n";
Testing:
$ perl t.pl
"test in double quotes"
'test in single quotes'
"test in double quotes again"

Matching multiline string in file using perl regex

I am reading in another perl file and trying to find all strings surrounded by quotations within the file, single or multiline. I've matched all the single lines fine but I can't match the mulitlines without printing the entire line out, when I just want the string itself. For example, heres a snippet of what I'm reading in:
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
so the output I'd like is
'Hello World!';
"chmod";
"This is a fun multiple line string, please match";
but I am getting:
'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
This is the code I am using to find the strings - all file content is stored in #contents:
my #strings_found = ();
my $line;
for(#contents) {
$line .= $_;
}
if($line =~ /(['"](.?)*["'])/s) {
push #strings_found,$1;
}
print #strings_found;
I am guessing I am only getting 'Hello World!'; correctly because I am using the $1 but I am not sure how else to find the others without looping line by line, which I would think would make it hard to find the multi line string as it doesn't know what the next line is.
I know my regex is reasonably basic and doesn't account for some caveats but I just wanted to get the basic catch most regex working before moving on to more complex situations.
Any pointers as to where I am going wrong?
Couple big things, you need to search in a while loop with the g modifier on your regex. And you also need to turn off greedy matching for what's inside the quotes by using .*?.
use strict;
use warnings;
my $contents = do {local $/; <DATA>};
my #strings_found = ();
while ($contents =~ /(['"](.*?)["'])/sg) {
push #strings_found, $1;
}
print "$_\n" for #strings_found;
__DATA__
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
Outputs
'Hello World!'
"chmod"
"This is a fun
multiple line string, please match"
You aren't the first person to search for help with this homework problem. Here's a more detailed answer I gave to ... well ... you ;) finding words surround by quotations perl
regexp matching (in perl and generally) are greedy by default. So your regexp will match from 1st ' or " to last. Print the length of your #strings_found array. I think it will always be just 1 with the code you have.
Change it to be not greedy by following * with a ?
/('"*?["'])/s
I think.
It will work in a basic way. Regexps are kindof the wrong way to do this if you want a robust solution. You would want to write parsing code instead for that. If you have different quotes inside a string then greedy will give you the 1 biggest string. Non greedy will give you the smallest strings not caring if start or end quote are different.
Read about greedy and non greedy.
Also note the /m multiline modifier.
http://perldoc.perl.org/perlre.html#Regular-Expressions

Match string in Perl with space or with no-space in it

$search_buffer="this text has teststring in it, it has a Test String too";
#waitfor=('Test string','some other string');
foreach my $test (#waitfor)
{
eval ('if (lc $search_buffer =~ lc ' . $test . ') ' .
'{' .
' $prematch = $`;' .
' $match = $&; ' .
' $postmatch = ' . "\$';" .
'}');
print "prematch=$prematch\n";
print "match=$match\n"; #I want to match both "teststring" and "Test String"
print "postmatch=$postmatch\n";
}
I need to print both teststring and Test String, can you please help? thanks.
my $search_buffer="this text has teststring in it, it has a Test String too";
my $pattern = qr/test ?string/i;
say "Match found: $1" while $search_buffer =~ /($pattern)/g;
That is a horrible piece of code you have there. Why are you using eval and trying to concatenate strings into code, remembering to interpolate some variables and forgetting about some? There is no reason to use eval in that situation at all.
I assume that you by using lc are trying to make the match case-insensitive. This is best done by using the /i modifier on your regex:
$search_buffer =~ /$test/i; # case insensitive match
In your case, you are trying to match some strings against another string, and you want to compensate for case and for possible whitespace inside. I assume that your strings are generated in some way, and not hard coded.
What you could do is simply to make use of the /x modifier, which will make literal whitespace inside your regex ignored.
Something that you should take into consideration is meta characters inside your strings. If you for example have a string such as foo?, the meta character ? will alter the meaning of your regex. You can disable meta characters inside a regex with the \Q ... \E escape sequence.
So the solution is
use strict;
use warnings;
use feature 'say';
my $s = "this text has teststring in it, it has a Test String too";
my #waitfor= ('Test string','some other string', '#test string');
for my $str (#waitfor) {
if ($s =~ /\Q$str/xi) {
say "prematch = $`";
say "match = $&";
say "postmatch = $'";
}
}
Output:
prematch = this text has teststring in it, it has a
match = Test String
postmatch = too
Note that I use
use strict;
use warnings;
These two pragmas are vital to learning how to write good Perl code, and there is no (valid) reason you should ever write code without them.
This would work for your specific example.
test\s?string
Basically it marks the space as optional [\s]?.
The problem that I'm seeing with this is that it requires you to know where exactly there might be a space inside the string you're searching.
Note: You might also have to use the case-insensitive flag which would be /Test[\s]?String/i

Perl regex to subsitute a pattern excluding another pattern

I have a string as below.
$line = 'this is my string "hello world"';
I want to have a regex to delete all space characters inside the string except the region "Hello world".
I use below to delete space chars but it deletes all of them.
$line=~s/ +//g;
How can I exclude the region between "Hello world" and i get the string as below?
thisismystring"hello world"
Thanks
Since you probably want to handle quoted strings properly, you should have a look at the Text::Balanced module.
Use that to split your text into quoted parts and non-quoted parts, then do the replacement on the non-quoted parts only, and finally join the string together again.
Well, here's one way to do it:
use warnings;
use strict;
my $l = 'this is my string "hello world some" one two three "some hello word"';
$l =~ s/ +(?=[^"]*(?:"[^"]*"[^"]*)+$)//g;
print $l;
# thisismystring"hello world some"onetwothree"some hello word"
Demo.
But I really wonder shouldn't it be done the other way (by tokenizing the string, for example), especially if the quotes may be unbalanced.
Another regex to do it:
s/(\s+(".*?")?)/$2/g
#!/usr/bin/perl
use warnings;
use strict;
sub main {
my $line = 'this is my string "hello world"';
while ($line =~ /(\w*|(?:"[^"]*"))\s*/g) { print $1;}
print "\n";
}
main;
s/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)//g
Test the code here.