Perl regex to subsitute a pattern excluding another pattern - regex

I have a string as below.
$line = 'this is my string "hello world"';
I want to have a regex to delete all space characters inside the string except the region "Hello world".
I use below to delete space chars but it deletes all of them.
$line=~s/ +//g;
How can I exclude the region between "Hello world" and i get the string as below?
thisismystring"hello world"
Thanks

Since you probably want to handle quoted strings properly, you should have a look at the Text::Balanced module.
Use that to split your text into quoted parts and non-quoted parts, then do the replacement on the non-quoted parts only, and finally join the string together again.

Well, here's one way to do it:
use warnings;
use strict;
my $l = 'this is my string "hello world some" one two three "some hello word"';
$l =~ s/ +(?=[^"]*(?:"[^"]*"[^"]*)+$)//g;
print $l;
# thisismystring"hello world some"onetwothree"some hello word"
Demo.
But I really wonder shouldn't it be done the other way (by tokenizing the string, for example), especially if the quotes may be unbalanced.

Another regex to do it:
s/(\s+(".*?")?)/$2/g

#!/usr/bin/perl
use warnings;
use strict;
sub main {
my $line = 'this is my string "hello world"';
while ($line =~ /(\w*|(?:"[^"]*"))\s*/g) { print $1;}
print "\n";
}
main;

s/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)//g
Test the code here.

Related

Perl regular expression to get 'This text':(

for example , if i have 'Sample text':(
My $1 should be Sample text, so any string which is enclosed in ' ' and is before :(
The following will do the trick:
/'([^']*)':\(/
For example,
my $str = "'Sample text':(";
my ($match) = $str =~ /'([^']*)':\(/
or die("No match\n");
say $match; # Sample text
Here is the simplest code to answer the question:
#/usr/bin/perl
use strict;
"'Sample text':(" =~ /'(.*)':\(/;
print "$1\n"
I'm not adding anything new here but just some samples and a place for you to test it yourself and see perhaps how it works. There are two assumptions here:
No escape characters, such as 'Sam\'ple text'.
:( is directly after your string. If it's not, you'd want to do a lookahead instead.
+ requires at least one character in your string. So 'a':( would be valid but '':( would not. If you want to allow empty strings, use * instead of +.
'([^']+)':\(

Perl Regex unable to select word with Special character $

I tried to split a string into words (one of the words has the special character $) , but the split didn't work. I want to split the string below into "Test" "Str$ing"
$test = "Test Str$ing";
my #words = split(" ",$test);
print "#words";
print "-------1End------------\n";
foreach my $str (split /(\s)+/, $test) {
print "$str\n";
}
print "-------End------------\n";
I executed the above code and got the below result, and, as you can see, the 2nd word is only half there:
Test Str
-------1End------------
Test
Str
-------End------------
Any help on this?
In Perl, a dollar sign inside a double-quoted string triggers interpolation. So this assignment:
$test = "Test Str$ing";
sets $test to the string Test Str followed by the value of the variable $ing. If $ing is not set (and you don't have strictures enabled, which would cause the program to fail at this point), the result is just Test Str.
To get a literal dollar sign you have to escape it with a backslash, or use single quotes instead:
$test = "Test Str\$ing";
# or
$test = 'Test Str$ing';
In any case, the first line in your program, after the #! anyway, should be use strict;. Then Perl will catch these errors and blow up instead of silently letting you shoot yourself in the foot. For good measure, you should add use warnings; too:
#!/usr/bin/env perl
use strict;
use warnings;
$test = "Test Str$ing";
Watch what happens when I try to run the above:
Global symbol "$test" requires explicit package name (did you forget to declare "my $test"?) at foo.pl line 5.
Global symbol "$ing" requires explicit package name (did you forget to declare "my $ing"?) at foo.pl line 5.
Execution of foo.pl aborted due to compilation errors.
Your program should look more like this, with minimal changes to pass strict:
#!/usr/bin/env perl
use strict;
use warnings;
my $test = 'Test Str$ing';
my #words = split ' ', $test;
print "#words";
print "-------1End------------\n";
foreach my $str (split /(\s)+/, $test) {
print "$str\n";
}
print "-------End------------\n";
That still seems odd to me, as you're printing all the words on a single line without a newline before the 1End sentinel, and then printing each of them on a line with blank lines between (not really blank, though - containing the whitespace from the original string). But if that's what you want, the above works. Output:
Test Str$ing-------1End------------
Test
Str$ing
-------End------------
You can do it with simple quotation marks:
#!/usr/bin/perl
$test = 'Test Str$ing';
my #words = split(' ',$test);
print "#words";
Otherwise Perl considers $ing to be a variable (which is empty).
The best practice to assign data to a variable is by using single quotes.
Ex:
my $var_name = 'TEST MESSAGE';
If the data which consists of variable and this is to be assigned to another variable then use double quotes. Data in double quotes will interpolated by PERL.
Ex:
my $var1 = 'TEST';
my $var2 = "$var1 MESSAGE";

Matching multiline string in file using perl regex

I am reading in another perl file and trying to find all strings surrounded by quotations within the file, single or multiline. I've matched all the single lines fine but I can't match the mulitlines without printing the entire line out, when I just want the string itself. For example, heres a snippet of what I'm reading in:
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
so the output I'd like is
'Hello World!';
"chmod";
"This is a fun multiple line string, please match";
but I am getting:
'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
This is the code I am using to find the strings - all file content is stored in #contents:
my #strings_found = ();
my $line;
for(#contents) {
$line .= $_;
}
if($line =~ /(['"](.?)*["'])/s) {
push #strings_found,$1;
}
print #strings_found;
I am guessing I am only getting 'Hello World!'; correctly because I am using the $1 but I am not sure how else to find the others without looping line by line, which I would think would make it hard to find the multi line string as it doesn't know what the next line is.
I know my regex is reasonably basic and doesn't account for some caveats but I just wanted to get the basic catch most regex working before moving on to more complex situations.
Any pointers as to where I am going wrong?
Couple big things, you need to search in a while loop with the g modifier on your regex. And you also need to turn off greedy matching for what's inside the quotes by using .*?.
use strict;
use warnings;
my $contents = do {local $/; <DATA>};
my #strings_found = ();
while ($contents =~ /(['"](.*?)["'])/sg) {
push #strings_found, $1;
}
print "$_\n" for #strings_found;
__DATA__
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
Outputs
'Hello World!'
"chmod"
"This is a fun
multiple line string, please match"
You aren't the first person to search for help with this homework problem. Here's a more detailed answer I gave to ... well ... you ;) finding words surround by quotations perl
regexp matching (in perl and generally) are greedy by default. So your regexp will match from 1st ' or " to last. Print the length of your #strings_found array. I think it will always be just 1 with the code you have.
Change it to be not greedy by following * with a ?
/('"*?["'])/s
I think.
It will work in a basic way. Regexps are kindof the wrong way to do this if you want a robust solution. You would want to write parsing code instead for that. If you have different quotes inside a string then greedy will give you the 1 biggest string. Non greedy will give you the smallest strings not caring if start or end quote are different.
Read about greedy and non greedy.
Also note the /m multiline modifier.
http://perldoc.perl.org/perlre.html#Regular-Expressions

Perl - Match string between two colons

My string looks like this
important stuff: some text 2: some text 3.
I want to only print "important stuff". So basically I want to print everything up to the first colon. I'm sure this is simple, but my regex foo is not so good.
Edit: Sorry I was doing something stupid and gave you a bad example line. It has been corrected.
Just restrict what you're matching to non-colons, [^:]*. Note, the ^ and : boundaries aren't actually needed, but they help document the intent behind the regex.
my $text = "important stuff: some text 2: some text 3."
if ($text =~ /^([^:]*):/) {
print "$1";
}
Consider just splitting on the colon:
use strict;
use warnings;
my $string = 'important stuff: some text 2: some text 3.';
my $important = ( split /:/, $string )[0];
print $important;
Output:
important stuff
Well, assume its a string
$test = "sass sg22gssg 22222 2222: important important :"
Assume you want all characters between.
Wrong answer: $test =~ /:(.+):/; # thank you for the change from .{1,}
Corrected.
$test =~ /:([^:]*):/;
print $1; #perl memory u can assign to a string ;
$found = $1;
As a cheat sheet of regex in perl. cheat sheet
I did test it.

Match string in Perl with space or with no-space in it

$search_buffer="this text has teststring in it, it has a Test String too";
#waitfor=('Test string','some other string');
foreach my $test (#waitfor)
{
eval ('if (lc $search_buffer =~ lc ' . $test . ') ' .
'{' .
' $prematch = $`;' .
' $match = $&; ' .
' $postmatch = ' . "\$';" .
'}');
print "prematch=$prematch\n";
print "match=$match\n"; #I want to match both "teststring" and "Test String"
print "postmatch=$postmatch\n";
}
I need to print both teststring and Test String, can you please help? thanks.
my $search_buffer="this text has teststring in it, it has a Test String too";
my $pattern = qr/test ?string/i;
say "Match found: $1" while $search_buffer =~ /($pattern)/g;
That is a horrible piece of code you have there. Why are you using eval and trying to concatenate strings into code, remembering to interpolate some variables and forgetting about some? There is no reason to use eval in that situation at all.
I assume that you by using lc are trying to make the match case-insensitive. This is best done by using the /i modifier on your regex:
$search_buffer =~ /$test/i; # case insensitive match
In your case, you are trying to match some strings against another string, and you want to compensate for case and for possible whitespace inside. I assume that your strings are generated in some way, and not hard coded.
What you could do is simply to make use of the /x modifier, which will make literal whitespace inside your regex ignored.
Something that you should take into consideration is meta characters inside your strings. If you for example have a string such as foo?, the meta character ? will alter the meaning of your regex. You can disable meta characters inside a regex with the \Q ... \E escape sequence.
So the solution is
use strict;
use warnings;
use feature 'say';
my $s = "this text has teststring in it, it has a Test String too";
my #waitfor= ('Test string','some other string', '#test string');
for my $str (#waitfor) {
if ($s =~ /\Q$str/xi) {
say "prematch = $`";
say "match = $&";
say "postmatch = $'";
}
}
Output:
prematch = this text has teststring in it, it has a
match = Test String
postmatch = too
Note that I use
use strict;
use warnings;
These two pragmas are vital to learning how to write good Perl code, and there is no (valid) reason you should ever write code without them.
This would work for your specific example.
test\s?string
Basically it marks the space as optional [\s]?.
The problem that I'm seeing with this is that it requires you to know where exactly there might be a space inside the string you're searching.
Note: You might also have to use the case-insensitive flag which would be /Test[\s]?String/i