Extract string after a symbol in Perl - regex

How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.

That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.

I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;

You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";

Related

Applying Filters in Perl using Regex

I'm trying to extract text and numbers from a string using regex in perl. Here is my code:
$line = "finish=100\n";
($var) = $line =~ /[a-z]+/;
($val) = $line =~ /[0-9]+/;
My expected output is that $var = "finish" and $val = 100. However when I run the code $var = 1 and $val = 1.
Any help would be appreciated!!
Use capturing parentheses inside your regular expressions:
$line = "finish=100\n";
($var) = $line =~ /([a-z]+)/;
($val) = $line =~ /([0-9]+)/;
print "$var $val\n";
Refer to perlre
A regex match in list context (where the regex doesn't use the /g flag) returns
the empty list if it fails
a list of captured substrings ($1, $2, ...) if it succeeds and the pattern contains capturing groups
the list 1 if it succeeds and the pattern doesn't capture anything
Your regexes match, but they don't contain any capturing groups, so that's why you get 1 in $var and $val.
If you add capturing groups (/([a-z]+)/, /([0-9]+)/), you get the matched substrings instead.
Note that it might be easier to just do it all in one match:
my ($var, $val) = $line =~ /^([a-z]+)=([0-9]+)$/;
This way you also validate that the input string has the expected form and isn't just something like "Cat o' 9 tails", which (with your original regexes) would extract $var = "at" and $val = "9".
You can too get two values in one array, maybe with this:
$line = "finish=100\n";
#matches = $line =~ /(\w+)\W(\d+)/;
print "$matches[0], $matches[1]";

Regex Word Boundary in Perl not yield expected results

So I'm having an issue with pulling data from a string between 2 keywords. I understand that in regex I'm suppose to use the \b boundary tags and I've written the following for a test example, however it seems to only match the whole string instead of just the portion I want.
For example, the string: "here are more string words START OF INFORMATION SECTION some other stuff"
I am gathering text between "START" and "SECTION".
So I'm expecting "START OF INFORMATION SECTION", I believe.
This is the following snippet I have written in Perl specifically, but it doesn't yield the results I expected.
#!/usr/bin/perl
# This is perl 5, version 22, subversion 1 (v5.22.1) built for cygwin-thread-multi
use POSIX;
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
# this should provide me with the specific text between my two boundary words
$text =~ /\bSTART\b(.*?)\bSECTION\b/;
print "New String: $text\n";
Your code is simply testing whether the regex pattern matches the string, returning a true or false value to indicate whether there was a match. You discard that indicator
If there was a match then the strings captured using parentheses in the regex pattern will be assigned to the capture variables $1, $2 etc.
It's unclear what you need to do, but this program prints everything between START and SECTION: in this case OF INFORMATION
There's no need for use POSIX, but use strict and use warnings 'all' are essential
#!/usr/bin/perl
use strict;
use warnings 'all';
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
output
Original String: here are more string words START OF INFORMATION SECTION some other stuff
New String: OF INFORMATION
You should use this
$text =~ /\b(START\b(.*?)\bSECTION)\b/;
print "New String: $1\n";
IDEONE DEMO
$1 is the first captured group.
As suggested by borodin
if ( $text =~ /\b(START\b(.*?)\bSECTION)\b/ ) {
my $tmp = $1;
print "New String: $tmp\n";
}
The match operator doesn't change the string it matches.
You can use either of the following to inspect the captured string:
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
or
if ( my ($section) = $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
print "New String: $section\n";
}

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

perl regular expression finding pattern only in the front of the text

Suppose there is a text like this:
|-SAMPLE-D2
|---SAMPLE-D1
|---SAMPLE3
I want to count the number of "-" after |.
I tried to parse that by using the following regular expression in perl
$count=()= /-/g;
but this is problematic because the first two has "-" somewhere else in the text as well as in the front. How should I form my regex or use other function in perl to get the number of "-" right after "|"?
Regex to match the dashes after the starting |:
/^\|([\-]*)/
To count dashes that are not preceded by a letter, use a negative look-behind assertion.
$count = () = /(?<!\w)-/g
If the vertical line only ever comes at the start you can get the string of repeating minuses with:
my ($match) = $txt =~ /^\|(-*)/;
The brackets around $match cause the captured portion of the regex to be put into it
then get the number of minuses using
my $minus_count = length($match || '');
The
|| '')
bit
Initialises $match if the regex above found no matches at all, to stop length moaning about uninitialised variables (if you have warnings on)
Not sure if you can count in Regex directly but you can extract capture groups and do a simple arithmetic with their string lengths:
#!/usr/bin/perl
use warnings;
my $inFile = $ARGV[0];
open(FILEHANDLE, "<", $inFile) || die("Could not open file ".$inFile);
my #fileLines = <FILEHANDLE>;
my $lineNo = 0;
my $rslt;
foreach my $line(#fileLines) {
chomp($line);
$line =~ s/^\s+//;
$line =~ s/\s+$//;
$lineNo++;
print "\n".$lineNo." = <".$line.">";
if($line =~ m/^\|-+(.+)/) {
my $text = $1;
print "\n\ttext = <".$text.">";
my $minCnt = length($line) - length($text) - 1;
print "\n\tminus count = <".$minCnt.">";
}
}
close(FILEHANDLE);

perl regex match closest

I'm trying to match from the last item closet to a final word.
For instance, closest b to dog
"abcbdog"
Should be "bdog"
But instead I'm getting "bcbdog"
How can I only match from the last occurrence "b" before "dog"
Here is my current regex:
/b.*?dog/si
Thank you!
Regexes want to go from left to right but you want to go from right to left so just reverse your string, reverse your pattern, and reverse the match:
my $search_this = 'abcbdog';
my $item_name = 'dog';
my $find_closest = 'b';
my $pattern = reverse($item_name)
. '.*?'
. reverse($find_closest);
my $reversed = reverse($search_this);
$reversed =~ /$pattern/si;
my $what_matched = reverse($&);
print "$what_matched\n";
# prints bdog
Try this:
/b[^b]*dog/si
Match b, then anything that isn't a b (including nothing), and then dog.
TIMTOWTDI:
This method can even find multiple matches through the string, or may be optimized if the start or end words will be more common. Edit: Now uses zero-width matches to avoid removing then adding the start and end strings.
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10; #say
my $string = 'abcbdog';
my $start = 'b';
my $end = 'dog';
my #found =
grep { s/(?<=$end).*// }
split( /(?=$start)/, $string );
say for #found;
when you don't know already what is the last character before dog this just works:
my $str = 'abcbdog';
my #r = $str =~ /(.dog)/;
print #r;
prints bdog
The accepted answer seems a little complicated if you're just trying to match up the closest 'b' to 'dog', including dog, you just need to make your matches before the term you're looking for greedy. For example:
# First example
my $string1 = 'abcbdog';
if ( $string1 =~ /.+(b.*dog)/ ) {
print $1;
# Returns 'bdog'
}
# Second example, different string, same regex.
my $string2 = 'abcbmoretextdog';
if ( $string2 =~ /.+(b.*dog)/ ) {
print $1;
# Returns 'bmoretextdog'
}
Or am I missing something? If you want to change the captured string to match what you want, just shift the brackets.
Try this code:
~/.* b.*?dog/si