Perl regex - Having the delimiter as part of the string itself - regex

I have a long string in the format
id1:2014-08-05 11:24;Does this work?,id2:2014-08-04 13:22; Does this work,too?,id3:2014-07-25 16:56 ...
I am trying to extract the 'date' and 'comment' part out of this, based on the id, which is the input.
For example, if the input is id2, I'd want the comment as 'Does this work, too?' and date as '2014-08-04 13:22'. Here is the regex I have so far.
if($string =~ m/\b$id:(.*?);(.*,?)/){
my $date = $1;
my $comment = substr($2,0,-1); #to remove the last ,
}
Now since there is a ',' as part of the string itself, my regex treats it as a delimiter and just returns 'Does this work' as the comment, leaving out the ',too?' part.
Any help would really help as to how to handle when my string has the delimiter within itself.

I think the best way to do this is to form a hash out of the string. If you start by splitting the string on any comma that's immediately followed by some alphanumeric characters and a colon then the commas within the comments will be ignored and most of your work is done.
Then just use a regex to divide each split into three chunks: the ID, the date/time, and the comment, and put them into a hash. After that you can get the date/time for an ID as $data{id1}[0] and the comment as $data{id1}[1]
This program demonstrates
use strict;
use warnings;
my $s = 'id1:2014-08-05 11:24;Does this work?,id2:2014-08-04 13:22; Does this work,too?,id3:2014-07-25 16:56 ...';
my %data;
for (split /,(?=\w+:)/, $s) {
my #fields = /([^:]+):([^;]+);(.+)/g;
$data{$1} = [ $2, $3 ];
}
print $data{id2}[1], "\n";
output
Does this work,too?

$str = "id1:2014-08-05 11:24;Does this work?,id2:2014-08-04 13:22; Does this work,too?,id3:2014-07-25 16:56; bla";
$id = "id2";
# I need comma to set the end of the last "record"
$str = $str . ",";
if ($str =~ /$id:([\d\-\: ]+);([ \w\?\,]+)\,/) {
print "date = $1\n";
print "comment = $2\n";
}

Related

Extract string after a symbol in Perl

How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.
That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.
I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;
You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

need regexp to help me extract data from within double-quotes

I have sought the answer to this question here in stackoverflow but can't get acceptable results. (Sorry!)
I have a data file that looks like this:
share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST
share "SHARE2" "/path/to/a/different/share with spaces in the dir name" umask=022 maxusr=4294967295 netbios=ANOTHERCIFSHOST
... from which I need to extract the values inside double-quotes. In other words, I'd like to get something like this:
share,SHARE1,/path/to/some/share/,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST
The tricky part I've found is in trying to extract the data inside quotes. Suggestions made here have not worked for me, so I'm guessing I'm just doing it wrong. I also need to extract BOTH values from each line's double-quoted strings, not just the first one; I figure the remaining stuff could easily be parsed by splitting on whitespace.
In case it's relevant, I'm running this on a RHEL box and I need to pull it out with a regexp using Perl.
Thx!
One option is to treat your data as a CSV file and use Text::CSV_XS to parse it, setting the separator character to a space:
use strict;
use warnings;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new( { binary => 1, sep_char => ' ' } )
or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, "<:encoding(utf8)", "data.txt" or die "data.txt: $!";
while ( my $row = $csv->getline($fh) ) {
print join ',', #$row;
print "\n";
}
$csv->eof or $csv->error_diag();
close $fh;
Output on your dataset:
share,SHARE1,/path/to/some/share,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST
Hope this helps!
You can do this:
if literal quotes inside quotes are escaped with a backslash: share "SHA \" RE1" ...
$str =~ s/(?|"((?>[^"\\]++|\\{2}|\\.)*)"|()) /$1,/gs;
if literal quotes are escaped with an other quote: share "SHA "" RE1" ...
$str =~ s/(?|"((?>[^"]++|"")*)"|()) /$1,/g;
if you are absolutly sure that there is no escaped quote between quotes in all your data:
$str =~ s/(?|"([^"]*)"|()) /$1,/g;
Try this.
[^\" ]*
It selects every char but the quotation marks and the spaces.
my $str = 'share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST';
$str =~ s/"?\s*"\s*/,/g;
print $str;
This regex replaces like below:
"space" = ,
"space = ,
space" = ,
"" = ,
Not sure if I understand the question, you say one thing in the text but the example says something different, annyway, try this:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my #matches = $_ =~ /"(.*?)"/g;
print "#matches\n";
}
__DATA__
share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST
share "SHARE2" "/path/to/a/different/share with spaces in the dir name" umask=022 maxusr=4294967295 netbios=ANOTHERCIFSHOST
output:
$ ./p.pl
SHARE1 /path/to/some/share
SHARE2 /path/to/a/different/share with spaces in the dir name
#!/usr/bin/env perl
while(<>){
my #a = split /\s+\"|\"\s+/ , $_; # split on any spaces + ", or any " + spaces
for my $item ( #a ) {
if ( $item =~ /\"/ ) { # if there's a quote, remove
$item =~ s/\"//g;
} elsif ( $item !~ /\"/ ){ # else just replace spaces with comma
$item =~ s/\s+/,/g;
}
}
print join(",", #a);
print "\n";
}
output:
share,SHARE1,/path/to/some/share,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST,
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST,
Leave it to you to remove the last comma :)

Regex parsing string substitution question

I would like to know if there is an easy way for parsing a string like this
set PROMPT = Yes, Master?
What I would like to do, is parse one part of this string up to the equal sign and parse the second part after the equal sign into another string.
Something like...
$phrase = 'set PROMPT = Yes, Master?';
#parts = split /=/, $phrase;
or
($set, $value) = split /=/, $phrase, 2;
[updated] Changes per comments.
Try matching this regex /\s*set\s*(\w+)\s*=\s*(.*)\s*$/ and setting the parts with $1 and $2:
my $str = 'set PROMPT = Yes, Master?';
my ($k, $v) = ($1, $2) if $str =~ /\s*set\s*(\w+)\s*=\s*(.*)\s*$/;
print "OK: k=$k, v=$v\n"; OK: k=PROMPT, v=Yes, Master?
while ($subject =~ m/([^\s]+)\s*=\s*([^\$]+)/img) {
# $1 = $2
}

In Perl, how can I get the matched substring from a regex?

My program read other programs source code and colect information about used SQL queries. I have problem with getting substring.
...
$line = <FILE_IN>;
until( ($line =~m/$values_string/i && $line !~m/$rem_string/i) || eof )
{
if($line =~m/ \S{2}DT\S{3}/i)
{
# here I wish to get (only) substring that match to pattern \S{2}DT\S{3}
# (7 letter table name) and display it.
$line =~/\S{2}DT\S{3}/i;
print $line."\n";
...
In result print prints whole line and not a substring I expect. I tried different approach, but I use Perl seldom and probably make basic concept error. ( position of tablename in line is not fixed. Another problem is multiple occurrence i.e.[... SELECT * FROM AADTTAB, BBDTTAB, ...] ). How can I obtain that substring?
Use grouping with parenthesis and store the first group.
if( $line =~ /(\S{2}DT\S{3})/i )
{
my $substring = $1;
}
The code above fixes the immediate problem of pulling out the first table name. However, the question also asked how to pull out all the table names. So:
# FROM\s+ match FROM followed by one or more spaces
# (.+?) match (non-greedy) and capture any character until...
# (?:x|y) match x OR y - next 2 matches
# [^,]\s+[^,] match non-comma, 1 or more spaces, and non-comma
# \s*; match 0 or more spaces followed by a semi colon
if( $line =~ /FROM\s+(.+?)(?:[^,]\s+[^,]|\s*;)/i )
{
# $1 will be table1, table2, table3
my #tables = split(/\s*,\s*/, $1);
# delim is a space/comma
foreach(#tables)
{
# $_ = table name
print $_ . "\n";
}
}
Result:
If $line = "SELECT * FROM AADTTAB, BBDTTAB;"
Output:
AADTTAB
BBDTTAB
If $line = "SELECT * FROM AADTTAB;"
Output:
AADTTAB
Perl Version: v5.10.0 built for MSWin32-x86-multi-thread
I prefer this:
my ( $table_name ) = $line =~ m/(\S{2}DT\S{3})/i;
This
scans $line and captures the text corresponding to the pattern
returns "all" the captures (1) to the "list" on the other side.
This psuedo-list context is how we catch the first item in a list. It's done the same way as parameters passed to a subroutine.
my ( $first, $second, #rest ) = #_;
my ( $first_capture, $second_capture, #others ) = $feldman =~ /$some_pattern/;
NOTE:: That said, your regex assumes too much about the text to be useful in more than a handful of situations. Not capturing any table name that doesn't have dt as in positions 3 and 4 out of 7? It's good enough for 1) quick-and-dirty, 2) if you're okay with limited applicability.
It would be better to match the pattern if it follows FROM. I assume table names consist solely of ASCII letters. In that case, it is best to say what you want. With those two remarks out of the way, note that a successful capturing regex match in list context returns the matched substring(s).
#!/usr/bin/perl
use strict;
use warnings;
my $s = 'select * from aadttab, bbdttab';
if ( my ($table) = $s =~ /FROM ([A-Z]{2}DT[A-Z]{3})/i ) {
print $table, "\n";
}
__END__
Output:
C:\Temp> s
aadttab
Depending on the version of perl on your system, you may be able to use a named capturing group which might make the whole thing easier to read:
if ( $s =~ /FROM (?<table>[A-Z]{2}DT[A-Z]{3})/i ) {
print $+{table}, "\n";
}
See perldoc perlre.
Parens will let you grab part of the regex into special variables: $1, $2, $3...
So:
$line = ' abc andtabl 1234';
if($line =~m/ (\S{2}DT\S{3})/i) {
# here I wish to get (only) substring that match to pattern \S{2}DT\S{3}
# (7 letter table name) and display it.
print $1."\n";
}
Use a capturing group:
$line =~ /(\S{2}DT\S{3})/i;
my $substr = $1;
$& contains the string matched by the last pattern match.
Example:
$str = "abcdefghijkl";
$str =~ m/cdefg/;
print $&;
# Output: "cdefg"
So you could do something like
if($line =~m/ \S{2}DT\S{3}/i) {
print $&."\n";
}
WARNING:
If you use $& in your code it will slow down all pattern matches.