I'm currently learning about regular expressions and I'm trying to create a regex to match any legal variable name in Perl.
This is what I wrote so far:
^\$[A-Za-z_][a-zA-Z0-9_]*
The only problem is the regex returns true for special signs, for example the string $a& will return true.
What I did wrong?
Thanks!
Rotem
Parsing Perl is difficult, and the rules for what is and is not a variable are complicated. If you're attempting to parse Perl, consider using PPI instead. It can parse a Perl program and do things like find all the variables. PPI is what perlcritic uses to do its job.
If you want to try and do it anyway, here's some edge cases to consider...
$^F
$/
${^ENCODING}
$1
$élite # with utf8 on
${foo}
*{foo} = \42;
*{$name} = \42; # with strict off
${$name} = 42; # with strict off
And of course the other sigils #%*. And detecting if something is inside a single quoted string. Which is my way of strongly encouraging you to use PPI rather than try to do it yourself.
If you want practice, realistic practice is to pull the variable out of a larger string, rather than do exact matches.
# Match the various sigils.
my $sigils = qr{ [\$\#\%*] }x;
# Match $1 and #1 and so on
my $digit_var = qr{ $sigils \d+ }x;
# Match normal variables
my $named_var = qr{ $sigils [\w^0-9] \w* }x;
# Combine all the various variable matches
my $match_variable = qr{ ( $named_var | $digit_var ) }x;
This uses the () capture operator to grab just the variable. It also uses the /x modifier to make the regex easier to read and alternative delimiters to avoid leaning toothpick syndrome. Using \w instead of A-Z ensures that Unicode characters will be picked up when utf8 is on, and that they won't when its off. Finally, qr is used to build up the regex in pieces. Filling in the gaps is left as an exercise.
You need a $ at the end, otherwise it's just matches as far as it can and ignores the rest. So it should be:
^\$[A-Za-z_][A-Za-z0-9]*$
I needed to solve this problem to create a simple source code analyzer.
This subroutine extracts Perl user variables from an input section of code
sub extractVars {
my $line = shift;
chomp $line;
$line =~ s/#.*//; # Remove comments
$line =~ s/\s*;\s*$//; # Remove trailing ;
my #vars = ();
my $match = 'junk';
while ($match ne '') {
push #vars, $match if $match ne 'junk';
$match = '';
if ($line =~ s/(
[\#\$\%] # $#%
{? # optional brace
\$? # optional $
[\w^0-9] # begin var name
[\w\-\>\${}\[\]'"]* # var name
[\w}\]] # end var name
|
[\#\$\%] # $#%
{? # optional brace
\$? # optional $
[\w^0-9] # one letter var name
[}\]]? # optional brace or bracket
)//x) {
$match = $1;
next;
}
}
return #vars;
}
Test it with this code:
my #variables = extractVars('$a $a{b} $a[c] $scalar #list %hash $list[0][1] $list[-1] $hash{foo}{bar} $aref->{foo} $href->{foo}->{bar} #$aref %$hash_ref %{$aref->{foo}} $hash{\'foo\'} "$a" "$var{abc}"');
It does NOT work if the variable name contains spaces, for example:
$hash{"baz qux"}
${ $var->{foo} }[0]
Related
I am seeking advice on extracting a section of a string, that is always occurs as the first instance data between parenthesis using perl and regex and assign that value to a variable.
Here is the precise situation, I am using perl and regex to extract the courseID from a university catalog and assign it to a variable. Please consider the following:
BIO-2109-01 (12345) Introduction to Biology
CHM-3501-F2-01 (54321) Introduction to Chemistry
IDS-3250-01 (98765) History of US (1860-2000)
SPN-1234-02-F1 (45678) Spanish History (1900-2010)
The typical format is [course-section-name] [(courseID)] [courseName]
My goal is to create a script which can take each entry, one at a time, assign it to a variable and then use regex to extract only the courseID and assign only the courseID to a variable.
My approach has been to use search and replace to replace everything not matching that with '' and then saving what is left (the courseID) to the variable. Here are a few examples of what I have tried the following:
$string = "BIO-2109-01 (12345) Introduction to Biology";
($courseID = $string) =~ s/[^\d\d\d\d\d]//g;
print $courseID;
Result: 21090112345 --- printing the course-section-name and courseID
$string = "BIO-2109-01 (12345) Introduction to Biology";
$($courseID = $string) =~ s/[^\b\(\d{5}\)]\b//g;
print $courseID;
Result: 210901(12345) --- printing course-section-name, parens, and courseID
So I haven't had much luck with search and replace - however I found this nugget:
\(([^\)]+)\)
On http://regexr.com/ that will match the parens section. However, it would also match multiple parans, including for example (abc).
I'm not really sure at this point how to do something like this:
$string = "BIO-2109-01 (12345) Introduction to Biology";
($courseID = $string) =~ [magicRegex_goes_here];
print courseID;
result 12345
OR, better:
$string = IDS-3250-01 (98765) History of US (1860-2000)
($courseID = $string) =~ [magicRegex_goes_here];
print courseID;
result 98765
Any advice or direction would be greatly appreciated. I have tried everything I know and can research in regards to regex to solve this problem. If there is anymore information I can include please ask away.
UPDATE
use warnings 'all';
use strict;
use feature 'say';
my $file = './data/enrollment.csv'; #File this script generates
my $course = ""; #Complete course string [name-of-course] [(courseID)] [course_name]
my #arrayCourses = ""; #Array of courseIDs
my $i = ""; #i in for loop
my $courseID = ""; #Extracted course ID
my $userName = ""; #Username of person we are enrolling
my $action = "add,"; #What we are doing to user
my $permission = "teacher,"; #What permissions to assign to user
my $stringToPrint = ""; #Concatinated string to write to file
my $n = "\n"; #\n
my $c = ","; #,
#BEGIN PROGRAM
print "Enter the username \n";
chomp($userName = <STDIN>); #Get the enrollee username from user
print "\n";
print "Enter course name and press enter. Enter 'x' to end. \n"; #prompt for course names
while ($course ne 'x') {
chomp($course = <STDIN>);
if ($course ne "x") {
if (($courseID) = ($course =~ /[^(]+\(([^)]+)\)/) ) { #nasty regex to extract courseID - thnx PerlDuck and zdim
push #arrayCourses, $courseID; #put the courseID into array
}
else {
print "Cannot process last entry check it";
}
}
else {
last;
}
}
shift #arrayCourses; #Remove first entry from array - add,teacher,,username
open(my $fh,'>', $file); #open file
for $i (#arrayCourses) #write array to file
{
$stringToPrint= join "", $action, $permission, $i, $c, $userName, $n ;
print $fh $stringToPrint;
}
close $fh;
That'll do it! Suggestions or improvements are always welcome! Thanks #PerlDuck and #zdim
#!/usr/bin/env perl
use strict;
use warnings;
while( my $line = <DATA> ) {
if (my ($courseID) = ($line =~ /[^(]+\(([^)]+)\)/) ) {
print "course-ID = $courseID; -- line was $line";
}
}
__DATA__
BIO-2109-01 (12345) Introduction to Biology
CHM-3501-F2-01 (54321) Introduction to Chemistry
IDS-3250-01 (98765) History of US (1860-2000)
SPN-1234-02-F1 (45678) Spanish History (1900-2010)
Output:
course-ID = 12345; -- line was BIO-2109-01 (12345) Introduction to Biology
course-ID = 54321; -- line was CHM-3501-F2-01 (54321) Introduction to Chemistry
course-ID = 98765; -- line was IDS-3250-01 (98765) History of US (1860-2000)
course-ID = 45678; -- line was SPN-1234-02-F1 (45678) Spanish History (1900-2010)
The pattern I used, /[^(]+\(([^)]+)\)/, can also be written as
/ [^(]+ # 1 or more characters that are not a '('
\( # a literal '('. You must escape that because you don't want
# to start it a capture group.
([^)]+) # 1 or more chars that are not a ')'.
# The sorrounding '(' and ')' capture this match
\) # a literal ')'
/x
The /x modifier allows you to insert spaces, comments, and even newlines right in the pattern.
Just in case you're unsure about the /x. You can indeed write:
while( my $line = <DATA> ) {
if (my ($courseID) = ($line =~ / [^(]+ # …
\( # …
([^)]+) # …
\) # …
/x ) ) {
print "course-ID = $courseID; -- line was $line";
}
}
That's probably not nice to read but you can also store the regex in a separate variable:
my $pattern =
qr/ [^(]+ # 1 or more characters that are not a '('
\( # a literal '(' (you must escape it)
([^)]+) # 1 or more chars that are not a ')'.
# The sorrounding '(' and ')' capture this match
\) # a literal ')'
/x;
And then:
if (my ($courseID) = ($line =~ $pattern)) {
…
}
Since you nailed down the format
my ($section, $id, $name) =
$string =~ /^\s* ([^(]+) \(\s* ([^)]+) \)\s* (.+) $/x;
The key here is the negated character class, [^...], which matches any one character other than those listed inside following the ^ (which makes it "negated"). The un-escaped parenthesis capture the match, except inside a character class [] where they are taken as literal.
It first matches all consecutive characters other than (, so up to first (, what is captured by the pair of ( ) around it. Then all other than ), so up to the first closing paren, also captured by its own pair ( ). This comes between literal parenthesis \( ... \), which are outside of ( ) since we don't want them captured. Then all the rest is captured, (.+), requiring at least some characters since + means one or more. Note though that these can be spaces. We exclude possible leading white space from the first capture, by matching it specifically before the capturing parenthesis, and extract (some of) possible spaces around id-parenthesis.
The /x modifier allows use of spaces (and comments and newlines) inside, what helps reaadbility. The match operator returns a list of all matches, which we assign to variables. Note, even if there is only one match it still returns (it as) a list. See Regular Expressions Tutorial (perlretut).
Then, assuming that you have the catalog in a file
use warnings 'all';
use strict;
use feature 'say';
my $file = 'catalog.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>)
{
next if $line =~ /^\s*$/; # skip empty lines
# Strip leading and trailing white space
$line =~ s{^\s*|\s*$}{}g;
my ($section, $id, $name) =
$line =~ /^ ([^(]+) \(\s* ([^)]+) \)\s* (.+) $/x
or do {
warn "Error with expected format -- ";
next;
};
say "$section, $id, $name";
}
close $fh;
I use s{}{} delimiters since s/// confuse markup's syntax highlighter with this pattern, which is also a good demonstration since these sometimes help readability a lot.
You would store the retrieved variables in a suitable data structure. Any combination of arrays and hashes (and their references) comes to mind, depending on what need be done with them later. See Cookbook of Data Structures (perldsc).
Note on the error handling. Since none of the matches involve * (allowing zero matches -- nothing), if any component of your format isn't as expected there won't be a match at all and we get an error. The .+ is extremely permissive but it still requires something to be there. This is why the trailing space is first stripped, so that the last pattern (.+) cannot be satisfied by spaces alone.
If the only objective is the course id and we are certain that the first parenthesis are around it
my ($id) = $line =~ / \(\s* ([^)]+) \) /x or do { ... };
We now only need to match and capture the middle piece, something inside parenthesis.
I am trying to remove commas between double quotes in a string, while leaving other commas intact? (This is an email address which sometimes contains spare commas). The following "brute force" code works OK on my particular machine, but is there a more elegant way to do it, perhaps with a single regex?
Duncan
$string = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <ubabankoffice93#gmail.com>,1,2';
print "Initial string = ", $string, "<br>\n";
# Extract stuff between the quotes
$string =~ /\"(.*?)\"/;
$name = $1;
print "name = ", $1, "<br>\n";
# Delete all commas between the quotes
$name =~ s/,//g;
print "name minus commas = ", $name, "<br>\n";
# Put the modified name back between the quotes
$string =~ s/\"(.*?)\"/\"$name\"/;
print "new string = ", $string, "<br>\n";
You can use this kind of pattern:
$string =~ s/(?:\G(?!\A)|[^"]*")[^",]*\K(?:,|"(*SKIP)(*FAIL))//g;
pattern details:
(?: # two possible beginnings:
\G(?!\A) # contiguous to the previous match
| # OR
[^"]*" # all characters until an opening quote
)
[^",]* #"# all that is not a quote or a comma
\K # discard all previous characters from the match result
(?: # two possible cases:
, # a comma is found, so it will be replaced
| # OR
"(*SKIP)(*FAIL) #"# when the closing quote is reached, make the pattern fail
# and force the regex engine to not retry previous positions.
)
If you use an older perl version, \K and the backtracking control verbs may be not supported. In this case you can use this pattern with capture groups:
$string =~ s/((?:\G(?!\A)|[^"]*")[^",]*)(?:,|("[^"]*(?:"|\z)))/$1$2/g;
One way would be to use the nice module Text::ParseWords to isolate the specific field and perform a simple transliteration to get rid of the commas:
use strict;
use warnings;
use Text::ParseWords;
my $str = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <ubabankoffice93#gmail.com>,1,2';
my #row = quotewords(',', 1, $str);
$row[2] =~ tr/,//d;
print join ",", #row;
Output:
06/14/2015,19:13:51,"Mrs Nkolika Nebedom" <ubabankoffice93#gmail.com>,1,2
I assume that no commas can appear legitimately in your email field. Otherwise some other replacement method is required.
How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.
I'm writing an abstraction function that will ask the user a given question and validate the answer based on a given regular expression. The question is repeated until the answer matches the validation regexp.
However, I also want the client to be able to specify whether the answer must match case-sensitively or not.
So something like this:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = #_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
# What I have to do:
$isValid = ($caseSensitive) ?
($ans =~ /$validationRe/) :
($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}
Upshot: is there a way to dynamically specify a regular expression's modifiers?
Upshot: is there a way to dynamically specify a regular expression's modifiers?
From perldoc perlre:
"(?adlupimsx-imsx)"
"(?^alupimsx)"
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by "-") for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
This is particularly useful for dynamic patterns, such as those read
in from a configuration file, taken from an argument, or specified in
a table somewhere. Consider the case where some patterns want to be
case-sensitive and some do not: The case-insensitive ones merely need
to include "(?i)" at the front of the pattern.
Which gives you something along the lines of
$isValid = $ans =~ m/(?$modifier)$validationRe/;
Just be sure to take the appropriate security precautions when accepting user input in this way.
You might also like the qr operator which quotes its STRING as a regular expression.
my $rex = qr/(?$mod)$pattern/;
$isValid = <STDIN> =~ $rex;
Get rid of your $caseSensitive parameter, as it will be useless in many cases. Instead, users of that function can encode the necessary information directly in the $validationRe regex.
When you create a regex object like qr/foo/, then the pattern is at that point compiled into instructions for the regex engine. If you stringify a regex object, you'll get a string that when interpolated back into a regex will have exactly the same behaviour as the original regex object. Most importantly, this means that all flags provided or omitted from the regex object literal will be preserved and can't be overridden! This is by design, so that a regex object will continue to behave identical no matter what context it is used in.
That's a bit dry, so let's use an example. Here is a match function that tries to apply a couple similar regexes to a list of strings. Which one will match?
use strict;
use warnings;
use feature 'say';
# This sub takes a string to match on, a regex, and a case insensitive marker.
# The regex will be recompiled to anchor at the start and end of the string.
sub match {
my ($str, $re, $i) = #_;
return $str =~ /\A$re\z/i if $i;
return $str =~ /\A$re\z/;
}
my #words = qw/foo FOO foO/;
my $real_regex = qr/foo/;
my $fake_regex = 'foo';
for my $re ($fake_regex, $real_regex) {
for my $i (0, 1) {
for my $word (#words) {
my $match = 0+ match($word, $re, $i);
my $output = qq("$word" =~ /$re/);
$output .= "i" if $i;
say "$output\t-->" . uc($match ? "match" : "fail");
}
}
}
Output:
"foo" =~ /foo/ -->MATCH
"FOO" =~ /foo/ -->FAIL
"foO" =~ /foo/ -->FAIL
"foo" =~ /foo/i -->MATCH
"FOO" =~ /foo/i -->MATCH
"foO" =~ /foo/i -->MATCH
"foo" =~ /(?^:foo)/ -->MATCH
"FOO" =~ /(?^:foo)/ -->FAIL
"foO" =~ /(?^:foo)/ -->FAIL
"foo" =~ /(?^:foo)/i -->MATCH
"FOO" =~ /(?^:foo)/i -->FAIL
"foO" =~ /(?^:foo)/i -->FAIL
First, we should notice that the string representation of regex objects has this weird (?^:...) form. In a non-capturing group (?: ... ), modifiers for the pattern inside the group can be added or removed between the question mark and colon, while the ^ indicates the default set of flags.
Now when we look at the fake regex that's actually just a string being interpolated, we can see that the addition of the /i flag makes a difference as expected. But when we use a real regex object, it doesn't change anything: The outside /i cannot override the (?^: ... ) flags.
It is probably best to assume that all regexes already are regex objects and should not be interfered with. If you load the regex patterns from a file, you should require the regexes to use the (?: ... ) syntax to apply flages (e.g. (?^i:foo) as an equivalent to qr/foo/i). E.g. loading one regex per line from a file handle could look like:
my #regexes;
while (<$fh>) {
chomp;
push #regexes, qr/$_/; # will die here on regex syntax errors
}
You need to use the eval function. The below code will work:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = #_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
$isValid = eval "$ans =~ /$validationRe/$modifier";
# What I have to do:
#$isValid = ($caseSensitive) ?
# ($ans =~ /$validationRe/) :
# ($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}
My program read other programs source code and colect information about used SQL queries. I have problem with getting substring.
...
$line = <FILE_IN>;
until( ($line =~m/$values_string/i && $line !~m/$rem_string/i) || eof )
{
if($line =~m/ \S{2}DT\S{3}/i)
{
# here I wish to get (only) substring that match to pattern \S{2}DT\S{3}
# (7 letter table name) and display it.
$line =~/\S{2}DT\S{3}/i;
print $line."\n";
...
In result print prints whole line and not a substring I expect. I tried different approach, but I use Perl seldom and probably make basic concept error. ( position of tablename in line is not fixed. Another problem is multiple occurrence i.e.[... SELECT * FROM AADTTAB, BBDTTAB, ...] ). How can I obtain that substring?
Use grouping with parenthesis and store the first group.
if( $line =~ /(\S{2}DT\S{3})/i )
{
my $substring = $1;
}
The code above fixes the immediate problem of pulling out the first table name. However, the question also asked how to pull out all the table names. So:
# FROM\s+ match FROM followed by one or more spaces
# (.+?) match (non-greedy) and capture any character until...
# (?:x|y) match x OR y - next 2 matches
# [^,]\s+[^,] match non-comma, 1 or more spaces, and non-comma
# \s*; match 0 or more spaces followed by a semi colon
if( $line =~ /FROM\s+(.+?)(?:[^,]\s+[^,]|\s*;)/i )
{
# $1 will be table1, table2, table3
my #tables = split(/\s*,\s*/, $1);
# delim is a space/comma
foreach(#tables)
{
# $_ = table name
print $_ . "\n";
}
}
Result:
If $line = "SELECT * FROM AADTTAB, BBDTTAB;"
Output:
AADTTAB
BBDTTAB
If $line = "SELECT * FROM AADTTAB;"
Output:
AADTTAB
Perl Version: v5.10.0 built for MSWin32-x86-multi-thread
I prefer this:
my ( $table_name ) = $line =~ m/(\S{2}DT\S{3})/i;
This
scans $line and captures the text corresponding to the pattern
returns "all" the captures (1) to the "list" on the other side.
This psuedo-list context is how we catch the first item in a list. It's done the same way as parameters passed to a subroutine.
my ( $first, $second, #rest ) = #_;
my ( $first_capture, $second_capture, #others ) = $feldman =~ /$some_pattern/;
NOTE:: That said, your regex assumes too much about the text to be useful in more than a handful of situations. Not capturing any table name that doesn't have dt as in positions 3 and 4 out of 7? It's good enough for 1) quick-and-dirty, 2) if you're okay with limited applicability.
It would be better to match the pattern if it follows FROM. I assume table names consist solely of ASCII letters. In that case, it is best to say what you want. With those two remarks out of the way, note that a successful capturing regex match in list context returns the matched substring(s).
#!/usr/bin/perl
use strict;
use warnings;
my $s = 'select * from aadttab, bbdttab';
if ( my ($table) = $s =~ /FROM ([A-Z]{2}DT[A-Z]{3})/i ) {
print $table, "\n";
}
__END__
Output:
C:\Temp> s
aadttab
Depending on the version of perl on your system, you may be able to use a named capturing group which might make the whole thing easier to read:
if ( $s =~ /FROM (?<table>[A-Z]{2}DT[A-Z]{3})/i ) {
print $+{table}, "\n";
}
See perldoc perlre.
Parens will let you grab part of the regex into special variables: $1, $2, $3...
So:
$line = ' abc andtabl 1234';
if($line =~m/ (\S{2}DT\S{3})/i) {
# here I wish to get (only) substring that match to pattern \S{2}DT\S{3}
# (7 letter table name) and display it.
print $1."\n";
}
Use a capturing group:
$line =~ /(\S{2}DT\S{3})/i;
my $substr = $1;
$& contains the string matched by the last pattern match.
Example:
$str = "abcdefghijkl";
$str =~ m/cdefg/;
print $&;
# Output: "cdefg"
So you could do something like
if($line =~m/ \S{2}DT\S{3}/i) {
print $&."\n";
}
WARNING:
If you use $& in your code it will slow down all pattern matches.