Problems with perl regex

Problems with perl regex - regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything...
I am surprised this (text =~ /^\W+\CC.*\A\.CC\[3].*/) is not working
Thanks

\A is an escape sequence that denotes beginning of line, or ^ like in the beginning of your regex. Remove the backslash to make it match a literal A.
Edit: You also seem to have \C in there. You should only use backslash to escape meta characters such as period ., or to create escape sequences, such as \Q .. \E.
At its simplest, a regex to match A.CC3 would be
$text =~ /A\.CC3/
That's all you need. This will match any string with A.CC3 in it. In the comments you mention the string you are matching is this:
my $text = "//%CC Unused Static Globals, A.CC3, Halstead Progam Volume";
You might want to avoid partial matches, in which case you can use word boundary \b
$text =~ /\bA\.CC3\b/
You might require that a line begins with //%
$text =~ m#^//%.*\bA\.CC3\b#
Of course, only you know which parts of the string should be matched and in what way. "Something followed by anything followed by A.CC3 followed by anything" really just needs the first simple regex.

It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do
if ( index( $str, 'A.CC3' ) >= 0 ) # Found it...
No need for a regex.

Try to give this a shot:
^.*?A\.CC.*$
That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.

It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work:
^.*A\.CC3.*$

Related

Need help in matching regexp

I am having a string say
my $str = "FILLER-1-1,EQPT:MN,EQPT_MISSING,NSA,04-30,15-07-13,NEND,NA";
I want to match a pattern say
my $pattern = "FILLER-1-1";
I am using the below regexp
$reg = $str =~ /$pattern/;
This is working fine
Now the problem is it is also matching if our string is
FILLER-1-10/FILLER-1-11/FILLER-1-12 so on ...
I dont want to match this. Also I don't want my regexp to be like
$reg = $str =~ /$pattern\W+/;
This one is working against the above mentioned issue but \W may come or not come. In some strings it can come while in other it may not come. So i need the regexp to match only FILLER-1-1 without using \W+ and it should match specifically FILLER-1-10
Note: If somebody is doing -(minus) rating to my question, please let me know what's wrong in the code. It will be appreciable if the person write the comment too

As \w matches [a-zA-Z0-9], you can use the zero-width assumption \b, which denotes a change in \w state (called a "word boundary", hence the "b" shortcut):
/FILLER-1-1\b/
This means that there needs to be a character that differs from the previous word state - a word state change.
It will match
FILLER-1-1.
FILLER-1-1&
FILLER-1-1,
It will not match
FILLER-1-1a
FILLER-1-16
Read more about it here.

If you want to match FILLER at the start of the input (line) followed by two numbers, this simple regex should work:
/~FILLER-\d+-\d+/
~ matches the beginning of the input
\d matches any digit ([0-9])
+ matches at least one, but can match any number

use ? quantifier like so:
/FILLER-\d-\d\W?/
The \W? means not a word zero or one time

Why doesn't zero-width match regex work?

I wrote a Perl function to replace job name in JCL script. Zero-width match was used here.
sub modify_jcl_jobname ()
{
my ($jcl, $old, $new) = #_;
$jcl =~ s/
# The name must begin in column 3.
^(?<=\/\/)
# The first charater must be alphabetic or national.
($old)
# The name must be followed by at leat on blank.
# Append JCL keyword JOB
(?=\s+JOB)
/$new/xmig; # Multi-lines, ignore case.
return $jcl;
}
But this function didn't work until I did a simple modification that just deleted the leading sign "^".
#before ^(?<=\/\/)
#after (?<=\/\/)
So I'd like to make it clear that the cause of problem. Any reply would be appreciated. Thanks.

The problem lies with
^(?<=\/\/)
That pattern will only match if the spot after which ^ matched is preceded by the two characters //. That's never going to happen since /^/m matches the start of the string and after a newline.
But you don't want to start matching at the start of the line. You want to start matching 2 characters in. What you want is actually:
(?<=^\/\/)
After doing some improvements, the code looks like:
sub modify_jcl_jobname {
my ($jcl, $old, $new) = #_;
$jcl =~ s{
(?<= ^// )
\Q$old\E
(?= \s+ JOB )
}{$new}xmig;
return $jcl;
}
Improvements:
Removed the incorrect prototype (()). It forced the caller to tell Perl to ignore the prototype (by using &).
Added code (\Q...\E) to convert the contents of $old into a regex pattern before using it as such.
Removed the needless capture ((...)).
Switched the delimiters of the substitution (from s/// to s{}{}) to require less escaping.
Removed highly redundant comments. (Good comments explain why something is being done rather than what is being done.)
The optimiser might handle this version better:
$jcl =~ s{
^// \K
\Q$old\E
(?= \s+ JOB )
}{$new}xmig;

The ^ sign matches the beginning of the line. You then want something preceded by two slashes - where should these slashes go if the next character is the very first character of the line?
s{^//
($old)
...
}{//$new}xmig
should work: you need no look behind.
Update: Thanks to ikegami, I now see why you used it. You want to keep the // in the string: well, you can repeat them in the substitution, or move the ^ character into the look-behind.

PERL-Subsitute any non alphanumerical character to "_"

In perl I want to substitute any character not [A-Z]i or [0-9] and replace it with "_" but only if this non alphanumerical character occurs between two alphanumerical characters. I do not want to touch non-alphanumericals at the beginning or end of the string.
I know enough regex to replace them, just not to only replace ones in the middle of the string.

s/(\p{Alnum})\P{Alnum}(\p{Alnum})/${1}_${2}/g;
Of course that would hurt your chanches with "#A#B%C", so you might use a look-arounds:
s/(?<=\p{Alnum})\P{Alnum}(?=\p{Alnum})/_/g;
That way you isolate it to just the non "alnum" character.
Or you could use the "keep flag", as well and get the same thing done.
s/\p{Alnum}\K\P{Alnum}(?=\p{Alnum})/_/g;
EDIT based on input:
To not eat a newline, you could do the following:
s/\p{Alnum}\K[^\p{Alnum}\n](?=\p{Alnum})/_/g;

Try this:
my $str = 'a-2=c+a()_';
$str =~ s/(?<=[A-Z0-9])[^A-Z0-9](?=[A-Z0-9])/\1_\2/gi;

How can I extract a substring enclosed in double quotes in Perl?

I'm new to Perl and regular expressions and I am having a hard time extracting a string enclosed by double quotes. Like for example,
"Stackoverflow is
awesome"
Before I extract the strings, I want to check if it is the end of the line of the whole text was in the variable:
if($wholeText =~ /\"$/) #check the last character if " which is the end of the string
{
$wholeText =~ s/\"(.*)\"/$1/; #extract the string, removed the quotes
}
My code didn't work; it is not getting inside of the if condition.

You need to do:
if($wholeText =~ /"$/)
{
$wholeText =~ s/"(.*?)"/$1/s;
}
. doesn't match newlines unless you apply the /s modifier.
There's no need to escape the quotes like you're doing.

The above poster who recommended using the "m" flag in the regular expression is correct, however the regex provided won't quite work. When you say:
$wholeText =~ s/\"(.*)\"/$1/m; #extract the string, removed the quotes
...the regular expression is too "greedy", which means the (.*) part will gobble up too much of the text. If you have a sample like this:
"The quick brown fox," he said, "jumped over the lazy dog."
...then the above regex will capture everything from "The" through "dog.", which is probably not what you intend. There are two ways to make the regex less greedy. Which one is better has everything to do with how you choose to handle extra " marks inside your string.
One:
$wholeText =~ s/\"([^"]*)\"/$1/m;
Two:
$wholeText =~ s/\"(.*?)\"/$1/m;
In One, the regex says "start with quote, then find everything that is not a quote and remember it, until you see another quote." In Two, the regex says "Start with quote, then find everything until you find another quote." The extra ? inside the ( ) tells the regex processor to not be greedy. Without considering quote escaping within the string, both regular expressions should behave the same.
By the way, this is a classic problem when parsing a CSV ("Comma Separated Values") file, by the way, so looking up some references on that may help you out.

If you want to anchor a match to the very end of the string (not line, entire string), use the \z anchor:
if( $wholeText =~ /"\z/ ) { ... }
You don't need a guard condition for this. Just use the right regex in the substitution. If it doesn't match the regex, nothing happens:
$wholeText =~ s/"(.*?)"\z/$1/s;
I think you really have a different question though. Why are you trying to anchor it to the end of the string? What problems are you trying to avoid?

For multi-line strings, you need to include the 'm' modifier with the search pattern.
if ($wholeText =~ m/\"$/m) # First m for match operator; second multi-line modifier
{
$wholeText =~ s/\"(.*?)\"/$1/s; #extract the string, removed the quotes
}
You will also need to consider whether you allow double quotes inside the string and if so, which convention to use. The primary ones are backslash and double quote (also backslash backslash), or double quote double quote in the string. These slightly complicate your regex.
The answer by #chaos uses 's' as a multi-line modifier. There's a small difference between the two:
m
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.
s
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

Assuming you have a single substring in quotes, this will extract it:
s/."(.?)".*/$1/
And the answer above (s/"(.*?)"/$1/s) will just remove quotes.
Test code:
my $text = "no \"need this\" again, no\n";
my $text2 = $text;
print $text;
$text2 =~ s/.*\"(.*?)\".*/$1/;
print $text2;
$text =~ s/"(.*?)"/$1/s;
print $text;
Output:
no "need this" again, no
need this
no need this again, no

How can I preserve whitespace when I match and replace several words in Perl?

Let's say I have some original text:
here is some text that has a substring that I'm interested in embedded in it.
I need the text to match a part of it, say: "has a substring".
However, the original text and the matching string may have whitespace differences. For example the match text might be:
has a
substring
or
has a substring
and/or the original text might be:
here is some
text that has
a substring that I'm interested in embedded in it.
What I need my program to output is:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
I also need to preserve the whitespace pattern in the original and just add the start and end markers to it.
Any ideas about a way of using Perl regexes to get this to happen? I tried, but ended up getting horribly confused.

Been some time since I've used perl regular expressions, but what about:
$match = s/(has\s+a\s+substring)/[$1]/ig
This would capture zero or more whitespace and newline characters between the words. It will wrap the entire match with brackets while maintaining the original separation. It ain't automatic, but it does work.
You could play games with this, like taking the string "has a substring" and doing a transform on it to make it "has\s*a\s*substring" to make this a little less painful.
EDIT: Incorporated ysth's comments that the \s metacharacter matches newlines and hobbs corrections to my \s usage.

This pattern will match the string that you're looking to find:
(has\s+a\s+substring)
So, when the user enters a search string, replace any whitespace in the search string with \s+ and you have your pattern. The, just replace every match with [match starts here]$1[match ends here] where $1 is the matched text.

In regexes, you can use + to mean "one or more." So something like this
/has\s+a\s+substring/
matches has followed by one or more whitespace chars, followed by a followed by one or more whitespace chars, followed by substring.
Putting it together with a substitution operator, you can say:
my $str = "here is some text that has a substring that I'm interested in embedded in it.";
$str =~ s/(has\s+a\s+substring)/\[match starts here]$1\[match ends here]/gs;
print $str;
And the output is:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

A many has suggested, use \s+ to match whitespace. Here is how you do it automaticly:
my $original = "here is some text that has a substring that I'm interested in embedded in it.";
my $search = "has a\nsubstring";
my $re = $search;
$re =~ s/\s+/\\s+/g;
$original =~ s/\b$re\b/[match starts here]$&[match ends here]/g;
print $original;
Output:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
You might want to escape any meta-characters in the string. If someone is interested, I could add it.

This is an example of how you could do that.
#! /opt/perl/bin/perl
use strict;
use warnings;
my $submatch = "has a\nsubstring";
my $str = "
here is some
text that has
a substring that I'm interested in, embedded in it.
";
print substr_match($str, $submatch), "\n";
sub substr_match{
my($string,$match) = #_;
$match =~ s/\s+/\\s+/g;
# This isn't safe the way it is now, you will need to sanitize $match
$string =~ /\b$match\b/;
}
This currently does anything to check the $match variable for unsafe characters.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Problems with perl regex - regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything... I am surprised this (text =~ /^\W+\CC.\A\.CC\[3]./) is not working Thanks

It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do if ( index( $str, 'A.CC3' ) >= 0 ) # Found it... No need for a regex.

Try to give this a shot: ^.?A\.CC.$ That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.

It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work: ^.A\.CC3.$

Related

Need help in matching regexp

Why doesn't zero-width match regex work?

PERL-Subsitute any non alphanumerical character to "_"

How can I extract a substring enclosed in double quotes in Perl?

How can I preserve whitespace when I match and replace several words in Perl?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Problems with perl regex - regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything... I am surprised this (text =~ /^\W+\CC.*\A\.CC\[3].*/) is not working Thanks

It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do if ( index( $str, 'A.CC3' ) >= 0 ) # Found it... No need for a regex.

Try to give this a shot: ^.*?A\.CC.*$ That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.

It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work: ^.*A\.CC3.*$

Related

Need help in matching regexp

Why doesn't zero-width match regex work?

PERL-Subsitute any non alphanumerical character to "_"

How can I extract a substring enclosed in double quotes in Perl?

How can I preserve whitespace when I match and replace several words in Perl?

Categories

Resources

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything... I am surprised this (text =~ /^\W+\CC.\A\.CC\[3]./) is not working Thanks

Try to give this a shot: ^.?A\.CC.$ That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.

It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work: ^.A\.CC3.$