How can I create a regex pattern than spans multiple lines [duplicate] - regex

This question already has an answer here:
split one line regex in a multiline regexp in perl
(1 answer)
Closed 2 years ago.
I have a long regex pattern that I want to split over multiple lines.
Is there a way to take
$s !~ m"((\d)|(\d))";
and split it so you get something like
$s !~ m"((\d)|"+
"(\d))";
The above guess is incorrect and results in
Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE (\d)|/ at perl5.pl line 20.
What can I use?

You can't split a pattern into two like that. You can however have a single pattern spread neatly over multiple lines by using the /x modifier to tell the regex parser to ignore whitespace. For example:
$s !~ m{
((\d)
|
(\d))
}x;

Related

Changing the results of a regex capture group to lowercase in powershell [duplicate]

This question already has answers here:
PowerShell Replace number in string
(3 answers)
Closed 3 years ago.
I have this regex capture group:
$lowerPattern='(href[\s]?=[\s]?\"[^"]*[^"]*\")'
which is returning all the matches I need just fine. However I need to replace the capture group with the results all lowercase:
$lowerPatternReplace = '$1'.ToLower()
This doesn't seem to be working. How you lowercase a capture group in powershell regex?
This code seems to work for me. It's just a bit less shorthand. I didn't see a way to do it with backreferences, due to the order of execution (you're lowering the literal string '$1').
$Entry = 'asdHREFasd'
$RegEx = '(href)'
$match = $Entry -match $RegEx
[string]$upper = $Matches[1] #first capture group
[string]$lower = $upper.ToLower()
[string]$Entry.replace($upper,$lower)
source

Perl Regular Expression issues [duplicate]

This question already has answers here:
In Perl, how can I get the matched substring from a regex?
(6 answers)
Extract the required substring from another string -Perl
(4 answers)
How do I access captured substrings after a successful regex match in Perl?
(4 answers)
Closed 5 years ago.
how do I capture everything after T and everything before T using a regex.
What I have so far is only giving me the number 1.
my $string = '2014-06-09T01:59:54.998Z';
my $mystring = $string =~ m/T(.*)Z/;
I am not very well with regex. I assumed this is getting anything between T and Z. Tried leaving off the Z still prints 1.But it only prints
1
my ($date, $time) = split /T/, $string;
In your case, you forget to put your match in list context.
my ($mystring) = ($string =~ m/T(.*)Z/);
In scalar context, you get the number of captured substrings.

Split into words by an uncommented comma that is not inside matching parentheses

Consider the following string:
blah, foo(a,b), bar(c,d), yo
I want to extract a list of strings:
blah
foo(a,b)
bar(c,d)
yo
It seems to me that I should be able to use quote words here, but I'm struggling with the regex. Can someone help me out?
Perl has a little thing regex recursion, so you might be able to look for:
either a bare word like blah containing no parentheses (\w+)
a "call", like \w+\((?R)(, *(?R))*\)
The total regex is (\w+(\((?R)(, ?(?R))*\))?), which seems to work.
You can use the following regex to use in split:
\([^()]*\)(*SKIP)(*F)|\s*,\s*
With \([^()]*\), we match a ( followed with 0 or more characters other than ( or ) and then followed with ). We fail the match with (*SKIP)(*F) if that parenthetical construction is found, and then we only match the comma surrounded with optional whitespaces.
See demo
#!/usr/bin/perl
my $string= "blah, foo(a,b), bar(c,d), yo";
my #string = split /\([^()]*\)(*SKIP)(*F)|\s*,\s*/, $string;
foreach(#string) {
print "$_\n";
}
To account for commas inside nested balanced parentheses, you can use
my #string = split /\((?>[^()]|(?R))*\)(*SKIP)(*F)|\s*,\s*/, $string;
Here is an IDEONE demo
With \((?>[^()]|(?R))*\) we match all balanced ()s and fail the match if found with the verbs (*SKIP)(*F), and then we match a comma with optional whitespace around (so as not to manually trim the strings later).
For a blah, foo(b, (a,b)), bar(c,d), yo string, the result is:
blah
foo(b, (a,b))
bar(c,d)
yo
There is a solution given by Borodin for one of your question (which is similar to this question). A small change of regex will give you desire output: (this will not work for nested parentheses)
use strict;
use warnings;
use 5.010;
my $line = q<blah, foo(a,b), bar(c,d), yo>;
my #words = $line =~ / (?: \([^)]*\) | [^,] )+ /xg;
say for #words;
Output:
blah
foo(a,b)
bar(c,d)
yo

Regex pattern matching for ; and / on multiple lines

I just started learning Perl today, and I am working with regular expressions to match text from within a file.
I am checking to see if my file contains.
/
;
This is what I have attempted so far:
if ($Text =~ /;\n//)
{
//dostuff
}
Is this syntax correct? Do I need to use the \n or is there a character for end of line? Also, can I search for / or do I need some sort of escape character?
To search for / you have to escape it with \.
To check your example however, you need to turn it around because you want to match the / before the ;.
In the end it should look like this: if ($Text =~ /\/\n;/)
For more information, see perlretut to get an introduction
to regular expressions in Perl.
Use a backslash to escape your forward slash. Also why are you trying to match it in reverse order? Try $Text =~ /\/\n;/.

How can I get my Perl regex not to use special characters from an interpolated variable? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I escape meta-characters when I interpolate a variable in Perl's match operator?
I am using the following regex to search for a string $word in the bigger string $referenceLine as follows :
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$word\b)/g
The problem happens when my $word substring contains some (, etc. Because it takes it as a part of the regex rather than the string to match and gives the following error :
Unmatched ( in regex; marked by <-- HERE in
m/( <-- HERE ?=\b( darsheel safary\b)/
at ./bleu.pl line 119, <REFERENCE> line 1.
Can somone please tell me a solution to this? I think If I could somehow get perl to understand that we want to look for the whole $word as it is without evaluating it, it might work out.
Use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b\Q$word\E\b)/g
to tell the regex engine to treat every character in $word as a literal character.
\Q marks the start, \E marks the end of a literal string in Perl regex.
Alternatively, you could do
$quote_word = quotemeta($word);
and then use
$wordRefMatchCount =()= $referenceLine =~ /(?=\b$quote_word\b)/g
One more thing (taken up here from the comments where it's harder to find:
Your regex fails in your example case because of the word boundary anchor \b. This anchor matches between a word character and a non-word character. It only makes sense if placed around actual words, i. e. \bbar\b to ensure that only bar is matched, not foobar or barbaric. If you put it around non-words (as in \b( darsheel safary\b) then it will cause the match to fail (unless there is a letter, digit or underscore right before the ().