Perl regex to extract digits from string with parenthesis - regex

I have the following string:
my $string = "Ethernet FlexNIC (NIC 1) LOM1:1-a FC:15:B4:13:6A:A8";
I want to extract the number that is in brackets (1) in another variable.
The following statement does not work:
my ($NAdapter) = $string =~ /\((\d+)\)/;
What is the correct syntax?

\d+(?=[^(]*\))
You can use this.See demo.Yours will not work as inside () there is more data besides \d+.
https://regex101.com/r/fM9lY3/57

You could try something like
my ($NAdapter) = $string =~ /\(.*(\d+).*\)/;
After that, $NAdapter should include the number that you want.

my $string = "Ethernet FlexNIC (NIC 1) LOM1:1-a FC:15:B4:13:6A:A8";
I want to extract the number that is in brackets (1) in another
variable
Your regex (with some spaces for clarity):
/ \( (\d+) \) /x;
says to match:
A literal opening parenthesis, immediately followed by...
A digit, one or more times (captured in group 1), immediately followed by...
A literal closing parenthesis.
Yet, the substring you want to match:
(NIC 1)
is of the form:
A literal opening parenthesis, immediately followed by...
Some capital letters
STOP EVERYTHING! NO MATCH!
As an alternative, your substring:
(NIC 1)
could be described as:
Some digits, immediately followed by...
A literal closing parenthesis.
Here's the regex:
use strict;
use warnings;
use 5.020;
my $string = "Ethernet FlexNIC (NIC 1234) LOM1:1-a FC:15:B4:13:6A:A8";
my ($match) = $string =~ /
(\d+) #Match any digit, one or more times, captured in group 1, followed by...
\) #a literal closing parenthesis.
#Parentheses have a special meaning in a regex--they create a capture
#group--so if you want to match a parenthesis in your string, you
#have to escape the parenthesis in your regex with a backslash.
/xms; #Standard flags that some people apply to every regex.
say $match;
--output:--
1234
Another description of your substring:
(NIC 1)
could be:
A literal opening parenthesis, immediately followed by...
Some non-digits, immediately followed by...
Some digits, immediately followed by..
A literal closing parenthesis.
Here's the regex:
use strict;
use warnings;
use 5.020;
my $string = "Ethernet FlexNIC (ABC NIC789) LOM1:1-a FC:15:B4:13:6A:A8";
my ($match) = $string =~ /
\( #Match a literal opening parethesis, followed by...
\D+ #a non-digit, one or more times, followed by...
(\d+) #a digit, one or more times, captured in group 1, followed by...
\) #a literal closing parentheses.
/xms; #Standard flags that some people apply to every regex.
say $match;
--output:--
789
If there might be spaces on some lines and not others, such as:
spaces
||
VV
(NIC 1 )
(NIC 2)
You can insert a \s* (any whitespace, zero or more times) in the appropriate place in the regex, for instance:
my ($match) = $string =~ /
#Parentheses have special meaning in a regex--they create a capture
#group--so if you want to match a parenthesis in your string, you
#have to escape the parenthesis in your regex with a backslash.
\( #Match a literal opening parethesis, followed by...
\D+ #a non-digit, one or more times, followed by...
(\d+) #a digit, one or more times, captured in group 1, followed by...
\s* #any whitespace, zero or more times, followed by...
\) #a literal closing parentheses.
/xms; #Standard flags that some people apply to every regex.

Related

Perl Regexp::Common package not matching certain real numbers when used with word boundary

The following code below print "34" instead of the expected ".34"
use strict;
use warnings;
use Regexp::Common;
my $regex = qr/\b($RE{num}{real})\s*/;
my $str = "This is .34 meters of cable";
if ($str =~ /$regex/) {
print $1;
}
Do I need to fix my regex? (The word boundary is need as not including it will cause it match something string like xx34 which I don't want to)
Or is it is a bug in Regexp::Common? I always thought that a longest match should win.
The word boundary is a context-dependent regex construct. When it is followed with a word char (letter, digit or _) this location should be preceded either with the start of a string or a non-word char. In this concrete case, the word boundary is followed with a non-word char and thus requires a word char to appear right before this character.
You may use a non-ambiguous word boundary expressed with a negative lookbehind:
my $regex = qr/(?<!\w)($RE{num}{real})/;
^^^^^^^
The (?<!\w) negative lookbehind always denotes one thing: fail the match if there
is no word character immediately to the left of the current location.
Or, use a whitespace boundary if you want your matches to only occur after whitespace or start of string:
my $regex = qr/(?<!\S)($RE{num}{real})/;
^^^^^^^
Try this patern: (?:^| )(\d*\.?\d+)
Explanation:
(?:...) - non-capturing group
^| - match either ^ - beginning oof a string or - space
\d* - match zero or more digits
\.? - match dot literally - zero or one
\d+ - match one or more digits
Matched number will be stored in first capturing group.
Demo

Regular expression to replace content parentheses and their contents

I am looking for a regular expression that will replace parentheses and the strings within them if the string anything that is not a digit.
The string can be any combination of characters including numbers, letters, spaces etc.
For example:
(3) will not be replaced
(1234) will not be replaced
(some letters) will be replaced
(some letters, spaces - and numbers 123) will be replaced
So far I have a regex that will replace any parentheses and its content
str = str.replaceAll("\\(.*?\\)","");
I am not good with the syntax of replaceAll, so I am just going to write the way you have written it. But I think I can help you with the regex.
Try this Regex:
\((?=[^)]*[a-zA-Z ])[^)]+?\)
Demo
OR an even better one:
\((?!\d+\))[^)]+?\)
Demo
Explanation(for 1st Regex)
\( - matches opening paranthesis
(?=[^)]*[a-zA-Z ]) - Positive Lookahead - checks for 0 or more of any characters which are not ) followed by a space or a letter
[^)]+? - Captures 1 or more characters which are not )
\) - Finally matches the closing Paranthesis
Explanation(for 2nd Regex)
\( - matches opening paranthesis
(?!\d+\)) - Negative Lookahead - matches only those strings which do not have ALL the characters as digits after the opening paranthesis but before the closing paranthesis appears
[^)]+? - Captures 1 or more characters which are not )
\) - Finally matches the closing Paranthesis
Now, you can try your Replace statement as:
str = str.replaceAll("\((?=[^)]*[a-zA-Z ])[^)]+?\)","");
OR
str = str.replaceAll("\((?!\d+\))[^)]+?\)","");

Perl greedy regex is not acting greedy

Giving the following code:
use strict;
use warnings;
my $text = "asdf(blablabla)";
$text =~ s/(.*?)\((.*)\)/$2/;
print "\nfirst match: $1";
print "\nsecond match: $2";
I expected that $2 would catch my last bracket, yet my output is:
If .* by default it's greedy why it stopped at the bracket?
The .* is a greedy subpattern, but it does not account for grouping. Grouping is defined with a pair of unescaped parentheses (see Use Parentheses for Grouping and Capturing).
See where your group boundaries are:
s/(.*?)\((.*)\)/$2/
| G1| |G2|
So, the \( and \) matching ( and ) are outside the groups, and will not be part of neither $1 nor $2.
If you need the ) be part of $2, use
s/(.*?)\((.*\))/$2/
^
A regex engine is processing both the string and the pattern from left to right. The first (.*?) is handled first, and it matches up to the first literal ( symbol as it is lazy (matches as few chars as possible before it can return a valid match), and the whole part before the ( is placed into Group 1 stack. Then, the ( is matched, but not captured, then (.*) matches any 0+ characters other than a newline up to the last ) symbol, and places the capture into Group 2. Then, the ) is just matched. The point is that .* grabs the whole string up to the end, but then backtracking happens since the engine tries to accommodate for the final ) in the pattern. The ) must be matched, but not captured in your pattern, thus, it is not part of Group 2 due to the group boundary placement. You can see the regex debugger at this regex demo page to see how the pattern matches your string.

Matching first letter of word

I want to match the first letter of a word in one string to another with the similar letter. In this example the letter H:
25HB matches to HC
I am using the match operator shown below:
my ($match) = ( $value =~ m/^d(\w)/ );
to not match the digit, but the first matching word character. How could I correct this?
That regex doesn't do what you think it does:
m/^d(\w)/
Matches 'start of line' - letter d then a single word character.
You may want:
m/^\d+(\w)/
Which will then match one or more digits from the start of line, and grab the first word character after that.
E.g.:
my $string = '25HC';
my ( $match ) =( $string =~ m/^\d+(\w)/ );
print $match,"\n";
Prints H
You are not clear about what you want. If you want to match the first letter in a string to the same letter later in the string:
m{
( # start a capture
[[:alpha:]] # match a single letter
) # end of capture
.*? # skip minimum number of any character
\1 # match the captured letter
}msx; # /m means multilines, /s means . matches newlines, /x means ignore whitespace in pattern
See perldoc perlre for more details.
Addendum:
If by word, you mean any alphanumeric sequence, this may be closer to what you want:
m{
\b # match a word boundary (start or end of a word)
\d* # greedy match any digits
( # start a capture
[[:alpha:]] # match a single letter
) # end of capture
.*? # skip minimum number of any character
\b # match a word boundary (start or end of a word)
\d* # greedy match any digits
\1 # match the captured letter
}msx; # /m means multilines, /s means . matches newlines, /x means ignore whitespace in pattern
You could try ^.*?([A-Za-z]).
The following code returns:
ITEM: 22hb
MATCH: h
ITEM: 33HB
MATCH: H
ITEM: 3333
MATCH:
ITEM: 43 H
MATCH: H
ITEM: HB33
MATCH: H
Script.
#!/usr/bin/perl
my #array = ('22hb','33HB','3333','43 H','HB33');
for my $item (#array) {
my $match = $1 if $item =~ /^.*?([A-Za-z])/;
print "ITEM: $item \nMATCH: $match\n\n";
}
I believe this is what you are looking for:
(If you can provide more clear example of what you are looking for we may be able to help you better)
The following code takes two strings and finds the first non-digit character common in both the strings:
my $string1 = '25HB';
my $string2 = 'HC';
#strip all digits
$string1 =~ s/\d//g;
foreach my $alpha (split //, $string1) {
# for each non-digit check if we find a match
if ($string2 =~ /$alpha/) {
print "First matching non-numeric character: $alpha\n";
exit;
}
}

cant save pattern matches in array using perl and regex

I am trying to save matched patterns in an array using perl and regex, the problem is that when the match is saved it is missing some characters
ex:
my #array;
my #temp_array;
#types_U8 = ("uint8","vuint8","UCHAR");
foreach my $type (#types_U8)
{
#temp_array = $str =~ /\(\s*\Q$type\E\s*\)\s*(0x[0-9ABCDEF]{3,}|\-[1-9]+)/g;
push(#array,#temp_array);
#temp_array = ();
}
So if $str = "any text (uint8)-1"
The saved string in the #temp_array is only ever "-1"
Your current regular expression is:
/\(\s*\Q$type\E\s*\)\s*(0x[0-9ABCDEF]{3,}|\-[1-9]+)/g
this means
match a literal left paren: \(
match zero or more whitespace characters: \s*
match the value that is stored in $type: \Q$type\E
match zero of more whitespace characters: \s*
match a literal right paren: \)
match zero of more whitespace characters: \s*
START capturing group: (
match a 3 digit hexadecimal number prefixed with 0x
OR
match a literal dash, followed by 1 or more digits from 1 to 9: 0x[0-9ABCDEF]{3,}|\-[1-9]+
END capturing group: )
If you notice above, your capturing group doesn't start until step #7, when you would also like to capture $type and the literal parens.
Extend your capturing group to enclose those areas:
/(\(\s*\Q$type\E\s*\)\s*(?:0x[0-9ABCDEF]{3,}|\-[1-9]+))/;
This means:
START a capturing group: (
match a literal left paren: \(
match zero or more whitespace characters: \s*
match the value that is stored in $type: \Q$type\E
match zero of more whitespace characters: \s*
match a literal right paren: \)
match zero of more whitespace characters: \s*
START non-capturing group: (?:
match a 3 digit hexadecimal number prefixed with 0x
OR
match a literal dash, followed by 1 or more digits from 1 to 9: 0x[0-9ABCDEF]{3,}|\-[1-9]+
END non-capturing group: )
END capturing group: )
(Note: I removed the g (global) modifier because it is unnecessary)
This change gives me a result of (uint8)-1