Regex to get text from within parenthesis including delimited parenthsis - regex

I am using powershell, if that matters.
Let's say I have
$s = "One two (three) four \(five\) six (\(seven\)) eight"
I want a regex that will return
three
(seven)
I need all matches, and I know how PowerShell stores the matches in $matches, similar to perl's $1 $2 $3 (but that's the easy part).

Use the below regex and get the string you want from group index 1.
(?<!\\)\(((?:\\[()]|[^()])*)\)
Negative lookbehind (?<!\\) which asserts that the match wouldn't be preceded by \ symbol.
DEMO

(?<!\\)\(([^)]+\))
Try this.See demo.
http://regex101.com/r/kT6vO6/3

Related

How can I get the count of a capture group then replace the characters with a specific character?

How can I get the count of a capture group and replace it with the same number of characters that I specify?
For example here is a string...
123456789ABCD00001DDD
My regex with capture groups is as follows...
^([123456789]{9})([ABCD]{1,4})([0]{1,5})([0-9]{1,5})([D]*)$
When I use something like Notepad++ I want to find the above and replace it with something like...
\1\2 \4\5
Making the end results look like...
123456789ABCD 1DDD
Example located at https://regex101.com/r/fykEnn/1
You may use this regex with \G and a positive lookahead:
(?:^([1-9]{9}[ABCD]{1,4})(?=0{1,5}\d{1,5}D*$)|\G)0
RegEx Demo
\G asserts position at the end of the previous match or the start of the string for the first match.
In bash, using sed:
$ echo 123456789ABCD00001DDD | sed -re 's/([123456789]{9})([ABCD]{1,4})([0]{1,5})([0-9]{1,5})([D]*)$/\1\2 \4\5 /g'
123456789ABCD 1DDD
I think you should to use both lookbehind & lookahead assertion with a single digit ...
example ...
Find what: (?<=[A-Z])\d{4}(?=\d[A-Z])
Replace with: a space

What is the regex to match exactly an alphanumeric 16 character string?

Here is a regex string I need to use but I only want it to match exactly 16 alphanumeric characters not the 16 within a longer string.
[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}
Its matches this: PLDTLL47S04L424T and MRTMTT25D09F205Z perfectly But what i dont want it to match is something like this in bold thats in middle of this long string:
FA4127E57FE52E49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
Thanks in advance!
You didn't say which regex flavor you're using, but the issue is that you're missing start and end anchors.
Add ^ and $ to your regex as such:
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
^ means match at the start of a string, or the point after any newline in multiline mode.
$ means the opposite: the end of a string, or the point before the newline in multiline mode.
In addition to my predecessors:
assuming that you want to match if and only if the line starts with something that matches your pattern, both anchor ^ and word boundary \b will do.
Ending the pattern with anchor $ and/or \b is, however, - taken into account the assumption that a line starting with something that matches, NOT correct.
See some example code:
#!/usr/bin/perl -w
my #tests = qw/
AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
0AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
/;
foreach my $test (#tests){
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5})/ ) {
print "$1 matches\n";
} else {
print "NO MATCH\n";
}
}
generates output:
marc:tmp marc$ perl test.pl
AAAAAA00A00AAAAA matches
NO MATCH
if you change the pattern to
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5}$)/ ) {
the result is:
marc:tmp marc$ perl test.pl
NO MATCH
NO MATCH
You can use Boundry Matchers to match the beginning and endings of lines, strings, words or other things. What is available depends on your flavour of regex. The start and end of string/input matchers are pretty universal.
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
Again depending on the flavour of regex you are using you can also POSIX character classes to match alpha numerics with \p{Alpha} and \p{Digit}. This will simplfy your regex a bit.
You should use ^ and $ to bound the regex
You can use word boundaries \b for this purpose:
\b[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}\b
^ ^
Edit: Word boundaries and not start ^ and end $ anchors because I am assuming you just want to avoid matches as a substring and your patterns are more like your sample string but with spaces
You may try this regex: ^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+){16}$

Regex to get all character to the right of first space?

I am trying to craft a regular expression that will match all characters after (but not including) the first space in a string.
Input text:
foo bar bacon
Desired match:
bar bacon
The closest thing I've found so far is:
\s(.*)
However, this matches the first space in addition to "bar bacon", which is undesirable. Any help is appreciated.
You can use a positive lookbehind:
(?<=\s).*
(demo)
Although it looks like you've already put a capturing group around .* in your current regex, so you could just try grabbing that.
I'd prefer to use [[:blank:]] for it as it doesn't match newlines just in case we're targetting mutli's. And it's also compatible to those not supporting \s.
(?<=[[:blank:]]).*
You don't need look behind.
my $str = 'now is the time';
# Non-greedily match up to the first space, and then get everything after in a group.
$str =~ /^.*? +(.+)/;
my $right_of_space = $1; # Keep what is in the group in parens
print "[$right_of_space]\n";
You can also try this
(?s)(?<=\S*\s+).*
or
(?s)\S*\s+(.*)//group 1 has your match
With (?s) . would also match newlines

Replace specific capture group instead of entire regex in Perl

I've got a regular expression with capture groups that matches what I want in a broader context. I then take capture group $1 and use it for my needs. That's easy.
But how to use capture groups with s/// when I just want to replace the content of $1, not the entire regex, with my replacement?
For instance, if I do:
$str =~ s/prefix (something) suffix/42/
prefix and suffix are removed. Instead, I would like something to be replaced by 42, while keeping prefix and suffix intact.
As I understand, you can use look-ahead or look-behind that don't consume characters. Or save data in groups and only remove what you are looking for. Examples:
With look-ahead:
s/your_text(?=ahead_text)//;
Grouping data:
s/(your_text)(ahead_text)/$2/;
If you only need to replace one capture then using #LAST_MATCH_START and #LAST_MATCH_END (with use English; see perldoc perlvar) together with substr might be a viable choice:
use English qw(-no_match_vars);
$your_string =~ m/aaa (bbb) ccc/;
substr $your_string, $LAST_MATCH_START[1], $LAST_MATCH_END[1] - $LAST_MATCH_START[1], "new content";
# replaces "bbb" with "new content"
This is an old question but I found the below easier for replacing lines that start with >something to >something_else. Good for changing the headers for fasta sequences
while ($filelines=~ />(.*)\s/g){
unless ($1 =~ /else/i){
$filelines =~ s/($1)/$1\_else/;
}
}
I use something like this:
s/(?<=prefix)(group)(?=suffix)/$1 =~ s|text|rep|gr/e;
Example:
In the following text I want to normalize the whitespace but only after ::=:
some text := a b c d e ;
Which can be achieved with:
s/(?<=::=)(.*)/$1 =~ s|\s+| |gr/e
Results with:
some text := a b c d e ;
Explanation:
(?<=::=): Look-behind assertion to match ::=
(.*): Everything after ::=
$1 =~ s|\s+| |gr: With the captured group normalize whitespace. Note the r modifier which makes sure not to attempt to modify $1 which is read-only. Use a different sub delimiter (|) to not terminate the replacement expression.
/e: Treat the replacement text as a perl expression.
Use lookaround assertions. Quoting the documentation:
Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.
If the beginning of the string has a fixed length, you can thus do:
s/(?<=prefix)(your capture)(?=suffix)/$1/
However, ?<= does not work for variable length patterns (starting from Perl 5.30, it accepts variable length patterns whose length is smaller than 255 characters, which enables the use of |, but still prevents the use of *). The work-around is to use \K instead of (?<=):
s/.*prefix\K(your capture)(?=suffix)/$1/

Insertion with Regex to format a date (Perl)

Suppose I have a string 04032010.
I want it to be 04/03/2010. How would I insert the slashes with a regex?
To do this with a regex, try the following:
my $var = "04032010";
$var =~ s{ (\d{2}) (\d{2}) (\d{4}) }{$1/$2/$3}x;
print $var;
The \d means match single digit. And {n} means the preceding matched character n times. Combined you get \d{2} to match two digits or \d{4} to match four digits. By surrounding each set in parenthesis the match will be stored in a variable, $1, $2, $3 ... etc.
Some of the prior answers used a . to match, this is not a good thing because it'll match any character. The one we've built here is much more strict in what it'll accept.
You'll notice I used extra spacing in the regex, I used the x modifier to tell the engine to ignore whitespace in my regex. It can be quite helpful to make the regex a bit more readable.
Compare s{(\d{2})(\d{2})(\d{4})}{$1/$2/$3}x; vs s{ (\d{2}) (\d{2}) (\d{4}) }{$1/$2/$3}x;
Well, a regular expression just matches, but you can try something like this:
s/(..)(..)(..)/$1/$2/$3/
#!/usr/bin/perl
$var = "04032010";
$var =~ s/(..)(..)(....)/$1\/$2\/$3/;
print $var, "\n";
Works for me:
$ perl perltest
04/03/2010
I always prefer to use a different delimiter if / is involved so I would go for
s| (\d\d) (\d\d) |$1/$2/|x ;